athena.data.datasets.speaker_recognition

speaker dataset

Module Contents

Classes

SpeakerRecognitionDatasetBuilder SpeakerRecognitionDatasetBuilder
SpeakerVerificationDatasetBuilder SpeakerVerificationDatasetBuilder
class athena.data.datasets.speaker_recognition.SpeakerRecognitionDatasetBuilder(config=None)

Bases: athena.data.datasets.base.BaseDatasetBuilder

SpeakerRecognitionDatasetBuilder

default_config
num_class

@property

Returns:the number of speakers
Return type:int
sample_type

@property

Returns:sample_type of the dataset:
{
    "input": tf.float32,
    "input_length": tf.int32,
    "output_length": tf.int32,
    "output": tf.int32
}
Return type:dict
sample_shape

@property

Returns:sample_shape of the dataset:
{
    "input": tf.TensorShape([None, dim, nc]),
    "input_length": tf.TensorShape([]),
    "output_length": tf.TensorShape([]),
    "output": tf.TensorShape([None])
}
Return type:dict
sample_signature

@property

Returns:sample_signature of the dataset:
{
    "input": tf.TensorSpec(shape=(None, None, dim, nc), dtype=tf.float32),
    "input_length": tf.TensorSpec(shape=(None), dtype=tf.int32),
    "output_length": tf.TensorSpec(shape=(None), dtype=tf.int32),
    "output": tf.TensorSpec(shape=(None, None), dtype=tf.int32),
}
Return type:dict
reload_config(self, config)

reload the config

preprocess_data(self, data_csv_path)

generate a list of tuples (wav_filename, wav_length_ms, speaker_id, speaker_name).

cut_features(self, feature)

cut acoustic featuers

load_csv(self, data_csv_path)

load csv file

__getitem__(self, index)

get a sample

Parameters:index (int) – index of the entries
Returns:sample:
{
    "input": feat,
    "input_length": feat_length,
    "output_length": 1,
    "output": spkid
}
Return type:dict
__len__(self)

return the number of data samples

filter_sample_by_input_length(self)

filter samples by input length

The length of filterd samples will be in [min_length, max_length)

Returns:a filtered list of tuples (wav_filename, wav_len, transcripts, speed, speaker)
Return type:entries
compute_cmvn_if_necessary(self, is_necessary=True)

compute cmvn file

class athena.data.datasets.speaker_recognition.SpeakerVerificationDatasetBuilder(config=None)

Bases: athena.data.datasets.speaker_recognition.SpeakerRecognitionDatasetBuilder

SpeakerVerificationDatasetBuilder

sample_type

@property

Returns:sample_type of the dataset:
{
    "input_a": tf.float32,
    "input_b": tf.float32,
    "output": tf.int32
}
Return type:dict
sample_shape

@property

Returns:sample_shape of the dataset:
{
    "input_a": tf.TensorShape([None, dim, nc]),
    "input_b": tf.TensorShape([None, dim, nc]),
    "output": tf.TensorShape([None])
}
Return type:dict
sample_signature

@property

Returns:sample_signature of the dataset:
{
    "input_a": tf.TensorSpec(shape=(None, None, dim, nc), dtype=tf.float32),
    "input_b":tf.TensorSpec(shape=(None, None, dim, nc), dtype=tf.float32),
    "output": tf.TensorSpec(shape=(None, None), dtype=tf.int32),
}
Return type:dict
preprocess_data(self, data_csv_path)

generate a list of tuples (wav_filename_a, speaker_a, wav_filename_b, speaker_b, label).

__getitem__(self, index)

get a sample

Parameters:index (int) – index of the entries
Returns:sample:
{
    "input_a": feat_a,
    "input_b": feat_b,
    "output": [label]
}
Return type:dict