`athena.data.datasets.speaker_recognition`¶

speaker dataset

Module Contents¶

Classes¶

`SpeakerRecognitionDatasetBuilder`	SpeakerRecognitionDatasetBuilder
`SpeakerVerificationDatasetBuilder`	SpeakerVerificationDatasetBuilder

class athena.data.datasets.speaker_recognition.SpeakerRecognitionDatasetBuilder(config=None)¶

Bases: athena.data.datasets.base.BaseDatasetBuilder

SpeakerRecognitionDatasetBuilder

default_config¶

num_class¶

@property

Returns:	the number of speakers
Return type:	int

sample_type¶

@property

Returns:	sample_type of the dataset: { "input": tf.float32, "input_length": tf.int32, "output_length": tf.int32, "output": tf.int32 }
Return type:	dict

sample_shape¶

@property

Returns:	sample_shape of the dataset: { "input": tf.TensorShape([None, dim, nc]), "input_length": tf.TensorShape([]), "output_length": tf.TensorShape([]), "output": tf.TensorShape([None]) }
Return type:	dict

sample_signature¶

@property

Returns:

sample_signature of the dataset:

{
    "input": tf.TensorSpec(shape=(None, None, dim, nc), dtype=tf.float32),
    "input_length": tf.TensorSpec(shape=(None), dtype=tf.int32),
    "output_length": tf.TensorSpec(shape=(None), dtype=tf.int32),
    "output": tf.TensorSpec(shape=(None, None), dtype=tf.int32),
}

Return type: dict

reload_config(self, config)¶: reload the config

preprocess_data(self, data_csv_path)¶: generate a list of tuples (wav_filename, wav_length_ms, speaker_id, speaker_name).

cut_features(self, feature)¶: cut acoustic featuers

load_csv(self, data_csv_path)¶: load csv file

__getitem__(self, index)¶

get a sample

Parameters:	index (int) – index of the entries
Returns:	sample: { "input": feat, "input_length": feat_length, "output_length": 1, "output": spkid }
Return type:	dict

__len__(self)¶: return the number of data samples

filter_sample_by_input_length(self)¶

filter samples by input length

The length of filterd samples will be in [min_length, max_length)

Returns:	a filtered list of tuples (wav_filename, wav_len, transcripts, speed, speaker)
Return type:	entries

compute_cmvn_if_necessary(self, is_necessary=True)¶: compute cmvn file

class athena.data.datasets.speaker_recognition.SpeakerVerificationDatasetBuilder(config=None)¶

Bases: athena.data.datasets.speaker_recognition.SpeakerRecognitionDatasetBuilder

SpeakerVerificationDatasetBuilder

sample_type¶

@property

Returns:	sample_type of the dataset: { "input_a": tf.float32, "input_b": tf.float32, "output": tf.int32 }
Return type:	dict

sample_shape¶

@property

Returns:	sample_shape of the dataset: { "input_a": tf.TensorShape([None, dim, nc]), "input_b": tf.TensorShape([None, dim, nc]), "output": tf.TensorShape([None]) }
Return type:	dict

sample_signature¶

@property

Returns:

sample_signature of the dataset:

{
    "input_a": tf.TensorSpec(shape=(None, None, dim, nc), dtype=tf.float32),
    "input_b":tf.TensorSpec(shape=(None, None, dim, nc), dtype=tf.float32),
    "output": tf.TensorSpec(shape=(None, None), dtype=tf.int32),
}

Return type: dict

preprocess_data(self, data_csv_path)¶: generate a list of tuples (wav_filename_a, speaker_a, wav_filename_b, speaker_b, label).

__getitem__(self, index)¶

get a sample

Parameters:	index (int) – index of the entries
Returns:	sample: { "input_a": feat_a, "input_b": feat_b, "output": [label] }
Return type:	dict

athena.data.datasets.speaker_recognition¶

Module Contents¶

Classes¶

`athena.data.datasets.speaker_recognition`¶