athena.data.datasets.speaker_recognition¶
speaker dataset
Module Contents¶
Classes¶
SpeakerRecognitionDatasetBuilder |
SpeakerRecognitionDatasetBuilder |
SpeakerVerificationDatasetBuilder |
SpeakerVerificationDatasetBuilder |
-
class
athena.data.datasets.speaker_recognition.SpeakerRecognitionDatasetBuilder(config=None)¶ Bases:
athena.data.datasets.base.BaseDatasetBuilderSpeakerRecognitionDatasetBuilder
-
default_config¶
-
num_class¶ @propertyReturns: the number of speakers Return type: int
-
sample_type¶ @propertyReturns: sample_type of the dataset: { "input": tf.float32, "input_length": tf.int32, "output_length": tf.int32, "output": tf.int32 }
Return type: dict
-
sample_shape¶ @propertyReturns: sample_shape of the dataset: { "input": tf.TensorShape([None, dim, nc]), "input_length": tf.TensorShape([]), "output_length": tf.TensorShape([]), "output": tf.TensorShape([None]) }
Return type: dict
-
sample_signature¶ @propertyReturns: sample_signature of the dataset: { "input": tf.TensorSpec(shape=(None, None, dim, nc), dtype=tf.float32), "input_length": tf.TensorSpec(shape=(None), dtype=tf.int32), "output_length": tf.TensorSpec(shape=(None), dtype=tf.int32), "output": tf.TensorSpec(shape=(None, None), dtype=tf.int32), }
Return type: dict
-
reload_config(self, config)¶ reload the config
-
preprocess_data(self, data_csv_path)¶ generate a list of tuples (wav_filename, wav_length_ms, speaker_id, speaker_name).
-
cut_features(self, feature)¶ cut acoustic featuers
-
load_csv(self, data_csv_path)¶ load csv file
-
__getitem__(self, index)¶ get a sample
Parameters: index (int) – index of the entries Returns: sample: { "input": feat, "input_length": feat_length, "output_length": 1, "output": spkid }
Return type: dict
-
__len__(self)¶ return the number of data samples
-
filter_sample_by_input_length(self)¶ filter samples by input length
The length of filterd samples will be in [min_length, max_length)
Returns: a filtered list of tuples (wav_filename, wav_len, transcripts, speed, speaker) Return type: entries
-
compute_cmvn_if_necessary(self, is_necessary=True)¶ compute cmvn file
-
-
class
athena.data.datasets.speaker_recognition.SpeakerVerificationDatasetBuilder(config=None)¶ Bases:
athena.data.datasets.speaker_recognition.SpeakerRecognitionDatasetBuilderSpeakerVerificationDatasetBuilder
-
sample_type¶ @propertyReturns: sample_type of the dataset: { "input_a": tf.float32, "input_b": tf.float32, "output": tf.int32 }
Return type: dict
-
sample_shape¶ @propertyReturns: sample_shape of the dataset: { "input_a": tf.TensorShape([None, dim, nc]), "input_b": tf.TensorShape([None, dim, nc]), "output": tf.TensorShape([None]) }
Return type: dict
-
sample_signature¶ @propertyReturns: sample_signature of the dataset: { "input_a": tf.TensorSpec(shape=(None, None, dim, nc), dtype=tf.float32), "input_b":tf.TensorSpec(shape=(None, None, dim, nc), dtype=tf.float32), "output": tf.TensorSpec(shape=(None, None), dtype=tf.int32), }
Return type: dict
-
preprocess_data(self, data_csv_path)¶ generate a list of tuples (wav_filename_a, speaker_a, wav_filename_b, speaker_b, label).
-
__getitem__(self, index)¶ get a sample
Parameters: index (int) – index of the entries Returns: sample: { "input_a": feat_a, "input_b": feat_b, "output": [label] }
Return type: dict
-