`athena.data.datasets.speech_set`¶

audio dataset

Module Contents¶

Classes¶

SpeechDatasetBuilder SpeechDatasetBuilder

class athena.data.datasets.speech_set.SpeechDatasetBuilder(config=None)¶

Bases: athena.data.datasets.base.BaseDatasetBuilder

SpeechDatasetBuilder

default_config¶

num_class¶

@property

Returns:	the target dim
Return type:	int

speaker_list¶: return the speaker list

audio_featurizer_func¶: return the audio_featurizer function

sample_type¶

@property

Returns:	sample_type of the dataset: { "input": tf.float32, "input_length": tf.int32, "output": tf.float32, "output_length": tf.int32, }
Return type:	dict

sample_shape¶

@property

Returns:

sample_shape of the dataset:

{
    "input": tf.TensorShape(
    [None, self.audio_featurizer.dim, self.audio_featurizer.num_channels]
    ),
    "input_length": tf.TensorShape([]),
    "output": tf.TensorShape([None, None]),
    "output_length": tf.TensorShape([]),
}

Return type: dict

sample_signature¶

@property

Returns:

sample_signature of the dataset:

{
    "input": tf.TensorSpec(
        shape=(None, None, None, None), dtype=tf.float32
    ),
    "input_length": tf.TensorSpec(shape=([None]), dtype=tf.int32),
    "output": tf.TensorSpec(shape=(None, None, None), dtype=tf.float32),
    "output_length": tf.TensorSpec(shape=([None]), dtype=tf.int32),
}

Return type: dict

reload_config(self, config)¶: reload the config

preprocess_data(self, file_path)¶: generate a list of tuples (wav_filename, wav_length_ms, speaker).

load_csv(self, file_path)¶: load csv file

__getitem__(self, index)¶

get a sample

Parameters:	index (int) – index of the entries
Returns:	sample: { "input": input_data, "input_length": input_data.shape[0], "output": output_data, "output_length": output_data.shape[0], }
Return type:	dict

__len__(self)¶: return the number of data samples

filter_sample_by_input_length(self)¶

filter samples by input length

The length of filterd samples will be in [min_length, max_length)

Parameters:	= [min_len, max_len] (self.hparams.input_length_range) – min_len – the minimal length(ms) max_len – the maximal length(ms)
Returns:	a filtered list of tuples (wav_filename, wav_len, speaker)
Return type:	entries

compute_cmvn_if_necessary(self, is_necessary=True)¶: compute cmvn file

athena.data.datasets.speech_set¶

Module Contents¶

Classes¶

`athena.data.datasets.speech_set`¶