athena.data.datasets.base¶
base dataset
Module Contents¶
Classes¶
BaseDatasetBuilder |
base dataset builder |
Functions¶
data_loader(dataset_builder, batch_size=16, num_threads=1) |
data loader |
-
athena.data.datasets.base.data_loader(dataset_builder, batch_size=16, num_threads=1)¶ data loader
Parameters: - dataset_builder – dataset builder
- batch_size (int, optional) – Defaults to 16.
- num_threads (int, optional) – number of threads to load data. Defaults to 1.
-
class
athena.data.datasets.base.BaseDatasetBuilder¶ base dataset builder
-
entries_list¶ return the entries list
-
sample_type¶ example types
-
sample_shape¶ examples shapes
-
sample_signature¶ examples signature
-
__getitem__(self, index)¶
-
__len__(self)¶
-
as_dataset(self, batch_size=16, num_threads=1)¶ return tf.data.Dataset object
-
shard(self, num_shards, index)¶ creates a Dataset that includes only 1/num_shards of this dataset
-
batch_wise_shuffle(self, batch_size=64)¶ Batch-wise shuffling of the data entries.
Each data entry is in the format of (audio_file, file_size, transcript). If epoch_index is 0 and sortagrad is true, we don’t perform shuffling and return entries in sorted file_size order. Otherwise, do batch_wise shuffling.
Parameters: batch_size (int, optional) – an integer for the batch size. Defaults to 64.
-
compute_cmvn_if_necessary(self, is_necessary=True)¶ vitural interface
-