athena.data.datasets.base

base dataset

Module Contents

Classes

BaseDatasetBuilder base dataset builder

Functions

data_loader(dataset_builder, batch_size=16, num_threads=1) data loader
athena.data.datasets.base.data_loader(dataset_builder, batch_size=16, num_threads=1)

data loader

Parameters:
  • dataset_builder – dataset builder
  • batch_size (int, optional) – Defaults to 16.
  • num_threads (int, optional) – number of threads to load data. Defaults to 1.
class athena.data.datasets.base.BaseDatasetBuilder

base dataset builder

entries_list

return the entries list

sample_type

example types

sample_shape

examples shapes

sample_signature

examples signature

__getitem__(self, index)
__len__(self)
as_dataset(self, batch_size=16, num_threads=1)

return tf.data.Dataset object

shard(self, num_shards, index)

creates a Dataset that includes only 1/num_shards of this dataset

batch_wise_shuffle(self, batch_size=64)

Batch-wise shuffling of the data entries.

Each data entry is in the format of (audio_file, file_size, transcript). If epoch_index is 0 and sortagrad is true, we don’t perform shuffling and return entries in sorted file_size order. Otherwise, do batch_wise shuffling.

Parameters:batch_size (int, optional) – an integer for the batch size. Defaults to 64.
compute_cmvn_if_necessary(self, is_necessary=True)

vitural interface