athena.data.datasets.language_set¶
audio dataset
Module Contents¶
Classes¶
LanguageDatasetBuilder |
LanguageDatasetBuilder |
-
class
athena.data.datasets.language_set.LanguageDatasetBuilder(config=None)¶ Bases:
athena.data.datasets.base.BaseDatasetBuilderLanguageDatasetBuilder
-
default_config¶
-
num_class¶ @propertyReturns: the max_index of the vocabulary Return type: int
-
input_vocab_size¶ @propertyReturns: the input vocab size Return type: int
-
sample_type¶ @propertyReturns: sample_type of the dataset: { "input": tf.int32, "input_length": tf.int32, "output": tf.int32, "output_length": tf.int32, }
Return type: dict
-
sample_shape¶ @propertyReturns: sample_shape of the dataset: { "input": tf.TensorShape([None]), "input_length": tf.TensorShape([]), "output": tf.TensorShape([None]), "output_length": tf.TensorShape([]), }
Return type: dict
-
sample_signature¶ @propertyReturns: sample_signature of the dataset: { "input": tf.TensorSpec(shape=(None, None), dtype=tf.int32), "input_length": tf.TensorSpec(shape=([None]), dtype=tf.int32), "output": tf.TensorSpec(shape=(None, None), dtype=tf.int32), "output_length": tf.TensorSpec(shape=([None]), dtype=tf.int32), }
Return type: dict
-
load_csv(self, file_path)¶ load csv file
-
__getitem__(self, index)¶ get a sample
Parameters: index (int) – index of the entries Returns: sample: { "input": input_labels, "input_length": input_length, "output": output_labels, "output_length": output_length, }
Return type: dict
-
__len__(self)¶ return the number of data samples
-