`athena.data.datasets.language_set`¶

audio dataset

Module Contents¶

Classes¶

LanguageDatasetBuilder LanguageDatasetBuilder

class athena.data.datasets.language_set.LanguageDatasetBuilder(config=None)¶

Bases: athena.data.datasets.base.BaseDatasetBuilder

LanguageDatasetBuilder

default_config¶

num_class¶

@property

Returns:	the max_index of the vocabulary
Return type:	int

input_vocab_size¶

@property

Returns:	the input vocab size
Return type:	int

sample_type¶

@property

Returns:	sample_type of the dataset: { "input": tf.int32, "input_length": tf.int32, "output": tf.int32, "output_length": tf.int32, }
Return type:	dict

sample_shape¶

@property

Returns:	sample_shape of the dataset: { "input": tf.TensorShape([None]), "input_length": tf.TensorShape([]), "output": tf.TensorShape([None]), "output_length": tf.TensorShape([]), }
Return type:	dict

sample_signature¶

@property

Returns:

sample_signature of the dataset:

{
    "input": tf.TensorSpec(shape=(None, None), dtype=tf.int32),
    "input_length": tf.TensorSpec(shape=([None]), dtype=tf.int32),
    "output": tf.TensorSpec(shape=(None, None), dtype=tf.int32),
    "output_length": tf.TensorSpec(shape=([None]), dtype=tf.int32),
}

Return type: dict

load_csv(self, file_path)¶: load csv file

__getitem__(self, index)¶

get a sample

Parameters:	index (int) – index of the entries
Returns:	sample: { "input": input_labels, "input_length": input_length, "output": output_labels, "output_length": output_length, }
Return type:	dict

__len__(self)¶: return the number of data samples

athena.data.datasets.language_set¶

Module Contents¶

Classes¶

`athena.data.datasets.language_set`¶