athena.data.datasets.language_set

audio dataset

Module Contents

Classes

LanguageDatasetBuilder LanguageDatasetBuilder
class athena.data.datasets.language_set.LanguageDatasetBuilder(config=None)

Bases: athena.data.datasets.base.BaseDatasetBuilder

LanguageDatasetBuilder

default_config
num_class

@property

Returns:the max_index of the vocabulary
Return type:int
input_vocab_size

@property

Returns:the input vocab size
Return type:int
sample_type

@property

Returns:sample_type of the dataset:
{
    "input": tf.int32,
    "input_length": tf.int32,
    "output": tf.int32,
    "output_length": tf.int32,
}
Return type:dict
sample_shape

@property

Returns:sample_shape of the dataset:
{
    "input": tf.TensorShape([None]),
    "input_length": tf.TensorShape([]),
    "output": tf.TensorShape([None]),
    "output_length": tf.TensorShape([]),
}
Return type:dict
sample_signature

@property

Returns:sample_signature of the dataset:
{
    "input": tf.TensorSpec(shape=(None, None), dtype=tf.int32),
    "input_length": tf.TensorSpec(shape=([None]), dtype=tf.int32),
    "output": tf.TensorSpec(shape=(None, None), dtype=tf.int32),
    "output_length": tf.TensorSpec(shape=([None]), dtype=tf.int32),
}
Return type:dict
load_csv(self, file_path)

load csv file

__getitem__(self, index)

get a sample

Parameters:index (int) – index of the entries
Returns:sample:
{
    "input": input_labels,
    "input_length": input_length,
    "output": output_labels,
    "output_length": output_length,
}
Return type:dict
__len__(self)

return the number of data samples