`athena.data.datasets.preprocess`¶

preprecessing for speech features

Module Contents¶

Classes¶

SpecAugment Implementation of specaugument from paper “SpecAugment: A Simple Data

class athena.data.datasets.preprocess.SpecAugment(preprocess_config)¶

Implementation of specaugument from paper “SpecAugment: A Simple Data: Augmentation Method for Automatic Speech Recognition”

Parameters:

preprocess_config –

it contains configs below:

time_warping: warped time parameter, should be in (0, time / 2),
    a random horizontal center point in (W, time_steps - W) will be warped
    either to left or right by a distance chosen from range [0, W) randomly.
time_masking: masked time range, should be in (0, time_steps),
    the final masked time steps will be [t_0, t_0 + t),
    t is random from[0, T), t_0 is random from [0, time_steps - t)
frequency_masking: masked frequency range, should be in (0, dimension),
    the final masked frequencies will be [f_0, f_0 + f),
    f is random from[0, F), f_0 is random from [0, dimension - f)
mask_cols: masking operation is executed mask_cols times in each axis

default_config¶

__call__(self, feat)¶

spec augment preprocess for audio features

Parameters:	feat – audio features, shape should be [time_steps, dimension, channels]
Returns:	processed features

feat_time_warping(self, feat)¶

time warping for spec agument

Parameters:	feat – audio features, shape should be [time_steps, dimension, channels]
Returns:	time warped features

feat_masking(self, feat, axis=0, mask_num=0)¶

masking for spec augment

Parameters:	feat – audio features, shape should be [time_steps, dimension, channels] axis (int, optional) – the axis to be masked. Defaults to 0. mask_num (int, optional) – masked time or frequency range. Defaults to 0.
Returns:	masked features

athena.data.datasets.preprocess¶

Module Contents¶

Classes¶

`athena.data.datasets.preprocess`¶