athena.data.datasets.preprocess

preprecessing for speech features

Module Contents

Classes

SpecAugment Implementation of specaugument from paper “SpecAugment: A Simple Data
class athena.data.datasets.preprocess.SpecAugment(preprocess_config)
Implementation of specaugument from paper “SpecAugment: A Simple Data
Augmentation Method for Automatic Speech Recognition”
Parameters:preprocess_config

it contains configs below:

time_warping: warped time parameter, should be in (0, time / 2),
    a random horizontal center point in (W, time_steps - W) will be warped
    either to left or right by a distance chosen from range [0, W) randomly.
time_masking: masked time range, should be in (0, time_steps),
    the final masked time steps will be [t_0, t_0 + t),
    t is random from[0, T), t_0 is random from [0, time_steps - t)
frequency_masking: masked frequency range, should be in (0, dimension),
    the final masked frequencies will be [f_0, f_0 + f),
    f is random from[0, F), f_0 is random from [0, dimension - f)
mask_cols: masking operation is executed mask_cols times in each axis
default_config
__call__(self, feat)

spec augment preprocess for audio features

Parameters:feat – audio features, shape should be [time_steps, dimension, channels]
Returns:processed features
feat_time_warping(self, feat)

time warping for spec agument

Parameters:feat – audio features, shape should be [time_steps, dimension, channels]
Returns:time warped features
feat_masking(self, feat, axis=0, mask_num=0)

masking for spec augment

Parameters:
  • feat – audio features, shape should be [time_steps, dimension, channels]
  • axis (int, optional) – the axis to be masked. Defaults to 0.
  • mask_num (int, optional) – masked time or frequency range. Defaults to 0.
Returns:

masked features