athena.data.datasets.preprocess¶
preprecessing for speech features
Module Contents¶
Classes¶
SpecAugment |
Implementation of specaugument from paper “SpecAugment: A Simple Data |
-
class
athena.data.datasets.preprocess.SpecAugment(preprocess_config)¶ - Implementation of specaugument from paper “SpecAugment: A Simple Data
- Augmentation Method for Automatic Speech Recognition”
Parameters: preprocess_config – it contains configs below:
time_warping: warped time parameter, should be in (0, time / 2), a random horizontal center point in (W, time_steps - W) will be warped either to left or right by a distance chosen from range [0, W) randomly. time_masking: masked time range, should be in (0, time_steps), the final masked time steps will be [t_0, t_0 + t), t is random from[0, T), t_0 is random from [0, time_steps - t) frequency_masking: masked frequency range, should be in (0, dimension), the final masked frequencies will be [f_0, f_0 + f), f is random from[0, F), f_0 is random from [0, dimension - f) mask_cols: masking operation is executed mask_cols times in each axis
-
default_config¶
-
__call__(self, feat)¶ spec augment preprocess for audio features
Parameters: feat – audio features, shape should be [time_steps, dimension, channels] Returns: processed features
-
feat_time_warping(self, feat)¶ time warping for spec agument
Parameters: feat – audio features, shape should be [time_steps, dimension, channels] Returns: time warped features
-
feat_masking(self, feat, axis=0, mask_num=0)¶ masking for spec augment
Parameters: - feat – audio features, shape should be [time_steps, dimension, channels]
- axis (int, optional) – the axis to be masked. Defaults to 0.
- mask_num (int, optional) – masked time or frequency range. Defaults to 0.
Returns: masked features