athena.transform.feats.fbank¶
This model extracts Fbank features per frame.
Module Contents¶
Classes¶
Fbank |
Computing filter banks is applying triangular filters on a Mel-scale to the power |
-
class
athena.transform.feats.fbank.Fbank(config: dict)¶ Bases:
athena.transform.feats.base_frontend.BaseFrontendComputing filter banks is applying triangular filters on a Mel-scale to the power spectrum to extract frequency bands. Return a float tensor with shape (num_channels, num_frames, num_frequencies).
-
classmethod
params(cls, config=None)¶ Set params. :param config: contains thirteen optional parameters: :param window_length: Window length in seconds. (float, default = 0.025) :param frame_length: Hop length in seconds. (float, default = 0.010)
- snip_edges: If 1, the last frame (shorter than window_length) will be
- cutoff. If 2, 1 // 2 frame_length data will be padded to data. (int, default = 1)
- raw_energy: If 1, compute frame energy before preemphasis and
- windowing. If 2, compute frame energy after preemphasis and windowing. (int, default = 1)
- preEph_coeff: Coefficient for use in frame-signal preemphasis.
- (float, default = 0.97)
- window_type: Type of window (“hamm”|”hann”|”povey”|”rect”|”blac”|”tria”).
- (string, default = “povey”)
- remove_dc_offset: Subtract mean from waveform on each frame.
- (bool, default = true)
- is_fbank: If true, compute power spetrum without frame energy.
- If false, using the frame energy instead of the square of the constant component of the signal. (bool, default = true)
- is_log10: If true, using log10 to fbank. If false, using loge.
- (bool, default = false)
- output_type: If 1, return power spectrum. If 2, return log-power
- spectrum. (int, default = 1)
- upper_frequency_limit: High cutoff frequency for mel bins (if <= 0, offset
- from Nyquist) (float, default = 0)
lower_frequency_limit: Low cutoff frequency for mel bins (float, default = 20) filterbank_channel_count: Number of triangular mel-frequency bins.
(float, default = 23)- dither: Dithering constant (0.0 means no dither).
- (float, default = 1) [add robust to training]
Returns: An object of class HParams, which is a set of hyperparameters as name-value pairs.
-
call(self, audio_data, sample_rate)¶ Caculate fbank features of audio data. :param audio_data: the audio signal from which to compute spectrum.
Should be an (1, N) tensor.Parameters: sample_rate – the samplerate of the signal we working with, default is 16kHz. Returns: A float tensor of size (num_channels, num_frames, num_frequencies) containing fbank features of every frame in speech.
-
dim(self)¶ Return dimension of fbank.
-
num_channels(self)¶ Return number of channels of fbank.
-
classmethod