athena.transform.feats.fbank_pitch

This model extracts Fbank && Pitch features per frame.

Module Contents

Classes

FbankPitch Compute Fbank && Pitch features respectively,and concate them. Return
class athena.transform.feats.fbank_pitch.FbankPitch(config: dict)

Bases: athena.transform.feats.base_frontend.BaseFrontend

Compute Fbank && Pitch features respectively,and concate them. Return a tensor with shape (num_frames, dim_features).

classmethod params(cls, config=None)

Set params. :param config: contains twenty-nine optional parameters:t

window_length: Window length in seconds. (float, default = 0.025) frame_length: Hop length in seconds. (float, default = 0.010) snip_edges: If 1, the last frame (shorter than window_length) will

be cutoff. If 2, 1 // 2 frame_length data will be padded to data. (int, default = 1)
raw_energy: If 1, compute frame energy before preemphasis and
windowing. If 2, compute frame energy after preemphasis
and windowing. (int, default = 1)
preEph_coeff: Coefficient for use in frame-signal preemphasis.
(float, default = 0.97)
window_type: Type of window (“hamm”|”hann”|”povey”|”rect”|”blac”|”tria”).
(string, default = “povey”)
remove_dc_offset: Subtract mean from waveform on each frame.
(bool, default = true)
is_fbank: If true, compute power spetrum without frame
energy. If false, using the frame energy instead of the square of the constant component of the signal. (bool, default = true)
output_type: If 1, return power spectrum. If 2, return
log-power spectrum. (int, default = 1)
upper_frequency_limit: High cutoff frequency for mel bins.
(if <= 0, offset from Nyquist) (float, default = 0)
lower_frequency_limit: Low cutoff frequency for mel bins.
(float, default = 20)
filterbank_channel_count: Number of triangular mel-frequency bins.
(float, default = 23)
dither: Dithering constant (0.0 means no dither).
(float, default = 1)[add robust to training]
delta-pitch: Smallest relative change in pitch that our
algorithm measures. (float, default = 0.005)
frames-per-chunk: Only relevant for offline pitch extraction.
(e.g. compute-kaldi-pitch-feats), you can set it to a small nonzero value, such as 10, for better feature compatibility with online decoding (affects energy normalization in the algorithm) (int, default = 0)
lowpass-cutoff: cutoff frequency for LowPass filter (Hz).
(float, default = 1000)
lowpass-filter-width: Integer that determines filter width of lowpass filter,
more gives sharper filter (int, default = 1)

max-f0: max. F0 to search for (Hz) (float, default = 400) max-frames-latency: Maximum number of frames of latency that we allow pitch

tracking to introduce into the feature processing

(affects output only if –frames-per-chunk > 0 and –simulate-first-pass-online=true (int, default = 0)

min-f0: min. F0 to search for (Hz) (float, default = 50) nccf-ballast: Increasing this factor reduces NCCF for quiet frames.

(float, default = 7000)
nccf-ballast-online: This is useful mainly for debug; it affects how the
NCCF ballast is computed. (bool, default = false)

penalty-factor: cost factor for FO change. (float, default = 0.1) preemphasis-coefficient: Coefficient for use in signal preemphasis (deprecated)

(float, default = 0)
recompute-frame: Only relevant for online pitch extraction, or for
compatibility with online pitch extraction. A non-critical parameter; the frame at which we recompute some of the forward pointers, after revising our estimate of the signal energy. Relevant if–frames-per-chunk > 0. (int, default = 500)
resample-frequency: Frequency that we down-sample the signal to. Must be
more than twice lowpass-cutoff (float, default = 4000)
simulate-first-pass-online: If true, compute-kaldi-pitch-feats will output features
that correspond to what an online decoder would see in the first pass of decoding– not the final version of the features, which is the default. Relevant if –frames-per-chunk > 0 (bool, default = false)
soft-min-f0: Minimum f0, applied in soft way, must not exceed
min-f0 (float, default = 10)
upsample-filter-width: Integer that determines filter width when upsampling
NCCF (int, default = 5)
Returns:An object of class HParams, which is a set of hyperparameters as name-value pairs.
call(self, audio_data, sample_rate)

Caculate fbank && pitch(concat) features of wav. :param audio_data: the audio signal from which to compute spectrum.

Should be an (1, N) tensor.
Parameters:sample_rate – the samplerate of the signal we working with.
Returns:A tensor with shape (num_frames, dim_features), containing fbank && pitch feature of every frame in speech.
dim(self)