`athena.transform.feats.fbank_pitch`¶

This model extracts Fbank && Pitch features per frame.

Module Contents¶

Classes¶

FbankPitch Compute Fbank && Pitch features respectively，and concate them. Return

class athena.transform.feats.fbank_pitch.FbankPitch(config: dict)¶

Bases: athena.transform.feats.base_frontend.BaseFrontend

Compute Fbank && Pitch features respectively，and concate them. Return a tensor with shape (num_frames, dim_features).

classmethod params(cls, config=None)¶

Set params. :param config: contains twenty-nine optional parameters:t

window_length: Window length in seconds. (float, default = 0.025) frame_length: Hop length in seconds. (float, default = 0.010) snip_edges: If 1, the last frame (shorter than window_length) will

be cutoff. If 2, 1 // 2 frame_length data will be padded to data. (int, default = 1)

raw_energy: If 1, compute frame energy before preemphasis and

windowing. If 2, compute frame energy after preemphasis

and windowing. (int, default = 1)

preEph_coeff: Coefficient for use in frame-signal preemphasis.

(float, default = 0.97)

window_type: Type of window (“hamm”|”hann”|”povey”|”rect”|”blac”|”tria”).

(string, default = “povey”)

remove_dc_offset: Subtract mean from waveform on each frame.

(bool, default = true)

is_fbank: If true, compute power spetrum without frame

energy. If false, using the frame energy instead of the square of the constant component of the signal. (bool, default = true)

output_type: If 1, return power spectrum. If 2, return

log-power spectrum. (int, default = 1)

upper_frequency_limit: High cutoff frequency for mel bins.

(if <= 0, offset from Nyquist) (float, default = 0)

lower_frequency_limit: Low cutoff frequency for mel bins.

(float, default = 20)

filterbank_channel_count: Number of triangular mel-frequency bins.

(float, default = 23)

dither: Dithering constant (0.0 means no dither).

(float, default = 1)[add robust to training]

delta-pitch: Smallest relative change in pitch that our

algorithm measures. (float, default = 0.005)

frames-per-chunk: Only relevant for offline pitch extraction.

(e.g. compute-kaldi-pitch-feats), you can set it to a small nonzero value, such as 10, for better feature compatibility with online decoding (affects energy normalization in the algorithm) (int, default = 0)

lowpass-cutoff: cutoff frequency for LowPass filter (Hz).

(float, default = 1000)

lowpass-filter-width: Integer that determines filter width of lowpass filter,

more gives sharper filter (int, default = 1)

max-f0: max. F0 to search for (Hz) (float, default = 400) max-frames-latency: Maximum number of frames of latency that we allow pitch

tracking to introduce into the feature processing

(affects output only if –frames-per-chunk > 0 and –simulate-first-pass-online=true (int, default = 0)

min-f0: min. F0 to search for (Hz) (float, default = 50) nccf-ballast: Increasing this factor reduces NCCF for quiet frames.

(float, default = 7000)

nccf-ballast-online: This is useful mainly for debug; it affects how the

NCCF ballast is computed. (bool, default = false)

penalty-factor: cost factor for FO change. (float, default = 0.1) preemphasis-coefficient: Coefficient for use in signal preemphasis (deprecated)

(float, default = 0)

recompute-frame: Only relevant for online pitch extraction, or for

compatibility with online pitch extraction. A non-critical parameter; the frame at which we recompute some of the forward pointers, after revising our estimate of the signal energy. Relevant if–frames-per-chunk > 0. (int, default = 500)

resample-frequency: Frequency that we down-sample the signal to. Must be

more than twice lowpass-cutoff (float, default = 4000)

simulate-first-pass-online: If true, compute-kaldi-pitch-feats will output features

that correspond to what an online decoder would see in the first pass of decoding– not the final version of the features, which is the default. Relevant if –frames-per-chunk > 0 (bool, default = false)

soft-min-f0: Minimum f0, applied in soft way, must not exceed

min-f0 (float, default = 10)

upsample-filter-width: Integer that determines filter width when upsampling

NCCF (int, default = 5)

Returns:	An object of class HParams, which is a set of hyperparameters as name-value pairs.

call(self, audio_data, sample_rate)¶

Caculate fbank && pitch(concat) features of wav. :param audio_data: the audio signal from which to compute spectrum.

Should be an (1, N) tensor.

Parameters:	sample_rate – the samplerate of the signal we working with.
Returns:	A tensor with shape (num_frames, dim_features), containing fbank && pitch feature of every frame in speech.

dim(self)¶

athena.transform.feats.fbank_pitch¶

Module Contents¶

Classes¶

`athena.transform.feats.fbank_pitch`¶