athena.transform.feats.pitch¶
This model extracts pitch features per frame.
Module Contents¶
Classes¶
Pitch |
Compute pitch features of every frame in speech, return a float tensor |
-
class
athena.transform.feats.pitch.Pitch(config: dict)¶ Bases:
athena.transform.feats.base_frontend.BaseFrontendCompute pitch features of every frame in speech, return a float tensor with size (num_frames, 2).
-
classmethod
params(cls, config=None)¶ Set params. :param config: contains nineteen optional parameters:
- delta-pitch: Smallest relative change in pitch that our algorithm
- measures (float, default = 0.005)
window_length: Frame length in seconds (float, default = 0.025) frame_length: Frame shift in seconds (float, default = 0.010) frames-per-chunk: Only relevant for offline pitch extraction (e.g.
compute-kaldi-pitch-feats), you can set it to a small nonzero value, such as 10, for better feature compatibility with online decoding (affects energy normalization in the algorithm) (int, default = 0)- lowpass-cutoff : cutoff frequency for LowPass filter (Hz).
- (float, default = 1000)
- lowpass-filter-width: Integer that determines filter width of lowpass filter,
- more gives sharper filter (int, default = 1)
max-f0: max. F0 to search for (Hz) (float, default = 400) max-frames-latency: Maximum number of frames of latency that we allow pitch
tracking to introduce into the feature processing (affects output only if –frames-per-chunk > 0 and –simulate-first-pass-online=true (int, default = 0)min-f0: min. F0 to search for (Hz) (float, default = 50) nccf-ballast: Increasing this factor reduces NCCF for quiet frames.
(float, default = 7000)- nccf-ballast-online: This is useful mainly for debug; it affects how the NCCF
- ballast is computed. (bool, default = false)
penalty-factor: cost factor for FO change. (float, default = 0.1) preemphasis-coefficient: Coefficient for use in signal preemphasis (deprecated).
(float, default = 0)- recompute-frame: Only relevant for online pitch extraction, or for
- compatibility with online pitch extraction. A non-critical parameter; the frame at which we recompute some of the forward pointers, after revising our estimate of the signal energy. Relevant if–frames-per-chunk > 0. (int, default = 500)
- resample-frequency: Frequency that we down-sample the signal to. Must be
- more than twice lowpass-cutoff (float, default = 4000)
- simulate-first-pass-online: If true, compute-kaldi-pitch-feats will output features
- that correspond to what an online decoder would see in the first pass of decoding– not the final version of the features, which is the default. Relevant if –frames-per-chunk > 0 (bool, default = false)
- snip-edges: If this is set to false, the incomplete frames near the
- ending edge won’t be snipped, so that the number of frames is the file size divided by the frame-shift. This makes different types of features give the same number of frames. (bool, default = true)
- soft-min-f0: Minimum f0, applied in soft way, must not exceed min-f0.
- (float, default = 10)
- upsample-filter-width: Integer that determines filter width when upsampling
- NCCF. (int, default = 5)
- add-delta-pitch: If true, time derivative of log-pitch is added to
- output features. (bool, default = true)
- add-pov-feature: If true, the warped NCCF is added to output features.
- (bool, default = true)
- add-raw-log-pitch: If true, log(pitch) is added to output features.
- (bool, default = false)
- delay: Number of frames by which the pitch information is
- delayed. (int, default = 0)
- delta-pitch-noise-stddev: Standard deviation for noise we add to the delta
- log-pitch (before scaling); should be about the same as delta-pitch option to pitch creation. The purpose is to get rid of peaks in the delta-pitch caused by discretization of pitch values. (float, default = 0.005)
- delta-pitch-scale: Term to scale the final delta log-pitch feature.
- (float, default = 10)
- delta-window: Number of frames on each side of central frame,
- to use for delta window. (int, default = 2)
- normalization-left-context: Left-context (in frames) for moving window
- normalization. (int, default = 75)
- normalization-right-context: Right-context (in frames) for moving window
- normalization. (int, default = 75)
- pitch-scale: Scaling factor for the final normalized log-pitch
- value. (float, default = 2)
- pov-offset: This can be used to add an offset to the POV feature.
- Intended for use in online decoding as a substitute for CMN. (float, default = 0)
- pov-scale: Scaling factor for final POV (probability of voicing)
- feature. (float, default = 2)
Returns: An object of class HParams, which is a set of hyperparameters as name-value pairs.
-
call(self, audio_data, sample_rate)¶ Caculate picth features of audio data. :param audio_data: the audio signal from which to compute spectrum.
Should be an (1, N) tensor.Parameters: sample_rate – the samplerate of the signal we working with. Returns: A float tensor of size (num_frames, 2) containing pitch && POV features of every frame in speech.
-
dim(self)¶ dim
-
classmethod