Files | |
file | resample.h |
] | |
Classes | |
struct | ExampleFeatureComputerOptions |
This class is only added for documentation, it is not intended to ever be used. More... | |
class | ExampleFeatureComputer |
This class is only added for documentation, it is not intended to ever be used. More... | |
class | OfflineFeatureTpl< F > |
This templated class is intended for offline feature extraction, i.e. More... | |
struct | FbankOptions |
FbankOptions contains basic options for computing filterbank features. More... | |
class | FbankComputer |
Class for computing mel-filterbank features; see Computing MFCC features for more information. More... | |
struct | DeltaFeaturesOptions |
class | DeltaFeatures |
struct | ShiftedDeltaFeaturesOptions |
class | ShiftedDeltaFeatures |
struct | SlidingWindowCmnOptions |
struct | MfccOptions |
MfccOptions contains basic options for computing MFCC features. More... | |
class | MfccComputer |
struct | PlpOptions |
PlpOptions contains basic options for computing PLP features. More... | |
class | PlpComputer |
This is the new-style interface to the PLP computation. More... | |
struct | SpectrogramOptions |
SpectrogramOptions contains basic options for computing spectrogram features. More... | |
class | SpectrogramComputer |
Class for computing spectrogram features. More... | |
struct | FrameExtractionOptions |
struct | FeatureWindowFunction |
struct | MelBanksOptions |
class | MelBanks |
struct | PitchExtractionOptions |
struct | ProcessPitchOptions |
class | OnlinePitchFeature |
class | OnlineProcessPitch |
This online-feature class implements post processing of pitch features. More... | |
class | ArbitraryResample |
Class ArbitraryResample allows you to resample a signal (assumed zero outside the sample region, not periodic) at arbitrary specified time values, which don't have to be linearly spaced. More... | |
class | LinearResample |
LinearResample is a special case of ArbitraryResample, where we want to resample a signal at linearly spaced intervals (this means we want to upsample or downsample the signal). More... | |
Typedefs | |
typedef OfflineFeatureTpl< FbankComputer > | Fbank |
typedef OfflineFeatureTpl< MfccComputer > | Mfcc |
typedef OfflineFeatureTpl< PlpComputer > | Plp |
typedef OfflineFeatureTpl< SpectrogramComputer > | Spectrogram |
Functions | |
void | ComputePowerSpectrum (VectorBase< BaseFloat > *waveform) |
void | ComputeDeltas (const DeltaFeaturesOptions &delta_opts, const MatrixBase< BaseFloat > &input_features, Matrix< BaseFloat > *output_features) |
void | ComputeShiftedDeltas (const ShiftedDeltaFeaturesOptions &delta_opts, const MatrixBase< BaseFloat > &input_features, Matrix< BaseFloat > *output_features) |
void | SpliceFrames (const MatrixBase< BaseFloat > &input_features, int32 left_context, int32 right_context, Matrix< BaseFloat > *output_features) |
void | ReverseFrames (const MatrixBase< BaseFloat > &input_features, Matrix< BaseFloat > *output_features) |
void | InitIdftBases (int32 n_bases, int32 dimension, Matrix< BaseFloat > *mat_out) |
void | SlidingWindowCmn (const SlidingWindowCmnOptions &opts, const MatrixBase< BaseFloat > &input, MatrixBase< BaseFloat > *output) |
Applies sliding-window cepstral mean and/or variance normalization. More... | |
int32 | NumFrames (int64 num_samples, const FrameExtractionOptions &opts, bool flush=true) |
This function returns the number of frames that we can extract from a wave file with the given number of samples in it (assumed to have the same sampling rate as specified in 'opts'). More... | |
int64 | FirstSampleOfFrame (int32 frame, const FrameExtractionOptions &opts) |
void | Dither (VectorBase< BaseFloat > *waveform, BaseFloat dither_value) |
void | Preemphasize (VectorBase< BaseFloat > *waveform, BaseFloat preemph_coeff) |
void | ProcessWindow (const FrameExtractionOptions &opts, const FeatureWindowFunction &window_function, VectorBase< BaseFloat > *window, BaseFloat *log_energy_pre_window=NULL) |
This function does all the windowing steps after actually extracting the windowed signal: depending on the configuration, it does dithering, dc offset removal, preemphasis, and multiplication by the windowing function. More... | |
void | ExtractWindow (int64 sample_offset, const VectorBase< BaseFloat > &wave, int32 f, const FrameExtractionOptions &opts, const FeatureWindowFunction &window_function, Vector< BaseFloat > *window, BaseFloat *log_energy_pre_window) |
void | ComputeLifterCoeffs (BaseFloat Q, VectorBase< BaseFloat > *coeffs) |
BaseFloat | Durbin (int n, const BaseFloat *pAC, BaseFloat *pLP, BaseFloat *pTmp) |
BaseFloat | ComputeLpc (const VectorBase< BaseFloat > &autocorr_in, Vector< BaseFloat > *lpc_out) |
void | Lpc2Cepstrum (int n, const BaseFloat *pLPC, BaseFloat *pCepst) |
void | GetEqualLoudnessVector (const MelBanks &mel_banks, Vector< BaseFloat > *ans) |
void | ComputeKaldiPitch (const PitchExtractionOptions &opts, const VectorBase< BaseFloat > &wave, Matrix< BaseFloat > *output) |
This function extracts (pitch, NCCF) per frame, using the pitch extraction method described in "A Pitch Extraction Algorithm Tuned for Automatic Speech
Recognition", Pegah Ghahremani, Bagher BabaAli, Daniel Povey, Korbinian Riedhammer, Jan Trmal and Sanjeev Khudanpur, ICASSP 2014. More... | |
void | ProcessPitch (const ProcessPitchOptions &opts, const MatrixBase< BaseFloat > &input, Matrix< BaseFloat > *output) |
This function processes the raw (NCCF, pitch) quantities computed by ComputeKaldiPitch, and processes them into features. More... | |
void | ComputeAndProcessKaldiPitch (const PitchExtractionOptions &pitch_opts, const ProcessPitchOptions &process_opts, const VectorBase< BaseFloat > &wave, Matrix< BaseFloat > *output) |
This function combines ComputeKaldiPitch and ProcessPitch. More... | |
void | ResampleWaveform (BaseFloat orig_freq, const VectorBase< BaseFloat > &wave, BaseFloat new_freq, Vector< BaseFloat > *new_wave) |
Downsample or upsample a waveform. More... | |
void | DownsampleWaveForm (BaseFloat orig_freq, const VectorBase< BaseFloat > &wave, BaseFloat new_freq, Vector< BaseFloat > *new_wave) |
This function is deprecated. More... | |
typedef OfflineFeatureTpl<FbankComputer> Fbank |
Definition at line 143 of file feature-fbank.h.
typedef OfflineFeatureTpl<MfccComputer> Mfcc |
Definition at line 147 of file feature-mfcc.h.
typedef OfflineFeatureTpl<PlpComputer> Plp |
Definition at line 169 of file feature-plp.h.
Definition at line 122 of file feature-spectrogram.h.
void ComputeAndProcessKaldiPitch | ( | const PitchExtractionOptions & | pitch_opts, |
const ProcessPitchOptions & | process_opts, | ||
const VectorBase< BaseFloat > & | wave, | ||
Matrix< BaseFloat > * | output | ||
) |
This function combines ComputeKaldiPitch and ProcessPitch.
The reason why we need a separate function to do this is in order to be able to accurately simulate the online pitch-processing, for testing and for training models matched to the "first-pass" features. It is sensitive to the variables in pitch_opts that relate to online processing, i.e. max_frames_latency, frames_per_chunk, simulate_first_pass_online, recompute_frame.
Definition at line 1597 of file pitch-functions.cc.
References OnlinePitchFeature::AcceptWaveform(), VectorBase< Real >::Dim(), OnlineProcessPitch::Dim(), PitchExtractionOptions::frame_shift_ms, PitchExtractionOptions::frames_per_chunk, OnlineProcessPitch::GetFrame(), OnlinePitchFeature::InputFinished(), KALDI_ASSERT, KALDI_WARN, kaldi::kCopyData, OnlineProcessPitch::NumFramesReady(), Matrix< Real >::Resize(), MatrixBase< Real >::RowRange(), PitchExtractionOptions::samp_freq, and PitchExtractionOptions::simulate_first_pass_online.
Referenced by main(), kaldi::UnitTestSimple(), and kaldi::UnitTestSnipEdges().
void ComputeDeltas | ( | const DeltaFeaturesOptions & | delta_opts, |
const MatrixBase< BaseFloat > & | input_features, | ||
Matrix< BaseFloat > * | output_features | ||
) |
Definition at line 160 of file feature-functions.cc.
References MatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), DeltaFeaturesOptions::order, DeltaFeatures::Process(), and Matrix< Real >::Resize().
Referenced by OnlineProcessPitch::GetDeltaPitchFeature(), main(), kaldi::TestOnlineDeltaFeature(), kaldi::TestOnlineDeltaInput(), UnitTestCompareWithDeltaFeatures(), UnitTestHTKCompare1(), UnitTestHTKCompare2(), UnitTestHTKCompare3(), UnitTestHTKCompare4(), UnitTestHTKCompare5(), and UnitTestHTKCompare6().
void ComputeKaldiPitch | ( | const PitchExtractionOptions & | opts, |
const VectorBase< BaseFloat > & | wave, | ||
Matrix< BaseFloat > * | output | ||
) |
This function extracts (pitch, NCCF) per frame, using the pitch extraction method described in "A Pitch Extraction Algorithm Tuned for Automatic Speech Recognition", Pegah Ghahremani, Bagher BabaAli, Daniel Povey, Korbinian Riedhammer, Jan Trmal and Sanjeev Khudanpur, ICASSP 2014.
The output will have as many rows as there are frames, and two columns corresponding to (NCCF, pitch)
Definition at line 1291 of file pitch-functions.cc.
References OnlinePitchFeature::AcceptWaveform(), kaldi::ComputeKaldiPitchFirstPass(), VectorBase< Real >::Dim(), PitchExtractionOptions::frame_shift_ms, PitchExtractionOptions::frames_per_chunk, OnlinePitchFeature::GetFrame(), OnlinePitchFeature::InputFinished(), KALDI_ASSERT, KALDI_WARN, OnlinePitchFeature::NumFramesReady(), Matrix< Real >::Resize(), PitchExtractionOptions::samp_freq, and PitchExtractionOptions::simulate_first_pass_online.
Referenced by main(), kaldi::UnitTestDiffSampleRate(), kaldi::UnitTestKeele(), kaldi::UnitTestKeeleNccfBallast(), kaldi::UnitTestPenaltyFactor(), kaldi::UnitTestPieces(), kaldi::UnitTestPitchExtractionSpeed(), kaldi::UnitTestPitchExtractorCompareKeele(), kaldi::UnitTestProcess(), and kaldi::UnitTestSearch().
void ComputeLifterCoeffs | ( | BaseFloat | Q, |
VectorBase< BaseFloat > * | coeffs | ||
) |
Definition at line 253 of file mel-computations.cc.
References VectorBase< Real >::Dim(), rnnlm::i, and M_PI.
Referenced by MfccComputer::MfccComputer(), and PlpComputer::PlpComputer().
BaseFloat ComputeLpc | ( | const VectorBase< BaseFloat > & | autocorr_in, |
Vector< BaseFloat > * | lpc_out | ||
) |
Definition at line 326 of file mel-computations.cc.
References VectorBase< Real >::Data(), VectorBase< Real >::Dim(), kaldi::Durbin(), KALDI_ASSERT, KALDI_WARN, kaldi::Log(), and rnnlm::n.
Referenced by PlpComputer::Compute().
void ComputePowerSpectrum | ( | VectorBase< BaseFloat > * | waveform | ) |
Definition at line 29 of file feature-functions.cc.
References VectorBase< Real >::Dim(), and rnnlm::i.
Referenced by SpectrogramComputer::Compute(), MfccComputer::Compute(), FbankComputer::Compute(), and PlpComputer::Compute().
void ComputeShiftedDeltas | ( | const ShiftedDeltaFeaturesOptions & | delta_opts, |
const MatrixBase< BaseFloat > & | input_features, | ||
Matrix< BaseFloat > * | output_features | ||
) |
Definition at line 173 of file feature-functions.cc.
References ShiftedDeltaFeaturesOptions::num_blocks, MatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), ShiftedDeltaFeatures::Process(), and Matrix< Real >::Resize().
Referenced by main(), UnitTestCompareWithDeltaFeatures(), UnitTestEndEffects(), and UnitTestParams().
void Dither | ( | VectorBase< BaseFloat > * | waveform, |
BaseFloat | dither_value | ||
) |
Definition at line 90 of file feature-window.cc.
References VectorBase< Real >::Data(), VectorBase< Real >::Dim(), rnnlm::i, and kaldi::RandGauss().
Referenced by kaldi::ProcessWindow().
|
inline |
This function is deprecated.
It is provided for backward compatibility, to avoid breaking older code.
Definition at line 279 of file resample.h.
References kaldi::ResampleWaveform().
Definition at line 267 of file mel-computations.cc.
References rnnlm::i, rnnlm::j, and rnnlm::n.
Referenced by kaldi::ComputeLpc().
void ExtractWindow | ( | int64 | sample_offset, |
const VectorBase< BaseFloat > & | wave, | ||
int32 | f, | ||
const FrameExtractionOptions & | opts, | ||
const FeatureWindowFunction & | window_function, | ||
Vector< BaseFloat > * | window, | ||
BaseFloat * | log_energy_pre_window | ||
) |
Definition at line 166 of file feature-window.cc.
References VectorBase< Real >::Dim(), kaldi::FirstSampleOfFrame(), KALDI_ASSERT, kaldi::kUndefined, FrameExtractionOptions::PaddedWindowSize(), kaldi::ProcessWindow(), VectorBase< Real >::Range(), Vector< Real >::Resize(), FrameExtractionOptions::snip_edges, and FrameExtractionOptions::WindowSize().
Referenced by OfflineFeatureTpl< F >::Compute(), and OnlineGenericBaseFeature< C >::ComputeFeatures().
int64 FirstSampleOfFrame | ( | int32 | frame, |
const FrameExtractionOptions & | opts | ||
) |
Definition at line 30 of file feature-window.cc.
References FrameExtractionOptions::snip_edges, FrameExtractionOptions::WindowShift(), and FrameExtractionOptions::WindowSize().
Referenced by OnlineGenericBaseFeature< C >::ComputeFeatures(), kaldi::ExtractWindow(), and kaldi::NumFrames().
Definition at line 311 of file mel-computations.cc.
References MelBanks::GetCenterFreqs(), rnnlm::i, rnnlm::n, MelBanks::NumBins(), and Vector< Real >::Resize().
Referenced by PlpComputer::GetEqualLoudness().
Definition at line 188 of file feature-functions.cc.
References rnnlm::i, rnnlm::j, M_PI, and Matrix< Real >::Resize().
Referenced by PlpComputer::PlpComputer().
Definition at line 300 of file mel-computations.cc.
References rnnlm::i, rnnlm::j, and rnnlm::n.
Referenced by PlpComputer::Compute().
int32 NumFrames | ( | int64 | num_samples, |
const FrameExtractionOptions & | opts, | ||
bool | flush = true |
||
) |
This function returns the number of frames that we can extract from a wave file with the given number of samples in it (assumed to have the same sampling rate as specified in 'opts').
[in] | num_samples | The number of samples in the wave file. |
[in] | opts | The frame-extraction options class |
[in] | flush | True if we are asserting that this number of samples is 'all there is', false if we expecting more data to possibly come in. This only makes a difference to the answer if opts.snips_edges == false. For offline feature extraction you always want flush == true. In an online-decoding context, once you know (or decide) that no more data is coming in, you'd call it with flush == true at the end to flush out any remaining data. |
Definition at line 42 of file feature-window.cc.
References kaldi::FirstSampleOfFrame(), FrameExtractionOptions::snip_edges, FrameExtractionOptions::WindowShift(), and FrameExtractionOptions::WindowSize().
Referenced by OfflineFeatureTpl< F >::Compute(), OnlineFeInput< E >::Compute(), OnlineGenericBaseFeature< C >::ComputeFeatures(), and OnlineIvectorFeature::PrintDiagnostics().
void Preemphasize | ( | VectorBase< BaseFloat > * | waveform, |
BaseFloat | preemph_coeff | ||
) |
Definition at line 101 of file feature-window.cc.
References VectorBase< Real >::Dim(), rnnlm::i, and KALDI_ASSERT.
Referenced by kaldi::ProcessWindow().
void ProcessPitch | ( | const ProcessPitchOptions & | opts, |
const MatrixBase< BaseFloat > & | input, | ||
Matrix< BaseFloat > * | output | ||
) |
This function processes the raw (NCCF, pitch) quantities computed by ComputeKaldiPitch, and processes them into features.
By default it will output three-dimensional features, (POV-feature, mean-subtracted-log-pitch, delta-of-raw-pitch), but this is configurable in the options. The number of rows of "output" will be the number of frames (rows) in "input", and the number of columns will be the number of different types of features requested (by default, 3; 4 is the max). The four config variables –add-pov-feature, –add-normalized-log-pitch, –add-delta-pitch, –add-raw-log-pitch determine which features we create; by default we create the first three.
Definition at line 1581 of file pitch-functions.cc.
References OnlineProcessPitch::Dim(), OnlineProcessPitch::GetFrame(), OnlineProcessPitch::NumFramesReady(), and Matrix< Real >::Resize().
Referenced by main(), kaldi::UnitTestPieces(), and kaldi::UnitTestProcess().
void ProcessWindow | ( | const FrameExtractionOptions & | opts, |
const FeatureWindowFunction & | window_function, | ||
VectorBase< BaseFloat > * | window, | ||
BaseFloat * | log_energy_pre_window = NULL |
||
) |
This function does all the windowing steps after actually extracting the windowed signal: depending on the configuration, it does dithering, dc offset removal, preemphasis, and multiplication by the windowing function.
[in] | opts | The options class to be used |
[in] | window_function | The windowing function– should have been initialized using 'opts'. |
[in,out] | window | A vector of size opts.WindowSize(). Note: it will typically be a sub-vector of a larger vector of size opts.PaddedWindowSize(), with the remaining samples zero, as the FFT code is more efficient if it operates on data with power-of-two size. |
[out] | log_energy_pre_window | If non-NULL, then after dithering and DC offset removal, this function will write to this pointer the log of the total energy (i.e. sum-squared) of the frame. |
Definition at line 137 of file feature-window.cc.
References VectorBase< Real >::Add(), VectorBase< Real >::Dim(), FrameExtractionOptions::dither, kaldi::Dither(), KALDI_ASSERT, kaldi::Log(), VectorBase< Real >::MulElements(), FrameExtractionOptions::preemph_coeff, kaldi::Preemphasize(), FrameExtractionOptions::remove_dc_offset, VectorBase< Real >::Sum(), kaldi::VecVec(), FeatureWindowFunction::window, and FrameExtractionOptions::WindowSize().
Referenced by kaldi::ExtractWindow().
void ResampleWaveform | ( | BaseFloat | orig_freq, |
const VectorBase< BaseFloat > & | wave, | ||
BaseFloat | new_freq, | ||
Vector< BaseFloat > * | new_wave | ||
) |
Downsample or upsample a waveform.
This is a convenience wrapper for the class 'LinearResample'. The low-pass filter cutoff used in 'LinearResample' is 0.99 of the Nyquist, where the Nyquist is half of the minimum of (orig_freq, new_freq). The resampling is done with a symmetric FIR filter with N_z (number of zeros) as 6.
We compared the downsampling results with those from the sox resampling toolkit. Sox's design is inspired by Laurent De Soras' paper, https://ccrma.stanford.edu/~jos/resample/Implementation.html
Note: we expect that while orig_freq and new_freq are of type BaseFloat, they are actually required to have exact integer values (like 16000 or 8000) with a ratio between them that can be expressed as a rational number with reasonably small integer factors.
Definition at line 368 of file resample.cc.
References LinearResample::Resample().
Referenced by OfflineFeatureTpl< F >::ComputeFeatures(), and kaldi::DownsampleWaveForm().
void ReverseFrames | ( | const MatrixBase< BaseFloat > & | input_features, |
Matrix< BaseFloat > * | output_features | ||
) |
Definition at line 228 of file feature-functions.cc.
References VectorBase< Real >::CopyFromVec(), KALDI_ERR, MatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), and Matrix< Real >::Resize().
void SlidingWindowCmn | ( | const SlidingWindowCmnOptions & | opts, |
const MatrixBase< BaseFloat > & | input, | ||
MatrixBase< BaseFloat > * | output | ||
) |
Applies sliding-window cepstral mean and/or variance normalization.
See the strings registering the options in the options class for information on how this works and what the options are. input and output must have the same dimension.
Definition at line 350 of file feature-functions.cc.
References MatrixBase< Real >::CopyFromMat(), KALDI_ASSERT, MatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), kaldi::SameDim(), and kaldi::SlidingWindowCmnInternal().
Referenced by main(), SlidingWindowCmnOptions::Register(), and kaldi::UnitTestOnlineCmvn().
void SpliceFrames | ( | const MatrixBase< BaseFloat > & | input_features, |
int32 | left_context, | ||
int32 | right_context, | ||
Matrix< BaseFloat > * | output_features | ||
) |
Definition at line 205 of file feature-functions.cc.
References rnnlm::j, KALDI_ASSERT, KALDI_ERR, MatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), and Matrix< Real >::Resize().
Referenced by OnlineLdaInput::Dim(), main(), kaldi::TestOnlineLdaInput(), and kaldi::TestOnlineSpliceFrames().