This class does an online version of the cepstral mean and [optionally] variance, but note that this is not equivalent to the offline version. More...
#include <online-feature.h>
Public Member Functions | |
virtual int32 | Dim () const |
virtual bool | IsLastFrame (int32 frame) const |
Returns true if this is the last frame. More... | |
virtual BaseFloat | FrameShiftInSeconds () const |
virtual int32 | NumFramesReady () const |
returns the feature dimension. More... | |
virtual void | GetFrame (int32 frame, VectorBase< BaseFloat > *feat) |
Gets the feature vector for this frame. More... | |
OnlineCmvn (const OnlineCmvnOptions &opts, const OnlineCmvnState &cmvn_state, OnlineFeatureInterface *src) | |
Initializer that sets the cmvn state. More... | |
OnlineCmvn (const OnlineCmvnOptions &opts, OnlineFeatureInterface *src) | |
Initializer that does not set the cmvn state: after calling this, you should call SetState(). More... | |
void | GetState (int32 cur_frame, OnlineCmvnState *cmvn_state) |
void | SetState (const OnlineCmvnState &cmvn_state) |
void | Freeze (int32 cur_frame) |
virtual | ~OnlineCmvn () |
Public Member Functions inherited from OnlineFeatureInterface | |
virtual void | GetFrames (const std::vector< int32 > &frames, MatrixBase< BaseFloat > *feats) |
This is like GetFrame() but for a collection of frames. More... | |
virtual | ~OnlineFeatureInterface () |
Virtual destructor. More... | |
Private Member Functions | |
void | GetMostRecentCachedFrame (int32 frame, int32 *cached_frame, MatrixBase< double > *stats) |
Get the most recent cached frame of CMVN stats. More... | |
void | CacheFrame (int32 frame, const MatrixBase< double > &stats) |
Cache this frame of stats. More... | |
void | InitRingBufferIfNeeded () |
Initialize ring buffer for caching stats. More... | |
void | ComputeStatsForFrame (int32 frame, MatrixBase< double > *stats) |
Computes the raw CMVN stats for this frame, making use of (and updating if necessary) the cached statistics in raw_stats_. More... | |
Static Private Member Functions | |
static void | SmoothOnlineCmvnStats (const MatrixBase< double > &speaker_stats, const MatrixBase< double > &global_stats, const OnlineCmvnOptions &opts, MatrixBase< double > *stats) |
Smooth the CMVN stats "stats" (which are stored in the normal format as a 2 x (dim+1) matrix), by possibly adding some stats from "global_stats" and/or "speaker_stats", controlled by the config. More... | |
Private Attributes | |
OnlineCmvnOptions | opts_ |
std::vector< int32 > | skip_dims_ |
OnlineCmvnState | orig_state_ |
Matrix< double > | frozen_state_ |
std::vector< Matrix< double > * > | cached_stats_modulo_ |
std::vector< std::pair< int32, Matrix< double > > > | cached_stats_ring_ |
Matrix< double > | temp_stats_ |
Vector< BaseFloat > | temp_feats_ |
Vector< double > | temp_feats_dbl_ |
OnlineFeatureInterface * | src_ |
This class does an online version of the cepstral mean and [optionally] variance, but note that this is not equivalent to the offline version.
This is necessarily so, as the offline computation involves looking into the future. If you plan to use features normalized with this type of CMVN then you need to train in a `matched' way, i.e. with the same type of features. We normally only do so in the "online" GMM-based decoding, e.g. in online2bin/online2-wav-gmm-latgen-faster.cc; see also the script steps/online/prepare_online_decoding.sh and steps/online/decode.sh.
In the steady state (in the middle of a long utterance), this class accumulates CMVN statistics from the previous "cmn_window" frames (default 600 frames, or 6 seconds), and uses these to normalize the mean and possibly variance of the current frame.
The config variables "speaker_frames" and "global_frames" relate to what happens at the beginning of the utterance when we have seen fewer than "cmn_window" frames of context, and so might not have very good stats to normalize with. Basically, we first augment any existing stats with up to "speaker_frames" frames of stats from previous utterances of the current speaker, and if this doesn't take us up to the required "cmn_window" frame count, we further augment with up to "global_frames" frames of global stats. The global stats are CMVN stats accumulated from training or testing data, that give us a reasonable source of mean and variance for "typical" data.
Definition at line 321 of file online-feature.h.
OnlineCmvn | ( | const OnlineCmvnOptions & | opts, |
const OnlineCmvnState & | cmvn_state, | ||
OnlineFeatureInterface * | src | ||
) |
Initializer that sets the cmvn state.
If you don't have previous utterances from the same speaker you are supposed to initialize the CMVN state from some global CMVN stats, which you can get from summing all cmvn stats you have in your training data using "sum-matrix". This just gives it a reasonable starting point at the start of the file. If you do have previous utterances from the same speaker or at least a similar environment, you are supposed to initialize it by calling GetState from the previous utterance
Definition at line 238 of file online-feature.cc.
References KALDI_ERR, OnlineCmvn::SetState(), OnlineCmvnOptions::skip_dims, OnlineCmvn::skip_dims_, and kaldi::SplitStringToIntegers().
OnlineCmvn | ( | const OnlineCmvnOptions & | opts, |
OnlineFeatureInterface * | src | ||
) |
Initializer that does not set the cmvn state: after calling this, you should call SetState().
Definition at line 250 of file online-feature.cc.
References KALDI_ERR, OnlineCmvnOptions::skip_dims, OnlineCmvn::skip_dims_, and kaldi::SplitStringToIntegers().
|
virtual |
Definition at line 331 of file online-feature.cc.
References OnlineCmvn::cached_stats_modulo_, and rnnlm::i.
|
private |
Cache this frame of stats.
Definition at line 305 of file online-feature.cc.
References OnlineCmvn::cached_stats_modulo_, OnlineCmvn::cached_stats_ring_, OnlineCmvn::InitRingBufferIfNeeded(), KALDI_ASSERT, KALDI_WARN, OnlineCmvnOptions::modulus, rnnlm::n, and OnlineCmvn::opts_.
Referenced by OnlineCmvn::ComputeStatsForFrame().
|
private |
Computes the raw CMVN stats for this frame, making use of (and updating if necessary) the cached statistics in raw_stats_.
This means the (x, x^2, count) stats for the last up to opts_.cmn_window frames.
Definition at line 337 of file online-feature.cc.
References OnlineCmvn::CacheFrame(), OnlineCmvnOptions::cmn_window, VectorBase< Real >::CopyFromVec(), OnlineCmvn::Dim(), OnlineFeatureInterface::GetFrame(), OnlineCmvn::GetMostRecentCachedFrame(), KALDI_ASSERT, OnlineCmvnOptions::normalize_variance, OnlineCmvn::NumFramesReady(), OnlineCmvn::opts_, MatrixBase< Real >::Row(), OnlineCmvn::src_, OnlineCmvn::temp_feats_, and OnlineCmvn::temp_feats_dbl_.
Referenced by OnlineCmvn::Freeze(), and OnlineCmvn::GetFrame().
|
inlinevirtual |
Implements OnlineFeatureInterface.
Definition at line 327 of file online-feature.h.
Referenced by OnlineCmvn::ComputeStatsForFrame(), OnlineCmvn::Freeze(), OnlineCmvn::GetFrame(), OnlineCmvn::GetState(), and OnlineCmvn::InitRingBufferIfNeeded().
|
inlinevirtual |
Implements OnlineFeatureInterface.
Definition at line 332 of file online-feature.h.
void Freeze | ( | int32 | cur_frame | ) |
Definition at line 454 of file online-feature.cc.
References OnlineCmvn::ComputeStatsForFrame(), OnlineCmvn::Dim(), OnlineCmvn::frozen_state_, OnlineCmvnState::global_cmvn_stats, OnlineCmvn::opts_, OnlineCmvn::orig_state_, OnlineCmvn::SmoothOnlineCmvnStats(), and OnlineCmvnState::speaker_cmvn_stats.
Referenced by OnlineFeaturePipeline::FreezeCmvn().
|
virtual |
Gets the feature vector for this frame.
Before calling this for a given frame, it is assumed that you called NumFramesReady() and it returned a number greater than "frame". Otherwise this call will likely crash with an assert failure. This function is not declared const, in case there is some kind of caching going on, but most of the time it shouldn't modify the class.
Implements OnlineFeatureInterface.
Definition at line 421 of file online-feature.cc.
References kaldi::ApplyCmvn(), OnlineCmvn::ComputeStatsForFrame(), MatrixBase< Real >::CopyFromMat(), VectorBase< Real >::Data(), VectorBase< Real >::Dim(), OnlineCmvn::Dim(), kaldi::FakeStatsForSomeDims(), OnlineCmvn::frozen_state_, OnlineFeatureInterface::GetFrame(), OnlineCmvnState::global_cmvn_stats, KALDI_ASSERT, kaldi::kUndefined, OnlineCmvnOptions::normalize_mean, OnlineCmvnOptions::normalize_variance, MatrixBase< Real >::NumRows(), OnlineCmvn::opts_, OnlineCmvn::orig_state_, Matrix< Real >::Resize(), OnlineCmvn::skip_dims_, OnlineCmvn::SmoothOnlineCmvnStats(), OnlineCmvnState::speaker_cmvn_stats, OnlineCmvn::src_, and OnlineCmvn::temp_stats_.
Referenced by main().
|
private |
Get the most recent cached frame of CMVN stats.
[If no frames were cached, sets up empty stats for frame zero and returns that].
Definition at line 261 of file online-feature.cc.
References OnlineCmvn::cached_stats_modulo_, OnlineCmvn::cached_stats_ring_, MatrixBase< Real >::CopyFromMat(), OnlineCmvn::InitRingBufferIfNeeded(), KALDI_ASSERT, OnlineCmvnOptions::modulus, rnnlm::n, OnlineCmvn::opts_, OnlineCmvnOptions::ring_buffer_size, and MatrixBase< Real >::SetZero().
Referenced by OnlineCmvn::ComputeStatsForFrame().
void GetState | ( | int32 | cur_frame, |
OnlineCmvnState * | cmvn_state | ||
) |
Definition at line 467 of file online-feature.cc.
References VectorBase< Real >::CopyFromVec(), OnlineCmvn::Dim(), OnlineCmvnState::frozen_state, OnlineCmvn::frozen_state_, OnlineFeatureInterface::GetFrame(), MatrixBase< Real >::NumRows(), OnlineCmvn::orig_state_, Matrix< Real >::Resize(), MatrixBase< Real >::Row(), OnlineCmvnState::speaker_cmvn_stats, and OnlineCmvn::src_.
Referenced by OnlineFeaturePipeline::GetCmvnState(), OnlineNnet2FeaturePipeline::GetCmvnState(), and main().
|
inlineprivate |
Initialize ring buffer for caching stats.
Definition at line 297 of file online-feature.cc.
References OnlineCmvn::cached_stats_ring_, OnlineCmvn::Dim(), OnlineCmvn::opts_, and OnlineCmvnOptions::ring_buffer_size.
Referenced by OnlineCmvn::CacheFrame(), and OnlineCmvn::GetMostRecentCachedFrame().
Returns true if this is the last frame.
Frame indices are zero-based, so the first frame is zero. IsLastFrame(-1) will return false, unless the file is empty (which is a case that I'm not sure all the code will handle, so be careful). This function may return false for some frame if we haven't yet decided to terminate decoding, but later true if we decide to terminate decoding. This function exists mainly to correctly handle end effects in feature extraction, and is not a mechanism to determine how many frames are in the decodable object (as it used to be, and for backward compatibility, still is, in the Decodable interface).
Implements OnlineFeatureInterface.
Definition at line 329 of file online-feature.h.
|
inlinevirtual |
returns the feature dimension.
Returns the total number of frames, since the start of the utterance, that are now available. In an online-decoding context, this will likely increase with time as more data becomes available.
Implements OnlineFeatureInterface.
Definition at line 337 of file online-feature.h.
Referenced by OnlineCmvn::ComputeStatsForFrame(), OnlineFeaturePipeline::FreezeCmvn(), OnlineFeaturePipeline::GetCmvnState(), OnlineNnet2FeaturePipeline::GetCmvnState(), and OnlineSpliceFrames::GetFrame().
void SetState | ( | const OnlineCmvnState & | cmvn_state | ) |
Definition at line 489 of file online-feature.cc.
References OnlineCmvn::cached_stats_modulo_, OnlineCmvnState::frozen_state, OnlineCmvn::frozen_state_, KALDI_ASSERT, and OnlineCmvn::orig_state_.
Referenced by OnlineCmvn::OnlineCmvn(), OnlineIvectorFeature::SetAdaptationState(), OnlineFeaturePipeline::SetCmvnState(), and OnlineNnet2FeaturePipeline::SetCmvnState().
|
staticprivate |
Smooth the CMVN stats "stats" (which are stored in the normal format as a 2 x (dim+1) matrix), by possibly adding some stats from "global_stats" and/or "speaker_stats", controlled by the config.
The best way to understand the smoothing rule we use is just to look at the code.
Definition at line 372 of file online-feature.cc.
References MatrixBase< Real >::AddMat(), OnlineCmvnOptions::cmn_window, OnlineCmvnOptions::global_frames, KALDI_ASSERT, KALDI_ERR, OnlineCmvnOptions::normalize_variance, MatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), MatrixBase< Real >::RowRange(), and OnlineCmvnOptions::speaker_frames.
Referenced by OnlineCmvn::Freeze(), and OnlineCmvn::GetFrame().
|
private |
Definition at line 431 of file online-feature.h.
Referenced by OnlineCmvn::CacheFrame(), OnlineCmvn::GetMostRecentCachedFrame(), OnlineCmvn::SetState(), and OnlineCmvn::~OnlineCmvn().
Definition at line 434 of file online-feature.h.
Referenced by OnlineCmvn::CacheFrame(), OnlineCmvn::GetMostRecentCachedFrame(), and OnlineCmvn::InitRingBufferIfNeeded().
|
private |
Definition at line 423 of file online-feature.h.
Referenced by OnlineCmvn::Freeze(), OnlineCmvn::GetFrame(), OnlineCmvn::GetState(), and OnlineCmvn::SetState().
|
private |
Definition at line 419 of file online-feature.h.
Referenced by OnlineCmvn::CacheFrame(), OnlineCmvn::ComputeStatsForFrame(), OnlineCmvn::Freeze(), OnlineCmvn::GetFrame(), OnlineCmvn::GetMostRecentCachedFrame(), and OnlineCmvn::InitRingBufferIfNeeded().
|
private |
Definition at line 421 of file online-feature.h.
Referenced by OnlineCmvn::Freeze(), OnlineCmvn::GetFrame(), OnlineCmvn::GetState(), and OnlineCmvn::SetState().
|
private |
Definition at line 420 of file online-feature.h.
Referenced by OnlineCmvn::GetFrame(), and OnlineCmvn::OnlineCmvn().
|
private |
Definition at line 442 of file online-feature.h.
Referenced by OnlineCmvn::ComputeStatsForFrame(), OnlineCmvn::GetFrame(), OnlineSpliceFrames::GetFrame(), OnlineCmvn::GetState(), and OnlineSpliceFrames::NumFramesReady().
Definition at line 439 of file online-feature.h.
Referenced by OnlineCmvn::ComputeStatsForFrame().
|
private |
Definition at line 440 of file online-feature.h.
Referenced by OnlineCmvn::ComputeStatsForFrame().
|
private |
Definition at line 438 of file online-feature.h.
Referenced by OnlineCmvn::GetFrame().