OnlineNnet2FeaturePipeline is a class that's responsible for putting together the various parts of the feature-processing pipeline for neural networks, in an online setting. More...
#include <online-nnet2-feature-pipeline.h>
Public Member Functions | |
OnlineNnet2FeaturePipeline (const OnlineNnet2FeaturePipelineInfo &info) | |
Constructor from the "info" object. More... | |
virtual int32 | Dim () const |
Member functions from OnlineFeatureInterface: More... | |
virtual bool | IsLastFrame (int32 frame) const |
Returns true if this is the last frame. More... | |
virtual int32 | NumFramesReady () const |
returns the feature dimension. More... | |
virtual void | GetFrame (int32 frame, VectorBase< BaseFloat > *feat) |
Gets the feature vector for this frame. More... | |
void | UpdateFrameWeights (const std::vector< std::pair< int32, BaseFloat > > &delta_weights) |
If you are downweighting silence, you can call OnlineSilenceWeighting::GetDeltaWeights and supply the output to this class using UpdateFrameWeights(). More... | |
void | SetAdaptationState (const OnlineIvectorExtractorAdaptationState &adaptation_state) |
Set the adaptation state to a particular value, e.g. More... | |
void | GetAdaptationState (OnlineIvectorExtractorAdaptationState *adaptation_state) const |
Get the adaptation state; you may want to call this before destroying this object, to get adaptation state that can be used to improve decoding of later utterances of this speaker. More... | |
void | SetCmvnState (const OnlineCmvnState &cmvn_state) |
Set the CMVN state to a particular value. More... | |
void | GetCmvnState (OnlineCmvnState *cmvn_state) |
void | AcceptWaveform (BaseFloat sampling_rate, const VectorBase< BaseFloat > &waveform) |
Accept more data to process. More... | |
BaseFloat | FrameShiftInSeconds () const |
void | InputFinished () |
If you call InputFinished(), it tells the class you won't be providing any more waveform. More... | |
OnlineIvectorFeature * | IvectorFeature () |
This function returns the iVector-extracting part of the feature pipeline (or NULL if iVectors are not being used); the pointer ownership is retained by this object and not transferred to the caller. More... | |
const OnlineIvectorFeature * | IvectorFeature () const |
A const accessor for the iVector extractor. More... | |
OnlineFeatureInterface * | InputFeature () |
This function returns the part of the feature pipeline that would be given as the primary (non-iVector) input to the neural network in nnet3 applications. More... | |
virtual | ~OnlineNnet2FeaturePipeline () |
Public Member Functions inherited from OnlineFeatureInterface | |
virtual void | GetFrames (const std::vector< int32 > &frames, MatrixBase< BaseFloat > *feats) |
This is like GetFrame() but for a collection of frames. More... | |
virtual | ~OnlineFeatureInterface () |
Virtual destructor. More... | |
Private Attributes | |
const OnlineNnet2FeaturePipelineInfo & | info_ |
OnlineBaseFeature * | base_feature_ |
OnlinePitchFeature * | pitch_ |
MFCC/PLP/filterbank. More... | |
OnlineProcessPitch * | pitch_feature_ |
Raw pitch, if used. More... | |
OnlineCmvn * | cmvn_feature_ |
Processed pitch, if pitch used. More... | |
Matrix< BaseFloat > | lda_mat_ |
Matrix< double > | global_cmvn_stats_ |
LDA matrix, if supplied. More... | |
OnlineFeatureInterface * | feature_plus_optional_pitch_ |
Global CMVN stats. More... | |
OnlineFeatureInterface * | feature_plus_optional_cmvn_ |
feature_plus_optional_cmvn_ is the feature_plus_optional_pitch_ transformed with OnlineCmvn if cmvn is active; otherwise, points to the same address as feature_plus_optional_pitch_. More... | |
OnlineIvectorFeature * | ivector_feature_ |
OnlineFeatureInterface * | nnet3_feature_ |
iVector feature, if used. More... | |
OnlineFeatureInterface * | final_feature_ |
final_feature_ is feature_plus_optional_cmvn_ appended (OnlineAppendFeature) with ivector_feature_, if ivector_feature_ is used; otherwise, points to the same address as feature_plus_optional_pitch_. More... | |
int32 | dim_ |
we cache the feature dimension, to save time when calling Dim(). More... | |
OnlineNnet2FeaturePipeline is a class that's responsible for putting together the various parts of the feature-processing pipeline for neural networks, in an online setting.
The recipe here does not include fMLLR; instead, it assumes we're giving raw features such as MFCC or PLP or filterbank (with no CMVN) to the neural network, and optionally augmenting these with an iVector that describes the speaker characteristics. The iVector is extracted using class OnlineIvectorFeature (see that class for more info on how it's done). No splicing is currently done in this code, as we're currently only supporting the nnet2 neural network in which the splicing is done inside the network. Probably our strategy for nnet1 network conversion would be to convert to nnet2 and just add layers to do the splicing.
Definition at line 198 of file online-nnet2-feature-pipeline.h.
|
explicit |
Constructor from the "info" object.
The main feature extraction pipeline is constructed in this constructor.
After calling this for a non-initial utterance of a speaker, you may want to call SetAdaptationState().
Definition at line 90 of file online-nnet2-feature-pipeline.cc.
References OnlineNnet2FeaturePipelineInfo::add_pitch, OnlineNnet2FeaturePipeline::base_feature_, OnlineNnet2FeaturePipeline::cmvn_feature_, OnlineNnet2FeaturePipelineInfo::cmvn_opts, OnlineFeatureInterface::Dim(), OnlineNnet2FeaturePipeline::dim_, OnlineNnet2FeaturePipelineInfo::fbank_opts, OnlineNnet2FeaturePipeline::feature_plus_optional_cmvn_, OnlineNnet2FeaturePipeline::feature_plus_optional_pitch_, OnlineNnet2FeaturePipelineInfo::feature_type, OnlineNnet2FeaturePipeline::final_feature_, OnlineNnet2FeaturePipeline::global_cmvn_stats_, OnlineNnet2FeaturePipelineInfo::global_cmvn_stats_rxfilename, OnlineNnet2FeaturePipeline::info_, OnlineNnet2FeaturePipelineInfo::ivector_extractor_info, OnlineNnet2FeaturePipeline::ivector_feature_, KALDI_ASSERT, KALDI_ERR, OnlineNnet2FeaturePipelineInfo::mfcc_opts, OnlineNnet2FeaturePipeline::nnet3_feature_, OnlineNnet2FeaturePipeline::pitch_, OnlineNnet2FeaturePipeline::pitch_feature_, OnlineNnet2FeaturePipelineInfo::pitch_opts, OnlineNnet2FeaturePipelineInfo::pitch_process_opts, OnlineNnet2FeaturePipelineInfo::plp_opts, kaldi::ReadKaldiObject(), OnlineNnet2FeaturePipelineInfo::use_cmvn, and OnlineNnet2FeaturePipelineInfo::use_ivectors.
|
virtual |
Definition at line 201 of file online-nnet2-feature-pipeline.cc.
References OnlineNnet2FeaturePipeline::base_feature_, OnlineNnet2FeaturePipeline::cmvn_feature_, OnlineNnet2FeaturePipeline::feature_plus_optional_cmvn_, OnlineNnet2FeaturePipeline::feature_plus_optional_pitch_, OnlineNnet2FeaturePipeline::final_feature_, OnlineNnet2FeaturePipeline::ivector_feature_, OnlineNnet2FeaturePipeline::pitch_, and OnlineNnet2FeaturePipeline::pitch_feature_.
void AcceptWaveform | ( | BaseFloat | sampling_rate, |
const VectorBase< BaseFloat > & | waveform | ||
) |
Accept more data to process.
It won't actually process it until you call GetFrame() [probably indirectly via (decoder).AdvanceDecoding()], when you call this function it will just copy it). sampling_rate is necessary just to assert it equals what's in the config.
Definition at line 217 of file online-nnet2-feature-pipeline.cc.
References OnlineBaseFeature::AcceptWaveform(), OnlinePitchFeature::AcceptWaveform(), OnlineNnet2FeaturePipeline::base_feature_, and OnlineNnet2FeaturePipeline::pitch_.
Referenced by SingleUtteranceNnet2DecoderThreaded::FeatureComputation(), and main().
|
virtual |
Member functions from OnlineFeatureInterface:
^-^
Dim() will return the base-feature dimension (e.g. 13 for normal MFCC); plus the pitch-feature dimension (e.g. 3), if used; plus the iVector dimension, if used. Any frame-splicing happens inside the neural-network code.
Implements OnlineFeatureInterface.
Definition at line 149 of file online-nnet2-feature-pipeline.cc.
References OnlineNnet2FeaturePipeline::dim_.
Referenced by SingleUtteranceNnet2DecoderThreaded::RunNnetEvaluationInternal().
|
inlinevirtual |
Implements OnlineFeatureInterface.
Definition at line 257 of file online-nnet2-feature-pipeline.h.
Referenced by SingleUtteranceNnet2DecoderThreaded::EndpointDetected(), SingleUtteranceNnet2DecoderThreaded::FeatureComputation(), SingleUtteranceNnet2DecoderThreaded::GetRemainingWaveform(), and SingleUtteranceNnet2DecoderThreaded::NumFramesReceivedApprox().
void GetAdaptationState | ( | OnlineIvectorExtractorAdaptationState * | adaptation_state | ) | const |
Get the adaptation state; you may want to call this before destroying this object, to get adaptation state that can be used to improve decoding of later utterances of this speaker.
You might not want to do this, though, if you have reason to believe that something went wrong in the recognition (e.g., low confidence).
Definition at line 177 of file online-nnet2-feature-pipeline.cc.
References OnlineIvectorFeature::GetAdaptationState(), OnlineNnet2FeaturePipeline::info_, OnlineNnet2FeaturePipeline::ivector_feature_, and OnlineNnet2FeaturePipelineInfo::use_ivectors.
Referenced by SingleUtteranceNnet2DecoderThreaded::GetAdaptationState().
void GetCmvnState | ( | OnlineCmvnState * | cmvn_state | ) |
Definition at line 191 of file online-nnet2-feature-pipeline.cc.
References OnlineNnet2FeaturePipeline::cmvn_feature_, OnlineCmvn::GetState(), and OnlineCmvn::NumFramesReady().
Referenced by SingleUtteranceNnet2DecoderThreaded::GetCmvnState().
|
virtual |
Gets the feature vector for this frame.
Before calling this for a given frame, it is assumed that you called NumFramesReady() and it returned a number greater than "frame". Otherwise this call will likely crash with an assert failure. This function is not declared const, in case there is some kind of caching going on, but most of the time it shouldn't modify the class.
Implements OnlineFeatureInterface.
Definition at line 159 of file online-nnet2-feature-pipeline.cc.
References OnlineNnet2FeaturePipeline::final_feature_, and OnlineFeatureInterface::GetFrame().
Referenced by SingleUtteranceNnet2DecoderThreaded::RunNnetEvaluationInternal().
|
inline |
This function returns the part of the feature pipeline that would be given as the primary (non-iVector) input to the neural network in nnet3 applications.
Definition at line 284 of file online-nnet2-feature-pipeline.h.
void InputFinished | ( | ) |
If you call InputFinished(), it tells the class you won't be providing any more waveform.
This will help flush out the last few frames of delta or LDA features, and finalize the pitch features (making them more accurate)... although since in neural-net decoding we don't anticipate rescoring the lattices, this may not be much of an issue.
Definition at line 225 of file online-nnet2-feature-pipeline.cc.
References OnlineNnet2FeaturePipeline::base_feature_, OnlineBaseFeature::InputFinished(), OnlinePitchFeature::InputFinished(), and OnlineNnet2FeaturePipeline::pitch_.
Referenced by SingleUtteranceNnet2DecoderThreaded::FeatureComputation(), and main().
Returns true if this is the last frame.
Frame indices are zero-based, so the first frame is zero. IsLastFrame(-1) will return false, unless the file is empty (which is a case that I'm not sure all the code will handle, so be careful). This function may return false for some frame if we haven't yet decided to terminate decoding, but later true if we decide to terminate decoding. This function exists mainly to correctly handle end effects in feature extraction, and is not a mechanism to determine how many frames are in the decodable object (as it used to be, and for backward compatibility, still is, in the Decodable interface).
Implements OnlineFeatureInterface.
Definition at line 151 of file online-nnet2-feature-pipeline.cc.
References OnlineNnet2FeaturePipeline::final_feature_, and OnlineFeatureInterface::IsLastFrame().
Referenced by SingleUtteranceNnet2DecoderThreaded::FeatureComputation(), and SingleUtteranceNnet2DecoderThreaded::RunNnetEvaluationInternal().
|
inline |
This function returns the iVector-extracting part of the feature pipeline (or NULL if iVectors are not being used); the pointer ownership is retained by this object and not transferred to the caller.
This function is used in nnet3, and also in the silence-weighting code used to exclude silence from the iVector estimation.
Definition at line 271 of file online-nnet2-feature-pipeline.h.
Referenced by main(), SingleUtteranceNnet2DecoderThreaded::RunNnetEvaluationInternal(), and OnlineNnet2FeaturePipeline::UpdateFrameWeights().
|
inline |
A const accessor for the iVector extractor.
Returns NULL if iVectors are not being used.
Definition at line 277 of file online-nnet2-feature-pipeline.h.
|
virtual |
returns the feature dimension.
Returns the total number of frames, since the start of the utterance, that are now available. In an online-decoding context, this will likely increase with time as more data becomes available.
Implements OnlineFeatureInterface.
Definition at line 155 of file online-nnet2-feature-pipeline.cc.
References OnlineNnet2FeaturePipeline::final_feature_, and OnlineFeatureInterface::NumFramesReady().
Referenced by SingleUtteranceNnet2DecoderThreaded::FeatureComputation(), main(), and SingleUtteranceNnet2DecoderThreaded::RunNnetEvaluationInternal().
void SetAdaptationState | ( | const OnlineIvectorExtractorAdaptationState & | adaptation_state | ) |
Set the adaptation state to a particular value, e.g.
reflecting previous utterances of the same speaker; this will generally be called after Copy().
Definition at line 169 of file online-nnet2-feature-pipeline.cc.
References OnlineNnet2FeaturePipeline::info_, OnlineNnet2FeaturePipeline::ivector_feature_, OnlineIvectorFeature::SetAdaptationState(), and OnlineNnet2FeaturePipelineInfo::use_ivectors.
Referenced by SingleUtteranceNnet2DecoderThreaded::SingleUtteranceNnet2DecoderThreaded().
void SetCmvnState | ( | const OnlineCmvnState & | cmvn_state | ) |
Set the CMVN state to a particular value.
(for features on nnet3 input, not the i-vector input).
Definition at line 185 of file online-nnet2-feature-pipeline.cc.
References OnlineNnet2FeaturePipeline::cmvn_feature_, and OnlineCmvn::SetState().
Referenced by SingleUtteranceNnet2DecoderThreaded::SingleUtteranceNnet2DecoderThreaded().
If you are downweighting silence, you can call OnlineSilenceWeighting::GetDeltaWeights and supply the output to this class using UpdateFrameWeights().
The reason why this call happens outside this class, rather than this class pulling in the data weights, relates to multi-threaded operation and also from not wanting this class to have excessive dependencies.
You must either always call this as soon as new data becomes available, ideally just after calling AcceptWaveform(), or never call it for the lifetime of this object.
Definition at line 164 of file online-nnet2-feature-pipeline.cc.
References OnlineNnet2FeaturePipeline::IvectorFeature(), and OnlineIvectorFeature::UpdateFrameWeights().
Referenced by main().
|
private |
|
private |
Processed pitch, if pitch used.
Definition at line 298 of file online-nnet2-feature-pipeline.h.
Referenced by OnlineNnet2FeaturePipeline::GetCmvnState(), OnlineNnet2FeaturePipeline::OnlineNnet2FeaturePipeline(), OnlineNnet2FeaturePipeline::SetCmvnState(), and OnlineNnet2FeaturePipeline::~OnlineNnet2FeaturePipeline().
|
private |
we cache the feature dimension, to save time when calling Dim().
Definition at line 325 of file online-nnet2-feature-pipeline.h.
Referenced by OnlineNnet2FeaturePipeline::Dim(), and OnlineNnet2FeaturePipeline::OnlineNnet2FeaturePipeline().
|
private |
feature_plus_optional_cmvn_ is the feature_plus_optional_pitch_ transformed with OnlineCmvn if cmvn is active; otherwise, points to the same address as feature_plus_optional_pitch_.
Definition at line 310 of file online-nnet2-feature-pipeline.h.
Referenced by OnlineNnet2FeaturePipeline::OnlineNnet2FeaturePipeline(), and OnlineNnet2FeaturePipeline::~OnlineNnet2FeaturePipeline().
|
private |
Global CMVN stats.
feature_plus_optional_pitch_ is the base_feature_ appended (OnlineAppendFeature) with pitch_feature_, if used; otherwise, points to the same address as base_feature_.
Definition at line 305 of file online-nnet2-feature-pipeline.h.
Referenced by OnlineNnet2FeaturePipeline::OnlineNnet2FeaturePipeline(), and OnlineNnet2FeaturePipeline::~OnlineNnet2FeaturePipeline().
|
private |
final_feature_ is feature_plus_optional_cmvn_ appended (OnlineAppendFeature) with ivector_feature_, if ivector_feature_ is used; otherwise, points to the same address as feature_plus_optional_pitch_.
Definition at line 322 of file online-nnet2-feature-pipeline.h.
Referenced by OnlineNnet2FeaturePipeline::GetFrame(), OnlineNnet2FeaturePipeline::IsLastFrame(), OnlineNnet2FeaturePipeline::NumFramesReady(), OnlineNnet2FeaturePipeline::OnlineNnet2FeaturePipeline(), and OnlineNnet2FeaturePipeline::~OnlineNnet2FeaturePipeline().
|
private |
LDA matrix, if supplied.
Definition at line 300 of file online-nnet2-feature-pipeline.h.
Referenced by OnlineNnet2FeaturePipeline::OnlineNnet2FeaturePipeline().
|
private |
Definition at line 291 of file online-nnet2-feature-pipeline.h.
Referenced by OnlineNnet2FeaturePipeline::GetAdaptationState(), OnlineNnet2FeaturePipeline::OnlineNnet2FeaturePipeline(), and OnlineNnet2FeaturePipeline::SetAdaptationState().
|
private |
Definition at line 299 of file online-nnet2-feature-pipeline.h.
|
private |
iVector feature, if used.
Part of the feature pipeline that would be given as the primary (non-iVector) input to the neural network in nnet3 applications. This pointer is returned by InputFeature().
Definition at line 317 of file online-nnet2-feature-pipeline.h.
Referenced by OnlineNnet2FeaturePipeline::OnlineNnet2FeaturePipeline().
|
private |
MFCC/PLP/filterbank.
Definition at line 295 of file online-nnet2-feature-pipeline.h.
Referenced by OnlineNnet2FeaturePipeline::AcceptWaveform(), OnlineNnet2FeaturePipeline::InputFinished(), OnlineNnet2FeaturePipeline::OnlineNnet2FeaturePipeline(), and OnlineNnet2FeaturePipeline::~OnlineNnet2FeaturePipeline().
|
private |
Raw pitch, if used.
Definition at line 296 of file online-nnet2-feature-pipeline.h.
Referenced by OnlineNnet2FeaturePipeline::OnlineNnet2FeaturePipeline(), and OnlineNnet2FeaturePipeline::~OnlineNnet2FeaturePipeline().