OnlineNnet2FeaturePipeline is a class that's responsible for putting together the various parts of the feature-processing pipeline for neural networks, in an online setting. More...

#include <online-nnet2-feature-pipeline.h>

Inheritance diagram for OnlineNnet2FeaturePipeline:

[legend]

Collaboration diagram for OnlineNnet2FeaturePipeline:

[legend]

Public Member Functions
	OnlineNnet2FeaturePipeline (const OnlineNnet2FeaturePipelineInfo &info)
	Constructor from the "info" object. More...

virtual int32	Dim () const
	Member functions from OnlineFeatureInterface: More...

virtual bool	IsLastFrame (int32 frame) const
	Returns true if this is the last frame. More...

virtual int32	NumFramesReady () const
	returns the feature dimension. More...

virtual void	GetFrame (int32 frame, VectorBase< BaseFloat > *feat)
	Gets the feature vector for this frame. More...

void	UpdateFrameWeights (const std::vector< std::pair< int32, BaseFloat > > &delta_weights)
	If you are downweighting silence, you can call OnlineSilenceWeighting::GetDeltaWeights and supply the output to this class using UpdateFrameWeights(). More...

void	SetAdaptationState (const OnlineIvectorExtractorAdaptationState &adaptation_state)
	Set the adaptation state to a particular value, e.g. More...

void	GetAdaptationState (OnlineIvectorExtractorAdaptationState *adaptation_state) const
	Get the adaptation state; you may want to call this before destroying this object, to get adaptation state that can be used to improve decoding of later utterances of this speaker. More...

void	SetCmvnState (const OnlineCmvnState &cmvn_state)
	Set the CMVN state to a particular value. More...

void	GetCmvnState (OnlineCmvnState *cmvn_state)

void	AcceptWaveform (BaseFloat sampling_rate, const VectorBase< BaseFloat > &waveform)
	Accept more data to process. More...

BaseFloat	FrameShiftInSeconds () const

void	InputFinished ()
	If you call InputFinished(), it tells the class you won't be providing any more waveform. More...

OnlineIvectorFeature *	IvectorFeature ()
	This function returns the iVector-extracting part of the feature pipeline (or NULL if iVectors are not being used); the pointer ownership is retained by this object and not transferred to the caller. More...

const OnlineIvectorFeature *	IvectorFeature () const
	A const accessor for the iVector extractor. More...

OnlineFeatureInterface *	InputFeature ()
	This function returns the part of the feature pipeline that would be given as the primary (non-iVector) input to the neural network in nnet3 applications. More...

virtual	~OnlineNnet2FeaturePipeline ()

Public Member Functions inherited from OnlineFeatureInterface
virtual void	GetFrames (const std::vector< int32 > &frames, MatrixBase< BaseFloat > *feats)
	This is like GetFrame() but for a collection of frames. More...

virtual	~OnlineFeatureInterface ()
	Virtual destructor. More...

Private Attributes
const OnlineNnet2FeaturePipelineInfo &	info_

OnlineBaseFeature *	base_feature_

OnlinePitchFeature *	pitch_
	MFCC/PLP/filterbank. More...

OnlineProcessPitch *	pitch_feature_
	Raw pitch, if used. More...

OnlineCmvn *	cmvn_feature_
	Processed pitch, if pitch used. More...

Matrix< BaseFloat >	lda_mat_

Matrix< double >	global_cmvn_stats_
	LDA matrix, if supplied. More...

OnlineFeatureInterface *	feature_plus_optional_pitch_
	Global CMVN stats. More...

OnlineFeatureInterface *	feature_plus_optional_cmvn_
	feature_plus_optional_cmvn_ is the feature_plus_optional_pitch_ transformed with OnlineCmvn if cmvn is active; otherwise, points to the same address as feature_plus_optional_pitch_. More...

OnlineIvectorFeature *	ivector_feature_

OnlineFeatureInterface *	nnet3_feature_
	iVector feature, if used. More...

OnlineFeatureInterface *	final_feature_
	final_feature_ is feature_plus_optional_cmvn_ appended (OnlineAppendFeature) with ivector_feature_, if ivector_feature_ is used; otherwise, points to the same address as feature_plus_optional_pitch_. More...

int32	dim_
	we cache the feature dimension, to save time when calling Dim(). More...

Detailed Description

OnlineNnet2FeaturePipeline is a class that's responsible for putting together the various parts of the feature-processing pipeline for neural networks, in an online setting.

The recipe here does not include fMLLR; instead, it assumes we're giving raw features such as MFCC or PLP or filterbank (with no CMVN) to the neural network, and optionally augmenting these with an iVector that describes the speaker characteristics. The iVector is extracted using class OnlineIvectorFeature (see that class for more info on how it's done). No splicing is currently done in this code, as we're currently only supporting the nnet2 neural network in which the splicing is done inside the network. Probably our strategy for nnet1 network conversion would be to convert to nnet2 and just add layers to do the splicing.

Definition at line 198 of file online-nnet2-feature-pipeline.h.

Constructor & Destructor Documentation

◆ OnlineNnet2FeaturePipeline()

OnlineNnet2FeaturePipeline ( const OnlineNnet2FeaturePipelineInfo & info )

explicit

Constructor from the "info" object.

The main feature extraction pipeline is constructed in this constructor.

After calling this for a non-initial utterance of a speaker, you may want to call SetAdaptationState().

Definition at line 90 of file online-nnet2-feature-pipeline.cc.

                                                :
     info_(info), base_feature_(NULL),
     pitch_(NULL), pitch_feature_(NULL),
     cmvn_feature_(NULL),
     feature_plus_optional_pitch_(NULL),
     feature_plus_optional_cmvn_(NULL),
     ivector_feature_(NULL),
     nnet3_feature_(NULL),
     final_feature_(NULL) {
 
   if (info_.feature_type == "mfcc") {
     base_feature_ = new OnlineMfcc(info_.mfcc_opts);
   } else if (info_.feature_type == "plp") {
     base_feature_ = new OnlinePlp(info_.plp_opts);
   } else if (info_.feature_type == "fbank") {
     base_feature_ = new OnlineFbank(info_.fbank_opts);
   } else {
     KALDI_ERR << "Code error: invalid feature type " << info_.feature_type;
   }
 
   if (info_.add_pitch) {
     pitch_ = new OnlinePitchFeature(info_.pitch_opts);
     pitch_feature_ = new OnlineProcessPitch(info_.pitch_process_opts,
                                             pitch_);
     feature_plus_optional_pitch_ = new OnlineAppendFeature(base_feature_,
                                                            pitch_feature_);
   } else {
     feature_plus_optional_pitch_ = base_feature_;
   }
 
   if (info_.use_cmvn) {
     KALDI_ASSERT(info.global_cmvn_stats_rxfilename != "");
     ReadKaldiObject(info.global_cmvn_stats_rxfilename, &global_cmvn_stats_);
     OnlineCmvnState initial_state(global_cmvn_stats_);
     cmvn_feature_ = new OnlineCmvn(info_.cmvn_opts, initial_state,
         feature_plus_optional_pitch_);
     feature_plus_optional_cmvn_ = cmvn_feature_;
   } else {
     feature_plus_optional_cmvn_ = feature_plus_optional_pitch_;
   }
 
   if (info_.use_ivectors) {
     nnet3_feature_ = feature_plus_optional_cmvn_;
     // Note: the i-vector extractor OnlineIvectorFeature gets 'base_feautre_'
     // without cmvn (the online cmvn is applied inside the class)
     ivector_feature_ = new OnlineIvectorFeature(info_.ivector_extractor_info,
                                                 base_feature_);
     final_feature_ = new OnlineAppendFeature(feature_plus_optional_cmvn_,
                                              ivector_feature_);
   } else {
     nnet3_feature_ = feature_plus_optional_cmvn_;
     final_feature_ = feature_plus_optional_cmvn_;
   }
   dim_ = final_feature_->Dim();
 }

◆ ~OnlineNnet2FeaturePipeline()

~OnlineNnet2FeaturePipeline ( )

virtual

Definition at line 201 of file online-nnet2-feature-pipeline.cc.

References OnlineNnet2FeaturePipeline::base_feature_, OnlineNnet2FeaturePipeline::cmvn_feature_, OnlineNnet2FeaturePipeline::feature_plus_optional_cmvn_, OnlineNnet2FeaturePipeline::feature_plus_optional_pitch_, OnlineNnet2FeaturePipeline::final_feature_, OnlineNnet2FeaturePipeline::ivector_feature_, OnlineNnet2FeaturePipeline::pitch_, and OnlineNnet2FeaturePipeline::pitch_feature_.

                                                         {
   // Note: the delete command only deletes pointers that are non-NULL.  Not all
   // of the pointers below will be non-NULL.
   // Some of the online-feature pointers are just copies of other pointers,
   // and we do have to avoid deleting them in those cases.
   if (final_feature_ != feature_plus_optional_cmvn_)
     delete final_feature_;
   delete ivector_feature_;
   delete cmvn_feature_;
   if (feature_plus_optional_pitch_ != base_feature_)
     delete feature_plus_optional_pitch_;
   delete pitch_feature_;
   delete pitch_;
   delete base_feature_;
 }

Member Function Documentation

◆ AcceptWaveform()

void AcceptWaveform	(	BaseFloat	sampling_rate,
		const VectorBase< BaseFloat > &	waveform
	)

Accept more data to process.

It won't actually process it until you call GetFrame() [probably indirectly via (decoder).AdvanceDecoding()], when you call this function it will just copy it). sampling_rate is necessary just to assert it equals what's in the config.

Definition at line 217 of file online-nnet2-feature-pipeline.cc.

References OnlineBaseFeature::AcceptWaveform(), OnlinePitchFeature::AcceptWaveform(), OnlineNnet2FeaturePipeline::base_feature_, and OnlineNnet2FeaturePipeline::pitch_.

Referenced by SingleUtteranceNnet2DecoderThreaded::FeatureComputation(), and main().

                                            {
   base_feature_->AcceptWaveform(sampling_rate, waveform);
   if (pitch_)
     pitch_->AcceptWaveform(sampling_rate, waveform);
 }

◆ Dim()

int32 Dim ( ) const

virtual

Member functions from OnlineFeatureInterface:

^-^

Dim() will return the base-feature dimension (e.g. 13 for normal MFCC); plus the pitch-feature dimension (e.g. 3), if used; plus the iVector dimension, if used. Any frame-splicing happens inside the neural-network code.

Implements OnlineFeatureInterface.

Definition at line 149 of file online-nnet2-feature-pipeline.cc.

References OnlineNnet2FeaturePipeline::dim_.

Referenced by SingleUtteranceNnet2DecoderThreaded::RunNnetEvaluationInternal().

149 { return dim_; }

kaldi::OnlineNnet2FeaturePipeline::dim_

int32 dim_

we cache the feature dimension, to save time when calling Dim().

Definition: online-nnet2-feature-pipeline.h:325

◆ FrameShiftInSeconds()

BaseFloat FrameShiftInSeconds ( ) const

inlinevirtual

Implements OnlineFeatureInterface.

Definition at line 257 of file online-nnet2-feature-pipeline.h.

Referenced by SingleUtteranceNnet2DecoderThreaded::EndpointDetected(), SingleUtteranceNnet2DecoderThreaded::FeatureComputation(), SingleUtteranceNnet2DecoderThreaded::GetRemainingWaveform(), and SingleUtteranceNnet2DecoderThreaded::NumFramesReceivedApprox().

257 { return info_.FrameShiftInSeconds(); }

kaldi::OnlineNnet2FeaturePipelineInfo::FrameShiftInSeconds

BaseFloat FrameShiftInSeconds() const

Definition: online-nnet2-feature-pipeline.cc:231

kaldi::OnlineNnet2FeaturePipeline::info_

const OnlineNnet2FeaturePipelineInfo & info_

Definition: online-nnet2-feature-pipeline.h:291

◆ GetAdaptationState()

void GetAdaptationState ( OnlineIvectorExtractorAdaptationState * adaptation_state ) const

Get the adaptation state; you may want to call this before destroying this object, to get adaptation state that can be used to improve decoding of later utterances of this speaker.

You might not want to do this, though, if you have reason to believe that something went wrong in the recognition (e.g., low confidence).

Definition at line 177 of file online-nnet2-feature-pipeline.cc.

References OnlineIvectorFeature::GetAdaptationState(), OnlineNnet2FeaturePipeline::info_, OnlineNnet2FeaturePipeline::ivector_feature_, and OnlineNnet2FeaturePipelineInfo::use_ivectors.

Referenced by SingleUtteranceNnet2DecoderThreaded::GetAdaptationState().

                                                                    {
   if (info_.use_ivectors) {
     ivector_feature_->GetAdaptationState(adaptation_state);
   }
   // else silently do nothing, as there is nothing to do.
 }

◆ GetCmvnState()

void GetCmvnState ( OnlineCmvnState * cmvn_state )

Definition at line 191 of file online-nnet2-feature-pipeline.cc.

References OnlineNnet2FeaturePipeline::cmvn_feature_, OnlineCmvn::GetState(), and OnlineCmvn::NumFramesReady().

Referenced by SingleUtteranceNnet2DecoderThreaded::GetCmvnState().

                                  {
   if (NULL != cmvn_feature_) {
     int32 frame = cmvn_feature_->NumFramesReady() - 1;
     // the following call will crash if no frames are ready.
     cmvn_feature_->GetState(frame, cmvn_state);
   }
 }

◆ GetFrame()

void GetFrame	(	int32	frame,
		VectorBase< BaseFloat > *	feat
	)

virtual

Gets the feature vector for this frame.

Before calling this for a given frame, it is assumed that you called NumFramesReady() and it returned a number greater than "frame". Otherwise this call will likely crash with an assert failure. This function is not declared const, in case there is some kind of caching going on, but most of the time it shouldn't modify the class.

Implements OnlineFeatureInterface.

Definition at line 159 of file online-nnet2-feature-pipeline.cc.

References OnlineNnet2FeaturePipeline::final_feature_, and OnlineFeatureInterface::GetFrame().

Referenced by SingleUtteranceNnet2DecoderThreaded::RunNnetEvaluationInternal().

                                                                        {
   return final_feature_->GetFrame(frame, feat);
 }

◆ InputFeature()

OnlineFeatureInterface* InputFeature ( )

inline

This function returns the part of the feature pipeline that would be given as the primary (non-iVector) input to the neural network in nnet3 applications.

Definition at line 284 of file online-nnet2-feature-pipeline.h.

                                          {
     return nnet3_feature_;
   }

◆ InputFinished()

void InputFinished ( )

If you call InputFinished(), it tells the class you won't be providing any more waveform.

This will help flush out the last few frames of delta or LDA features, and finalize the pitch features (making them more accurate)... although since in neural-net decoding we don't anticipate rescoring the lattices, this may not be much of an issue.

Definition at line 225 of file online-nnet2-feature-pipeline.cc.

References OnlineNnet2FeaturePipeline::base_feature_, OnlineBaseFeature::InputFinished(), OnlinePitchFeature::InputFinished(), and OnlineNnet2FeaturePipeline::pitch_.

Referenced by SingleUtteranceNnet2DecoderThreaded::FeatureComputation(), and main().

                                                {
   base_feature_->InputFinished();
   if (pitch_)
     pitch_->InputFinished();
 }

◆ IsLastFrame()

bool IsLastFrame ( int32 frame ) const

virtual

Returns true if this is the last frame.

Frame indices are zero-based, so the first frame is zero. IsLastFrame(-1) will return false, unless the file is empty (which is a case that I'm not sure all the code will handle, so be careful). This function may return false for some frame if we haven't yet decided to terminate decoding, but later true if we decide to terminate decoding. This function exists mainly to correctly handle end effects in feature extraction, and is not a mechanism to determine how many frames are in the decodable object (as it used to be, and for backward compatibility, still is, in the Decodable interface).

Implements OnlineFeatureInterface.

Definition at line 151 of file online-nnet2-feature-pipeline.cc.

References OnlineNnet2FeaturePipeline::final_feature_, and OnlineFeatureInterface::IsLastFrame().

Referenced by SingleUtteranceNnet2DecoderThreaded::FeatureComputation(), and SingleUtteranceNnet2DecoderThreaded::RunNnetEvaluationInternal().

                                                               {
   return final_feature_->IsLastFrame(frame);
 }

◆ IvectorFeature() [1/2]

OnlineIvectorFeature* IvectorFeature ( )

inline

This function returns the iVector-extracting part of the feature pipeline (or NULL if iVectors are not being used); the pointer ownership is retained by this object and not transferred to the caller.

This function is used in nnet3, and also in the silence-weighting code used to exclude silence from the iVector estimation.

Definition at line 271 of file online-nnet2-feature-pipeline.h.

Referenced by main(), SingleUtteranceNnet2DecoderThreaded::RunNnetEvaluationInternal(), and OnlineNnet2FeaturePipeline::UpdateFrameWeights().

                                          {
     return ivector_feature_;
   }

◆ IvectorFeature() [2/2]

const OnlineIvectorFeature* IvectorFeature ( ) const

inline

A const accessor for the iVector extractor.

Returns NULL if iVectors are not being used.

Definition at line 277 of file online-nnet2-feature-pipeline.h.

                                                      {
     return ivector_feature_;
   }

◆ NumFramesReady()

int32 NumFramesReady ( ) const

virtual

returns the feature dimension.

Returns the total number of frames, since the start of the utterance, that are now available. In an online-decoding context, this will likely increase with time as more data becomes available.

Implements OnlineFeatureInterface.

Definition at line 155 of file online-nnet2-feature-pipeline.cc.

References OnlineNnet2FeaturePipeline::final_feature_, and OnlineFeatureInterface::NumFramesReady().

Referenced by SingleUtteranceNnet2DecoderThreaded::FeatureComputation(), main(), and SingleUtteranceNnet2DecoderThreaded::RunNnetEvaluationInternal().

                                                        {
   return final_feature_->NumFramesReady();
 }

◆ SetAdaptationState()

void SetAdaptationState ( const OnlineIvectorExtractorAdaptationState & adaptation_state )

Set the adaptation state to a particular value, e.g.

reflecting previous utterances of the same speaker; this will generally be called after Copy().

Definition at line 169 of file online-nnet2-feature-pipeline.cc.

References OnlineNnet2FeaturePipeline::info_, OnlineNnet2FeaturePipeline::ivector_feature_, OnlineIvectorFeature::SetAdaptationState(), and OnlineNnet2FeaturePipelineInfo::use_ivectors.

Referenced by SingleUtteranceNnet2DecoderThreaded::SingleUtteranceNnet2DecoderThreaded().

                                                                    {
   if (info_.use_ivectors) {
     ivector_feature_->SetAdaptationState(adaptation_state);
   }
   // else silently do nothing, as there is nothing to do.
 }

◆ SetCmvnState()

void SetCmvnState ( const OnlineCmvnState & cmvn_state )

Set the CMVN state to a particular value.

(for features on nnet3 input, not the i-vector input).

Definition at line 185 of file online-nnet2-feature-pipeline.cc.

References OnlineNnet2FeaturePipeline::cmvn_feature_, and OnlineCmvn::SetState().

Referenced by SingleUtteranceNnet2DecoderThreaded::SingleUtteranceNnet2DecoderThreaded().

                                        {
   if (NULL != cmvn_feature_)
     cmvn_feature_->SetState(cmvn_state);
 }

◆ UpdateFrameWeights()

void UpdateFrameWeights ( const std::vector< std::pair< int32, BaseFloat > > & delta_weights )

If you are downweighting silence, you can call OnlineSilenceWeighting::GetDeltaWeights and supply the output to this class using UpdateFrameWeights().

The reason why this call happens outside this class, rather than this class pulling in the data weights, relates to multi-threaded operation and also from not wanting this class to have excessive dependencies.

You must either always call this as soon as new data becomes available, ideally just after calling AcceptWaveform(), or never call it for the lifetime of this object.

Definition at line 164 of file online-nnet2-feature-pipeline.cc.

References OnlineNnet2FeaturePipeline::IvectorFeature(), and OnlineIvectorFeature::UpdateFrameWeights().

Referenced by main().

                                                                 {
     IvectorFeature()->UpdateFrameWeights(delta_weights);
 }

Member Data Documentation

◆ base_feature_

OnlineBaseFeature* base_feature_

private

Definition at line 293 of file online-nnet2-feature-pipeline.h.

Referenced by OnlineNnet2FeaturePipeline::AcceptWaveform(), OnlineNnet2FeaturePipeline::InputFinished(), OnlineNnet2FeaturePipeline::OnlineNnet2FeaturePipeline(), and OnlineNnet2FeaturePipeline::~OnlineNnet2FeaturePipeline().

◆ cmvn_feature_

OnlineCmvn* cmvn_feature_

private

Processed pitch, if pitch used.

Definition at line 298 of file online-nnet2-feature-pipeline.h.

Referenced by OnlineNnet2FeaturePipeline::GetCmvnState(), OnlineNnet2FeaturePipeline::OnlineNnet2FeaturePipeline(), OnlineNnet2FeaturePipeline::SetCmvnState(), and OnlineNnet2FeaturePipeline::~OnlineNnet2FeaturePipeline().

◆ dim_

int32 dim_

private

we cache the feature dimension, to save time when calling Dim().

Definition at line 325 of file online-nnet2-feature-pipeline.h.

Referenced by OnlineNnet2FeaturePipeline::Dim(), and OnlineNnet2FeaturePipeline::OnlineNnet2FeaturePipeline().

◆ feature_plus_optional_cmvn_

OnlineFeatureInterface* feature_plus_optional_cmvn_

private

feature_plus_optional_cmvn_ is the feature_plus_optional_pitch_ transformed with OnlineCmvn if cmvn is active; otherwise, points to the same address as feature_plus_optional_pitch_.

Definition at line 310 of file online-nnet2-feature-pipeline.h.

Referenced by OnlineNnet2FeaturePipeline::OnlineNnet2FeaturePipeline(), and OnlineNnet2FeaturePipeline::~OnlineNnet2FeaturePipeline().

◆ feature_plus_optional_pitch_

OnlineFeatureInterface* feature_plus_optional_pitch_

private

Global CMVN stats.

feature_plus_optional_pitch_ is the base_feature_ appended (OnlineAppendFeature) with pitch_feature_, if used; otherwise, points to the same address as base_feature_.

Definition at line 305 of file online-nnet2-feature-pipeline.h.

Referenced by OnlineNnet2FeaturePipeline::OnlineNnet2FeaturePipeline(), and OnlineNnet2FeaturePipeline::~OnlineNnet2FeaturePipeline().

◆ final_feature_

OnlineFeatureInterface* final_feature_

private

final_feature_ is feature_plus_optional_cmvn_ appended (OnlineAppendFeature) with ivector_feature_, if ivector_feature_ is used; otherwise, points to the same address as feature_plus_optional_pitch_.

Definition at line 322 of file online-nnet2-feature-pipeline.h.

Referenced by OnlineNnet2FeaturePipeline::GetFrame(), OnlineNnet2FeaturePipeline::IsLastFrame(), OnlineNnet2FeaturePipeline::NumFramesReady(), OnlineNnet2FeaturePipeline::OnlineNnet2FeaturePipeline(), and OnlineNnet2FeaturePipeline::~OnlineNnet2FeaturePipeline().

◆ global_cmvn_stats_

Matrix<double> global_cmvn_stats_

private

LDA matrix, if supplied.

Definition at line 300 of file online-nnet2-feature-pipeline.h.

Referenced by OnlineNnet2FeaturePipeline::OnlineNnet2FeaturePipeline().

◆ info_

const OnlineNnet2FeaturePipelineInfo& info_

private

Definition at line 291 of file online-nnet2-feature-pipeline.h.

Referenced by OnlineNnet2FeaturePipeline::GetAdaptationState(), OnlineNnet2FeaturePipeline::OnlineNnet2FeaturePipeline(), and OnlineNnet2FeaturePipeline::SetAdaptationState().

◆ ivector_feature_

OnlineIvectorFeature* ivector_feature_

private

Definition at line 312 of file online-nnet2-feature-pipeline.h.

Referenced by OnlineNnet2FeaturePipeline::GetAdaptationState(), OnlineNnet2FeaturePipeline::OnlineNnet2FeaturePipeline(), OnlineNnet2FeaturePipeline::SetAdaptationState(), and OnlineNnet2FeaturePipeline::~OnlineNnet2FeaturePipeline().

◆ lda_mat_

Matrix<BaseFloat> lda_mat_

private

Definition at line 299 of file online-nnet2-feature-pipeline.h.

◆ nnet3_feature_

OnlineFeatureInterface* nnet3_feature_

private

iVector feature, if used.

Part of the feature pipeline that would be given as the primary (non-iVector) input to the neural network in nnet3 applications. This pointer is returned by InputFeature().

Definition at line 317 of file online-nnet2-feature-pipeline.h.

Referenced by OnlineNnet2FeaturePipeline::OnlineNnet2FeaturePipeline().