Classes
struct	OnlineEndpointRule
	This header contains a simple facility for endpointing, that should be used in conjunction with the "online2" online decoding code; see ../online2bin/online2-wav-gmm-latgen-faster-endpoint.cc. More...

struct	OnlineEndpointConfig

struct	OnlineGmmDecodingAdaptationPolicyConfig
	This configuration class controls when to re-estimate the basis-fMLLR during online decoding. More...

struct	OnlineGmmDecodingConfig

class	OnlineGmmDecodingModels
	This class is used to read, store and give access to the models used for 3 phases of decoding (first-pass with online-CMN features; the ML models used for estimating transforms; and the discriminatively trained models). More...

struct	OnlineGmmAdaptationState

class	SingleUtteranceGmmDecoder
	You will instantiate this class when you want to decode a single utterance using the online-decoding setup. More...

class	ThreadSynchronizer
	class ThreadSynchronizer acts to guard an arbitrary type of buffer between a producing and a consuming thread (note: it's all symmetric between the two thread types). More...

struct	OnlineNnet2DecodingThreadedConfig

class	SingleUtteranceNnet2DecoderThreaded
	You will instantiate this class when you want to decode a single utterance using the online-decoding setup for neural nets. More...

struct	OnlineNnet2DecodingConfig

class	SingleUtteranceNnet2Decoder
	You will instantiate this class when you want to decode a single utterance using the online-decoding setup for neural nets. More...

class	SingleUtteranceNnet3DecoderTpl< FST >
	You will instantiate this class when you want to decode a single utterance using the online-decoding setup for neural nets. More...

class	SingleUtteranceNnet3IncrementalDecoderTpl< FST >
	You will instantiate this class when you want to decode a single utterance using the online-decoding setup for neural nets. More...

class	OnlineTimingStats
	class OnlineTimingStats stores statistics from timing of online decoding, which will enable the Print() function to print out the average real-time factor and average delay per utterance. More...

class	OnlineTimer
	class OnlineTimer is used to test real-time decoding algorithms and evaluate how long the decoding of a particular utterance would take. More...

Typedefs
typedef SingleUtteranceNnet3DecoderTpl< fst::Fst< fst::StdArc > >	SingleUtteranceNnet3Decoder

typedef SingleUtteranceNnet3IncrementalDecoderTpl< fst::Fst< fst::StdArc > >	SingleUtteranceNnet3IncrementalDecoder

Functions
bool	EndpointDetected (const OnlineEndpointConfig &config, int32 num_frames_decoded, int32 trailing_silence_frames, BaseFloat frame_shift_in_seconds, BaseFloat final_relative_cost)
	This function returns true if this set of endpointing rules thinks we should terminate decoding. More...

template<typename FST , typename DEC >
int32	TrailingSilenceLength (const TransitionModel &tmodel, const std::string &silence_phones, const DEC &decoder)
	returns the number of frames of trailing silence in the best-path traceback (not using final-probs). More...

template<typename FST >
bool	EndpointDetected (const OnlineEndpointConfig &config, const TransitionModel &tmodel, BaseFloat frame_shift_in_seconds, const LatticeFasterOnlineDecoderTpl< FST > &decoder)
	This is a higher-level convenience function that works out the arguments to the EndpointDetected function above, from the decoder. More...

template<typename FST >
bool	EndpointDetected (const OnlineEndpointConfig &config, const TransitionModel &tmodel, BaseFloat frame_shift_in_seconds, const LatticeIncrementalOnlineDecoderTpl< FST > &decoder)
	This is a higher-level convenience function that works out the arguments to the EndpointDetected function above, from the decoder. More...

Detailed Description

Typedef Documentation

◆ SingleUtteranceNnet3Decoder

typedef SingleUtteranceNnet3DecoderTpl<fst::Fst<fst::StdArc> > SingleUtteranceNnet3Decoder

Definition at line 121 of file online-nnet3-decoding.h.

◆ SingleUtteranceNnet3IncrementalDecoder

typedef SingleUtteranceNnet3IncrementalDecoderTpl<fst::Fst<fst::StdArc> > SingleUtteranceNnet3IncrementalDecoder

Definition at line 140 of file online-nnet3-incremental-decoding.h.

Function Documentation

◆ EndpointDetected() [1/3]

bool EndpointDetected	(	const OnlineEndpointConfig &	config,
		int32	num_frames_decoded,
		int32	trailing_silence_frames,
		BaseFloat	frame_shift_in_seconds,
		BaseFloat	final_relative_cost
	)

This function returns true if this set of endpointing rules thinks we should terminate decoding.

Note: in verbose mode it will print logging information when returning true.

Definition at line 46 of file online-endpoint.cc.

References KALDI_ASSERT, OnlineEndpointConfig::rule1, OnlineEndpointConfig::rule2, OnlineEndpointConfig::rule3, OnlineEndpointConfig::rule4, OnlineEndpointConfig::rule5, and kaldi::RuleActivated().

Referenced by SingleUtteranceNnet3DecoderTpl< FST >::EndpointDetected(), SingleUtteranceNnet2Decoder::EndpointDetected(), kaldi::EndpointDetected(), SingleUtteranceNnet3IncrementalDecoderTpl< FST >::EndpointDetected(), SingleUtteranceGmmDecoder::EndpointDetected(), SingleUtteranceNnet2DecoderThreaded::EndpointDetected(), SingleUtteranceGmmDecoder::FinalRelativeCost(), and OnlineEndpointConfig::Register().

                                                      {
   KALDI_ASSERT(num_frames_decoded >= trailing_silence_frames);
 
   BaseFloat utterance_length = num_frames_decoded * frame_shift_in_seconds,
       trailing_silence = trailing_silence_frames * frame_shift_in_seconds;
 
   if (RuleActivated(config.rule1, "rule1",
                     trailing_silence, final_relative_cost, utterance_length))
     return true;
   if (RuleActivated(config.rule2, "rule2",
                     trailing_silence, final_relative_cost, utterance_length))
     return true;
   if (RuleActivated(config.rule3, "rule3",
                     trailing_silence, final_relative_cost, utterance_length))
     return true;
   if (RuleActivated(config.rule4, "rule4",
                     trailing_silence, final_relative_cost, utterance_length))
     return true;
   if (RuleActivated(config.rule5, "rule5",
                     trailing_silence, final_relative_cost, utterance_length))
     return true;
   return false;
 }

◆ EndpointDetected() [2/3]

bool EndpointDetected	(	const OnlineEndpointConfig &	config,
		const TransitionModel &	tmodel,
		BaseFloat	frame_shift_in_seconds,
		const LatticeFasterOnlineDecoderTpl< FST > &	decoder
	)

This is a higher-level convenience function that works out the arguments to the EndpointDetected function above, from the decoder.

Definition at line 110 of file online-endpoint.cc.

References kaldi::EndpointDetected(), LatticeFasterDecoderTpl< FST, decoder::BackpointerToken >::FinalRelativeCost(), LatticeFasterDecoderTpl< FST, decoder::BackpointerToken >::NumFramesDecoded(), and OnlineEndpointConfig::silence_phones.

                                                        {
   if (decoder.NumFramesDecoded() == 0) return false;
 
   BaseFloat final_relative_cost = decoder.FinalRelativeCost();
 
   int32 num_frames_decoded = decoder.NumFramesDecoded(),
       trailing_silence_frames = TrailingSilenceLength<FST, LatticeFasterOnlineDecoderTpl<FST>>(tmodel,
                                                       config.silence_phones,
                                                       decoder);
 
   return EndpointDetected(config, num_frames_decoded, trailing_silence_frames,
                           frame_shift_in_seconds, final_relative_cost);
 }

◆ EndpointDetected() [3/3]

bool EndpointDetected	(	const OnlineEndpointConfig &	config,
		const TransitionModel &	tmodel,
		BaseFloat	frame_shift_in_seconds,
		const LatticeIncrementalOnlineDecoderTpl< FST > &	decoder
	)

This is a higher-level convenience function that works out the arguments to the EndpointDetected function above, from the decoder.

Definition at line 129 of file online-endpoint.cc.

References kaldi::EndpointDetected(), LatticeIncrementalDecoderTpl< FST, decoder::BackpointerToken >::FinalRelativeCost(), LatticeIncrementalDecoderTpl< FST, decoder::BackpointerToken >::NumFramesDecoded(), and OnlineEndpointConfig::silence_phones.

                                                             {
   if (decoder.NumFramesDecoded() == 0) return false;
 
   BaseFloat final_relative_cost = decoder.FinalRelativeCost();
 
   int32 num_frames_decoded = decoder.NumFramesDecoded(),
       trailing_silence_frames = TrailingSilenceLength<FST, LatticeIncrementalOnlineDecoderTpl<FST>>(tmodel,
                                                       config.silence_phones,
                                                       decoder);
 
   return EndpointDetected(config, num_frames_decoded, trailing_silence_frames,
                           frame_shift_in_seconds, final_relative_cost);
 }

◆ TrailingSilenceLength()

int32 TrailingSilenceLength	(	const TransitionModel &	tmodel,
		const std::string &	silence_phones,
		const DEC &	decoder
	)

returns the number of frames of trailing silence in the best-path traceback (not using final-probs).

"silence_phones" is a colon-separated list of integer id's of phones that we consider silence. We use the the BestPathEnd() and TraceBackOneLink() functions of LatticeFasterOnlineDecoder to do this efficiently.

Definition at line 75 of file online-endpoint.cc.

References kaldi::IsSortedAndUniq(), KALDI_ASSERT, KALDI_ERR, kaldi::SplitStringToIntegers(), and TransitionModel::TransitionIdToPhone().

Referenced by OnlineEndpointConfig::Register().

                                                 {
   std::vector<int32> silence_phones;
   if (!SplitStringToIntegers(silence_phones_str, ":", false, &silence_phones))
     KALDI_ERR << "Bad --silence-phones option in endpointing config: "
               << silence_phones_str;
   std::sort(silence_phones.begin(), silence_phones.end());
   KALDI_ASSERT(IsSortedAndUniq(silence_phones) &&
                "Duplicates in --silence-phones option in endpointing config");
   KALDI_ASSERT(!silence_phones.empty() &&
                "Endpointing requires nonempty --endpoint.silence-phones option");
   ConstIntegerSet<int32> silence_set(silence_phones);
 
   bool use_final_probs = false;
   typename DEC::BestPathIterator iter =
       decoder.BestPathEnd(use_final_probs, NULL);
   int32 num_silence_frames = 0;
   while (!iter.Done()) {  // we're going backwards in time from the most
                           // recently decoded frame...
     LatticeArc arc;
     iter = decoder.TraceBackBestPath(iter, &arc);
     if (arc.ilabel != 0) {
       int32 phone = tmodel.TransitionIdToPhone(arc.ilabel);
       if (silence_set.count(phone) != 0) {
         num_silence_frames++;
       } else {
         break; // stop counting as soon as we hit non-silence.
       }
     }
   }
   return num_silence_frames;
 }

Classes

Typedefs

Functions