OnlineDecoding

Classes

struct  OnlineEndpointRule
 This header contains a simple facility for endpointing, that should be used in conjunction with the "online2" online decoding code; see ../online2bin/online2-wav-gmm-latgen-faster-endpoint.cc. More...
 
struct  OnlineEndpointConfig
 
struct  OnlineGmmDecodingAdaptationPolicyConfig
 This configuration class controls when to re-estimate the basis-fMLLR during online decoding. More...
 
struct  OnlineGmmDecodingConfig
 
class  OnlineGmmDecodingModels
 This class is used to read, store and give access to the models used for 3 phases of decoding (first-pass with online-CMN features; the ML models used for estimating transforms; and the discriminatively trained models). More...
 
struct  OnlineGmmAdaptationState
 
class  SingleUtteranceGmmDecoder
 You will instantiate this class when you want to decode a single utterance using the online-decoding setup. More...
 
class  ThreadSynchronizer
 class ThreadSynchronizer acts to guard an arbitrary type of buffer between a producing and a consuming thread (note: it's all symmetric between the two thread types). More...
 
struct  OnlineNnet2DecodingThreadedConfig
 
class  SingleUtteranceNnet2DecoderThreaded
 You will instantiate this class when you want to decode a single utterance using the online-decoding setup for neural nets. More...
 
struct  OnlineNnet2DecodingConfig
 
class  SingleUtteranceNnet2Decoder
 You will instantiate this class when you want to decode a single utterance using the online-decoding setup for neural nets. More...
 
class  SingleUtteranceNnet3DecoderTpl< FST >
 You will instantiate this class when you want to decode a single utterance using the online-decoding setup for neural nets. More...
 
class  SingleUtteranceNnet3IncrementalDecoderTpl< FST >
 You will instantiate this class when you want to decode a single utterance using the online-decoding setup for neural nets. More...
 
class  OnlineTimingStats
 class OnlineTimingStats stores statistics from timing of online decoding, which will enable the Print() function to print out the average real-time factor and average delay per utterance. More...
 
class  OnlineTimer
 class OnlineTimer is used to test real-time decoding algorithms and evaluate how long the decoding of a particular utterance would take. More...
 

Typedefs

typedef SingleUtteranceNnet3DecoderTpl< fst::Fst< fst::StdArc > > SingleUtteranceNnet3Decoder
 
typedef SingleUtteranceNnet3IncrementalDecoderTpl< fst::Fst< fst::StdArc > > SingleUtteranceNnet3IncrementalDecoder
 

Functions

bool EndpointDetected (const OnlineEndpointConfig &config, int32 num_frames_decoded, int32 trailing_silence_frames, BaseFloat frame_shift_in_seconds, BaseFloat final_relative_cost)
 This function returns true if this set of endpointing rules thinks we should terminate decoding. More...
 
template<typename FST , typename DEC >
int32 TrailingSilenceLength (const TransitionModel &tmodel, const std::string &silence_phones, const DEC &decoder)
 returns the number of frames of trailing silence in the best-path traceback (not using final-probs). More...
 
template<typename FST >
bool EndpointDetected (const OnlineEndpointConfig &config, const TransitionModel &tmodel, BaseFloat frame_shift_in_seconds, const LatticeFasterOnlineDecoderTpl< FST > &decoder)
 This is a higher-level convenience function that works out the arguments to the EndpointDetected function above, from the decoder. More...
 
template<typename FST >
bool EndpointDetected (const OnlineEndpointConfig &config, const TransitionModel &tmodel, BaseFloat frame_shift_in_seconds, const LatticeIncrementalOnlineDecoderTpl< FST > &decoder)
 This is a higher-level convenience function that works out the arguments to the EndpointDetected function above, from the decoder. More...
 

Detailed Description

Typedef Documentation

◆ SingleUtteranceNnet3Decoder

◆ SingleUtteranceNnet3IncrementalDecoder

Function Documentation

◆ EndpointDetected() [1/3]

bool EndpointDetected ( const OnlineEndpointConfig config,
int32  num_frames_decoded,
int32  trailing_silence_frames,
BaseFloat  frame_shift_in_seconds,
BaseFloat  final_relative_cost 
)

This function returns true if this set of endpointing rules thinks we should terminate decoding.

Note: in verbose mode it will print logging information when returning true.

Definition at line 46 of file online-endpoint.cc.

References KALDI_ASSERT, OnlineEndpointConfig::rule1, OnlineEndpointConfig::rule2, OnlineEndpointConfig::rule3, OnlineEndpointConfig::rule4, OnlineEndpointConfig::rule5, and kaldi::RuleActivated().

Referenced by SingleUtteranceNnet3DecoderTpl< FST >::EndpointDetected(), SingleUtteranceNnet2Decoder::EndpointDetected(), kaldi::EndpointDetected(), SingleUtteranceNnet3IncrementalDecoderTpl< FST >::EndpointDetected(), SingleUtteranceGmmDecoder::EndpointDetected(), SingleUtteranceNnet2DecoderThreaded::EndpointDetected(), SingleUtteranceGmmDecoder::FinalRelativeCost(), and OnlineEndpointConfig::Register().

50  {
51  KALDI_ASSERT(num_frames_decoded >= trailing_silence_frames);
52 
53  BaseFloat utterance_length = num_frames_decoded * frame_shift_in_seconds,
54  trailing_silence = trailing_silence_frames * frame_shift_in_seconds;
55 
56  if (RuleActivated(config.rule1, "rule1",
57  trailing_silence, final_relative_cost, utterance_length))
58  return true;
59  if (RuleActivated(config.rule2, "rule2",
60  trailing_silence, final_relative_cost, utterance_length))
61  return true;
62  if (RuleActivated(config.rule3, "rule3",
63  trailing_silence, final_relative_cost, utterance_length))
64  return true;
65  if (RuleActivated(config.rule4, "rule4",
66  trailing_silence, final_relative_cost, utterance_length))
67  return true;
68  if (RuleActivated(config.rule5, "rule5",
69  trailing_silence, final_relative_cost, utterance_length))
70  return true;
71  return false;
72 }
float BaseFloat
Definition: kaldi-types.h:29
static bool RuleActivated(const OnlineEndpointRule &rule, const std::string &rule_name, BaseFloat trailing_silence, BaseFloat relative_cost, BaseFloat utterance_length)
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185

◆ EndpointDetected() [2/3]

bool EndpointDetected ( const OnlineEndpointConfig config,
const TransitionModel tmodel,
BaseFloat  frame_shift_in_seconds,
const LatticeFasterOnlineDecoderTpl< FST > &  decoder 
)

This is a higher-level convenience function that works out the arguments to the EndpointDetected function above, from the decoder.

Definition at line 110 of file online-endpoint.cc.

References kaldi::EndpointDetected(), LatticeFasterDecoderTpl< FST, decoder::BackpointerToken >::FinalRelativeCost(), LatticeFasterDecoderTpl< FST, decoder::BackpointerToken >::NumFramesDecoded(), and OnlineEndpointConfig::silence_phones.

114  {
115  if (decoder.NumFramesDecoded() == 0) return false;
116 
117  BaseFloat final_relative_cost = decoder.FinalRelativeCost();
118 
119  int32 num_frames_decoded = decoder.NumFramesDecoded(),
120  trailing_silence_frames = TrailingSilenceLength<FST, LatticeFasterOnlineDecoderTpl<FST>>(tmodel,
121  config.silence_phones,
122  decoder);
123 
124  return EndpointDetected(config, num_frames_decoded, trailing_silence_frames,
125  frame_shift_in_seconds, final_relative_cost);
126 }
kaldi::int32 int32
float BaseFloat
Definition: kaldi-types.h:29
bool EndpointDetected(const OnlineEndpointConfig &config, const TransitionModel &tmodel, BaseFloat frame_shift_in_seconds, const LatticeIncrementalOnlineDecoderTpl< FST > &decoder)
This is a higher-level convenience function that works out the arguments to the EndpointDetected func...

◆ EndpointDetected() [3/3]

bool EndpointDetected ( const OnlineEndpointConfig config,
const TransitionModel tmodel,
BaseFloat  frame_shift_in_seconds,
const LatticeIncrementalOnlineDecoderTpl< FST > &  decoder 
)

This is a higher-level convenience function that works out the arguments to the EndpointDetected function above, from the decoder.

Definition at line 129 of file online-endpoint.cc.

References kaldi::EndpointDetected(), LatticeIncrementalDecoderTpl< FST, decoder::BackpointerToken >::FinalRelativeCost(), LatticeIncrementalDecoderTpl< FST, decoder::BackpointerToken >::NumFramesDecoded(), and OnlineEndpointConfig::silence_phones.

133  {
134  if (decoder.NumFramesDecoded() == 0) return false;
135 
136  BaseFloat final_relative_cost = decoder.FinalRelativeCost();
137 
138  int32 num_frames_decoded = decoder.NumFramesDecoded(),
139  trailing_silence_frames = TrailingSilenceLength<FST, LatticeIncrementalOnlineDecoderTpl<FST>>(tmodel,
140  config.silence_phones,
141  decoder);
142 
143  return EndpointDetected(config, num_frames_decoded, trailing_silence_frames,
144  frame_shift_in_seconds, final_relative_cost);
145 }
kaldi::int32 int32
float BaseFloat
Definition: kaldi-types.h:29
bool EndpointDetected(const OnlineEndpointConfig &config, const TransitionModel &tmodel, BaseFloat frame_shift_in_seconds, const LatticeIncrementalOnlineDecoderTpl< FST > &decoder)
This is a higher-level convenience function that works out the arguments to the EndpointDetected func...

◆ TrailingSilenceLength()

int32 TrailingSilenceLength ( const TransitionModel tmodel,
const std::string &  silence_phones,
const DEC &  decoder 
)

returns the number of frames of trailing silence in the best-path traceback (not using final-probs).

"silence_phones" is a colon-separated list of integer id's of phones that we consider silence. We use the the BestPathEnd() and TraceBackOneLink() functions of LatticeFasterOnlineDecoder to do this efficiently.

Definition at line 75 of file online-endpoint.cc.

References kaldi::IsSortedAndUniq(), KALDI_ASSERT, KALDI_ERR, kaldi::SplitStringToIntegers(), and TransitionModel::TransitionIdToPhone().

Referenced by OnlineEndpointConfig::Register().

77  {
78  std::vector<int32> silence_phones;
79  if (!SplitStringToIntegers(silence_phones_str, ":", false, &silence_phones))
80  KALDI_ERR << "Bad --silence-phones option in endpointing config: "
81  << silence_phones_str;
82  std::sort(silence_phones.begin(), silence_phones.end());
83  KALDI_ASSERT(IsSortedAndUniq(silence_phones) &&
84  "Duplicates in --silence-phones option in endpointing config");
85  KALDI_ASSERT(!silence_phones.empty() &&
86  "Endpointing requires nonempty --endpoint.silence-phones option");
87  ConstIntegerSet<int32> silence_set(silence_phones);
88 
89  bool use_final_probs = false;
90  typename DEC::BestPathIterator iter =
91  decoder.BestPathEnd(use_final_probs, NULL);
92  int32 num_silence_frames = 0;
93  while (!iter.Done()) { // we're going backwards in time from the most
94  // recently decoded frame...
95  LatticeArc arc;
96  iter = decoder.TraceBackBestPath(iter, &arc);
97  if (arc.ilabel != 0) {
98  int32 phone = tmodel.TransitionIdToPhone(arc.ilabel);
99  if (silence_set.count(phone) != 0) {
100  num_silence_frames++;
101  } else {
102  break; // stop counting as soon as we hit non-silence.
103  }
104  }
105  }
106  return num_silence_frames;
107 }
fst::ArcTpl< LatticeWeight > LatticeArc
Definition: kaldi-lattice.h:40
bool SplitStringToIntegers(const std::string &full, const char *delim, bool omit_empty_strings, std::vector< I > *out)
Split a string (e.g.
Definition: text-utils.h:68
kaldi::int32 int32
#define KALDI_ERR
Definition: kaldi-error.h:147
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
bool IsSortedAndUniq(const std::vector< T > &vec)
Returns true if the vector is sorted and contains each element only once.
Definition: stl-utils.h:63