OnlineEndpointRule Struct Reference

This header contains a simple facility for endpointing, that should be used in conjunction with the "online2" online decoding code; see ../online2bin/ More...

#include <online-endpoint.h>

Collaboration diagram for OnlineEndpointRule:

Public Member Functions

 OnlineEndpointRule (bool must_contain_nonsilence=true, BaseFloat min_trailing_silence=1.0, BaseFloat max_relative_cost=std::numeric_limits< BaseFloat >::infinity(), BaseFloat min_utterance_length=0.0)
void Register (OptionsItf *opts)
void RegisterWithPrefix (const std::string &prefix, OptionsItf *opts)

Public Attributes

bool must_contain_nonsilence
BaseFloat min_trailing_silence
BaseFloat max_relative_cost
BaseFloat min_utterance_length

Detailed Description

This header contains a simple facility for endpointing, that should be used in conjunction with the "online2" online decoding code; see ../online2bin/

By endpointing in this context we mean "deciding when to stop decoding", and not generic speech/silence segmentation. The use-case that we have in mind is some kind of dialog system where, as more speech data comes in, we decode more and more, and we have to decide when to stop decoding.

The endpointing rule is a disjunction of conjunctions. The way we have it configured, it's an OR of five rules, and each rule has the following form:

(<contains-nonsilence> || !rule.must_contain_nonsilence) && <length-of-trailing-silence> >= rule.min_trailing_silence && <relative-cost> <= rule.max_relative_cost && <utterance-length> >= rule.min_utterance_length)

where: <contains-nonsilence> is true if the best traceback contains any nonsilence phone; <length-of-trailing-silence> is the length in seconds of silence phones at the end of the best traceback (we stop counting when we hit non-silence), <relative-cost> is a value >= 0 extracted from the decoder, that is zero if a final-state of the grammar FST had the best cost at the final frame, and infinity if no final-state was active (and >0 for in-between cases). <utterance-length> is the number of seconds of the utterance that we have decoded so far.

All of these pieces of information are obtained from the best-path traceback from the decoder, which is output by the function GetBestPath(). We do this every time we're finished processing a chunk of data. [ Note: we're changing the decoders so that GetBestPath() is efficient and does not require operations on the entire lattice. ]

For details of the default rules, see struct OnlineEndpointConfig.

It's up to the caller whether to use final-probs or not when generating the best-path, i.e. whether to call decoder.GetBestPath(&lat, (true or false)), but my recommendation is not to use them. If you do use them, then depending on the grammar, you may force the best-path to decode non-silence even though that was not what it really preferred to decode.

Definition at line 88 of file online-endpoint.h.

Constructor & Destructor Documentation

◆ OnlineEndpointRule()

OnlineEndpointRule ( bool  must_contain_nonsilence = true,
BaseFloat  min_trailing_silence = 1.0,
BaseFloat  max_relative_cost = std::numeric_limits<BaseFloat>::infinity(),
BaseFloat  min_utterance_length = 0.0 

Member Function Documentation

◆ Register()

void Register ( OptionsItf opts)

Definition at line 103 of file online-endpoint.h.

References OptionsItf::Register().

Referenced by OnlineEndpointRule::RegisterWithPrefix().

103  {
104  opts->Register("must-contain-nonsilence", &must_contain_nonsilence,
105  "If true, for this endpointing rule to apply there must"
106  "be nonsilence in the best-path traceback.");
107  opts->Register("min-trailing-silence", &min_trailing_silence,
108  "This endpointing rule requires duration of trailing silence"
109  "(in seconds) to be >= this value.");
110  opts->Register("max-relative-cost", &max_relative_cost,
111  "This endpointing rule requires relative-cost of final-states"
112  " to be <= this value (describes how good the probability "
113  "of final-states is).");
114  opts->Register("min-utterance-length", &min_utterance_length,
115  "This endpointing rule requires utterance-length (in seconds) "
116  "to be >= this value.");
117  };

◆ RegisterWithPrefix()

void RegisterWithPrefix ( const std::string &  prefix,
OptionsItf opts 

Definition at line 121 of file online-endpoint.h.

References OnlineEndpointRule::Register().

Referenced by OnlineEndpointConfig::Register().

121  {
122  ParseOptions po_prefix(prefix, opts);
123  this->Register(&po_prefix);
124  }
void Register(OptionsItf *opts)

Member Data Documentation

◆ max_relative_cost

BaseFloat max_relative_cost

Definition at line 91 of file online-endpoint.h.

Referenced by kaldi::RuleActivated().

◆ min_trailing_silence

BaseFloat min_trailing_silence

Definition at line 90 of file online-endpoint.h.

Referenced by kaldi::RuleActivated().

◆ min_utterance_length

BaseFloat min_utterance_length

Definition at line 92 of file online-endpoint.h.

Referenced by kaldi::RuleActivated().

◆ must_contain_nonsilence

bool must_contain_nonsilence

Definition at line 89 of file online-endpoint.h.

Referenced by kaldi::RuleActivated().

The documentation for this struct was generated from the following file: