This header contains a simple facility for endpointing, that should be used in conjunction with the "online2" online decoding code; see ../online2bin/online2-wav-gmm-latgen-faster-endpoint.cc. More...

#include <online-endpoint.h>

Collaboration diagram for OnlineEndpointRule:

[legend]

Public Member Functions
	OnlineEndpointRule (bool must_contain_nonsilence=true, BaseFloat min_trailing_silence=1.0, BaseFloat max_relative_cost=std::numeric_limits< BaseFloat >::infinity(), BaseFloat min_utterance_length=0.0)

void	Register (OptionsItf *opts)

void	RegisterWithPrefix (const std::string &prefix, OptionsItf *opts)

Public Attributes
bool	must_contain_nonsilence

BaseFloat	min_trailing_silence

BaseFloat	max_relative_cost

BaseFloat	min_utterance_length

Detailed Description

This header contains a simple facility for endpointing, that should be used in conjunction with the "online2" online decoding code; see ../online2bin/online2-wav-gmm-latgen-faster-endpoint.cc.

By endpointing in this context we mean "deciding when to stop decoding", and not generic speech/silence segmentation. The use-case that we have in mind is some kind of dialog system where, as more speech data comes in, we decode more and more, and we have to decide when to stop decoding.

The endpointing rule is a disjunction of conjunctions. The way we have it configured, it's an OR of five rules, and each rule has the following form:

(<contains-nonsilence> || !rule.must_contain_nonsilence) && <length-of-trailing-silence> >= rule.min_trailing_silence && <relative-cost> <= rule.max_relative_cost && <utterance-length> >= rule.min_utterance_length)

where: <contains-nonsilence> is true if the best traceback contains any nonsilence phone; <length-of-trailing-silence> is the length in seconds of silence phones at the end of the best traceback (we stop counting when we hit non-silence), <relative-cost> is a value >= 0 extracted from the decoder, that is zero if a final-state of the grammar FST had the best cost at the final frame, and infinity if no final-state was active (and >0 for in-between cases). <utterance-length> is the number of seconds of the utterance that we have decoded so far.

All of these pieces of information are obtained from the best-path traceback from the decoder, which is output by the function GetBestPath(). We do this every time we're finished processing a chunk of data. [ Note: we're changing the decoders so that GetBestPath() is efficient and does not require operations on the entire lattice. ]

For details of the default rules, see struct OnlineEndpointConfig.

It's up to the caller whether to use final-probs or not when generating the best-path, i.e. whether to call decoder.GetBestPath(&lat, (true or false)), but my recommendation is not to use them. If you do use them, then depending on the grammar, you may force the best-path to decode non-silence even though that was not what it really preferred to decode.

Definition at line 88 of file online-endpoint.h.

Constructor & Destructor Documentation

◆ OnlineEndpointRule()

OnlineEndpointRule	(	bool	must_contain_nonsilence = `true`,
		BaseFloat	min_trailing_silence = `1.0`,
		BaseFloat	max_relative_cost = `std::numeric_limits<BaseFloat>::infinity()`,
		BaseFloat	min_utterance_length = `0.0`
	)

inline

Definition at line 94 of file online-endpoint.h.

                                                           :
       must_contain_nonsilence(must_contain_nonsilence),
       min_trailing_silence(min_trailing_silence),
       max_relative_cost(max_relative_cost),
       min_utterance_length(min_utterance_length) { }

Member Function Documentation

◆ Register()

void Register ( OptionsItf * opts )

inline

Definition at line 103 of file online-endpoint.h.

References OptionsItf::Register().

Referenced by OnlineEndpointRule::RegisterWithPrefix().

                                   {
     opts->Register("must-contain-nonsilence", &must_contain_nonsilence,
                    "If true, for this endpointing rule to apply there must"
                    "be nonsilence in the best-path traceback.");
     opts->Register("min-trailing-silence", &min_trailing_silence,
                    "This endpointing rule requires duration of trailing silence"
                    "(in seconds) to be >= this value.");
     opts->Register("max-relative-cost", &max_relative_cost,
                    "This endpointing rule requires relative-cost of final-states"
                    " to be <= this value (describes how good the probability "
                    "of final-states is).");
     opts->Register("min-utterance-length", &min_utterance_length,
                    "This endpointing rule requires utterance-length (in seconds) "
                    "to be >= this value.");
   };

◆ RegisterWithPrefix()

void RegisterWithPrefix	(	const std::string &	prefix,
		OptionsItf *	opts
	)

inline

Definition at line 121 of file online-endpoint.h.

References OnlineEndpointRule::Register().

Referenced by OnlineEndpointConfig::Register().

                                                                      {
     ParseOptions po_prefix(prefix, opts);
     this->Register(&po_prefix);
   }