OnlineFasterDecoder Class Reference

#include <online-faster-decoder.h>

Inheritance diagram for OnlineFasterDecoder:
Collaboration diagram for OnlineFasterDecoder:

Public Types

enum  DecodeState { kEndFeats = 1, kEndUtt = 2, kEndBatch = 4 }
 
- Public Types inherited from FasterDecoder
typedef fst::StdArc Arc
 
typedef Arc::Label Label
 
typedef Arc::StateId StateId
 
typedef Arc::Weight Weight
 

Public Member Functions

 OnlineFasterDecoder (const fst::Fst< fst::StdArc > &fst, const OnlineFasterDecoderOpts &opts, const std::vector< int32 > &sil_phones, const TransitionModel &trans_model)
 
DecodeState Decode (DecodableInterface *decodable)
 
bool PartialTraceback (fst::MutableFst< LatticeArc > *out_fst)
 
void FinishTraceBack (fst::MutableFst< LatticeArc > *fst_out)
 
bool EndOfUtterance ()
 
int32 frame ()
 
- Public Member Functions inherited from FasterDecoder
 FasterDecoder (const fst::Fst< fst::StdArc > &fst, const FasterDecoderOptions &config)
 
void SetOptions (const FasterDecoderOptions &config)
 
 ~FasterDecoder ()
 
void Decode (DecodableInterface *decodable)
 
bool ReachedFinal () const
 Returns true if a final state was active on the last frame. More...
 
bool GetBestPath (fst::MutableFst< LatticeArc > *fst_out, bool use_final_probs=true)
 GetBestPath gets the decoding traceback. More...
 
void InitDecoding ()
 As a new alternative to Decode(), you can call InitDecoding and then (possibly multiple times) AdvanceDecoding(). More...
 
void AdvanceDecoding (DecodableInterface *decodable, int32 max_num_frames=-1)
 This will decode until there are no more frames ready in the decodable object, but if max_num_frames is >= 0 it will decode no more than that many frames. More...
 
int32 NumFramesDecoded () const
 Returns the number of frames already decoded. More...
 

Private Member Functions

void ResetDecoder (bool full)
 
void TracebackNFrames (int32 nframes, fst::MutableFst< LatticeArc > *out_fst)
 
void MakeLattice (const Token *start, const Token *end, fst::MutableFst< LatticeArc > *out_fst) const
 
void UpdateImmortalToken ()
 
 KALDI_DISALLOW_COPY_AND_ASSIGN (OnlineFasterDecoder)
 

Private Attributes

const OnlineFasterDecoderOpts opts_
 
const ConstIntegerSet< int32silence_set_
 
const TransitionModeltrans_model_
 
const BaseFloat max_beam_
 
BaseFloateffective_beam_
 
DecodeState state_
 
int32 frame_
 
int32 utt_frames_
 
Tokenimmortal_tok_
 
Tokenprev_immortal_tok_
 

Additional Inherited Members

- Protected Types inherited from FasterDecoder
typedef HashList< StateId, Token * >::Elem Elem
 
- Protected Member Functions inherited from FasterDecoder
double GetCutoff (Elem *list_head, size_t *tok_count, BaseFloat *adaptive_beam, Elem **best_elem)
 Gets the weight cutoff. Also counts the active tokens. More...
 
void PossiblyResizeHash (size_t num_toks)
 
double ProcessEmitting (DecodableInterface *decodable)
 
void ProcessNonemitting (double cutoff)
 
void ClearToks (Elem *list)
 
 KALDI_DISALLOW_COPY_AND_ASSIGN (FasterDecoder)
 
- Protected Attributes inherited from FasterDecoder
HashList< StateId, Token * > toks_
 
const fst::Fst< fst::StdArc > & fst_
 
FasterDecoderOptions config_
 
std::vector< const Elem *> queue_
 
std::vector< BaseFloattmp_array_
 
int32 num_frames_decoded_
 

Detailed Description

Definition at line 69 of file online-faster-decoder.h.

Member Enumeration Documentation

◆ DecodeState

Enumerator
kEndFeats 
kEndUtt 
kEndBatch 

Definition at line 72 of file online-faster-decoder.h.

72  {
73  kEndFeats = 1, // No more scores are available from the Decodable
74  kEndUtt = 2, // End of utterance, caused by e.g. a sufficiently long silence
75  kEndBatch = 4 // End of batch - end of utterance not reached yet
76  };

Constructor & Destructor Documentation

◆ OnlineFasterDecoder()

OnlineFasterDecoder ( const fst::Fst< fst::StdArc > &  fst,
const OnlineFasterDecoderOpts opts,
const std::vector< int32 > &  sil_phones,
const TransitionModel trans_model 
)
inline

Definition at line 79 of file online-faster-decoder.h.

83  : FasterDecoder(fst, opts), opts_(opts),
84  silence_set_(sil_phones), trans_model_(trans_model),
FasterDecoderOptions config_
For an extended explanation of the framework of which grammar-fsts are a part, please see Support for...
Definition: graph.dox:21
const ConstIntegerSet< int32 > silence_set_
FasterDecoder(const fst::Fst< fst::StdArc > &fst, const FasterDecoderOptions &config)
const TransitionModel & trans_model_
const OnlineFasterDecoderOpts opts_

Member Function Documentation

◆ Decode()

Definition at line 231 of file online-faster-decoder.cc.

References OnlineFasterDecoderOpts::batch_size, OnlineFasterDecoderOpts::beam_update, OnlineFasterDecoder::effective_beam_, Timer::Elapsed(), OnlineFasterDecoder::EndOfUtterance(), OnlineFasterDecoder::frame_, DecodableInterface::IsLastFrame(), KALDI_VLOG, OnlineFasterDecoder::kEndBatch, OnlineFasterDecoder::kEndFeats, OnlineFasterDecoder::kEndUtt, OnlineFasterDecoder::max_beam_, OnlineFasterDecoderOpts::max_beam_update, OnlineFasterDecoder::opts_, FasterDecoder::ProcessEmitting(), FasterDecoder::ProcessNonemitting(), OnlineFasterDecoder::ResetDecoder(), OnlineFasterDecoderOpts::rt_max, OnlineFasterDecoderOpts::rt_min, OnlineFasterDecoder::state_, OnlineFasterDecoderOpts::update_interval, and OnlineFasterDecoder::utt_frames_.

Referenced by main().

231  {
232  if (state_ == kEndFeats || state_ == kEndUtt) // new utterance
234  ProcessNonemitting(std::numeric_limits<float>::max());
235  int32 batch_frame = 0;
236  Timer timer;
237  double64 tstart = timer.Elapsed(), tstart_batch = tstart;
238  BaseFloat factor = -1;
239  for (; !decodable->IsLastFrame(frame_ - 1) && batch_frame < opts_.batch_size;
240  ++frame_, ++utt_frames_, ++batch_frame) {
241  if (batch_frame != 0 && (batch_frame % opts_.update_interval) == 0) {
242  // adjust the beam if needed
243  BaseFloat tend = timer.Elapsed();
244  BaseFloat elapsed = (tend - tstart) * 1000;
245  // warning: hardcoded 10ms frames assumption!
246  factor = elapsed / (opts_.rt_max * opts_.update_interval * 10);
247  BaseFloat min_factor = (opts_.rt_min / opts_.rt_max);
248  if (factor > 1 || factor < min_factor) {
249  BaseFloat update_factor = (factor > 1)?
250  -std::min(opts_.beam_update * factor, opts_.max_beam_update):
251  std::min(opts_.beam_update / factor, opts_.max_beam_update);
252  effective_beam_ += effective_beam_ * update_factor;
254  }
255  tstart = tend;
256  }
257  if (batch_frame != 0 && (frame_ % 200) == 0)
258  // one log message at every 2 seconds assuming 10ms frames
259  KALDI_VLOG(3) << "Beam: " << effective_beam_
260  << "; Speed: "
261  << ((timer.Elapsed() - tstart_batch) * 1000) / (batch_frame*10)
262  << " xRT";
263  BaseFloat weight_cutoff = ProcessEmitting(decodable);
264  ProcessNonemitting(weight_cutoff);
265  }
266  if (batch_frame == opts_.batch_size && !decodable->IsLastFrame(frame_ - 1)) {
267  if (EndOfUtterance())
268  state_ = kEndUtt;
269  else
270  state_ = kEndBatch;
271  } else {
272  state_ = kEndFeats;
273  }
274  return state_;
275 }
kaldi::int32 int32
float BaseFloat
Definition: kaldi-types.h:29
void ProcessNonemitting(double cutoff)
double double64
Definition: kaldi-types.h:54
#define KALDI_VLOG(v)
Definition: kaldi-error.h:156
double ProcessEmitting(DecodableInterface *decodable)
const OnlineFasterDecoderOpts opts_

◆ EndOfUtterance()

bool EndOfUtterance ( )

Definition at line 210 of file online-faster-decoder.cc.

References ConstIntegerSet< I >::count(), fst::GetLinearSymbolSequence(), rnnlm::i, OnlineFasterDecoderOpts::inter_utt_sil, OnlineFasterDecoderOpts::max_utt_len_, OnlineFasterDecoder::opts_, OnlineFasterDecoder::silence_set_, kaldi::SplitToPhones(), OnlineFasterDecoder::TracebackNFrames(), OnlineFasterDecoder::trans_model_, TransitionModel::TransitionIdToPhone(), and OnlineFasterDecoder::utt_frames_.

Referenced by OnlineFasterDecoder::Decode().

210  {
211  fst::VectorFst<LatticeArc> trace;
213  TracebackNFrames(sil_frm, &trace);
214  std::vector<int32> isymbols;
215  fst::GetLinearSymbolSequence(trace, &isymbols,
216  static_cast<std::vector<int32>* >(0),
217  static_cast<LatticeArc::Weight*>(0));
218  std::vector<std::vector<int32> > split;
219  SplitToPhones(trans_model_, isymbols, &split);
220  for (size_t i = 0; i < split.size(); i++) {
221  int32 tid = split[i][0];
223  if (silence_set_.count(phone) == 0)
224  return false;
225  }
226  return true;
227 }
kaldi::int32 int32
bool GetLinearSymbolSequence(const Fst< Arc > &fst, std::vector< I > *isymbols_out, std::vector< I > *osymbols_out, typename Arc::Weight *tot_weight_out)
GetLinearSymbolSequence gets the symbol sequence from a linear FST.
bool SplitToPhones(const TransitionModel &trans_model, const std::vector< int32 > &alignment, std::vector< std::vector< int32 > > *split_alignment)
SplitToPhones splits up the TransitionIds in "alignment" into their individual phones (one vector per...
Definition: hmm-utils.cc:723
const ConstIntegerSet< int32 > silence_set_
void TracebackNFrames(int32 nframes, fst::MutableFst< LatticeArc > *out_fst)
const TransitionModel & trans_model_
int32 TransitionIdToPhone(int32 trans_id) const
const OnlineFasterDecoderOpts opts_

◆ FinishTraceBack()

void FinishTraceBack ( fst::MutableFst< LatticeArc > *  fst_out)

Definition at line 136 of file online-faster-decoder.cc.

References FasterDecoder::fst_, HashList< I, T >::GetList(), OnlineFasterDecoder::immortal_tok_, OnlineFasterDecoder::MakeLattice(), FasterDecoder::ReachedFinal(), HashList< I, T >::Elem::tail, and FasterDecoder::toks_.

Referenced by main().

136  {
137  Token *best_tok = NULL;
138  bool is_final = ReachedFinal();
139  if (!is_final) {
140  for (const Elem *e = toks_.GetList(); e != NULL; e = e->tail)
141  if (best_tok == NULL || *best_tok < *(e->val) )
142  best_tok = e->val;
143  } else {
144  double best_cost = std::numeric_limits<double>::infinity();
145  for (const Elem *e = toks_.GetList(); e != NULL; e = e->tail) {
146  double this_cost = e->val->cost_ + fst_.Final(e->key).Value();
147  if (this_cost != std::numeric_limits<double>::infinity() &&
148  this_cost < best_cost) {
149  best_cost = this_cost;
150  best_tok = e->val;
151  }
152  }
153  }
154  MakeLattice(best_tok, immortal_tok_, out_fst);
155 }
const fst::Fst< fst::StdArc > & fst_
void MakeLattice(const Token *start, const Token *end, fst::MutableFst< LatticeArc > *out_fst) const
const Elem * GetList() const
Gives the head of the current list to the user.
Definition: hash-list-inl.h:61
HashList< StateId, Token * > toks_
HashList< StateId, Token * >::Elem Elem
bool ReachedFinal() const
Returns true if a final state was active on the last frame.

◆ frame()

int32 frame ( )
inline

Definition at line 102 of file online-faster-decoder.h.

References kaldi::full.

Referenced by main().

◆ KALDI_DISALLOW_COPY_AND_ASSIGN()

KALDI_DISALLOW_COPY_AND_ASSIGN ( OnlineFasterDecoder  )
private

◆ MakeLattice()

void MakeLattice ( const Token start,
const Token end,
fst::MutableFst< LatticeArc > *  out_fst 
) const
private

Definition at line 45 of file online-faster-decoder.cc.

References FasterDecoder::Token::arc_, FasterDecoder::Token::cost_, FasterDecoder::fst_, rnnlm::i, LatticeWeightTpl< BaseFloat >::One(), FasterDecoder::Token::prev_, and fst::RemoveEpsLocal().

Referenced by OnlineFasterDecoder::FinishTraceBack(), and OnlineFasterDecoder::PartialTraceback().

47  {
48  out_fst->DeleteStates();
49  if (start == NULL) return;
50  bool is_final = false;
51  double this_cost = start->cost_ + fst_.Final(start->arc_.nextstate).Value();
52  if (this_cost != std::numeric_limits<double>::infinity())
53  is_final = true;
54  std::vector<LatticeArc> arcs_reverse; // arcs in reverse order.
55  for (const Token *tok = start; tok != end; tok = tok->prev_) {
56  BaseFloat tot_cost = tok->cost_ -
57  (tok->prev_ ? tok->prev_->cost_ : 0.0),
58  graph_cost = tok->arc_.weight.Value(),
59  ac_cost = tot_cost - graph_cost;
60  LatticeArc l_arc(tok->arc_.ilabel,
61  tok->arc_.olabel,
62  LatticeWeight(graph_cost, ac_cost),
63  tok->arc_.nextstate);
64  arcs_reverse.push_back(l_arc);
65  }
66  if(arcs_reverse.back().nextstate == fst_.Start()) {
67  arcs_reverse.pop_back(); // that was a "fake" token... gives no info.
68  }
69  StateId cur_state = out_fst->AddState();
70  out_fst->SetStart(cur_state);
71  for (ssize_t i = static_cast<ssize_t>(arcs_reverse.size())-1; i >= 0; i--) {
72  LatticeArc arc = arcs_reverse[i];
73  arc.nextstate = out_fst->AddState();
74  out_fst->AddArc(cur_state, arc);
75  cur_state = arc.nextstate;
76  }
77  if (is_final) {
78  Weight final_weight = fst_.Final(start->arc_.nextstate);
79  out_fst->SetFinal(cur_state, LatticeWeight(final_weight.Value(), 0.0));
80  } else {
81  out_fst->SetFinal(cur_state, LatticeWeight::One());
82  }
83  RemoveEpsLocal(out_fst);
84 }
fst::ArcTpl< LatticeWeight > LatticeArc
Definition: kaldi-lattice.h:40
static const LatticeWeightTpl One()
void RemoveEpsLocal(MutableFst< Arc > *fst)
RemoveEpsLocal remove some (but not necessarily all) epsilons in an FST, using an algorithm that is g...
fst::LatticeWeightTpl< BaseFloat > LatticeWeight
Definition: kaldi-lattice.h:32
const fst::Fst< fst::StdArc > & fst_
float BaseFloat
Definition: kaldi-types.h:29
Arc::StateId StateId

◆ PartialTraceback()

bool PartialTraceback ( fst::MutableFst< LatticeArc > *  out_fst)

Definition at line 126 of file online-faster-decoder.cc.

References OnlineFasterDecoder::immortal_tok_, OnlineFasterDecoder::MakeLattice(), OnlineFasterDecoder::prev_immortal_tok_, and OnlineFasterDecoder::UpdateImmortalToken().

Referenced by main().

126  {
129  return false; //no partial traceback at that point of time
131  return true;
132 }
void MakeLattice(const Token *start, const Token *end, fst::MutableFst< LatticeArc > *out_fst) const

◆ ResetDecoder()

void ResetDecoder ( bool  full)
private

Definition at line 30 of file online-faster-decoder.cc.

References HashList< I, T >::Clear(), FasterDecoder::ClearToks(), OnlineFasterDecoder::frame_, FasterDecoder::fst_, OnlineFasterDecoder::immortal_tok_, HashList< I, T >::Insert(), KALDI_ASSERT, OnlineFasterDecoder::prev_immortal_tok_, FasterDecoder::toks_, and OnlineFasterDecoder::utt_frames_.

Referenced by OnlineFasterDecoder::Decode().

30  {
32  StateId start_state = fst_.Start();
33  KALDI_ASSERT(start_state != fst::kNoStateId);
34  Arc dummy_arc(0, 0, Weight::One(), start_state);
35  Token *dummy_token = new Token(dummy_arc, NULL);
36  toks_.Insert(start_state, dummy_token);
37  prev_immortal_tok_ = immortal_tok_ = dummy_token;
38  utt_frames_ = 0;
39  if (full)
40  frame_ = 0;
41 }
Elem * Insert(I key, T val)
Insert inserts a new element into the hashtable/stored list.
void ClearToks(Elem *list)
const fst::Fst< fst::StdArc > & fst_
Elem * Clear()
Clears the hash and gives the head of the current list to the user; ownership is transferred to the u...
Definition: hash-list-inl.h:46
HashList< StateId, Token * > toks_
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
Arc::StateId StateId

◆ TracebackNFrames()

void TracebackNFrames ( int32  nframes,
fst::MutableFst< LatticeArc > *  out_fst 
)
private

Definition at line 159 of file online-faster-decoder.cc.

References FasterDecoder::Token::arc_, FasterDecoder::Token::cost_, FasterDecoder::fst_, HashList< I, T >::GetList(), rnnlm::i, LatticeWeightTpl< BaseFloat >::One(), FasterDecoder::Token::prev_, fst::RemoveEpsLocal(), HashList< I, T >::Elem::tail, and FasterDecoder::toks_.

Referenced by OnlineFasterDecoder::EndOfUtterance().

160  {
161  Token *best_tok = NULL;
162  for (const Elem *e = toks_.GetList(); e != NULL; e = e->tail)
163  if (best_tok == NULL || *best_tok < *(e->val) )
164  best_tok = e->val;
165  if (best_tok == NULL) {
166  out_fst->DeleteStates();
167  return;
168  }
169 
170  bool is_final = false;
171  double this_cost = best_tok->cost_ +
172  fst_.Final(best_tok->arc_.nextstate).Value();
173 
174  if (this_cost != std::numeric_limits<double>::infinity())
175  is_final = true;
176  std::vector<LatticeArc> arcs_reverse; // arcs in reverse order.
177  for (Token *tok = best_tok; (tok != NULL) && (nframes > 0); tok = tok->prev_) {
178  if (tok->arc_.ilabel != 0) // count only the non-epsilon arcs
179  --nframes;
180  BaseFloat tot_cost = tok->cost_ -
181  (tok->prev_ ? tok->prev_->cost_ : 0.0);
182  BaseFloat graph_cost = tok->arc_.weight.Value();
183  BaseFloat ac_cost = tot_cost - graph_cost;
184  LatticeArc larc(tok->arc_.ilabel,
185  tok->arc_.olabel,
186  LatticeWeight(graph_cost, ac_cost),
187  tok->arc_.nextstate);
188  arcs_reverse.push_back(larc);
189  }
190  if(arcs_reverse.back().nextstate == fst_.Start())
191  arcs_reverse.pop_back(); // that was a "fake" token... gives no info.
192  StateId cur_state = out_fst->AddState();
193  out_fst->SetStart(cur_state);
194  for (ssize_t i = static_cast<ssize_t>(arcs_reverse.size())-1; i >= 0; i--) {
195  LatticeArc arc = arcs_reverse[i];
196  arc.nextstate = out_fst->AddState();
197  out_fst->AddArc(cur_state, arc);
198  cur_state = arc.nextstate;
199  }
200  if (is_final) {
201  Weight final_weight = fst_.Final(best_tok->arc_.nextstate);
202  out_fst->SetFinal(cur_state, LatticeWeight(final_weight.Value(), 0.0));
203  } else {
204  out_fst->SetFinal(cur_state, LatticeWeight::One());
205  }
206  RemoveEpsLocal(out_fst);
207 }
fst::ArcTpl< LatticeWeight > LatticeArc
Definition: kaldi-lattice.h:40
static const LatticeWeightTpl One()
void RemoveEpsLocal(MutableFst< Arc > *fst)
RemoveEpsLocal remove some (but not necessarily all) epsilons in an FST, using an algorithm that is g...
fst::LatticeWeightTpl< BaseFloat > LatticeWeight
Definition: kaldi-lattice.h:32
const fst::Fst< fst::StdArc > & fst_
float BaseFloat
Definition: kaldi-types.h:29
const Elem * GetList() const
Gives the head of the current list to the user.
Definition: hash-list-inl.h:61
HashList< StateId, Token * > toks_
HashList< StateId, Token * >::Elem Elem
Arc::StateId StateId

◆ UpdateImmortalToken()

void UpdateImmortalToken ( )
private

Definition at line 87 of file online-faster-decoder.cc.

References FasterDecoder::Token::arc_, HashList< I, T >::GetList(), OnlineFasterDecoder::immortal_tok_, FasterDecoder::Token::prev_, OnlineFasterDecoder::prev_immortal_tok_, HashList< I, T >::Elem::tail, and FasterDecoder::toks_.

Referenced by OnlineFasterDecoder::PartialTraceback().

87  {
88  unordered_set<Token*> emitting;
89  for (const Elem *e = toks_.GetList(); e != NULL; e = e->tail) {
90  Token* tok = e->val;
91  while (tok != NULL && tok->arc_.ilabel == 0) //deal with non-emitting ones ...
92  tok = tok->prev_;
93  if (tok != NULL)
94  emitting.insert(tok);
95  }
96  Token* the_one = NULL;
97  while (1) {
98  if (emitting.size() == 1) {
99  the_one = *(emitting.begin());
100  break;
101  }
102  if (emitting.size() == 0)
103  break;
104  unordered_set<Token*> prev_emitting;
105  unordered_set<Token*>::iterator it;
106  for (it = emitting.begin(); it != emitting.end(); ++it) {
107  Token* tok = *it;
108  Token* prev_token = tok->prev_;
109  while ((prev_token != NULL) && (prev_token->arc_.ilabel == 0))
110  prev_token = prev_token->prev_; //deal with non-emitting ones
111  if (prev_token == NULL)
112  continue;
113  prev_emitting.insert(prev_token);
114  } // for
115  emitting = prev_emitting;
116  } // while
117  if (the_one != NULL) {
119  immortal_tok_ = the_one;
120  return;
121  }
122 }
const Elem * GetList() const
Gives the head of the current list to the user.
Definition: hash-list-inl.h:61
HashList< StateId, Token * > toks_
HashList< StateId, Token * >::Elem Elem

Member Data Documentation

◆ effective_beam_

BaseFloat& effective_beam_
private

Definition at line 123 of file online-faster-decoder.h.

Referenced by OnlineFasterDecoder::Decode().

◆ frame_

int32 frame_
private

◆ immortal_tok_

◆ max_beam_

const BaseFloat max_beam_
private

Definition at line 122 of file online-faster-decoder.h.

Referenced by OnlineFasterDecoder::Decode().

◆ opts_

◆ prev_immortal_tok_

◆ silence_set_

const ConstIntegerSet<int32> silence_set_
private

Definition at line 120 of file online-faster-decoder.h.

Referenced by OnlineFasterDecoder::EndOfUtterance().

◆ state_

DecodeState state_
private

Definition at line 124 of file online-faster-decoder.h.

Referenced by OnlineFasterDecoder::Decode().

◆ trans_model_

const TransitionModel& trans_model_
private

Definition at line 121 of file online-faster-decoder.h.

Referenced by OnlineFasterDecoder::EndOfUtterance().

◆ utt_frames_


The documentation for this class was generated from the following files: