All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
BiglmFasterDecoder Class Reference

This is as FasterDecoder, but does online composition between HCLG and the "difference language model", which is a deterministic FST that represents the difference between the language model you want and the language model you compiled HCLG with. More...

#include <biglm-faster-decoder.h>

Collaboration diagram for BiglmFasterDecoder:

Classes

class  Token
 

Public Types

typedef fst::StdArc Arc
 
typedef Arc::Label Label
 
typedef Arc::StateId StateId
 
typedef uint64 PairId
 
typedef Arc::Weight Weight
 

Public Member Functions

 BiglmFasterDecoder (const fst::Fst< fst::StdArc > &fst, const BiglmFasterDecoderOptions &opts, fst::DeterministicOnDemandFst< fst::StdArc > *lm_diff_fst)
 
void SetOptions (const BiglmFasterDecoderOptions &opts)
 
 ~BiglmFasterDecoder ()
 
void Decode (DecodableInterface *decodable)
 
bool ReachedFinal ()
 
bool GetBestPath (fst::MutableFst< LatticeArc > *fst_out, bool use_final_probs=true)
 

Private Types

typedef HashList< PairId,
Token * >::Elem 
Elem
 

Private Member Functions

PairId ConstructPair (StateId fst_state, StateId lm_state)
 
BaseFloat GetCutoff (Elem *list_head, size_t *tok_count, BaseFloat *adaptive_beam, Elem **best_elem)
 Gets the weight cutoff. Also counts the active tokens. More...
 
void PossiblyResizeHash (size_t num_toks)
 
StateId PropagateLm (StateId lm_state, Arc *arc)
 
BaseFloat ProcessEmitting (DecodableInterface *decodable, int frame)
 
void ProcessNonemitting (BaseFloat cutoff)
 
void ClearToks (Elem *list)
 
 KALDI_DISALLOW_COPY_AND_ASSIGN (BiglmFasterDecoder)
 

Static Private Member Functions

static StateId PairToState (PairId state_pair)
 
static StateId PairToLmState (PairId state_pair)
 

Private Attributes

HashList< PairId, Token * > toks_
 
const fst::Fst< fst::StdArc > & fst_
 
fst::DeterministicOnDemandFst
< fst::StdArc > * 
lm_diff_fst_
 
BiglmFasterDecoderOptions opts_
 
bool warned_noarc_
 
std::vector< PairIdqueue_
 
std::vector< BaseFloattmp_array_
 

Detailed Description

This is as FasterDecoder, but does online composition between HCLG and the "difference language model", which is a deterministic FST that represents the difference between the language model you want and the language model you compiled HCLG with.

The class DeterministicOnDemandFst follows through the epsilons in G for you (assuming G is a standard backoff language model) and makes it look like a determinized FST. Actually, in practice, DeterministicOnDemandFst operates in a mode where it composes two G's together; one has negated likelihoods and works by removing the LM probabilities that you made HCLG with, and one is the language model you want to use.

Definition at line 51 of file biglm-faster-decoder.h.

Member Typedef Documentation

typedef fst::StdArc Arc

Definition at line 53 of file biglm-faster-decoder.h.

typedef HashList<PairId, Token*>::Elem Elem
private

Definition at line 238 of file biglm-faster-decoder.h.

typedef Arc::Label Label

Definition at line 54 of file biglm-faster-decoder.h.

typedef uint64 PairId

Definition at line 57 of file biglm-faster-decoder.h.

typedef Arc::StateId StateId

Definition at line 55 of file biglm-faster-decoder.h.

typedef Arc::Weight Weight

Definition at line 58 of file biglm-faster-decoder.h.

Constructor & Destructor Documentation

BiglmFasterDecoder ( const fst::Fst< fst::StdArc > &  fst,
const BiglmFasterDecoderOptions opts,
fst::DeterministicOnDemandFst< fst::StdArc > *  lm_diff_fst 
)
inline

Definition at line 70 of file biglm-faster-decoder.h.

References FasterDecoderOptions::hash_ratio, KALDI_ASSERT, FasterDecoderOptions::max_active, BiglmFasterDecoder::opts_, DeterministicOnDemandFst< Arc >::Start(), and BiglmFasterDecoder::toks_.

72  :
73  fst_(fst), lm_diff_fst_(lm_diff_fst), opts_(opts), warned_noarc_(false) {
74  KALDI_ASSERT(opts_.hash_ratio >= 1.0); // less doesn't make much sense.
76  KALDI_ASSERT(fst.Start() != fst::kNoStateId &&
77  lm_diff_fst->Start() != fst::kNoStateId);
78  toks_.SetSize(1000); // just so on the first frame we do something reasonable.
79  }
HashList< PairId, Token * > toks_
Definition: graph.dox:21
virtual StateId Start()=0
const fst::Fst< fst::StdArc > & fst_
fst::DeterministicOnDemandFst< fst::StdArc > * lm_diff_fst_
BiglmFasterDecoderOptions opts_
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
~BiglmFasterDecoder ( )
inline

Definition at line 83 of file biglm-faster-decoder.h.

References BiglmFasterDecoder::ClearToks(), and BiglmFasterDecoder::toks_.

83  {
84  ClearToks(toks_.Clear());
85  }
HashList< PairId, Token * > toks_

Member Function Documentation

void ClearToks ( Elem list)
inlineprivate

Definition at line 489 of file biglm-faster-decoder.h.

References HashList< I, T >::Delete(), and BiglmFasterDecoder::Token::TokenDelete().

Referenced by BiglmFasterDecoder::Decode(), and BiglmFasterDecoder::~BiglmFasterDecoder().

489  {
490  for (Elem *e = list, *e_tail; e != NULL; e = e_tail) {
491  Token::TokenDelete(e->val);
492  e_tail = e->tail;
493  toks_.Delete(e);
494  }
495  }
HashList< PairId, Token * > toks_
HashList< PairId, Token * >::Elem Elem
PairId ConstructPair ( StateId  fst_state,
StateId  lm_state 
)
inlineprivate

Definition at line 183 of file biglm-faster-decoder.h.

Referenced by BiglmFasterDecoder::Decode(), BiglmFasterDecoder::ProcessEmitting(), and BiglmFasterDecoder::ProcessNonemitting().

183  {
184  return static_cast<PairId>(fst_state) + (static_cast<PairId>(lm_state) << 32);
185  }
void Decode ( DecodableInterface decodable)
inline

Definition at line 87 of file biglm-faster-decoder.h.

References BiglmFasterDecoder::ClearToks(), BiglmFasterDecoder::ConstructPair(), BiglmFasterDecoder::fst_, DecodableInterface::IsLastFrame(), BiglmFasterDecoder::lm_diff_fst_, BiglmFasterDecoder::ProcessEmitting(), BiglmFasterDecoder::ProcessNonemitting(), DeterministicOnDemandFst< Arc >::Start(), and BiglmFasterDecoder::toks_.

Referenced by main().

87  {
88  // clean up from last time:
89  ClearToks(toks_.Clear());
90  PairId start_pair = ConstructPair(fst_.Start(), lm_diff_fst_->Start());
91  Arc dummy_arc(0, 0, Weight::One(), fst_.Start()); // actually, the last element of
92  // the Arcs (fst_.Start(), here) is never needed.
93  toks_.Insert(start_pair, new Token(dummy_arc, NULL));
94  ProcessNonemitting(std::numeric_limits<float>::max());
95  for (int32 frame = 0; !decodable->IsLastFrame(frame-1); frame++) {
96  BaseFloat weight_cutoff = ProcessEmitting(decodable, frame);
97  ProcessNonemitting(weight_cutoff);
98  }
99  }
HashList< PairId, Token * > toks_
virtual StateId Start()=0
const fst::Fst< fst::StdArc > & fst_
float BaseFloat
Definition: kaldi-types.h:29
void ProcessNonemitting(BaseFloat cutoff)
PairId ConstructPair(StateId fst_state, StateId lm_state)
fst::DeterministicOnDemandFst< fst::StdArc > * lm_diff_fst_
BaseFloat ProcessEmitting(DecodableInterface *decodable, int frame)
bool GetBestPath ( fst::MutableFst< LatticeArc > *  fst_out,
bool  use_final_probs = true 
)
inline

Definition at line 115 of file biglm-faster-decoder.h.

References DeterministicOnDemandFst< Arc >::Final(), BiglmFasterDecoder::fst_, rnnlm::i, KALDI_ASSERT, BiglmFasterDecoder::lm_diff_fst_, LatticeWeightTpl< BaseFloat >::One(), BiglmFasterDecoder::PairToLmState(), BiglmFasterDecoder::PairToState(), BiglmFasterDecoder::Token::prev_, BiglmFasterDecoder::ReachedFinal(), fst::RemoveEpsLocal(), fst::Times(), BiglmFasterDecoder::toks_, and BiglmFasterDecoder::Token::weight_.

Referenced by main().

116  {
117  // GetBestPath gets the decoding output. If "use_final_probs" is true
118  // AND we reached a final state, it limits itself to final states;
119  // otherwise it gets the most likely token not taking into
120  // account final-probs. fst_out will be empty (Start() == kNoStateId) if
121  // nothing was available. It returns true if it got output (thus, fst_out
122  // will be nonempty).
123  fst_out->DeleteStates();
124  Token *best_tok = NULL;
125  Weight best_final; // only set if is_final == true. The final-prob corresponding
126  // to the best final token (i.e. the one with best weight best_weight, below).
127  bool is_final = ReachedFinal();
128  if (!is_final) {
129  for (const Elem *e = toks_.GetList(); e != NULL; e = e->tail)
130  if (best_tok == NULL || *best_tok < *(e->val) )
131  best_tok = e->val;
132  } else {
133  Weight best_weight = Weight::Zero();
134  for (const Elem *e = toks_.GetList(); e != NULL; e = e->tail) {
135  Weight fst_final = fst_.Final(PairToState(e->key)),
136  lm_final = lm_diff_fst_->Final(PairToLmState(e->key)),
137  final = Times(fst_final, lm_final);
138  Weight this_weight = Times(e->val->weight_, final);
139  if (this_weight != Weight::Zero() &&
140  this_weight.Value() < best_weight.Value()) {
141  best_weight = this_weight;
142  best_final = final;
143  best_tok = e->val;
144  }
145  }
146  }
147  if (best_tok == NULL) return false; // No output.
148 
149  std::vector<LatticeArc> arcs_reverse; // arcs in reverse order.
150 
151  for (Token *tok = best_tok; tok != NULL; tok = tok->prev_) {
152  BaseFloat tot_cost = tok->weight_.Value() -
153  (tok->prev_ ? tok->prev_->weight_.Value() : 0.0),
154  graph_cost = tok->arc_.weight.Value(),
155  ac_cost = tot_cost - graph_cost;
156  LatticeArc l_arc(tok->arc_.ilabel,
157  tok->arc_.olabel,
158  LatticeWeight(graph_cost, ac_cost),
159  tok->arc_.nextstate);
160  arcs_reverse.push_back(l_arc);
161  }
162  KALDI_ASSERT(arcs_reverse.back().nextstate == fst_.Start());
163  arcs_reverse.pop_back(); // that was a "fake" token... gives no info.
164 
165  StateId cur_state = fst_out->AddState();
166  fst_out->SetStart(cur_state);
167  for (ssize_t i = static_cast<ssize_t>(arcs_reverse.size())-1; i >= 0; i--) {
168  LatticeArc arc = arcs_reverse[i];
169  arc.nextstate = fst_out->AddState();
170  fst_out->AddArc(cur_state, arc);
171  cur_state = arc.nextstate;
172  }
173  if (is_final && use_final_probs) {
174  fst_out->SetFinal(cur_state, LatticeWeight(best_final.Value(), 0.0));
175  } else {
176  fst_out->SetFinal(cur_state, LatticeWeight::One());
177  }
178  RemoveEpsLocal(fst_out);
179  return true;
180  }
fst::ArcTpl< LatticeWeight > LatticeArc
Definition: kaldi-lattice.h:40
HashList< PairId, Token * > toks_
virtual Weight Final(StateId s)=0
void RemoveEpsLocal(MutableFst< Arc > *fst)
RemoveEpsLocal remove some (but not necessarily all) epsilons in an FST, using an algorithm that is g...
const fst::Fst< fst::StdArc > & fst_
fst::LatticeWeightTpl< BaseFloat > LatticeWeight
Definition: kaldi-lattice.h:32
LatticeWeightTpl< FloatType > Times(const LatticeWeightTpl< FloatType > &w1, const LatticeWeightTpl< FloatType > &w2)
float BaseFloat
Definition: kaldi-types.h:29
static const LatticeWeightTpl One()
fst::DeterministicOnDemandFst< fst::StdArc > * lm_diff_fst_
static StateId PairToLmState(PairId state_pair)
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
static StateId PairToState(PairId state_pair)
HashList< PairId, Token * >::Elem Elem
BaseFloat GetCutoff ( Elem list_head,
size_t *  tok_count,
BaseFloat adaptive_beam,
Elem **  best_elem 
)
inlineprivate

Gets the weight cutoff. Also counts the active tokens.

Definition at line 242 of file biglm-faster-decoder.h.

References FasterDecoderOptions::beam, FasterDecoderOptions::beam_delta, count, FasterDecoderOptions::max_active, FasterDecoderOptions::min_active, BiglmFasterDecoder::opts_, HashList< I, T >::Elem::tail, and BiglmFasterDecoder::tmp_array_.

Referenced by BiglmFasterDecoder::ProcessEmitting().

243  {
244  BaseFloat best_weight = 1.0e+10; // positive == high cost == bad.
245  size_t count = 0;
246  if (opts_.max_active == std::numeric_limits<int32>::max() &&
247  opts_.min_active == 0) {
248  for (Elem *e = list_head; e != NULL; e = e->tail, count++) {
249  BaseFloat w = static_cast<BaseFloat>(e->val->weight_.Value());
250  if (w < best_weight) {
251  best_weight = w;
252  if (best_elem) *best_elem = e;
253  }
254  }
255  if (tok_count != NULL) *tok_count = count;
256  if (adaptive_beam != NULL) *adaptive_beam = opts_.beam;
257  return best_weight + opts_.beam;
258  } else {
259  tmp_array_.clear();
260  for (Elem *e = list_head; e != NULL; e = e->tail, count++) {
261  BaseFloat w = e->val->weight_.Value();
262  tmp_array_.push_back(w);
263  if (w < best_weight) {
264  best_weight = w;
265  if (best_elem) *best_elem = e;
266  }
267  }
268  if (tok_count != NULL) *tok_count = count;
269 
270  BaseFloat beam_cutoff = best_weight + opts_.beam,
271  min_active_cutoff = std::numeric_limits<BaseFloat>::infinity(),
272  max_active_cutoff = std::numeric_limits<BaseFloat>::infinity();
273 
274  if (tmp_array_.size() > static_cast<size_t>(opts_.max_active)) {
275  std::nth_element(tmp_array_.begin(),
276  tmp_array_.begin() + opts_.max_active,
277  tmp_array_.end());
278  max_active_cutoff = tmp_array_[opts_.max_active];
279  }
280  if (tmp_array_.size() > static_cast<size_t>(opts_.min_active)) {
281  if (opts_.min_active == 0) min_active_cutoff = best_weight;
282  else {
283  std::nth_element(tmp_array_.begin(),
284  tmp_array_.begin() + opts_.min_active,
285  tmp_array_.size() > static_cast<size_t>(opts_.max_active) ?
286  tmp_array_.begin() + opts_.max_active :
287  tmp_array_.end());
288  min_active_cutoff = tmp_array_[opts_.min_active];
289  }
290  }
291 
292  if (max_active_cutoff < beam_cutoff) { // max_active is tighter than beam.
293  if (adaptive_beam)
294  *adaptive_beam = max_active_cutoff - best_weight + opts_.beam_delta;
295  return max_active_cutoff;
296  } else if (min_active_cutoff > beam_cutoff) { // min_active is looser than beam.
297  if (adaptive_beam)
298  *adaptive_beam = min_active_cutoff - best_weight + opts_.beam_delta;
299  return min_active_cutoff;
300  } else {
301  *adaptive_beam = opts_.beam;
302  return beam_cutoff;
303  }
304  }
305  }
const size_t count
float BaseFloat
Definition: kaldi-types.h:29
std::vector< BaseFloat > tmp_array_
BiglmFasterDecoderOptions opts_
HashList< PairId, Token * >::Elem Elem
KALDI_DISALLOW_COPY_AND_ASSIGN ( BiglmFasterDecoder  )
private
static StateId PairToLmState ( PairId  state_pair)
inlinestaticprivate

Definition at line 190 of file biglm-faster-decoder.h.

Referenced by BiglmFasterDecoder::GetBestPath(), BiglmFasterDecoder::ProcessEmitting(), BiglmFasterDecoder::ProcessNonemitting(), and BiglmFasterDecoder::ReachedFinal().

190  {
191  return static_cast<StateId>(static_cast<uint32>(state_pair >> 32));
192  }
static StateId PairToState ( PairId  state_pair)
inlinestaticprivate

Definition at line 187 of file biglm-faster-decoder.h.

Referenced by BiglmFasterDecoder::GetBestPath(), BiglmFasterDecoder::ProcessEmitting(), BiglmFasterDecoder::ProcessNonemitting(), and BiglmFasterDecoder::ReachedFinal().

187  {
188  return static_cast<StateId>(static_cast<uint32>(state_pair));
189  }
void PossiblyResizeHash ( size_t  num_toks)
inlineprivate

Definition at line 307 of file biglm-faster-decoder.h.

References FasterDecoderOptions::hash_ratio, BiglmFasterDecoder::opts_, and BiglmFasterDecoder::toks_.

Referenced by BiglmFasterDecoder::ProcessEmitting().

307  {
308  size_t new_sz = static_cast<size_t>(static_cast<BaseFloat>(num_toks)
309  * opts_.hash_ratio);
310  if (new_sz > toks_.Size()) {
311  toks_.SetSize(new_sz);
312  }
313  }
HashList< PairId, Token * > toks_
float BaseFloat
Definition: kaldi-types.h:29
BiglmFasterDecoderOptions opts_
BaseFloat ProcessEmitting ( DecodableInterface decodable,
int  frame 
)
inlineprivate

Definition at line 340 of file biglm-faster-decoder.h.

References BiglmFasterDecoder::Token::arc_, BiglmFasterDecoder::ConstructPair(), BiglmFasterDecoder::fst_, BiglmFasterDecoder::GetCutoff(), KALDI_ASSERT, HashList< I, T >::Elem::key, DecodableInterface::LogLikelihood(), BiglmFasterDecoder::PairToLmState(), BiglmFasterDecoder::PairToState(), BiglmFasterDecoder::PossiblyResizeHash(), BiglmFasterDecoder::PropagateLm(), BiglmFasterDecoder::Token::TokenDelete(), BiglmFasterDecoder::toks_, HashList< I, T >::Elem::val, and BiglmFasterDecoder::Token::weight_.

Referenced by BiglmFasterDecoder::Decode().

340  {
341  Elem *last_toks = toks_.Clear();
342  size_t tok_cnt;
343  BaseFloat adaptive_beam;
344  Elem *best_elem = NULL;
345  BaseFloat weight_cutoff = GetCutoff(last_toks, &tok_cnt,
346  &adaptive_beam, &best_elem);
347  PossiblyResizeHash(tok_cnt); // This makes sure the hash is always big enough.
348 
349  // This is the cutoff we use after adding in the log-likes (i.e.
350  // for the next frame). This is a bound on the cutoff we will use
351  // on the next frame.
352  BaseFloat next_weight_cutoff = 1.0e+10;
353 
354  // First process the best token to get a hopefully
355  // reasonably tight bound on the next cutoff.
356  if (best_elem) {
357  PairId state_pair = best_elem->key;
358  StateId state = PairToState(state_pair),
359  lm_state = PairToLmState(state_pair);
360  Token *tok = best_elem->val;
361  for (fst::ArcIterator<fst::Fst<Arc> > aiter(fst_, state);
362  !aiter.Done();
363  aiter.Next()) {
364  Arc arc = aiter.Value();
365  if (arc.ilabel != 0) { // we'd propagate..
366  PropagateLm(lm_state, &arc); // may affect "arc.weight".
367  // We don't need the return value (the new LM state).
368  BaseFloat ac_cost = - decodable->LogLikelihood(frame, arc.ilabel),
369  new_weight = arc.weight.Value() + tok->weight_.Value() + ac_cost;
370  if (new_weight + adaptive_beam < next_weight_cutoff)
371  next_weight_cutoff = new_weight + adaptive_beam;
372  }
373  }
374  }
375 
376  // the tokens are now owned here, in last_toks, and the hash is empty.
377  // 'owned' is a complex thing here; the point is we need to call toks_.Delete(e)
378  // on each elem 'e' to let toks_ know we're done with them.
379  for (Elem *e = last_toks, *e_tail; e != NULL; e = e_tail) { // loop this way
380  // because we delete "e" as we go.
381  PairId state_pair = e->key;
382  StateId state = PairToState(state_pair),
383  lm_state = PairToLmState(state_pair);
384  Token *tok = e->val;
385  if (tok->weight_.Value() < weight_cutoff) { // not pruned.
386  KALDI_ASSERT(state == tok->arc_.nextstate);
387  for (fst::ArcIterator<fst::Fst<Arc> > aiter(fst_, state);
388  !aiter.Done();
389  aiter.Next()) {
390  Arc arc = aiter.Value();
391  if (arc.ilabel != 0) { // propagate.
392  StateId next_lm_state = PropagateLm(lm_state, &arc);
393  Weight ac_weight(-decodable->LogLikelihood(frame, arc.ilabel));
394  BaseFloat new_weight = arc.weight.Value() + tok->weight_.Value()
395  + ac_weight.Value();
396  if (new_weight < next_weight_cutoff) { // not pruned..
397  PairId next_pair = ConstructPair(arc.nextstate, next_lm_state);
398  Token *new_tok = new Token(arc, ac_weight, tok);
399  Elem *e_found = toks_.Find(next_pair);
400  if (new_weight + adaptive_beam < next_weight_cutoff)
401  next_weight_cutoff = new_weight + adaptive_beam;
402  if (e_found == NULL) {
403  toks_.Insert(next_pair, new_tok);
404  } else {
405  if ( *(e_found->val) < *new_tok ) {
406  Token::TokenDelete(e_found->val);
407  e_found->val = new_tok;
408  } else {
409  Token::TokenDelete(new_tok);
410  }
411  }
412  }
413  }
414  }
415  }
416  e_tail = e->tail;
417  Token::TokenDelete(e->val);
418  toks_.Delete(e);
419  }
420  return next_weight_cutoff;
421  }
HashList< PairId, Token * > toks_
const fst::Fst< fst::StdArc > & fst_
StateId PropagateLm(StateId lm_state, Arc *arc)
float BaseFloat
Definition: kaldi-types.h:29
PairId ConstructPair(StateId fst_state, StateId lm_state)
void PossiblyResizeHash(size_t num_toks)
static StateId PairToLmState(PairId state_pair)
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
BaseFloat GetCutoff(Elem *list_head, size_t *tok_count, BaseFloat *adaptive_beam, Elem **best_elem)
Gets the weight cutoff. Also counts the active tokens.
static StateId PairToState(PairId state_pair)
HashList< PairId, Token * >::Elem Elem
void ProcessNonemitting ( BaseFloat  cutoff)
inlineprivate

Definition at line 424 of file biglm-faster-decoder.h.

References BiglmFasterDecoder::ConstructPair(), BiglmFasterDecoder::fst_, KALDI_ASSERT, BiglmFasterDecoder::PairToLmState(), BiglmFasterDecoder::PairToState(), BiglmFasterDecoder::PropagateLm(), BiglmFasterDecoder::queue_, BiglmFasterDecoder::Token::TokenDelete(), BiglmFasterDecoder::toks_, HashList< I, T >::Elem::val, and BiglmFasterDecoder::Token::weight_.

Referenced by BiglmFasterDecoder::Decode().

424  {
425  // Processes nonemitting arcs for one frame.
426  KALDI_ASSERT(queue_.empty());
427  for (const Elem *e = toks_.GetList(); e != NULL; e = e->tail)
428  queue_.push_back(e->key);
429  while (!queue_.empty()) {
430  PairId state_pair = queue_.back();
431  queue_.pop_back();
432  Token *tok = toks_.Find(state_pair)->val; // would segfault if state not
433  // in toks_ but this can't happen.
434  if (tok->weight_.Value() > cutoff) { // Don't bother processing successors.
435  continue;
436  }
437  KALDI_ASSERT(tok != NULL);
438  StateId state = PairToState(state_pair),
439  lm_state = PairToLmState(state_pair);
440  for (fst::ArcIterator<fst::Fst<Arc> > aiter(fst_, state);
441  !aiter.Done();
442  aiter.Next()) {
443  const Arc &arc_ref = aiter.Value();
444  if (arc_ref.ilabel == 0) { // propagate nonemitting only...
445  Arc arc(arc_ref);
446  StateId next_lm_state = PropagateLm(lm_state, &arc);
447  PairId next_pair = ConstructPair(arc.nextstate, next_lm_state);
448  Token *new_tok = new Token(arc, tok);
449  if (new_tok->weight_.Value() > cutoff) { // prune
450  Token::TokenDelete(new_tok);
451  } else {
452  Elem *e_found = toks_.Find(next_pair);
453  if (e_found == NULL) {
454  toks_.Insert(next_pair, new_tok);
455  queue_.push_back(next_pair);
456  } else {
457  if ( *(e_found->val) < *new_tok ) {
458  Token::TokenDelete(e_found->val);
459  e_found->val = new_tok;
460  queue_.push_back(next_pair);
461  } else {
462  Token::TokenDelete(new_tok);
463  }
464  }
465  }
466  }
467  }
468  }
469  }
HashList< PairId, Token * > toks_
const fst::Fst< fst::StdArc > & fst_
StateId PropagateLm(StateId lm_state, Arc *arc)
std::vector< PairId > queue_
PairId ConstructPair(StateId fst_state, StateId lm_state)
static StateId PairToLmState(PairId state_pair)
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
static StateId PairToState(PairId state_pair)
HashList< PairId, Token * >::Elem Elem
StateId PropagateLm ( StateId  lm_state,
Arc arc 
)
inlineprivate

Definition at line 315 of file biglm-faster-decoder.h.

References DeterministicOnDemandFst< Arc >::GetArc(), KALDI_WARN, BiglmFasterDecoder::lm_diff_fst_, fst::Times(), and BiglmFasterDecoder::warned_noarc_.

Referenced by BiglmFasterDecoder::ProcessEmitting(), and BiglmFasterDecoder::ProcessNonemitting().

316  { // returns new LM state.
317  if (arc->olabel == 0) {
318  return lm_state; // no change in LM state if no word crossed.
319  } else { // Propagate in the LM-diff FST.
320  Arc lm_arc;
321  bool ans = lm_diff_fst_->GetArc(lm_state, arc->olabel, &lm_arc);
322  if (!ans) { // this case is unexpected for statistical LMs.
323  if (!warned_noarc_) {
324  warned_noarc_ = true;
325  KALDI_WARN << "No arc available in LM (unlikely to be correct "
326  "if a statistical language model); will not warn again";
327  }
328  arc->weight = Weight::Zero();
329  return lm_state; // doesn't really matter what we return here; will
330  // be pruned.
331  } else {
332  arc->weight = Times(arc->weight, lm_arc.weight);
333  arc->olabel = lm_arc.olabel; // probably will be the same.
334  return lm_arc.nextstate; // return the new LM state.
335  }
336  }
337  }
virtual bool GetArc(StateId s, Label ilabel, Arc *oarc)=0
Note: ilabel must not be epsilon.
LatticeWeightTpl< FloatType > Times(const LatticeWeightTpl< FloatType > &w1, const LatticeWeightTpl< FloatType > &w2)
fst::DeterministicOnDemandFst< fst::StdArc > * lm_diff_fst_
#define KALDI_WARN
Definition: kaldi-error.h:130
bool ReachedFinal ( )
inline

Definition at line 101 of file biglm-faster-decoder.h.

References DeterministicOnDemandFst< Arc >::Final(), BiglmFasterDecoder::fst_, BiglmFasterDecoder::lm_diff_fst_, BiglmFasterDecoder::PairToLmState(), BiglmFasterDecoder::PairToState(), fst::Times(), and BiglmFasterDecoder::toks_.

Referenced by BiglmFasterDecoder::GetBestPath(), and main().

101  {
102  for (const Elem *e = toks_.GetList(); e != NULL; e = e->tail) {
103  PairId state_pair = e->key;
104  StateId state = PairToState(state_pair),
105  lm_state = PairToLmState(state_pair);
106  Weight this_weight =
107  Times(e->val->weight_,
108  Times(fst_.Final(state), lm_diff_fst_->Final(lm_state)));
109  if (this_weight != Weight::Zero())
110  return true;
111  }
112  return false;
113  }
HashList< PairId, Token * > toks_
virtual Weight Final(StateId s)=0
const fst::Fst< fst::StdArc > & fst_
LatticeWeightTpl< FloatType > Times(const LatticeWeightTpl< FloatType > &w1, const LatticeWeightTpl< FloatType > &w2)
fst::DeterministicOnDemandFst< fst::StdArc > * lm_diff_fst_
static StateId PairToLmState(PairId state_pair)
static StateId PairToState(PairId state_pair)
HashList< PairId, Token * >::Elem Elem
void SetOptions ( const BiglmFasterDecoderOptions opts)
inline

Definition at line 81 of file biglm-faster-decoder.h.

References BiglmFasterDecoder::opts_.

81 { opts_ = opts; }
BiglmFasterDecoderOptions opts_

Member Data Documentation

std::vector<PairId> queue_
private

Definition at line 479 of file biglm-faster-decoder.h.

Referenced by BiglmFasterDecoder::ProcessNonemitting().

std::vector<BaseFloat> tmp_array_
private

Definition at line 480 of file biglm-faster-decoder.h.

Referenced by BiglmFasterDecoder::GetCutoff().

bool warned_noarc_
private

Definition at line 478 of file biglm-faster-decoder.h.

Referenced by BiglmFasterDecoder::PropagateLm().


The documentation for this class was generated from the following file: