LatticeIncrementalDeterminizer Class Reference

This class is used inside LatticeIncrementalDecoderTpl; it handles some of the details of incremental determinization. More...

#include <lattice-incremental-decoder.h>

Collaboration diagram for LatticeIncrementalDeterminizer:

Public Types

enum  { kStateLabelOffset = (int)1e8, kTokenLabelOffset = (int)2e8, kMaxTokenLabel = (int)3e8 }
 
using Label = typename LatticeArc::Label
 

Public Member Functions

 LatticeIncrementalDeterminizer (const TransitionModel &trans_model, const LatticeIncrementalDecoderConfig &config)
 
void Init ()
 
const CompactLatticeGetDeterminizedLattice () const
 
void InitializeRawLatticeChunk (Lattice *olat, unordered_map< Label, LatticeArc::StateId > *token_label2state)
 Starts the process of creating a raw lattice chunk. More...
 
bool AcceptRawLatticeChunk (Lattice *raw_fst)
 This function accepts the raw FST (state-level lattice) corresponding to a single chunk of the lattice, determinizes it and appends it to this->clat_. More...
 
void SetFinalCosts (const unordered_map< Label, BaseFloat > *token_label2final_cost=NULL)
 
const CompactLatticeGetLattice ()
 

Private Member Functions

void GetRawLatticeFinalCosts (const Lattice &raw_fst, std::unordered_map< Label, BaseFloat > *old_final_costs)
 
void GetNonFinalRedetStates ()
 
bool ProcessArcsFromChunkStartState (const CompactLattice &chunk_clat, std::unordered_map< CompactLattice::StateId, CompactLattice::StateId > *state_map)
 [called from AcceptRawLatticeChunk()] Processes arcs that leave the start-state of `chunk_clat` (if this is not the first chunk); does nothing if this is the first chunk. More...
 
void TransferArcsToClat (const CompactLattice &chunk_clat, bool is_first_chunk, const std::unordered_map< CompactLattice::StateId, CompactLattice::StateId > &state_map, const std::unordered_map< CompactLattice::StateId, Label > &chunk_state_to_token, const std::unordered_map< Label, BaseFloat > &old_final_costs)
 This function, called from AcceptRawLatticeChunk(), transfers arcs from `chunk_clat` to clat_. More...
 
void AddArcToClat (CompactLattice::StateId state, const CompactLatticeArc &arc)
 Adds one arc to `clat_`. More...
 
CompactLattice::StateId AddStateToClat ()
 
void IdentifyTokenFinalStates (const CompactLattice &chunk_clat, std::unordered_map< CompactLattice::StateId, CompactLatticeArc::Label > *token_map) const
 
 KALDI_DISALLOW_COPY_AND_ASSIGN (LatticeIncrementalDeterminizer)
 

Private Attributes

const TransitionModeltrans_model_
 
const LatticeIncrementalDecoderConfigconfig_
 
std::unordered_set< CompactLattice::StateId > non_final_redet_states_
 
CompactLattice clat_
 
std::vector< std::vector< std::pair< CompactLattice::StateId, int32 > > > arcs_in_
 
std::vector< CompactLatticeArcfinal_arcs_
 
std::vector< BaseFloatforward_costs_
 
std::unordered_set< int32temp_
 

Detailed Description

This class is used inside LatticeIncrementalDecoderTpl; it handles some of the details of incremental determinization.

https://www.danielpovey.com/files/ *TBD*.pdf for the paper.

Definition at line 196 of file lattice-incremental-decoder.h.

Member Typedef Documentation

◆ Label

using Label = typename LatticeArc::Label

Definition at line 198 of file lattice-incremental-decoder.h.

Member Enumeration Documentation

◆ anonymous enum

Constructor & Destructor Documentation

◆ LatticeIncrementalDeterminizer()

LatticeIncrementalDeterminizer ( const TransitionModel trans_model,
const LatticeIncrementalDecoderConfig config 
)
inline

Definition at line 203 of file lattice-incremental-decoder.h.

205  :
206  trans_model_(trans_model), config_(config) { }
const LatticeIncrementalDecoderConfig & config_

Member Function Documentation

◆ AcceptRawLatticeChunk()

bool AcceptRawLatticeChunk ( Lattice raw_fst)

This function accepts the raw FST (state-level lattice) corresponding to a single chunk of the lattice, determinizes it and appends it to this->clat_.

Unless this was the

Note: final-probs in `raw_fst` are treated specially: they are used to guide the pruned determinization, but when you call GetLattice() it will be – except for pruning effects– as if all nonzero final-probs in `raw_fst` were: One() if final_costs == NULL; else the value present in `final_costs`.

Parameters
[in]raw_fst(Consumed destructively). The input raw (state-level) lattice. Would correspond to the FST A in the paper if first_frame == 0, and B otherwise.
Returns
returns false if determinization finished earlier than the beam or the determinized lattice was empty; true otherwise.

NOTE: if this is not the final chunk, you will probably want to call SetFinalCosts() directly after calling this.

Definition at line 1547 of file lattice-incremental-decoder.cc.

References fst::DeterminizeLatticePhonePrunedWrapper(), KALDI_ASSERT, KALDI_WARN, and kaldi::TopSortCompactLatticeIfNeeded().

1548  {
1551 
1552  // old_final_costs is a map from a `token-label` (see glossary) to the
1553  // associated final-prob in a final-state of `raw_fst`, that is associated
1554  // with that Token. These are Tokens that were active at the end of the
1555  // chunk. The final-probs may arise from beta (backward) costs, introduced
1556  // for pruning purposes, and/or from final-probs in HCLG. Those costs will
1557  // not be included in anything we store permamently in this class; they used
1558  // only to guide pruned determinization, and we will use `old_final_costs`
1559  // later to cancel them out.
1560  std::unordered_map<Label, BaseFloat> old_final_costs;
1561  GetRawLatticeFinalCosts(*raw_fst, &old_final_costs);
1562 
1563  CompactLattice chunk_clat;
1564  bool determinized_till_beam = DeterminizeLatticePhonePrunedWrapper(
1565  trans_model_, raw_fst, config_.lattice_beam, &chunk_clat,
1566  config_.det_opts);
1567 
1568  TopSortCompactLatticeIfNeeded(&chunk_clat);
1569 
1570  std::unordered_map<StateId, Label> chunk_state_to_token;
1571  IdentifyTokenFinalStates(chunk_clat,
1572  &chunk_state_to_token);
1573 
1574  StateId chunk_num_states = chunk_clat.NumStates();
1575  if (chunk_num_states == 0) {
1576  // This will be an error but user-level calling code can detect it from the
1577  // lattice being empty.
1578  KALDI_WARN << "Empty lattice, something went wrong.";
1579  clat_.DeleteStates();
1580  return false;
1581  }
1582 
1583  StateId start_state = chunk_clat.Start(); // would be 0.
1584  KALDI_ASSERT(start_state == 0);
1585 
1586  // Process arcs leaving the start state of chunk_clat. Unless this is the
1587  // first chunk in the lattice, all arcs leaving the start state of chunk_clat
1588  // will have `state labels` on them (identifying redeterminized-states in
1589  // clat_), and will transition to a state in `chunk_clat` that we can identify
1590  // with that redeterminized-state.
1591 
1592  // state_map maps from (non-initial, non-token-final state s in chunk_clat) to
1593  // a state in clat_.
1594  std::unordered_map<StateId, StateId> state_map;
1595 
1596 
1597  bool is_first_chunk = ProcessArcsFromChunkStartState(chunk_clat, &state_map);
1598 
1599  // Remove any existing arcs in clat_ that leave redeterminized-states, and
1600  // make those states non-final. Below, we'll add arcs leaving those states
1601  // (and possibly new final-probs.)
1602  for (StateId clat_state: non_final_redet_states_) {
1603  clat_.DeleteArcs(clat_state);
1604  clat_.SetFinal(clat_state, CompactLatticeWeight::Zero());
1605  }
1606 
1607  // The previous final-arc info is no longer relevant; we'll recreate it below.
1608  final_arcs_.clear();
1609 
1610  // assume chunk_lat.Start() == 0; we asserted it above. Allocate state-ids
1611  // for all remaining states in chunk_clat, except for token-final states.
1612  for (StateId state = (is_first_chunk ? 0 : 1);
1613  state < chunk_num_states; state++) {
1614  if (chunk_state_to_token.count(state) != 0)
1615  continue; // these `token-final` states don't get a state allocated.
1616 
1617  StateId new_clat_state = clat_.NumStates();
1618  if (state_map.insert({state, new_clat_state}).second) {
1619  // If it was inserted then we need to actually allocate that state
1620  StateId s = AddStateToClat();
1621  KALDI_ASSERT(s == new_clat_state);
1622  } // else do nothing; it would have been a redeterminized-state and no
1623  } // allocation is needed since they already exist in clat_. and
1624  // in state_map.
1625 
1626  if (is_first_chunk) {
1627  auto iter = state_map.find(start_state);
1628  KALDI_ASSERT(iter != state_map.end());
1629  CompactLattice::StateId clat_start_state = iter->second;
1630  KALDI_ASSERT(clat_start_state == 0); // topological order.
1631  clat_.SetStart(clat_start_state);
1632  forward_costs_[clat_start_state] = 0.0;
1633  }
1634 
1635  TransferArcsToClat(chunk_clat, is_first_chunk,
1636  state_map, chunk_state_to_token, old_final_costs);
1637 
1639 
1640  return determinized_till_beam;
1641 }
fst::StdArc::StateId StateId
std::unordered_set< CompactLattice::StateId > non_final_redet_states_
void GetRawLatticeFinalCosts(const Lattice &raw_fst, std::unordered_map< Label, BaseFloat > *old_final_costs)
fst::DeterminizeLatticePhonePrunedOptions det_opts
bool ProcessArcsFromChunkStartState(const CompactLattice &chunk_clat, std::unordered_map< CompactLattice::StateId, CompactLattice::StateId > *state_map)
[called from AcceptRawLatticeChunk()] Processes arcs that leave the start-state of `chunk_clat` (if t...
void TransferArcsToClat(const CompactLattice &chunk_clat, bool is_first_chunk, const std::unordered_map< CompactLattice::StateId, CompactLattice::StateId > &state_map, const std::unordered_map< CompactLattice::StateId, Label > &chunk_state_to_token, const std::unordered_map< Label, BaseFloat > &old_final_costs)
This function, called from AcceptRawLatticeChunk(), transfers arcs from `chunk_clat` to clat_...
const LatticeIncrementalDecoderConfig & config_
#define KALDI_WARN
Definition: kaldi-error.h:150
fst::StdArc::Label Label
fst::VectorFst< CompactLatticeArc > CompactLattice
Definition: kaldi-lattice.h:46
void IdentifyTokenFinalStates(const CompactLattice &chunk_clat, std::unordered_map< CompactLattice::StateId, CompactLatticeArc::Label > *token_map) const
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
static const CompactLatticeWeightTpl< WeightType, IntType > Zero()
void TopSortCompactLatticeIfNeeded(CompactLattice *clat)
Topologically sort the compact lattice if not already topologically sorted.
bool DeterminizeLatticePhonePrunedWrapper(const kaldi::TransitionModel &trans_model, MutableFst< kaldi::LatticeArc > *ifst, double beam, MutableFst< kaldi::CompactLatticeArc > *ofst, DeterminizeLatticePhonePrunedOptions opts)
This function is a wrapper of DeterminizeLatticePhonePruned() that works for Lattice type FSTs...
std::vector< CompactLatticeArc > final_arcs_

◆ AddArcToClat()

void AddArcToClat ( CompactLattice::StateId  state,
const CompactLatticeArc arc 
)
private

Adds one arc to `clat_`.

It's like clat_.AddArc(state, arc), except it also modifies arcs_in_ and forward_costs_.

Definition at line 1150 of file lattice-incremental-decoder.cc.

References fst::ConvertToCost().

1152  {
1153  BaseFloat forward_cost = forward_costs_[state] +
1154  ConvertToCost(arc.weight);
1155  if (forward_cost == std::numeric_limits<BaseFloat>::infinity())
1156  return;
1157  int32 arc_idx = clat_.NumArcs(state);
1158  clat_.AddArc(state, arc);
1159  arcs_in_[arc.nextstate].push_back({state, arc_idx});
1160  if (forward_cost < forward_costs_[arc.nextstate])
1161  forward_costs_[arc.nextstate] = forward_cost;
1162 }
kaldi::int32 int32
float BaseFloat
Definition: kaldi-types.h:29
double ConvertToCost(const LatticeWeightTpl< Float > &w)
std::vector< std::vector< std::pair< CompactLattice::StateId, int32 > > > arcs_in_

◆ AddStateToClat()

CompactLattice::StateId AddStateToClat ( )
private

Definition at line 1142 of file lattice-incremental-decoder.cc.

References KALDI_ASSERT.

1142  {
1143  CompactLattice::StateId ans = clat_.AddState();
1144  forward_costs_.push_back(std::numeric_limits<BaseFloat>::infinity());
1145  KALDI_ASSERT(forward_costs_.size() == ans + 1);
1146  arcs_in_.resize(ans + 1);
1147  return ans;
1148 }
fst::StdArc::StateId StateId
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
std::vector< std::vector< std::pair< CompactLattice::StateId, int32 > > > arcs_in_

◆ GetDeterminizedLattice()

const CompactLattice& GetDeterminizedLattice ( ) const
inline

Definition at line 212 of file lattice-incremental-decoder.h.

◆ GetLattice()

const CompactLattice& GetLattice ( )
inline

Definition at line 284 of file lattice-incremental-decoder.h.

◆ GetNonFinalRedetStates()

void GetNonFinalRedetStates ( )
private

Definition at line 1190 of file lattice-incremental-decoder.cc.

1190  {
1192  non_final_redet_states_.clear();
1193  non_final_redet_states_.reserve(final_arcs_.size());
1194 
1195  std::vector<StateId> state_queue;
1196  for (const CompactLatticeArc &arc: final_arcs_) {
1197  // Note: we abuse the .nextstate field to store the state which is really
1198  // the source of that arc.
1199  StateId redet_state = arc.nextstate;
1200  if (forward_costs_[redet_state] != std::numeric_limits<BaseFloat>::infinity()) {
1201  // if it is accessible..
1202  if (non_final_redet_states_.insert(redet_state).second) {
1203  // it was not already there
1204  state_queue.push_back(redet_state);
1205  }
1206  }
1207  }
1208  // Add any states that are reachable from the states above.
1209  while (!state_queue.empty()) {
1210  StateId s = state_queue.back();
1211  state_queue.pop_back();
1212  for (fst::ArcIterator<CompactLattice> aiter(clat_, s); !aiter.Done();
1213  aiter.Next()) {
1214  const CompactLatticeArc &arc = aiter.Value();
1215  StateId nextstate = arc.nextstate;
1216  if (non_final_redet_states_.insert(nextstate).second)
1217  state_queue.push_back(nextstate); // it was not already there
1218  }
1219  }
1220 }
fst::StdArc::StateId StateId
std::unordered_set< CompactLattice::StateId > non_final_redet_states_
fst::ArcTpl< CompactLatticeWeight > CompactLatticeArc
Definition: kaldi-lattice.h:42
std::vector< CompactLatticeArc > final_arcs_

◆ GetRawLatticeFinalCosts()

void GetRawLatticeFinalCosts ( const Lattice raw_fst,
std::unordered_map< Label, BaseFloat > *  old_final_costs 
)
private

Definition at line 1329 of file lattice-incremental-decoder.cc.

References KALDI_ERR, LatticeWeightTpl< FloatType >::Value1(), and LatticeWeightTpl< FloatType >::Value2().

1331  {
1332  LatticeArc::StateId raw_fst_num_states = raw_fst.NumStates();
1333  for (LatticeArc::StateId s = 0; s < raw_fst_num_states; s++) {
1334  for (fst::ArcIterator<Lattice> aiter(raw_fst, s); !aiter.Done();
1335  aiter.Next()) {
1336  const LatticeArc &value = aiter.Value();
1337  if (value.olabel >= (Label)kTokenLabelOffset &&
1338  value.olabel < (Label)kMaxTokenLabel) {
1339  LatticeWeight final_weight = raw_fst.Final(value.nextstate);
1340  if (final_weight != LatticeWeight::Zero() &&
1341  final_weight.Value2() != 0) {
1342  KALDI_ERR << "Label " << value.olabel << " from state " << s
1343  << " looks like a token-label but its next-state "
1344  << value.nextstate <<
1345  " has unexpected final-weight " << final_weight.Value1() << ','
1346  << final_weight.Value2();
1347  }
1348  auto r = old_final_costs->insert({value.olabel,
1349  final_weight.Value1()});
1350  if (!r.second && r.first->second != final_weight.Value1()) {
1351  // For any given token-label, all arcs in raw_fst with that
1352  // olabel should go to the same state, so this should be
1353  // impossible.
1354  KALDI_ERR << "Unexpected mismatch in final-costs for tokens, "
1355  << r.first->second << " vs " << final_weight.Value1();
1356  }
1357  }
1358  }
1359  }
1360 }
fst::StdArc::StateId StateId
fst::ArcTpl< LatticeWeight > LatticeArc
Definition: kaldi-lattice.h:40
static const LatticeWeightTpl Zero()
#define KALDI_ERR
Definition: kaldi-error.h:147
fst::StdArc::Label Label

◆ IdentifyTokenFinalStates()

void IdentifyTokenFinalStates ( const CompactLattice chunk_clat,
std::unordered_map< CompactLattice::StateId, CompactLatticeArc::Label > *  token_map 
) const
private

Definition at line 1165 of file lattice-incremental-decoder.cc.

References KALDI_ASSERT.

1167  {
1168  token_map->clear();
1171 
1172  StateId num_states = chunk_clat.NumStates();
1173  for (StateId state = 0; state < num_states; state++) {
1174  for (fst::ArcIterator<CompactLattice> aiter(chunk_clat, state);
1175  !aiter.Done(); aiter.Next()) {
1176  const CompactLatticeArc &arc = aiter.Value();
1177  if (arc.olabel >= kTokenLabelOffset && arc.olabel < kMaxTokenLabel) {
1178  StateId nextstate = arc.nextstate;
1179  auto r = token_map->insert({nextstate, arc.olabel});
1180  // Check consistency of labels on incoming arcs
1181  KALDI_ASSERT(r.first->second == arc.olabel);
1182  }
1183  }
1184  }
1185 }
fst::StdArc::StateId StateId
fst::StdArc::Label Label
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
fst::ArcTpl< CompactLatticeWeight > CompactLatticeArc
Definition: kaldi-lattice.h:42

◆ Init()

void Init ( )

Definition at line 1134 of file lattice-incremental-decoder.cc.

1134  {
1135  non_final_redet_states_.clear();
1136  clat_.DeleteStates();
1137  final_arcs_.clear();
1138  forward_costs_.clear();
1139  arcs_in_.clear();
1140 }
std::unordered_set< CompactLattice::StateId > non_final_redet_states_
std::vector< std::vector< std::pair< CompactLattice::StateId, int32 > > > arcs_in_
std::vector< CompactLatticeArc > final_arcs_

◆ InitializeRawLatticeChunk()

void InitializeRawLatticeChunk ( Lattice olat,
unordered_map< Label, LatticeArc::StateId > *  token_label2state 
)

Starts the process of creating a raw lattice chunk.

(Search the glossary for "raw lattice chunk"). This just sets up the initial states and redeterminized-states in the chunk. Relates to sec. 5.2 in the paper, specifically the initial-state i and the redeterminized-states.

After calling this, the caller would add the remaining arcs and states to `olat` and then call AcceptRawLatticeChunk() with the result.

Parameters
[out]olatThe lattice to be (partially) created
[out]token_label2stateThis function outputs to here a map from `token-label` to the state we created for it in *olat. See glossary for `token-label`. The keys actually correspond to the .nextstate fields in the arcs in final_arcs_; values are states in `olat`. See the last bullet point before Sec. 5.3 in the paper.

Definition at line 1223 of file lattice-incremental-decoder.cc.

References kaldi::AddCompactLatticeArcToLattice(), and KALDI_ASSERT.

1225  {
1226  using namespace fst;
1227 
1228  olat->DeleteStates();
1229  LatticeArc::StateId start_state = olat->AddState();
1230  olat->SetStart(start_state);
1231  token_label2state->clear();
1232 
1233  // redet_state_map maps from state-ids in clat_ to state-ids in olat. This
1234  // will be the set of states from which the arcs to final-states in the
1235  // canonical appended lattice leave (physically, these are in the .nextstate
1236  // elements of arcs_, since we use that field for the source state), plus any
1237  // states reachable from those states.
1238  unordered_map<CompactLattice::StateId, LatticeArc::StateId> redet_state_map;
1239 
1241  redet_state_map[redet_state] = olat->AddState();
1242 
1243  // First, process any arcs leaving the non-final redeterminized states that
1244  // are not to final-states. (What we mean by "not to final states" is, not to
1245  // stats that are final in the `canonical appended lattice`.. they may
1246  // actually be physically final in clat_, because we make clat_ what we want
1247  // to return to the user.
1248  for (CompactLattice::StateId redet_state: non_final_redet_states_) {
1249  LatticeArc::StateId lat_state = redet_state_map[redet_state];
1250 
1251  for (ArcIterator<CompactLattice> aiter(clat_, redet_state);
1252  !aiter.Done(); aiter.Next()) {
1253  const CompactLatticeArc &arc = aiter.Value();
1254  CompactLattice::StateId nextstate = arc.nextstate;
1255  LatticeArc::StateId lat_nextstate = olat->NumStates();
1256  auto r = redet_state_map.insert({nextstate, lat_nextstate});
1257  if (r.second) { // Was inserted.
1258  LatticeArc::StateId s = olat->AddState();
1259  KALDI_ASSERT(s == lat_nextstate);
1260  } else {
1261  // was not inserted -> was already there.
1262  lat_nextstate = r.first->second;
1263  }
1264  CompactLatticeArc clat_arc(arc);
1265  clat_arc.nextstate = lat_nextstate;
1266  AddCompactLatticeArcToLattice(clat_arc, lat_state, olat);
1267  }
1268  clat_.DeleteArcs(redet_state);
1269  clat_.SetFinal(redet_state, CompactLatticeWeight::Zero());
1270  }
1271 
1272  for (const CompactLatticeArc &arc: final_arcs_) {
1273  // We abuse the `nextstate` field to store the source state.
1274  CompactLattice::StateId src_state = arc.nextstate;
1275  auto iter = redet_state_map.find(src_state);
1276  if (forward_costs_[src_state] == std::numeric_limits<BaseFloat>::infinity())
1277  continue; /* Unreachable state */
1278  KALDI_ASSERT(iter != redet_state_map.end());
1279  LatticeArc::StateId src_lat_state = iter->second;
1280  Label token_label = arc.ilabel; // will be == arc.olabel.
1281  KALDI_ASSERT(token_label >= kTokenLabelOffset &&
1282  token_label < kMaxTokenLabel);
1283  auto r = token_label2state->insert({token_label,
1284  olat->NumStates()});
1285  LatticeArc::StateId dest_lat_state = r.first->second;
1286  if (r.second) { // was inserted
1287  LatticeArc::StateId new_state = olat->AddState();
1288  KALDI_ASSERT(new_state == dest_lat_state);
1289  }
1290  CompactLatticeArc new_arc;
1291  new_arc.nextstate = dest_lat_state;
1292  /* We convert the token-label to epsilon; it's not needed anymore. */
1293  new_arc.ilabel = new_arc.olabel = 0;
1294  new_arc.weight = arc.weight;
1295  AddCompactLatticeArcToLattice(new_arc, src_lat_state, olat);
1296  }
1297 
1298  // Now deal with the initial-probs. Arcs from initial-states to
1299  // redeterminized-states in the raw lattice have an olabel that identifies the
1300  // id of that redeterminized-state in clat_, and a cost that is derived from
1301  // its entry in forward_costs_. These forward-probs are used to get the
1302  // pruned lattice determinization to behave correctly, and will be canceled
1303  // out later on.
1304  //
1305  // In the paper this is the second-from-last bullet in Sec. 5.2. NOTE: in the
1306  // paper we state that we only include such arcs for "each redeterminized
1307  // state that is either initial in det(A) or that has an arc entering it from
1308  // a state that is not a redeterminized state." In fact, we include these
1309  // arcs for all redeterminized states. I realized that it won't make a
1310  // difference to the outcome, and it's easier to do it this way.
1311  for (CompactLattice::StateId state_id: non_final_redet_states_) {
1312  BaseFloat forward_cost = forward_costs_[state_id];
1313  LatticeArc arc;
1314  arc.ilabel = 0;
1315  // The olabel (which appears where the word-id would) is what
1316  // we call a 'state-label'. It identifies a state in clat_.
1317  arc.olabel = state_id + kStateLabelOffset;
1318  // It doesn't matter what field we put forward_cost in (or whether we
1319  // divide it among them both; the effect on pruning is the same, and
1320  // we will cancel it out later anyway.
1321  arc.weight = LatticeWeight(forward_cost, 0);
1322  auto iter = redet_state_map.find(state_id);
1323  KALDI_ASSERT(iter != redet_state_map.end());
1324  arc.nextstate = iter->second;
1325  olat->AddArc(start_state, arc);
1326  }
1327 }
fst::StdArc::StateId StateId
fst::ArcTpl< LatticeWeight > LatticeArc
Definition: kaldi-lattice.h:40
std::unordered_set< CompactLattice::StateId > non_final_redet_states_
For an extended explanation of the framework of which grammar-fsts are a part, please see Support for...
Definition: graph.dox:21
fst::LatticeWeightTpl< BaseFloat > LatticeWeight
Definition: kaldi-lattice.h:32
float BaseFloat
Definition: kaldi-types.h:29
fst::StdArc::Label Label
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
static const CompactLatticeWeightTpl< WeightType, IntType > Zero()
fst::ArcTpl< CompactLatticeWeight > CompactLatticeArc
Definition: kaldi-lattice.h:42
static void AddCompactLatticeArcToLattice(const CompactLatticeArc &clat_arc, LatticeArc::StateId src_state, Lattice *lat)
std::vector< CompactLatticeArc > final_arcs_

◆ KALDI_DISALLOW_COPY_AND_ASSIGN()

KALDI_DISALLOW_COPY_AND_ASSIGN ( LatticeIncrementalDeterminizer  )
private

◆ ProcessArcsFromChunkStartState()

bool ProcessArcsFromChunkStartState ( const CompactLattice chunk_clat,
std::unordered_map< CompactLattice::StateId, CompactLattice::StateId > *  state_map 
)
private

[called from AcceptRawLatticeChunk()] Processes arcs that leave the start-state of `chunk_clat` (if this is not the first chunk); does nothing if this is the first chunk.

This includes using the `state-labels` to work out which states in clat_ these states correspond to, and writing that mapping to `state_map`.

Also modifies forward_costs_, because it has to do a kind of reweighting of the clat states that are the values it puts in `state_map`, to take account of the probabilities on the arcs from the start state of chunk_clat to the states corresponding to those redeterminized-states (i.e. the states in clat corresponding to the values it puts in `*state_map`). It also modifies arcs_in_, mostly because there are rare cases when we end up `merging` sets of those redeterminized-states, because the determinization process mapped them to a single state, and that means we need to reroute the arcs into members of that set into one single member (which will appear as a value in `*state_map`).

Parameters
[in]chunk_clatThe determinized chunk of lattice we are processing
[out]state_mapMapping from states in chunk_clat to the state in clat_ they correspond to.
Returns
Returns true if this is the first chunk.

Definition at line 1363 of file lattice-incremental-decoder.cc.

References fst::ConvertToCost(), KALDI_ASSERT, CompactLatticeWeightTpl< WeightType, IntType >::SetWeight(), fst::Times(), and CompactLatticeWeightTpl< WeightType, IntType >::Weight().

1365  {
1367  StateId clat_num_states = clat_.NumStates();
1368 
1369  // Process arcs leaving the start state of chunk_clat. These arcs will have
1370  // state-labels on them (unless this is the first chunk).
1371  // For destination-states of those arcs, work out which states in
1372  // clat_ they correspond to and update their forward_costs.
1373  for (fst::ArcIterator<CompactLattice> aiter(chunk_clat, chunk_clat.Start());
1374  !aiter.Done(); aiter.Next()) {
1375  const CompactLatticeArc &arc = aiter.Value();
1376  Label label = arc.ilabel; // ilabel == olabel; would be the olabel
1377  // in a Lattice.
1378  if (!(label >= kStateLabelOffset &&
1379  label - kStateLabelOffset < clat_num_states)) {
1380  // The label was not a state-label. This should only be possible on the
1381  // first chunk.
1382  KALDI_ASSERT(state_map->empty());
1383  return true; // this is the first chunk.
1384  }
1385  StateId clat_state = label - kStateLabelOffset;
1386  StateId chunk_state = arc.nextstate;
1387  auto p = state_map->insert({chunk_state, clat_state});
1388  StateId dest_clat_state = p.first->second;
1389  // We deleted all its arcs in InitializeRawLatticeChunk
1390  KALDI_ASSERT(clat_.NumArcs(clat_state) == 0);
1391  /*
1392  In almost all cases, dest_clat_state and clat_state will be the same state;
1393  but there may be situations where two arcs with different state-labels
1394  left the start state and entered the same next-state in chunk_clat; and in
1395  these cases, they will be different.
1396 
1397  We didn't address this issue in the paper (or actually realize it could be
1398  a problem). What we do is pick one of the clat_states as the "canonical"
1399  one, and redirect all incoming transitions of the others to enter the
1400  "canonical" one. (Search below for new_in_arc.nextstate =
1401  dest_clat_state).
1402  */
1403  if (clat_state != dest_clat_state) {
1404  // Check that the start state isn't getting merged with any other state.
1405  // If this were possible, we'd need to deal with it specially, but it
1406  // can't be, because to be merged, 2 states must have identical arcs
1407  // leaving them with identical weights, so we'd need to have another state
1408  // on frame 0 identical to the start state, which is not possible if the
1409  // lattice is deterministic and epsilon-free.
1410  KALDI_ASSERT(clat_state != 0 && dest_clat_state != 0);
1411  }
1412 
1413  // in_weight is an extra weight that we'll include on arcs entering this
1414  // state from the previous chunk. We need to cancel out
1415  // `forward_costs[clat_state]`, which was included in the corresponding arc
1416  // in the raw lattice for pruning purposes; and we need to include the
1417  // weight on the arc from the start-state of `chunk_clat` to this state.
1418  CompactLatticeWeight extra_weight_in = arc.weight;
1419  extra_weight_in.SetWeight(
1420  fst::Times(extra_weight_in.Weight(),
1421  LatticeWeight(-forward_costs_[clat_state], 0.0)));
1422 
1423  // We don't allow state 0 to be a redeterminized-state; calling code assures
1424  // this. Search for `determinizer_.GetLattice().Final(0) !=
1425  // CompactLatticeWeight::Zero())` to find that calling code.
1426  KALDI_ASSERT(clat_state != 0);
1427 
1428  // Note: 0 is the start state of clat_. This was checked.
1429  forward_costs_[clat_state] = (clat_state == 0 ? 0 :
1430  std::numeric_limits<BaseFloat>::infinity());
1431  std::vector<std::pair<StateId, int32> > arcs_in;
1432  arcs_in.swap(arcs_in_[clat_state]);
1433  for (auto p: arcs_in) {
1434  // Note: we'll be doing `continue` below if this input arc came from
1435  // another redeterminized-state, because we did DeleteArcs() for them in
1436  // InitializeRawLatticeChunk(). Those arcs will be transferred
1437  // from chunk_clat later on.
1438  CompactLattice::StateId src_state = p.first;
1439  int32 arc_pos = p.second;
1440 
1441  if (arc_pos >= (int32)clat_.NumArcs(src_state))
1442  continue;
1443  fst::MutableArcIterator<CompactLattice> aiter(&clat_, src_state);
1444  aiter.Seek(arc_pos);
1445  if (aiter.Value().nextstate != clat_state)
1446  continue; // This arc record has become invalidated.
1447  CompactLatticeArc new_in_arc(aiter.Value());
1448  // In most cases we will have dest_clat_state == clat_state, so the next
1449  // line won't change the value of .nextstate
1450  new_in_arc.nextstate = dest_clat_state;
1451  new_in_arc.weight = fst::Times(new_in_arc.weight, extra_weight_in);
1452  aiter.SetValue(new_in_arc);
1453 
1454  BaseFloat new_forward_cost = forward_costs_[src_state] +
1455  ConvertToCost(new_in_arc.weight);
1456  if (new_forward_cost < forward_costs_[dest_clat_state])
1457  forward_costs_[dest_clat_state] = new_forward_cost;
1458  arcs_in_[dest_clat_state].push_back(p);
1459  }
1460  }
1461  return false; // this is not the first chunk.
1462 }
fst::StdArc::StateId StateId
kaldi::int32 int32
fst::LatticeWeightTpl< BaseFloat > LatticeWeight
Definition: kaldi-lattice.h:32
LatticeWeightTpl< FloatType > Times(const LatticeWeightTpl< FloatType > &w1, const LatticeWeightTpl< FloatType > &w2)
float BaseFloat
Definition: kaldi-types.h:29
double ConvertToCost(const LatticeWeightTpl< Float > &w)
fst::StdArc::Label Label
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
fst::ArcTpl< CompactLatticeWeight > CompactLatticeArc
Definition: kaldi-lattice.h:42
std::vector< std::vector< std::pair< CompactLattice::StateId, int32 > > > arcs_in_

◆ SetFinalCosts()

void SetFinalCosts ( const unordered_map< Label, BaseFloat > *  token_label2final_cost = NULL)

Definition at line 1645 of file lattice-incremental-decoder.cc.

References KALDI_WARN, fst::Plus(), and fst::Times().

1646  {
1647  if (final_arcs_.empty()) {
1648  KALDI_WARN << "SetFinalCosts() called when final_arcs_.empty()... possibly "
1649  "means you are calling this after Finalize()? Not allowed: could "
1650  "indicate a code error. Or possibly decoding failed somehow.";
1651  }
1652 
1653  /*
1654  prefinal states a terminology that does not appear in the paper. What it
1655  means is: the set of states that have an arc with a Token-label as the label
1656  leaving them in the canonical appended lattice.
1657  */
1658  std::unordered_set<int32> &prefinal_states(temp_);
1659  prefinal_states.clear();
1660  for (const auto &arc: final_arcs_) {
1661  /* Caution: `state` is actually the state the arc would
1662  leave from in the canonical appended lattice; we just store
1663  that in the .nextstate field. */
1664  CompactLattice::StateId state = arc.nextstate;
1665  prefinal_states.insert(state);
1666  }
1667 
1668  for (int32 state: prefinal_states)
1669  clat_.SetFinal(state, CompactLatticeWeight::Zero());
1670 
1671 
1672  for (const CompactLatticeArc &arc: final_arcs_) {
1673  Label token_label = arc.ilabel;
1674  /* Note: we store the source state in the .nextstate field. */
1675  CompactLattice::StateId src_state = arc.nextstate;
1676  BaseFloat graph_final_cost;
1677  if (token_label2final_cost == NULL) {
1678  graph_final_cost = 0.0;
1679  } else {
1680  auto iter = token_label2final_cost->find(token_label);
1681  if (iter == token_label2final_cost->end())
1682  continue;
1683  else
1684  graph_final_cost = iter->second;
1685  }
1686  /* It might seem odd to set a final-prob on the src-state of the arc..
1687  the point is that the symbol on the arc is a token-label, which should not
1688  appear in the lattice the user sees, so after that token-label is removed
1689  the arc would just become a final-prob.
1690  */
1691  clat_.SetFinal(src_state,
1692  fst::Plus(clat_.Final(src_state),
1693  fst::Times(arc.weight,
1695  LatticeWeight(graph_final_cost, 0), {}))));
1696  }
1697 }
fst::StdArc::StateId StateId
LatticeWeightTpl< FloatType > Plus(const LatticeWeightTpl< FloatType > &w1, const LatticeWeightTpl< FloatType > &w2)
kaldi::int32 int32
fst::LatticeWeightTpl< BaseFloat > LatticeWeight
Definition: kaldi-lattice.h:32
LatticeWeightTpl< FloatType > Times(const LatticeWeightTpl< FloatType > &w1, const LatticeWeightTpl< FloatType > &w2)
float BaseFloat
Definition: kaldi-types.h:29
#define KALDI_WARN
Definition: kaldi-error.h:150
fst::StdArc::Label Label
static const CompactLatticeWeightTpl< WeightType, IntType > Zero()
fst::ArcTpl< CompactLatticeWeight > CompactLatticeArc
Definition: kaldi-lattice.h:42
std::vector< CompactLatticeArc > final_arcs_

◆ TransferArcsToClat()

void TransferArcsToClat ( const CompactLattice chunk_clat,
bool  is_first_chunk,
const std::unordered_map< CompactLattice::StateId, CompactLattice::StateId > &  state_map,
const std::unordered_map< CompactLattice::StateId, Label > &  chunk_state_to_token,
const std::unordered_map< Label, BaseFloat > &  old_final_costs 
)
private

This function, called from AcceptRawLatticeChunk(), transfers arcs from `chunk_clat` to clat_.

For those arcs that have `token-labels` on them, they don't get written to clat_ but instead are stored in the arcs_ array.

Parameters
[in]chunk_clatThe determinized lattice for the chunk we are processing; this is the source of the arcs we are moving.
[in]is_first_chunkTrue if this is the first chunk in the utterance; it's needed because if it is, we will also transfer arcs from the start state of chunk_clat.
[in]state_mapMap from state-ids in chunk_clat to state-ids in clat_.
[in]chunk_state_to_tokenMap from `token-final states` (see glossary) in chunk_clat, to the token-label on arcs entering those states.
[in]old_final_costsMap from token-label to the final-costs that were on the corresponding token-final states in the undeterminized lattice; these final-costs need to be removed when we record the weights in final_arcs_, because they were just temporary.

Definition at line 1464 of file lattice-incremental-decoder.cc.

References KALDI_ASSERT, and fst::Times().

1469  {
1471  StateId chunk_num_states = chunk_clat.NumStates();
1472 
1473  // Now transfer arcs from chunk_clat to clat_.
1474  for (StateId chunk_state = (is_first_chunk ? 0 : 1);
1475  chunk_state < chunk_num_states; chunk_state++) {
1476  auto iter = state_map.find(chunk_state);
1477  if (iter == state_map.end()) {
1478  KALDI_ASSERT(chunk_state_to_token.count(chunk_state) != 0);
1479  // Don't process token-final states. Anyway they have no arcs leaving
1480  // them.
1481  continue;
1482  }
1483  StateId clat_state = iter->second;
1484 
1485  // We know that this point that `clat_state` is not a token-final state
1486  // (see glossary for definition) as if it were, we would have done
1487  // `continue` above.
1488  //
1489  // Only in the last chunk of the lattice would be there be a final-prob on
1490  // states that are not `token-final states`; these final-probs would
1491  // normally all be Zero() at this point. So in almost all cases the following
1492  // call will do nothing.
1493  clat_.SetFinal(clat_state, chunk_clat.Final(chunk_state));
1494 
1495  // Process arcs leaving this state.
1496  for (fst::ArcIterator<CompactLattice> aiter(chunk_clat, chunk_state);
1497  !aiter.Done(); aiter.Next()) {
1498  CompactLatticeArc arc(aiter.Value());
1499 
1500  auto next_iter = state_map.find(arc.nextstate);
1501  if (next_iter != state_map.end()) {
1502  // The normal case (when the .nextstate has a corresponding
1503  // state in clat_) is very simple. Just copy the arc over.
1504  arc.nextstate = next_iter->second;
1505  KALDI_ASSERT(arc.ilabel < kTokenLabelOffset ||
1506  arc.ilabel > kMaxTokenLabel);
1507  AddArcToClat(clat_state, arc);
1508  } else {
1509  // This is the case when the arc is to a `token-final` state (see
1510  // glossary.)
1511 
1512  // TODO: remove the following slightly excessive assertion?
1513  KALDI_ASSERT(chunk_clat.Final(arc.nextstate) != CompactLatticeWeight::Zero() &&
1514  arc.olabel >= (Label)kTokenLabelOffset &&
1515  arc.olabel < (Label)kMaxTokenLabel &&
1516  chunk_state_to_token.count(arc.nextstate) != 0 &&
1517  old_final_costs.count(arc.olabel) != 0);
1518 
1519  // Include the final-cost of the next state (which should be final)
1520  // in arc.weight.
1521  arc.weight = fst::Times(arc.weight,
1522  chunk_clat.Final(arc.nextstate));
1523 
1524  auto cost_iter = old_final_costs.find(arc.olabel);
1525  KALDI_ASSERT(cost_iter != old_final_costs.end());
1526  BaseFloat old_final_cost = cost_iter->second;
1527 
1528  // `arc` is going to become an element of final_arcs_. These
1529  // contain information about transitions from states in clat_ to
1530  // `token-final` states (i.e. states that have a token-label on the arc
1531  // to them and that are final in the canonical compact lattice).
1532  // We subtract the old_final_cost as it was just a temporary cost
1533  // introduced for pruning purposes.
1534  arc.weight.SetWeight(fst::Times(arc.weight.Weight(),
1535  LatticeWeight{-old_final_cost, 0.0}));
1536  // In a slight abuse of the Arc data structure, the nextstate is set to
1537  // the source state. The label (ilabel == olabel) indicates the
1538  // token it is associated with.
1539  arc.nextstate = clat_state;
1540  final_arcs_.push_back(arc);
1541  }
1542  }
1543  }
1544 
1545 }
fst::StdArc::StateId StateId
LatticeWeightTpl< FloatType > Times(const LatticeWeightTpl< FloatType > &w1, const LatticeWeightTpl< FloatType > &w2)
float BaseFloat
Definition: kaldi-types.h:29
fst::StdArc::Label Label
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
static const CompactLatticeWeightTpl< WeightType, IntType > Zero()
fst::ArcTpl< CompactLatticeWeight > CompactLatticeArc
Definition: kaldi-lattice.h:42
void AddArcToClat(CompactLattice::StateId state, const CompactLatticeArc &arc)
Adds one arc to `clat_`.
std::vector< CompactLatticeArc > final_arcs_

Member Data Documentation

◆ arcs_in_

std::vector<std::vector<std::pair<CompactLattice::StateId, int32> > > arcs_in_
private

Definition at line 420 of file lattice-incremental-decoder.h.

◆ clat_

CompactLattice clat_
private

Definition at line 411 of file lattice-incremental-decoder.h.

◆ config_

const LatticeIncrementalDecoderConfig& config_
private

Definition at line 395 of file lattice-incremental-decoder.h.

◆ final_arcs_

std::vector<CompactLatticeArc> final_arcs_
private

Definition at line 429 of file lattice-incremental-decoder.h.

◆ forward_costs_

std::vector<BaseFloat> forward_costs_
private

Definition at line 435 of file lattice-incremental-decoder.h.

◆ non_final_redet_states_

std::unordered_set<CompactLattice::StateId> non_final_redet_states_
private

Definition at line 403 of file lattice-incremental-decoder.h.

◆ temp_

std::unordered_set<int32> temp_
private

Definition at line 438 of file lattice-incremental-decoder.h.

◆ trans_model_

const TransitionModel& trans_model_
private

Definition at line 392 of file lattice-incremental-decoder.h.


The documentation for this class was generated from the following files: