InverseContextFst Class Reference

#include <context-fst.h>

Inheritance diagram for InverseContextFst:
Collaboration diagram for InverseContextFst:

Public Types

typedef StdArc Arc
 
typedef StdArc::StateId StateId
 
typedef StdArc::Weight Weight
 
typedef StdArc::Label Label
 
- Public Types inherited from DeterministicOnDemandFst< StdArc >
typedef StdArc ::StateId StateId
 
typedef StdArc ::Weight Weight
 
typedef StdArc ::Label Label
 

Public Member Functions

 InverseContextFst (Label subsequential_symbol, const std::vector< int32 > &phones, const std::vector< int32 > &disambig_syms, int32 context_width, int32 central_position)
 Constructor. More...
 
virtual StateId Start ()
 
virtual Weight Final (StateId s)
 
virtual bool GetArc (StateId s, Label ilabel, Arc *arc)
 Note: ilabel must not be epsilon. More...
 
 ~InverseContextFst ()
 
const std::vector< std::vector< int32 > > & IlabelInfo () const
 
void SwapIlabelInfo (std::vector< std::vector< int32 > > *vec)
 
- Public Member Functions inherited from DeterministicOnDemandFst< StdArc >
virtual Weight Final (StateId s)=0
 
virtual bool GetArc (StateId s, Label ilabel, StdArc *oarc)=0
 Note: ilabel must not be epsilon. More...
 
virtual ~DeterministicOnDemandFst ()
 

Private Types

typedef unordered_map< std::vector< int32 >, StateId, kaldi::VectorHasher< int32 > > VectorToStateMap
 
typedef unordered_map< std::vector< int32 >, Label, kaldi::VectorHasher< int32 > > VectorToLabelMap
 

Private Member Functions

StateId FindState (const std::vector< int32 > &seq)
 Returns the state-id corresponding to this vector of phones; creates the state it if necessary. More...
 
Label FindLabel (const std::vector< int32 > &label_info)
 Finds the label index corresponding to this context-window of phones (likely of width context_width_). More...
 
bool IsDisambigSymbol (Label lab)
 
bool IsPhoneSymbol (Label lab)
 
void CreateDisambigArc (StateId s, Label ilabel, Arc *arc)
 Create disambiguation-symbol self-loop arc; where 'ilabel' must correspond to a disambiguation symbol. More...
 
void CreatePhoneOrEpsArc (StateId src, StateId dst, Label ilabel, const std::vector< int32 > &phone_seq, Arc *arc)
 Creates an arc, this function is to be called only when 'ilabel' corresponds to a phone. More...
 
void ShiftSequenceLeft (Label label, std::vector< int32 > *phone_seq)
 If phone_seq is nonempty then this function it left by one and appends 'label' to it, otherwise it does nothing. More...
 
void GetFullPhoneSequence (const std::vector< int32 > &seq, Label label, std::vector< int32 > *full_phone_sequence)
 This utility function does something equivalent to the following 3 steps: *full_phone_sequence = seq; full_phone_sequence->append(label) Replace any values equal to 'subsequential_symbol_' in full_phone_sequence with zero (this is to avoid having to keep track of the value of 'subsequential_symbol_' outside of this program). More...
 

Private Attributes

int32 context_width_
 
int32 central_position_
 
kaldi::ConstIntegerSet< Labelphone_syms_
 
kaldi::ConstIntegerSet< Labeldisambig_syms_
 
Label subsequential_symbol_
 
int32 pseudo_eps_symbol_
 
VectorToStateMap state_map_
 
std::vector< std::vector< int32 > > state_seqs_
 
VectorToLabelMap ilabel_map_
 
std::vector< std::vector< int32 > > ilabel_info_
 

Detailed Description

Definition at line 152 of file context-fst.h.

Member Typedef Documentation

◆ Arc

typedef StdArc Arc

Definition at line 154 of file context-fst.h.

◆ Label

typedef StdArc::Label Label

Definition at line 157 of file context-fst.h.

◆ StateId

typedef StdArc::StateId StateId

Definition at line 155 of file context-fst.h.

◆ VectorToLabelMap

typedef unordered_map<std::vector<int32>, Label, kaldi::VectorHasher<int32> > VectorToLabelMap
private

Definition at line 256 of file context-fst.h.

◆ VectorToStateMap

typedef unordered_map<std::vector<int32>, StateId, kaldi::VectorHasher<int32> > VectorToStateMap
private

Definition at line 250 of file context-fst.h.

◆ Weight

typedef StdArc::Weight Weight

Definition at line 156 of file context-fst.h.

Constructor & Destructor Documentation

◆ InverseContextFst()

InverseContextFst ( Label  subsequential_symbol,
const std::vector< int32 > &  phones,
const std::vector< int32 > &  disambig_syms,
int32  context_width,
int32  central_position 
)

Constructor.

Parameters
[in]subsequential_symbolThe integer id of the 'subsequential symbol' (usually represented as '$') that terminates sequences on the output of C.fst (input of InverseContextFst). Search for "quential" in https://cs.nyu.edu/~mohri/pub/hbka.pdf. This may just be the first unused integer id. Must be nonzer.
[in]phonesList of integer ids of phones, as you would see in phones.txt
[in]disambig_symsList of integer ids of disambiguation symbols, e.g. the ids of #0, #1, #2 in phones.txt
[in]context_widthSize of context window, e.g. 3 for triphone.
[in]central_positionCentral position in context window (zero-based), e.g. 1 for triphone. See The ContextFst object for more details.

Definition at line 27 of file context-fst.cc.

References InverseContextFst::central_position_, InverseContextFst::context_width_, ConstIntegerSet< I >::count(), InverseContextFst::disambig_syms_, ConstIntegerSet< I >::empty(), InverseContextFst::FindLabel(), InverseContextFst::FindState(), rnnlm::i, KALDI_ASSERT, KALDI_WARN, InverseContextFst::phone_syms_, and InverseContextFst::pseudo_eps_symbol_.

32  :
33  context_width_(context_width),
34  central_position_(central_position),
35  phone_syms_(phones),
36  disambig_syms_(disambig_syms),
37  subsequential_symbol_(subsequential_symbol) {
38 
39  { // This block checks the inputs.
40  KALDI_ASSERT(subsequential_symbol != 0
41  && disambig_syms_.count(subsequential_symbol) == 0
42  && phone_syms_.count(subsequential_symbol) == 0);
43  if (phone_syms_.empty())
44  KALDI_WARN << "Context FST created but there are no phone symbols: probably "
45  "input FST was empty.";
46  KALDI_ASSERT(phone_syms_.count(0) == 0 && disambig_syms_.count(0) == 0 &&
48  for (size_t i = 0; i < phones.size(); i++) {
49  KALDI_ASSERT(disambig_syms_.count(phones[i]) == 0);
50  }
51  }
52 
53  // empty vector, will be the ilabel_info vector that corresponds to epsilon,
54  // in case our FST needs to output epsilons.
55  vector<int32> empty_vec;
56  Label epsilon_label = FindLabel(empty_vec);
57 
58  // epsilon_vec is the phonetic context window we have at the very start of a
59  // sequence, meaning "no real phones have been seen yet".
60  vector<int32> epsilon_vec(context_width_ - 1, 0);
61  StateId start_state = FindState(epsilon_vec);
62 
63  KALDI_ASSERT(epsilon_label == 0 && start_state == 0);
64 
66  // We add a symbol whose sequence representation is [ 0 ], and whose
67  // symbol-id is 1. This is treated as a disambiguation symbol, we call it
68  // #-1 in printed form. It is necessary to ensure that all determinizable
69  // LG's will have determinizable CLG's. The problem it fixes is quite
70  // subtle-- it relates to reordering of disambiguation symbols (they appear
71  // earlier in CLG than in LG, relative to phones), and the fact that if a
72  // disambig symbol appears at the very start of a sequence in CLG, it's not
73  // clear exatly where it appeared on the corresponding sequence at the input
74  // of LG.
75  vector<int32> pseudo_eps_vec;
76  pseudo_eps_vec.push_back(0);
77  pseudo_eps_symbol_= FindLabel(pseudo_eps_vec);
79  } else {
80  pseudo_eps_symbol_ = 0; // use actual epsilon.
81  }
82 }
StdArc::StateId StateId
Definition: context-fst.h:155
Label FindLabel(const std::vector< int32 > &label_info)
Finds the label index corresponding to this context-window of phones (likely of width context_width_)...
Definition: context-fst.cc:231
kaldi::ConstIntegerSet< Label > phone_syms_
Definition: context-fst.h:276
StdArc::Label Label
Definition: context-fst.h:157
StateId FindState(const std::vector< int32 > &seq)
Returns the state-id corresponding to this vector of phones; creates the state it if necessary...
Definition: context-fst.cc:216
#define KALDI_WARN
Definition: kaldi-error.h:150
kaldi::ConstIntegerSet< Label > disambig_syms_
Definition: context-fst.h:285
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185

◆ ~InverseContextFst()

~InverseContextFst ( )
inline

Definition at line 188 of file context-fst.h.

188 { }

Member Function Documentation

◆ CreateDisambigArc()

void CreateDisambigArc ( StateId  s,
Label  ilabel,
Arc arc 
)
inlineprivate

Create disambiguation-symbol self-loop arc; where 'ilabel' must correspond to a disambiguation symbol.

Called from CreateArc().

Definition at line 184 of file context-fst.cc.

References InverseContextFst::FindLabel().

Referenced by InverseContextFst::GetArc(), and InverseContextFst::IsPhoneSymbol().

184  {
185  // Creates a self-loop arc corresponding to the disambiguation symbol.
186  vector<int32> label_info; // This will be a vector containing just [ -olabel ].
187  label_info.push_back(-ilabel); // olabel is a disambiguation symbol. Use its negative
188  // so we can more easily distinguish them from phones.
189  Label olabel = FindLabel(label_info);
190  arc->ilabel = ilabel;
191  arc->olabel = olabel;
192  arc->weight = Weight::One();
193  arc->nextstate = s; // self-loop.
194 }
Label FindLabel(const std::vector< int32 > &label_info)
Finds the label index corresponding to this context-window of phones (likely of width context_width_)...
Definition: context-fst.cc:231
StdArc::Label Label
Definition: context-fst.h:157

◆ CreatePhoneOrEpsArc()

void CreatePhoneOrEpsArc ( StateId  src,
StateId  dst,
Label  ilabel,
const std::vector< int32 > &  phone_seq,
Arc arc 
)
inlineprivate

Creates an arc, this function is to be called only when 'ilabel' corresponds to a phone.

Called from CreateArc(). The olabel may end be epsilon, instead of a phone-in-context, if the system has right context and we are very near the beginning of the phone sequence.

Definition at line 196 of file context-fst.cc.

References InverseContextFst::central_position_, InverseContextFst::FindLabel(), KALDI_PARANOID_ASSERT, InverseContextFst::pseudo_eps_symbol_, and InverseContextFst::subsequential_symbol_.

Referenced by InverseContextFst::GetArc(), and InverseContextFst::IsPhoneSymbol().

199  {
201 
202  arc->ilabel = ilabel;
203  arc->weight = Weight::One();
204  arc->nextstate = dest;
205  if (phone_seq[central_position_] == 0) {
206  // This can happen at the beginning of the graph. In this case we don't
207  // output a real phone, we createdt an epsilon arc (but sometimes we need to
208  // use a special disambiguation symbol instead of epsilon).
209  arc->olabel = pseudo_eps_symbol_;
210  } else {
211  // We have a phone in the central position.
212  arc->olabel = FindLabel(phone_seq);
213  }
214 }
Label FindLabel(const std::vector< int32 > &label_info)
Finds the label index corresponding to this context-window of phones (likely of width context_width_)...
Definition: context-fst.cc:231
#define KALDI_PARANOID_ASSERT(cond)
Definition: kaldi-error.h:206

◆ Final()

InverseContextFst::Weight Final ( StateId  s)
virtual

Definition at line 109 of file context-fst.cc.

References InverseContextFst::central_position_, InverseContextFst::context_width_, KALDI_ASSERT, InverseContextFst::state_seqs_, and InverseContextFst::subsequential_symbol_.

Referenced by InverseContextFst::Start().

109  {
110  KALDI_ASSERT(static_cast<size_t>(s) < state_seqs_.size());
111 
112  const vector<int32> &phone_context = state_seqs_[s];
113 
114  KALDI_ASSERT(phone_context.size() == context_width_ - 1);
115 
116  bool has_final_prob;
117 
118  if (central_position_ < context_width_ - 1) {
119  has_final_prob = (phone_context[central_position_] == subsequential_symbol_);
120  // if phone_context[central_position_] != subsequential_symbol_ then we have
121  // pending phones-in-context that we still need to output, so we need to
122  // consume more subsequential symbols before we can terminate.
123  } else {
124  has_final_prob = true;
125  }
126  return has_final_prob ? Weight::One() : Weight::Zero();
127 }
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
StdArc::Weight Weight
Definition: context-fst.h:156
std::vector< std::vector< int32 > > state_seqs_
Definition: context-fst.h:318

◆ FindLabel()

StdArc::Label FindLabel ( const std::vector< int32 > &  label_info)
private

Finds the label index corresponding to this context-window of phones (likely of width context_width_).

Inserts it into the ilabel_info_/ilabel_map_ tables if necessary.

Definition at line 231 of file context-fst.cc.

References InverseContextFst::ilabel_info_, and InverseContextFst::ilabel_map_.

Referenced by InverseContextFst::CreateDisambigArc(), InverseContextFst::CreatePhoneOrEpsArc(), InverseContextFst::InverseContextFst(), and InverseContextFst::SwapIlabelInfo().

231  {
232  // Finds the ilabel corresponding to this vector (creates a new ilabel if
233  // necessary).
234  VectorToLabelMap::const_iterator iter = ilabel_map_.find(label_vec);
235  if (iter == ilabel_map_.end()) { // Not already in map.
236  Label this_label = ilabel_info_.size();
237  ilabel_info_.push_back(label_vec);
238  ilabel_map_[label_vec] = this_label;
239  return this_label;
240  } else {
241  return iter->second;
242  }
243 }
std::vector< std::vector< int32 > > ilabel_info_
Definition: context-fst.h:333
StdArc::Label Label
Definition: context-fst.h:157
VectorToLabelMap ilabel_map_
Definition: context-fst.h:325

◆ FindState()

StdArc::StateId FindState ( const std::vector< int32 > &  seq)
private

Returns the state-id corresponding to this vector of phones; creates the state it if necessary.

Requires seq.size() == context_width_ - 1.

Definition at line 216 of file context-fst.cc.

References InverseContextFst::context_width_, KALDI_ASSERT, InverseContextFst::state_map_, and InverseContextFst::state_seqs_.

Referenced by InverseContextFst::GetArc(), InverseContextFst::InverseContextFst(), and InverseContextFst::SwapIlabelInfo().

216  {
217  // Finds state-id corresponding to this vector of phones. Inserts it if
218  // necessary.
219  KALDI_ASSERT(static_cast<int32>(seq.size()) == context_width_ - 1);
220  VectorToStateMap::const_iterator iter = state_map_.find(seq);
221  if (iter == state_map_.end()) { // Not already in map.
222  StateId this_state_id = (StateId)state_seqs_.size();
223  state_seqs_.push_back(seq);
224  state_map_[seq] = this_state_id;
225  return this_state_id;
226  } else {
227  return iter->second;
228  }
229 }
VectorToStateMap state_map_
Definition: context-fst.h:314
StdArc::StateId StateId
Definition: context-fst.h:155
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
std::vector< std::vector< int32 > > state_seqs_
Definition: context-fst.h:318

◆ GetArc()

bool GetArc ( StateId  s,
Label  ilabel,
Arc arc 
)
virtual

Note: ilabel must not be epsilon.

Definition at line 129 of file context-fst.cc.

References InverseContextFst::central_position_, InverseContextFst::context_width_, InverseContextFst::CreateDisambigArc(), InverseContextFst::CreatePhoneOrEpsArc(), InverseContextFst::FindState(), InverseContextFst::GetFullPhoneSequence(), InverseContextFst::IsDisambigSymbol(), InverseContextFst::IsPhoneSymbol(), KALDI_ASSERT, KALDI_ERR, InverseContextFst::ShiftSequenceLeft(), InverseContextFst::state_seqs_, and InverseContextFst::subsequential_symbol_.

Referenced by InverseContextFst::Start().

129  {
130  KALDI_ASSERT(ilabel != 0 && static_cast<size_t>(s) < state_seqs_.size() &&
131  state_seqs_[s].size() == context_width_ - 1);
132 
133  if (IsDisambigSymbol(ilabel)) {
134  // A disambiguation-symbol self-loop arc.
135  CreateDisambigArc(s, ilabel, arc);
136  return true;
137  } else if (IsPhoneSymbol(ilabel)) {
138  const vector<int32> &seq = state_seqs_[s];
139  if (!seq.empty() && seq.back() == subsequential_symbol_) {
140  return false; // A real phone is not allowed to follow the subsequential
141  // symbol.
142  }
143 
144  // next_seq will be 'seq' shifted left by 1, with 'ilabel' appended.
145  vector<int32> next_seq(seq);
146  ShiftSequenceLeft(ilabel, &next_seq);
147 
148  // full-seq will be the full context window of size context_width_.
149  vector<int32> full_seq;
150  GetFullPhoneSequence(seq, ilabel, &full_seq);
151 
152  StateId next_s = FindState(next_seq);
153 
154  CreatePhoneOrEpsArc(s, next_s, ilabel, full_seq, arc);
155  return true;
156  } else if (ilabel == subsequential_symbol_) {
157  const vector<int32> &seq = state_seqs_[s];
158 
159  if (central_position_ + 1 == context_width_ ||
161  // We already had "enough" subsequential symbols in a row and don't want to
162  // accept any more, or we'd be making the subsequential symbol the central phone.
163  return false;
164  }
165 
166  // full-seq will be the full context window of size context_width_.
167  vector<int32> full_seq;
168  GetFullPhoneSequence(seq, ilabel, &full_seq);
169 
170  vector<int32> next_seq(seq);
171  ShiftSequenceLeft(ilabel, &next_seq);
172  StateId next_s = FindState(next_seq);
173 
174  CreatePhoneOrEpsArc(s, next_s, ilabel, full_seq, arc);
175  return true;
176  } else {
177  KALDI_ERR << "ContextFst: CreateArc, invalid ilabel supplied [confusion "
178  << "about phone list or disambig symbols?]: " << ilabel;
179  }
180  return false; // won't get here. suppress compiler error.
181 }
StdArc::StateId StateId
Definition: context-fst.h:155
void CreateDisambigArc(StateId s, Label ilabel, Arc *arc)
Create disambiguation-symbol self-loop arc; where &#39;ilabel&#39; must correspond to a disambiguation symbol...
Definition: context-fst.cc:184
void CreatePhoneOrEpsArc(StateId src, StateId dst, Label ilabel, const std::vector< int32 > &phone_seq, Arc *arc)
Creates an arc, this function is to be called only when &#39;ilabel&#39; corresponds to a phone...
Definition: context-fst.cc:196
bool IsDisambigSymbol(Label lab)
Definition: context-fst.h:213
void GetFullPhoneSequence(const std::vector< int32 > &seq, Label label, std::vector< int32 > *full_phone_sequence)
This utility function does something equivalent to the following 3 steps: *full_phone_sequence = seq;...
Definition: context-fst.cc:93
void ShiftSequenceLeft(Label label, std::vector< int32 > *phone_seq)
If phone_seq is nonempty then this function it left by one and appends &#39;label&#39; to it...
Definition: context-fst.cc:85
#define KALDI_ERR
Definition: kaldi-error.h:147
StateId FindState(const std::vector< int32 > &seq)
Returns the state-id corresponding to this vector of phones; creates the state it if necessary...
Definition: context-fst.cc:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
bool IsPhoneSymbol(Label lab)
Definition: context-fst.h:215
std::vector< std::vector< int32 > > state_seqs_
Definition: context-fst.h:318

◆ GetFullPhoneSequence()

void GetFullPhoneSequence ( const std::vector< int32 > &  seq,
Label  label,
std::vector< int32 > *  full_phone_sequence 
)
inlineprivate

This utility function does something equivalent to the following 3 steps: *full_phone_sequence = seq; full_phone_sequence->append(label) Replace any values equal to 'subsequential_symbol_' in full_phone_sequence with zero (this is to avoid having to keep track of the value of 'subsequential_symbol_' outside of this program).

This function assumes that seq.size() == context_width_ - 1, and also that 'subsequential_symbol_' does not appear in positions 0 through central_position_ of 'seq'.

Definition at line 93 of file context-fst.cc.

References InverseContextFst::central_position_, InverseContextFst::context_width_, rnnlm::i, and InverseContextFst::subsequential_symbol_.

Referenced by InverseContextFst::GetArc(), and InverseContextFst::IsPhoneSymbol().

95  {
96  int32 context_width = context_width_;
97  full_phone_sequence->reserve(context_width);
98  full_phone_sequence->insert(full_phone_sequence->end(),
99  seq.begin(), seq.end());
100  full_phone_sequence->push_back(label);
101  for (int32 i = central_position_ + 1; i < context_width; i++) {
102  if ((*full_phone_sequence)[i] == subsequential_symbol_) {
103  (*full_phone_sequence)[i] = 0;
104  }
105  }
106 }
kaldi::int32 int32

◆ IlabelInfo()

const std::vector<std::vector<int32> >& IlabelInfo ( ) const
inline

Definition at line 194 of file context-fst.h.

References InverseContextFst::ilabel_info_.

Referenced by fst::TestContextFst().

194  {
195  return ilabel_info_;
196  }
std::vector< std::vector< int32 > > ilabel_info_
Definition: context-fst.h:333

◆ IsDisambigSymbol()

bool IsDisambigSymbol ( Label  lab)
inlineprivate

Definition at line 213 of file context-fst.h.

References ConstIntegerSet< I >::count(), and InverseContextFst::disambig_syms_.

Referenced by InverseContextFst::GetArc().

213 { return (disambig_syms_.count(lab) != 0); }
kaldi::ConstIntegerSet< Label > disambig_syms_
Definition: context-fst.h:285

◆ IsPhoneSymbol()

◆ ShiftSequenceLeft()

void ShiftSequenceLeft ( Label  label,
std::vector< int32 > *  phone_seq 
)
inlineprivate

If phone_seq is nonempty then this function it left by one and appends 'label' to it, otherwise it does nothing.

We expect (but do not check) that phone_seq->size() == context_width_ - 1.

Definition at line 85 of file context-fst.cc.

Referenced by InverseContextFst::GetArc(), and InverseContextFst::IsPhoneSymbol().

86  {
87  if (!phone_seq->empty()) {
88  phone_seq->erase(phone_seq->begin());
89  phone_seq->push_back(label);
90  }
91 }

◆ Start()

virtual StateId Start ( )
inlinevirtual

Implements DeterministicOnDemandFst< StdArc >.

Definition at line 181 of file context-fst.h.

References InverseContextFst::Final(), and InverseContextFst::GetArc().

181 { return 0; }

◆ SwapIlabelInfo()

void SwapIlabelInfo ( std::vector< std::vector< int32 > > *  vec)
inline

Definition at line 200 of file context-fst.h.

References InverseContextFst::FindLabel(), InverseContextFst::FindState(), and InverseContextFst::ilabel_info_.

Referenced by fst::ComposeContext().

200 { ilabel_info_.swap(*vec); }
std::vector< std::vector< int32 > > ilabel_info_
Definition: context-fst.h:333

Member Data Documentation

◆ central_position_

◆ context_width_

◆ disambig_syms_

kaldi::ConstIntegerSet<Label> disambig_syms_
private

◆ ilabel_info_

std::vector<std::vector<int32> > ilabel_info_
private

◆ ilabel_map_

VectorToLabelMap ilabel_map_
private

Definition at line 325 of file context-fst.h.

Referenced by InverseContextFst::FindLabel().

◆ phone_syms_

◆ pseudo_eps_symbol_

int32 pseudo_eps_symbol_
private

◆ state_map_

VectorToStateMap state_map_
private

Definition at line 314 of file context-fst.h.

Referenced by InverseContextFst::FindState().

◆ state_seqs_

std::vector<std::vector<int32> > state_seqs_
private

◆ subsequential_symbol_


The documentation for this class was generated from the following files: