InverseLeftBiphoneContextFst Class Reference

#include <grammar-context-fst.h>

Inheritance diagram for InverseLeftBiphoneContextFst:
Collaboration diagram for InverseLeftBiphoneContextFst:

Public Types

typedef StdArc Arc
 
typedef StdArc::StateId StateId
 
typedef StdArc::Weight Weight
 
typedef StdArc::Label Label
 
- Public Types inherited from DeterministicOnDemandFst< StdArc >
typedef StdArc ::StateId StateId
 
typedef StdArc ::Weight Weight
 
typedef StdArc ::Label Label
 

Public Member Functions

 InverseLeftBiphoneContextFst (Label nonterm_phones_offset, const std::vector< int32 > &phones, const std::vector< int32 > &disambig_syms)
 Constructor. More...
 
virtual StateId Start ()
 Here is a note on the state space of InverseLeftBiphoneContextFst; see Special symbols in C.fst which has some documentation on this. More...
 
virtual Weight Final (StateId s)
 
virtual bool GetArc (StateId s, Label ilabel, Arc *arc)
 Note: ilabel must not be epsilon. More...
 
 ~InverseLeftBiphoneContextFst ()
 
const std::vector< std::vector< int32 > > & IlabelInfo () const
 
void SwapIlabelInfo (std::vector< std::vector< int32 > > *vec)
 
- Public Member Functions inherited from DeterministicOnDemandFst< StdArc >
virtual Weight Final (StateId s)=0
 
virtual bool GetArc (StateId s, Label ilabel, StdArc *oarc)=0
 Note: ilabel must not be epsilon. More...
 
virtual ~DeterministicOnDemandFst ()
 

Private Types

typedef unordered_map< std::vector< int32 >, Label, kaldi::VectorHasher< int32 > > VectorToLabelMap
 

Private Member Functions

int32 GetPhoneSymbolFor (enum NonterminalValues n)
 
Label FindLabel (const std::vector< int32 > &label_info)
 Finds the label index corresponding to this context-window of phones (likely of width context_width_). More...
 

Private Attributes

int32 nonterm_phones_offset_
 
kaldi::ConstIntegerSet< Labelphone_syms_
 
kaldi::ConstIntegerSet< Labeldisambig_syms_
 
VectorToLabelMap ilabel_map_
 
std::vector< std::vector< int32 > > ilabel_info_
 

Detailed Description

Definition at line 146 of file grammar-context-fst.h.

Member Typedef Documentation

◆ Arc

typedef StdArc Arc

Definition at line 148 of file grammar-context-fst.h.

◆ Label

typedef StdArc::Label Label

Definition at line 151 of file grammar-context-fst.h.

◆ StateId

typedef StdArc::StateId StateId

Definition at line 149 of file grammar-context-fst.h.

◆ VectorToLabelMap

typedef unordered_map<std::vector<int32>, Label, kaldi::VectorHasher<int32> > VectorToLabelMap
private

Definition at line 248 of file grammar-context-fst.h.

◆ Weight

typedef StdArc::Weight Weight

Definition at line 150 of file grammar-context-fst.h.

Constructor & Destructor Documentation

◆ InverseLeftBiphoneContextFst()

InverseLeftBiphoneContextFst ( Label  nonterm_phones_offset,
const std::vector< int32 > &  phones,
const std::vector< int32 > &  disambig_syms 
)

Constructor.

This does not take the arguments 'context_width' or 'central_position' because they are assumed to be (2, 1) meaning a system with left-biphone context; and there is no subsequential symbol because it is not needed in systems without right context.

Parameters
[in]nonterm_phones_offsetThe integer id of the symbol #nonterm_bos in the phones.txt file. You can just set this to a large value (like 1 million) if you are not actually using nonterminals (e.g. for testing purposes).
[in]phonesList of integer ids of phones, as you would see in phones.txt
[in]disambig_symsList of integer ids of disambiguation symbols, e.g. the ids of #0, #1, #2 in phones.txt

See The ContextFst object for more details.

Definition at line 27 of file grammar-context-fst.cc.

References ConstIntegerSet< I >::empty(), InverseLeftBiphoneContextFst::FindLabel(), rnnlm::i, KALDI_ASSERT, KALDI_ERR, KALDI_WARN, InverseLeftBiphoneContextFst::phone_syms_, and kaldi::SortAndUniq().

30  :
31  nonterm_phones_offset_(nonterm_phones_offset),
32  phone_syms_(phones),
33  disambig_syms_(disambig_syms) {
34 
35  { // This block does some checks.
36  std::vector<int32> all_inputs(phones);
37  all_inputs.insert(all_inputs.end(), disambig_syms.begin(),
38  disambig_syms.end());
39  all_inputs.push_back(nonterm_phones_offset);
40  size_t size = all_inputs.size();
41  kaldi::SortAndUniq(&all_inputs);
42  if (all_inputs.size() != size) {
43  KALDI_ERR << "There was overlap between disambig symbols, phones, "
44  "and/or --nonterm-phones-offset";
45  }
46  if (all_inputs.front() <= 0)
47  KALDI_ERR << "Symbols <= 0 were passed in as phones, disambig-syms, "
48  "or nonterm-phones-offset.";
49  if (all_inputs.back() != nonterm_phones_offset) {
50  // the value passed --nonterm-phones-offset is not higher numbered
51  // than all the phones and disambig syms... do some more checking.
52  for (int32 i = 1; i < 4; i++) {
53  int32 symbol = nonterm_phones_offset + i;
54  // None of the symbols --nonterm-phones-offset + {kNontermBos, kNontermBegin,
55  // kNontermEnd, kNontermReenter, kNontermUserDefined}
56  // (i.e. the special symbols plus the first user-defined symbol) may be
57  // listed as phones or disambig symbols... this doesn't make sense. We
58  // do allow disambig symbols to be higher-numbered than the nonterminal
59  // sybols, just in case that happens to be needed, but they can't overlap.
60  if (std::binary_search(all_inputs.begin(), all_inputs.end(), symbol)) {
61  KALDI_ERR << "The symbol " << symbol
62  << " = --nonterm-phones-offset + " << i
63  << " was listed as a phone or disambig symbol.";
64  }
65  }
66  }
67  if (phone_syms_.empty())
68  KALDI_WARN << "Context FST created but there are no phone symbols: probably "
69  "input FST was empty.";
70  }
71 
72  // empty vector, will be the ilabel_info vector that corresponds to epsilon,
73  // in case our FST needs to output epsilons.
74  vector<int32> empty_vec;
75  Label epsilon_label = FindLabel(empty_vec);
76  // Make sure that a label is assigned for epsilon.
77  KALDI_ASSERT(epsilon_label == 0);
78 }
kaldi::int32 int32
void SortAndUniq(std::vector< T > *vec)
Sorts and uniq&#39;s (removes duplicates) from a vector.
Definition: stl-utils.h:39
#define KALDI_ERR
Definition: kaldi-error.h:147
#define KALDI_WARN
Definition: kaldi-error.h:150
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
Label FindLabel(const std::vector< int32 > &label_info)
Finds the label index corresponding to this context-window of phones (likely of width context_width_)...
kaldi::ConstIntegerSet< Label > disambig_syms_
kaldi::ConstIntegerSet< Label > phone_syms_

◆ ~InverseLeftBiphoneContextFst()

Definition at line 218 of file grammar-context-fst.h.

218 { }

Member Function Documentation

◆ Final()

Definition at line 81 of file grammar-context-fst.cc.

References ConstIntegerSet< I >::count(), InverseLeftBiphoneContextFst::GetPhoneSymbolFor(), fst::kNontermEnd, and InverseLeftBiphoneContextFst::phone_syms_.

Referenced by InverseLeftBiphoneContextFst::Start().

81  {
82  if (s == 0 || phone_syms_.count(s) != 0 ||
84  return Weight::One();
85  else
86  return Weight::Zero();
87 }
int32 GetPhoneSymbolFor(enum NonterminalValues n)
kaldi::ConstIntegerSet< Label > phone_syms_

◆ FindLabel()

StdArc::Label FindLabel ( const std::vector< int32 > &  label_info)
private

Finds the label index corresponding to this context-window of phones (likely of width context_width_).

Inserts it into the ilabel_info_/ilabel_map_ tables if necessary.

Definition at line 181 of file grammar-context-fst.cc.

References InverseLeftBiphoneContextFst::ilabel_info_, and InverseLeftBiphoneContextFst::ilabel_map_.

Referenced by InverseLeftBiphoneContextFst::GetArc(), InverseLeftBiphoneContextFst::GetPhoneSymbolFor(), and InverseLeftBiphoneContextFst::InverseLeftBiphoneContextFst().

181  {
182  // Finds the ilabel corresponding to this vector (creates a new ilabel if
183  // necessary).
184  VectorToLabelMap::const_iterator iter = ilabel_map_.find(label_vec);
185  if (iter == ilabel_map_.end()) { // Not already in map.
186  Label this_label = ilabel_info_.size();
187  ilabel_info_.push_back(label_vec);
188  ilabel_map_[label_vec] = this_label;
189  return this_label;
190  } else {
191  return iter->second;
192  }
193 }
std::vector< std::vector< int32 > > ilabel_info_

◆ GetArc()

bool GetArc ( StateId  s,
Label  ilabel,
Arc arc 
)
virtual

Note: ilabel must not be epsilon.

Definition at line 89 of file grammar-context-fst.cc.

References ConstIntegerSet< I >::count(), InverseLeftBiphoneContextFst::disambig_syms_, InverseLeftBiphoneContextFst::FindLabel(), InverseLeftBiphoneContextFst::GetPhoneSymbolFor(), KALDI_ASSERT, KALDI_ERR, fst::kNontermBegin, fst::kNontermBos, fst::kNontermEnd, fst::kNontermReenter, fst::kNontermUserDefined, and InverseLeftBiphoneContextFst::phone_syms_.

Referenced by InverseLeftBiphoneContextFst::Start().

90  {
91  // it's a rule of the DeterministicOnDemandFst that the ilabel cannot be zero.q
92  KALDI_ASSERT(ilabel != 0);
93 
94  arc->ilabel = ilabel;
95  arc->weight = Weight::One();
96 
97  if (s == 0 || phone_syms_.count(s) != 0) {
98  // This is an epsilon or phone state.
99  if (phone_syms_.count(ilabel) != 0) {
100  // The ilabel is a phone.
101  std::vector<int32> context_window(2);
102  context_window[0] = s;
103  context_window[1] = ilabel;
104  arc->olabel = FindLabel(context_window);
105  arc->nextstate = ilabel;
106  return true;
107  } else if (disambig_syms_.count(ilabel) != 0) {
108  // the ilabel is a disambiguation symbol. Make a self-loop arc that
109  // replicates the disambiguation symbol on the input.
110  // The ilabel-info vector for disambig symbols is just a single element
111  // consisting of the negative of the disambig symbols (for easier
112  // identification from code).
113  std::vector<int32> this_ilabel_info(1);
114  this_ilabel_info[0] = -ilabel;
115  arc->olabel = FindLabel(this_ilabel_info);
116  arc->nextstate = s;
117  return true;
118  } else if (ilabel == GetPhoneSymbolFor(kNontermBegin) &&
119  s == 0) {
120  // We were at the start state and saw the symbol #nonterm_begin.
121  // Output nothing, but transition to the special #nonterm_begin state.
122  // when we're in that state, arcs for phones generate special
123  // osymbols corresponding to pairs like (#nonterm_begin, p1).
124  arc->olabel = 0;
125  arc->nextstate = GetPhoneSymbolFor(kNontermBegin);
126  return true;
127  } else if (ilabel == GetPhoneSymbolFor(kNontermEnd)) {
128  // we saw #nonterm_end.
129  std::vector<int32> this_ilabel_info(2);
130  this_ilabel_info[0] = -(GetPhoneSymbolFor(kNontermEnd));
131  this_ilabel_info[1] = (s != 0 ? s : GetPhoneSymbolFor(kNontermBos));
132  arc->olabel = FindLabel(this_ilabel_info);
133  arc->nextstate = GetPhoneSymbolFor(kNontermEnd);
134  return true;
135  } else if (ilabel >= GetPhoneSymbolFor(kNontermUserDefined)) {
136  // Assume this ilabel is a user-defined nonterminal.
137  // Transition to the state kNontermUserDefined, with an olabel
138  // (#nonterm:foo, p1) where 'p1' is the current left-context.
139  std::vector<int32> this_ilabel_info(2);
140  this_ilabel_info[0] = -ilabel;
141  this_ilabel_info[1] = (s != 0 ? s : GetPhoneSymbolFor(kNontermBos));
142  arc->olabel = FindLabel(this_ilabel_info);
143  // the destination state is not specific to this user-defined symbol, it's
144  // a generic destination state.
145  arc->nextstate = GetPhoneSymbolFor(kNontermUserDefined);
146  return true;
147  } else {
148  return false;
149  }
150  } else if (s == GetPhoneSymbolFor(kNontermBegin)) {
151  if (phone_syms_.count(ilabel) != 0 || ilabel == GetPhoneSymbolFor(kNontermBos)) {
152  std::vector<int32> this_ilabel_info(2);
153  this_ilabel_info[0] = -GetPhoneSymbolFor(kNontermBegin);
154  this_ilabel_info[1] = ilabel;
155  arc->nextstate = (ilabel == GetPhoneSymbolFor(kNontermBos) ? 0 : ilabel);
156  arc->olabel = FindLabel(this_ilabel_info);
157  return true;
158  } else {
159  return false;
160  }
161  } else if (s == GetPhoneSymbolFor(kNontermEnd)) {
162  return false;
163  } else if (s == GetPhoneSymbolFor(kNontermUserDefined)) {
164  if (phone_syms_.count(ilabel) != 0 || ilabel == GetPhoneSymbolFor(kNontermBos)) {
165  std::vector<int32> this_ilabel_info(2);
166  this_ilabel_info[0] = -GetPhoneSymbolFor(kNontermReenter);
167  this_ilabel_info[1] = ilabel;
168  arc->nextstate = (ilabel == GetPhoneSymbolFor(kNontermBos) ? 0 : ilabel);
169  arc->olabel = FindLabel(this_ilabel_info);
170  return true;
171  } else {
172  return false;
173  }
174  } else {
175  // likely code error.
176  KALDI_ERR << "Invalid state encountered";
177  return false; // won't get here. suppress compiler error.
178  }
179 }
#define KALDI_ERR
Definition: kaldi-error.h:147
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
Label FindLabel(const std::vector< int32 > &label_info)
Finds the label index corresponding to this context-window of phones (likely of width context_width_)...
kaldi::ConstIntegerSet< Label > disambig_syms_
int32 GetPhoneSymbolFor(enum NonterminalValues n)
kaldi::ConstIntegerSet< Label > phone_syms_

◆ GetPhoneSymbolFor()

int32 GetPhoneSymbolFor ( enum NonterminalValues  n)
inlineprivate

◆ IlabelInfo()

const std::vector<std::vector<int32> >& IlabelInfo ( ) const
inline

Definition at line 224 of file grammar-context-fst.h.

References InverseLeftBiphoneContextFst::ilabel_info_.

224  {
225  return ilabel_info_;
226  }
std::vector< std::vector< int32 > > ilabel_info_

◆ Start()

virtual StateId Start ( )
inlinevirtual

Here is a note on the state space of InverseLeftBiphoneContextFst; see Special symbols in C.fst which has some documentation on this.

The state space uses the same numbering as phones.txt.

State 0 means the beginning-of-sequence state, where there is no left context.

For each phone p in the list 'phones' passed to the constructor (i.e. in the set passed to the constructor), the state 'p' corresponds to a left-context of that phone.

If p is equal to nonterm_phones_offset_ + kNontermBegin (i.e. the integer form of `#nonterm_begin`), then this is the state we transition to when we see that symbol starting from left-context==0 (no context). The transition to this special state will have epsilon on the output. (talking here about inv(C), not C, so input/output are reversed). The state is nonfinal and when we see a regular phone p1 or #nonterm_bos, instead of outputting that phone in context, we output the pair (#nonterm_begin,p1) or (#nonterm_begin,#nonterm_bos). This state is not final.

If p is equal to nonterm_phones_offset_ + kNontermUserDefined, then this is the state we transition to when we see any user-defined nonterminal. Transitions to this special state have olabels of the form (#nonterm:foo,p1) where p1 is the preceding context (with #nonterm_begin if that context was 0); transitions out of it have olabels of the form (#nonterm_reenter,p2), where p2 is the phone on the ilabel of that transition. Again: talking about inv(C). This state is not final.

If p is equal to nonterm_phones_offset_ + kNontermEnd, then this is the state we transition to when we see the ilabel #nonterm_end. The olabels on the transitions to it (talking here about inv(C), so ilabels and olabels are reversed) are of the form (#nonterm_end, p1) where p1 corresponds to the context we were in. This state is final.

Implements DeterministicOnDemandFst< StdArc >.

Definition at line 211 of file grammar-context-fst.h.

References InverseLeftBiphoneContextFst::Final(), and InverseLeftBiphoneContextFst::GetArc().

211 { return 0; }

◆ SwapIlabelInfo()

void SwapIlabelInfo ( std::vector< std::vector< int32 > > *  vec)
inline

Definition at line 230 of file grammar-context-fst.h.

References InverseLeftBiphoneContextFst::ilabel_info_.

Referenced by fst::ComposeContextLeftBiphone().

230 { ilabel_info_.swap(*vec); }
std::vector< std::vector< int32 > > ilabel_info_

Member Data Documentation

◆ disambig_syms_

kaldi::ConstIntegerSet<Label> disambig_syms_
private

Definition at line 264 of file grammar-context-fst.h.

Referenced by InverseLeftBiphoneContextFst::GetArc().

◆ ilabel_info_

std::vector<std::vector<int32> > ilabel_info_
private

◆ ilabel_map_

VectorToLabelMap ilabel_map_
private

Definition at line 272 of file grammar-context-fst.h.

Referenced by InverseLeftBiphoneContextFst::FindLabel().

◆ nonterm_phones_offset_

int32 nonterm_phones_offset_
private

◆ phone_syms_


The documentation for this class was generated from the following files: