push-special.cc
Go to the documentation of this file.
1 // fstext/push-special.cc
2 
3 // Copyright 2012 Johns Hopkins University (author: Daniel Povey)
4 // 2012 Ehsan Variani, Pegah Ghahrmani
5 
6 // See ../../COPYING for clarification regarding multiple authors
7 //
8 // Licensed under the Apache License, Version 2.0 (the "License");
9 // you may not use this file except in compliance with the License.
10 // You may obtain a copy of the License at
11 //
12 // http://www.apache.org/licenses/LICENSE-2.0
13 //
14 // THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15 // KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
16 // WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
17 // MERCHANTABLITY OR NON-INFRINGEMENT.
18 // See the Apache 2 License for the specific language governing permissions and
19 // limitations under the License.
20 
21 #include "fstext/push-special.h"
22 #include "base/kaldi-error.h"
23 #include "base/kaldi-math.h"
24 
25 namespace fst {
26 
27 
28 /*
29 
30  This algorithm was briefly described in "COMBINING FORWARD AND BACKWARD SEARCH
31  IN DECODING" by Hannemann and Povey, ICASSP 2013,
32  http://www.danielpovey.com/files/2013_icassp_pingpong.pdf
33 
34  Below is the most relevant excerpt of the LaTeX source.
35 
36 Real backoff language models represented as WFSTs (\cite{Mohri:08}) will not
37 exactly sum to one because the backoff structure leads to duplicate paths for
38 some word sequences. In fact, such language models cannot be pushed at all in
39 the general case, because the total weight of the entire WFST may not be finite.
40 For our language model reversal we need a suitable pushing operation that will
41 always succeed.
42 
43 Our solution is to require a modified pushing operation such that each state
44 ``sums to'' the same quantity.
45 We were able to find an iterative algorithm that does this very efficiently in practice;
46 it is based on the power method for finding the top eigenvalue of a matrix.
47 Both for the math and the implementation, we find it more convenient to
48 use the probability semiring, i.e. we represent the transition-probabilities
49 as actual probabilities, not negative logs.
50 Let the transitions be written as a sparse matrix $\mathbf{P}$,
51 where $p_{ij}$ is the sum of all the probabilities of transitions between state $i$ and state $j$.
52 As a special case, if $j$ is the initial state,
53 then $p_{ij}$ is the final-probability of state $i$.
54 In our method we find the dominant eigenvector $\mathbf{v}$ of the matrix $\mathbf{P}$,
55 by starting from a random positive vector and iterating with the power method:
56 each time we let $\mathbf{v} \leftarrow \mathbf{P} \mathbf{v}$
57 and then renormalize the length of $\mathbf{v}$.
58 It is convenient to renormalize $\mathbf{v}$ so that $v_I$ is 1,
59 where $I$ is the initial state of the WFST\footnote{Note: in order to correctly
60 deal with the case of linear WFSTs, which have different eigenvalues
61 with the same magnitude but different complex phase,
62 we modify the iteration to $\mathbf{v} \leftarrow \mathbf{P} \mathbf{v} + 0.1 \mathbf{v}$.}.
63 This generally converges within several tens of iterations.
64 At the end we have a vector $\mathbf{v}$ with $v_I = 1$, and a scalar $\lambda > 0$, such that
65 \begin{equation}
66  \lambda \mathbf{v} = \mathbf{P} \mathbf{v} . \label{eqn:lambdav}
67 \end{equation}
68 Suppose we compute a modified transition matrix $\mathbf{P}'$, by letting
69 \begin{equation}
70  p'_{ij} = p_{ij} v_j / v_i .
71 \end{equation}
72 Then it is easy to show each row of $\mathbf{P}'$ sums to $\lambda$:
73 writing one element of Eq.~\ref{eqn:lambdav} as
74 \begin{equation}
75  \lambda v_i = \sum_j p_{ij} v_j,
76 \end{equation}
77 it easily follows that $\lambda = \sum_j p'_{ij}$.
78 We need to perform a similar transformation on the transition-probabilities and
79 final-probabilities of the WFST; the details are quite obvious, and the
80 equivalence with the original WFST is easy to show. Our algorithm is in
81 practice an order of magnitude faster than the more generic algorithm for
82 conventional weight-pushing of \cite{Mohri:02}, when applied to cyclic WFSTs.
83 
84  */
85 
87  typedef StdArc Arc;
90 
91  public:
92  // Everything happens in the initializer.
93  PushSpecialClass(VectorFst<StdArc> *fst,
94  float delta): fst_(fst) {
95  num_states_ = fst_->NumStates();
96  initial_state_ = fst_->Start();
97  occ_.resize(num_states_, 1.0 / sqrt(num_states_)); // unit length
98 
99  pred_.resize(num_states_);
100  for (StateId s = 0; s < num_states_; s++) {
101  for (ArcIterator<VectorFst<StdArc> > aiter(*fst, s);
102  !aiter.Done(); aiter.Next()) {
103  const Arc &arc = aiter.Value();
104  StateId t = arc.nextstate;
105  double weight = kaldi::Exp(-arc.weight.Value());
106  pred_[t].push_back(std::make_pair(s, weight));
107  }
108  double final = kaldi::Exp(-fst_->Final(s).Value());
109  if (final != 0.0)
110  pred_[initial_state_].push_back(std::make_pair(s, final));
111  }
112  Iterate(delta);
113  ModifyFst();
114  }
115  private:
116  double TestAccuracy() { // returns the error (the difference
117  // between the min and max weights).
118  double min_sum = 0, max_sum = 0;
119  for (StateId s = 0; s < num_states_; s++) {
120  double sum = 0.0;
121  for (ArcIterator<VectorFst<StdArc> > aiter(*fst_, s);
122  !aiter.Done(); aiter.Next()) {
123  const Arc &arc = aiter.Value();
124  StateId t = arc.nextstate;
125  sum += kaldi::Exp(-arc.weight.Value()) * occ_[t] / occ_[s];
126  }
127  sum += kaldi::Exp(-(fst_->Final(s).Value())) * occ_[initial_state_] / occ_[s];
128  if (s == 0) {
129  min_sum = sum;
130  max_sum = sum;
131  } else {
132  min_sum = std::min(min_sum, sum);
133  max_sum = std::max(max_sum, sum);
134  }
135  }
136  KALDI_VLOG(4) << "min,max is " << min_sum << " " << max_sum;
137  return kaldi::Log(max_sum / min_sum); // In FST world we'll actually
138  // dealing with logs, so the log of the ratio is more suitable
139  // to compare with delta (makes testing the algorithm easier).
140  }
141 
142 
143  void Iterate(float delta) {
144  // This is like the power method to find the top eigenvalue of a matrix.
145  // We limit it to 200 iters max, just in case something unanticipated
146  // happens, but we should exit due to the "delta" thing, usually after
147  // several tens of iterations.
148  int iter, max_iter = 200;
149 
150  for (iter = 0; iter < max_iter; iter++) {
151  std::vector<double> new_occ(num_states_);
152  // We initialize new_occ to 0.1 * occ. A simpler algorithm would
153  // initialize them to zero, so it's like the pure power method. This is
154  // like the power method on (M + 0.1 I), and we do it this way to avoid a
155  // problem we encountered with certain very simple linear FSTs where the
156  // eigenvalues of the weight matrix (including negative and imaginary
157  // ones) all have the same magnitude.
158  for (int i = 0; i < num_states_; i++)
159  new_occ[i] = 0.1 * occ_[i];
160 
161  for (int i = 0; i < num_states_; i++) {
162  std::vector<std::pair<StateId, double> >::const_iterator iter,
163  end = pred_[i].end();
164  for (iter = pred_[i].begin(); iter != end; ++iter) {
165  StateId j = iter->first;
166  double p = iter->second;
167  new_occ[j] += occ_[i] * p;
168  }
169  }
170  double sumsq = 0.0;
171  for (int i = 0; i < num_states_; i++) sumsq += new_occ[i] * new_occ[i];
172  lambda_ = std::sqrt(sumsq);
173  double inv_lambda = 1.0 / lambda_;
174  for (int i = 0; i < num_states_; i++) occ_[i] = new_occ[i] * inv_lambda;
175  KALDI_VLOG(4) << "Lambda is " << lambda_;
176  if (iter % 5 == 0 && iter > 0 && TestAccuracy() <= delta) {
177  KALDI_VLOG(3) << "Weight-pushing converged after " << iter
178  << " iterations.";
179  return;
180  }
181  }
182  KALDI_WARN << "push-special: finished " << iter
183  << " iterations without converging. Output will be inaccurate.";
184  }
185 
186 
187  // Modifies the FST weights and the final-prob to take account of these potentials.
188  void ModifyFst() {
189  // First get the potentials as negative-logs, like the values
190  // in the FST.
191  for (StateId s = 0; s < num_states_; s++) {
192  occ_[s] = -kaldi::Log(occ_[s]);
193  if (KALDI_ISNAN(occ_[s]) || KALDI_ISINF(occ_[s]))
194  KALDI_WARN << "NaN or inf found: " << occ_[s];
195  }
196  for (StateId s = 0; s < num_states_; s++) {
197  for (MutableArcIterator<VectorFst<StdArc> > aiter(fst_, s);
198  !aiter.Done(); aiter.Next()) {
199  Arc arc = aiter.Value();
200  StateId t = arc.nextstate;
201  arc.weight = Weight(arc.weight.Value() + occ_[t] - occ_[s]);
202  aiter.SetValue(arc);
203  }
204  fst_->SetFinal(s, Times(fst_->Final(s).Value(),
205  Weight(occ_[initial_state_] - occ_[s])));
206  }
207  }
208 
209  private:
210  StateId num_states_;
211  StateId initial_state_;
212  std::vector<double> occ_; // the top eigenvector of (matrix of weights) transposed.
213  double lambda_; // our current estimate of the top eigenvalue.
214 
215  std::vector<std::vector<std::pair<StateId, double> > > pred_; // List of transitions
216  // into each state. For the start state, this list consists of the list of
217  // states with final-probs, each with their final prob.
218 
219  VectorFst<StdArc> *fst_;
220 
221 };
222 
223 
224 
225 
226 void PushSpecial(VectorFst<StdArc> *fst, float delta) {
227  if (fst->NumStates() > 0)
228  PushSpecialClass c(fst, delta); // all the work
229  // gets done in the initializer.
230 }
231 
232 
233 } // end namespace fst.
234 
235 
236 /*
237  Note: in testing an earlier, simpler
238  version of this method (without the 0.1 * old_occ) we had a problem with the following FST.
239 0 2 3 3 0
240 1 3 1 4 0.5
241 2 1 0 0 0.5
242 3 0.25
243 
244  Corresponds to the following matrix [or maybe its transpose, doesn't matter
245  probably]
246 
247  a=exp(-0.5)
248  b=exp(-0.25)
249 M = [ 0 1 0 0
250  0 0 a 0
251  0 0 0 a
252  b 0 0 0 ]
253 
254 eigs(M)
255 eigs(M)
256 
257 ans =
258 
259  -0.0000 - 0.7316i
260  -0.0000 + 0.7316i
261  0.7316
262  -0.7316
263 
264  OK, so the issue appears to be that all the eigenvalues of this matrix
265  have the same magnitude. The solution is to work with the eigenvalues
266  of M + alpha I, for some small alpha such as 0.1 (so as not to slow down
267  convergence in the normal case).
268 
269 */
fst::StdArc::StateId StateId
double Exp(double x)
Definition: kaldi-math.h:83
For an extended explanation of the framework of which grammar-fsts are a part, please see Support for...
Definition: graph.dox:21
fst::StdArc StdArc
#define KALDI_ISINF
Definition: kaldi-math.h:73
VectorFst< StdArc > * fst_
LatticeWeightTpl< FloatType > Times(const LatticeWeightTpl< FloatType > &w1, const LatticeWeightTpl< FloatType > &w2)
std::vector< double > occ_
double Log(double x)
Definition: kaldi-math.h:100
#define KALDI_WARN
Definition: kaldi-error.h:150
PushSpecialClass(VectorFst< StdArc > *fst, float delta)
Definition: push-special.cc:93
void Iterate(float delta)
fst::StdArc::Weight Weight
#define KALDI_ISNAN
Definition: kaldi-math.h:72
void PushSpecial(VectorFst< StdArc > *fst, float delta)
std::vector< std::vector< std::pair< StateId, double > > > pred_
#define KALDI_VLOG(v)
Definition: kaldi-error.h:156
Arc::StateId StateId
Definition: push-special.cc:89