doc/determinize-lattice-pruned_8h_source.html

 // lat/determinize-lattice-pruned.h

 // Copyright 2009-2012  Microsoft Corporation
 //           2012-2013  Johns Hopkins University (Author: Daniel Povey)
 //                2014  Guoguo Chen

 // See ../../COPYING for clarification regarding multiple authors
 //
 // Licensed under the Apache License, Version 2.0 (the "License");
 // you may not use this file except in compliance with the License.
 // You may obtain a copy of the License at
 //
 //  http://www.apache.org/licenses/LICENSE-2.0
 //
 // THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 // KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
 // WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
 // MERCHANTABLITY OR NON-INFRINGEMENT.
 // See the Apache 2 License for the specific language governing permissions and
 // limitations under the License.

 #ifndef KALDI_LAT_DETERMINIZE_LATTICE_PRUNED_H_
 #define KALDI_LAT_DETERMINIZE_LATTICE_PRUNED_H_
 #include <fst/fstlib.h>
 #include <fst/fst-decl.h>
 #include <algorithm>
 #include <map>
 #include <set>
 #include <vector>
 #include "fstext/lattice-weight.h"
 #include "hmm/transition-model.h"
 #include "itf/options-itf.h"
 #include "lat/kaldi-lattice.h"

 namespace fst {


 // For example of usage, see test-determinize-lattice-pruned.cc

 /*
    DeterminizeLatticePruned implements a special form of determinization with
    epsilon removal, optimized for a phase of lattice generation.  This algorithm
    also does pruning at the same time-- the combination is more efficient as it
    somtimes prevents us from creating a lot of states that would later be pruned
    away.  This allows us to increase the lattice-beam and not have the algorithm
    blow up.  Also, because our algorithm processes states in order from those
    that appear on high-scoring paths down to those that appear on low-scoring
    paths, we can easily terminate the algorithm after a certain specified number
    of states or arcs.

    The input is an FST with weight-type BaseWeightType (usually a pair of floats,
    with a lexicographical type of order, such as LatticeWeightTpl<float>).
    Typically this would be a state-level lattice, with input symbols equal to
    words, and output-symbols equal to p.d.f's (so like the inverse of HCLG).  Imagine representing this as an
    acceptor of type CompactLatticeWeightTpl<float>, in which the input/output
    symbols are words, and the weights contain the original weights together with
    strings (with zero or one symbol in them) containing the original output labels
    (the p.d.f.'s).  We determinize this using acceptor determinization with
    epsilon removal.  Remember (from lattice-weight.h) that
    CompactLatticeWeightTpl has a special kind of semiring where we always take
    the string corresponding to the best cost (of type BaseWeightType), and
    discard the other.  This corresponds to taking the best output-label sequence
    (of p.d.f.'s) for each input-label sequence (of words).  We couldn't use the
    Gallic weight for this, or it would die as soon as it detected that the input
    FST was non-functional.  In our case, any acyclic FST (and many cyclic ones)
    can be determinized.
    We assume that there is a function
       Compare(const BaseWeightType &a, const BaseWeightType &b)
    that returns (-1, 0, 1) according to whether (a < b, a == b, a > b) in the
    total order on the BaseWeightType... this information should be the
    same as NaturalLess would give, but it's more efficient to do it this way.
    You can define this for things like TropicalWeight if you need to instantiate
    this class for that weight type.

    We implement this determinization in a special way to make it efficient for
    the types of FSTs that we will apply it to.  One issue is that if we
    explicitly represent the strings (in CompactLatticeWeightTpl) as vectors of
    type vector<IntType>, the algorithm takes time quadratic in the length of
    words (in states), because propagating each arc involves copying a whole
    vector (of integers representing p.d.f.'s).  Instead we use a hash structure
    where each string is a pointer (Entry*), and uses a hash from (Entry*,
    IntType), to the successor string (and a way to get the latest IntType and the
    ancestor Entry*).  [this is the class LatticeStringRepository].

    Another issue is that rather than representing a determinized-state as a
    collection of (state, weight), we represent it in a couple of reduced forms.
    Suppose a determinized-state is a collection of (state, weight) pairs; call
    this the "canonical representation".  Note: these collections are always
    normalized to remove any common weight and string part.  Define end-states as
    the subset of states that have an arc out of them with a label on, or are
    final.  If we represent a determinized-state a the set of just its (end-state,
    weight) pairs, this will be a valid and more compact representation, and will
    lead to a smaller set of determinized states (like early minimization).  Call
    this collection of (end-state, weight) pairs the "minimal representation".  As
    a mechanism to reduce compute, we can also consider another representation.
    In the determinization algorithm, we start off with a set of (begin-state,
    weight) pairs (where the "begin-states" are initial or have a label on the
    transition into them), and the "canonical representation" consists of the
    epsilon-closure of this set (i.e. follow epsilons).  Call this set of
    (begin-state, weight) pairs, appropriately normalized, the "initial
    representation".  If two initial representations are the same, the "canonical
    representation" and hence the "minimal representation" will be the same.  We
    can use this to reduce compute.  Note that if two initial representations are
    different, this does not preclude the other representations from being the same.

 */


 struct DeterminizeLatticePrunedOptions {
   float delta; // A small offset used to measure equality of weights.
   int max_mem; // If >0, determinization will fail and return false
   // when the algorithm's (approximate) memory consumption crosses this threshold.
   int max_loop; // If >0, can be used to detect non-determinizable input
   // (a case that wouldn't be caught by max_mem).
   int max_states;
   int max_arcs;
   float retry_cutoff;
   DeterminizeLatticePrunedOptions(): delta(kDelta),
                                      max_mem(-1),
                                      max_loop(-1),
                                      max_states(-1),
                                      max_arcs(-1),
                                      retry_cutoff(0.5) { }
   void Register (kaldi::OptionsItf *opts) {
     opts->Register("delta", &delta, "Tolerance used in determinization");
     opts->Register("max-mem", &max_mem, "Maximum approximate memory usage in "
                    "determinization (real usage might be many times this)");
     opts->Register("max-arcs", &max_arcs, "Maximum number of arcs in "
                    "output FST (total, not per state");
     opts->Register("max-states", &max_states, "Maximum number of arcs in output "
                    "FST (total, not per state");
     opts->Register("max-loop", &max_loop, "Option used to detect a particular "
                    "type of determinization failure, typically due to invalid input "
                    "(e.g., negative-cost loops)");
     opts->Register("retry-cutoff", &retry_cutoff, "Controls pruning un-determinized "
                    "lattice and retrying determinization: if effective-beam < "
                    "retry-cutoff * beam, we prune the raw lattice and retry.  Avoids "
                    "ever getting empty output for long segments.");
   }
 };

 struct DeterminizeLatticePhonePrunedOptions {
   // delta: a small offset used to measure equality of weights.
   float delta;
   // max_mem: if > 0, determinization will fail and return false when the
   // algorithm's (approximate) memory consumption crosses this threshold.
   int max_mem;
   // phone_determinize: if true, do a first pass determinization on both phones
   // and words.
   bool phone_determinize;
   // word_determinize: if true, do a second pass determinization on words only.
   bool word_determinize;
   // minimize: if true, push and minimize after determinization.
   bool minimize;
   DeterminizeLatticePhonePrunedOptions(): delta(kDelta),
                                           max_mem(50000000),
                                           phone_determinize(true),
                                           word_determinize(true),
                                           minimize(false) {}
   void Register (kaldi::OptionsItf *opts) {
     opts->Register("delta", &delta, "Tolerance used in determinization");
     opts->Register("max-mem", &max_mem, "Maximum approximate memory usage in "
                    "determinization (real usage might be many times this).");
     opts->Register("phone-determinize", &phone_determinize, "If true, do an "
                    "initial pass of determinization on both phones and words (see"
                    " also --word-determinize)");
     opts->Register("word-determinize", &word_determinize, "If true, do a second "
                    "pass of determinization on words only (see also "
                    "--phone-determinize)");
     opts->Register("minimize", &minimize, "If true, push and minimize after "
                    "determinization.");
   }
 };

 template<class Weight>
 bool DeterminizeLatticePruned(
     const ExpandedFst<ArcTpl<Weight> > &ifst,
     double prune,
     MutableFst<ArcTpl<Weight> > *ofst,
     DeterminizeLatticePrunedOptions opts = DeterminizeLatticePrunedOptions());


 /*  This is a version of DeterminizeLattice with a slightly more "natural" output format,
     where the output sequences are encoded using the CompactLatticeArcTpl template
     (i.e. the sequences of output symbols are represented directly as strings The input
     FST must be topologically sorted in order for the algorithm to work. For efficiency
     it is recommended to sort the ilabel for the input FST as well.
     Returns true on normal success, and false if it had to terminate the determinization
     earlier than specified by the "prune" beam-- that is, if it terminated because
     of the max_mem, max_loop or max_arcs constraints in the options.
     CAUTION: if Lattice is the input, you need to Invert() before calling this,
     so words are on the input side.
 */
 template<class Weight, class IntType>
 bool DeterminizeLatticePruned(
     const ExpandedFst<ArcTpl<Weight> >&ifst,
     double prune,
     MutableFst<ArcTpl<CompactLatticeWeightTpl<Weight, IntType> > > *ofst,
     DeterminizeLatticePrunedOptions opts = DeterminizeLatticePrunedOptions());

 template<class Weight>
 typename ArcTpl<Weight>::Label DeterminizeLatticeInsertPhones(
     const kaldi::TransitionModel &trans_model,
     MutableFst<ArcTpl<Weight> > *fst);

 template<class Weight>
 void DeterminizeLatticeDeletePhones(
     typename ArcTpl<Weight>::Label first_phone_label,
     MutableFst<ArcTpl<Weight> > *fst);

 template<class Weight, class IntType>
 bool DeterminizeLatticePhonePruned(
     const kaldi::TransitionModel &trans_model,
     const ExpandedFst<ArcTpl<Weight> > &ifst,
     double prune,
     MutableFst<ArcTpl<CompactLatticeWeightTpl<Weight, IntType> > > *ofst,
     DeterminizeLatticePhonePrunedOptions opts
       = DeterminizeLatticePhonePrunedOptions());

 template<class Weight, class IntType>
 bool DeterminizeLatticePhonePruned(
     const kaldi::TransitionModel &trans_model,
     MutableFst<ArcTpl<Weight> > *ifst,
     double prune,
     MutableFst<ArcTpl<CompactLatticeWeightTpl<Weight, IntType> > > *ofst,
     DeterminizeLatticePhonePrunedOptions opts
       = DeterminizeLatticePhonePrunedOptions());

 bool DeterminizeLatticePhonePrunedWrapper(
     const kaldi::TransitionModel &trans_model,
     MutableFst<kaldi::LatticeArc> *ifst,
     double prune,
     MutableFst<kaldi::CompactLatticeArc> *ofst,
     DeterminizeLatticePhonePrunedOptions opts
       = DeterminizeLatticePhonePrunedOptions());


 } // end namespace fst

 #endif
fst::DeterminizeLatticePhonePrunedOptions::phone_determinize
bool phone_determinize
Definition: determinize-lattice-pruned.h:153

fst::DeterminizeLatticePrunedOptions::retry_cutoff
float retry_cutoff
Definition: determinize-lattice-pruned.h:120

fst::DeterminizeLatticePhonePrunedOptions::Register
void Register(kaldi::OptionsItf *opts)
Definition: determinize-lattice-pruned.h:163

fst::DeterminizeLatticePruned
bool DeterminizeLatticePruned(const ExpandedFst< ArcTpl< Weight > > &ifst, double beam, MutableFst< ArcTpl< CompactLatticeWeightTpl< Weight, IntType > > > *ofst, DeterminizeLatticePrunedOptions opts)
Definition: determinize-lattice-pruned.cc:1196

fst
For an extended explanation of the framework of which grammar-fsts are a part, please see Support for...
Definition: graph.dox:21

fst::DeterminizeLatticePhonePrunedOptions::delta
float delta
Definition: determinize-lattice-pruned.h:147

fst::DeterminizeLatticePhonePruned
bool DeterminizeLatticePhonePruned(const kaldi::TransitionModel &trans_model, MutableFst< ArcTpl< Weight > > *ifst, double beam, MutableFst< ArcTpl< CompactLatticeWeightTpl< Weight, IntType > > > *ofst, DeterminizeLatticePhonePrunedOptions opts)
"Destructive" version of DeterminizeLatticePhonePruned() where the input lattice might be changed...
Definition: determinize-lattice-pruned.cc:1416

fst::DeterminizeLatticePhonePrunedOptions::DeterminizeLatticePhonePrunedOptions
DeterminizeLatticePhonePrunedOptions()
Definition: determinize-lattice-pruned.h:158

lattice-weight.h

kaldi::OptionsItf::Register
virtual void Register(const std::string &name, bool *ptr, const std::string &doc)=0

kaldi::TransitionModel
Definition: transition-model.h:123

kaldi-lattice.h

fst::DeterminizeLatticePrunedOptions::DeterminizeLatticePrunedOptions
DeterminizeLatticePrunedOptions()
Definition: determinize-lattice-pruned.h:121

fst::DeterminizeLatticePrunedOptions::delta
float delta
Definition: determinize-lattice-pruned.h:113

options-itf.h

fst::DeterminizeLatticePrunedOptions::max_loop
int max_loop
Definition: determinize-lattice-pruned.h:116

transition-model.h

fst::DeterminizeLatticePrunedOptions::Register
void Register(kaldi::OptionsItf *opts)
Definition: determinize-lattice-pruned.h:127

fst::CompactLatticeWeightTpl
Definition: lattice-weight.h:423

fst::DeterminizeLatticePhonePrunedOptions
Definition: determinize-lattice-pruned.h:145

kaldi::OptionsItf
Definition: options-itf.h:26

fst::Label
fst::StdArc::Label Label
Definition: deterministic-fst-test.cc:57

fst::DeterminizeLatticePhonePrunedOptions::max_mem
int max_mem
Definition: determinize-lattice-pruned.h:150

fst::DeterminizeLatticePhonePrunedOptions::minimize
bool minimize
Definition: determinize-lattice-pruned.h:157

fst::DeterminizeLatticePrunedOptions::max_mem
int max_mem
Definition: determinize-lattice-pruned.h:114

fst::DeterminizeLatticePhonePrunedOptions::word_determinize
bool word_determinize
Definition: determinize-lattice-pruned.h:155

fst::DeterminizeLatticePrunedOptions
Definition: determinize-lattice-pruned.h:112

fst::DeterminizeLatticeInsertPhones
ArcTpl< Weight >::Label DeterminizeLatticeInsertPhones(const kaldi::TransitionModel &trans_model, MutableFst< ArcTpl< Weight > > *fst)
This function takes in lattices and inserts phones at phone boundaries.
Definition: determinize-lattice-pruned.cc:1296

fst::DeterminizeLatticePrunedOptions::max_states
int max_states
Definition: determinize-lattice-pruned.h:118

fst::DeterminizeLatticePrunedOptions::max_arcs
int max_arcs
Definition: determinize-lattice-pruned.h:119

fst::DeterminizeLatticePhonePrunedWrapper
bool DeterminizeLatticePhonePrunedWrapper(const kaldi::TransitionModel &trans_model, MutableFst< kaldi::LatticeArc > *ifst, double beam, MutableFst< kaldi::CompactLatticeArc > *ofst, DeterminizeLatticePhonePrunedOptions opts)
This function is a wrapper of DeterminizeLatticePhonePruned() that works for Lattice type FSTs...
Definition: determinize-lattice-pruned.cc:1488

fst::DeterminizeLatticeDeletePhones
void DeterminizeLatticeDeletePhones(typename ArcTpl< Weight >::Label first_phone_label, MutableFst< ArcTpl< Weight > > *fst)
This function takes in lattices and deletes "phones" from them.
Definition: determinize-lattice-pruned.cc:1352