This class extracts some information from the lexicon and stores it in a suitable form for the word-alignment code to use. More...

#include <word-align-lattice-lexicon.h>

Collaboration diagram for WordAlignLatticeLexiconInfo:

[legend]

Public Member Functions
	WordAlignLatticeLexiconInfo (const std::vector< std::vector< int32 > > &lexicon)

bool	IsValidEntry (const std::vector< int32 > &entry) const
	Returns true if this lexicon-entry can appear, intepreted as (output-word phone1 phone2 ...). More...

int32	EquivalenceClassOf (int32 word) const
	Purely for the testing code, we map words into equivalence classes derived from the mappings in the first two fields of each line in the lexicon. More...

Protected Types
typedef unordered_map< std::vector< int32 >, std::vector< int32 >, VectorHasher< int32 > >	ViabilityMap
	The type ViabilityMap maps from sequences of phones (excluding the empty sequence), to the sets of all word-labels [on the input lattice] that could correspond to phone sequences that start with s [but are longer than s]. More...

typedef unordered_map< std::vector< int32 >, int32, VectorHasher< int32 > >	LexiconMap
	This is a map from a vector (orig-word-symbol phone1 phone2 ... More...

typedef unordered_map< int32, std::pair< int32, int32 > >	NumPhonesMap
	This is a map from the word-id (as present in the original lattice) to the minimum and maximum #phones of lexicon entries for that word. More...

typedef unordered_map< int32, int32 >	EquivalenceMap
	This is used only in testing code; it defines a mapping from a word to the primary member of that word's equivalence-class. More...

Protected Member Functions
void	UpdateViabilityMap (const std::vector< int32 > &lexicon_entry)

void	UpdateLexiconMap (const std::vector< int32 > &lexicon_entry)
	Update the map from a vector (orig-word-symbol phone1 phone2 ... More...

void	UpdateNumPhonesMap (const std::vector< int32 > &lexicon_entry)

void	UpdateEquivalenceMap (const std::vector< std::vector< int32 > > &lexicon)

void	FinalizeViabilityMap ()

Protected Attributes
LexiconMap	lexicon_map_

NumPhonesMap	num_phones_map_

ViabilityMap	viability_map_

LexiconMap	reverse_lexicon_map_

EquivalenceMap	equivalence_map_

Friends
class	LatticeLexiconWordAligner

Detailed Description

This class extracts some information from the lexicon and stores it in a suitable form for the word-alignment code to use.

Definition at line 56 of file word-align-lattice-lexicon.h.

Member Typedef Documentation

◆ EquivalenceMap

typedef unordered_map<int32, int32> EquivalenceMap

protected

This is used only in testing code; it defines a mapping from a word to the primary member of that word's equivalence-class.

Definition at line 101 of file word-align-lattice-lexicon.h.

◆ LexiconMap

typedef unordered_map<std::vector<int32>, int32, VectorHasher<int32> > LexiconMap

protected

This is a map from a vector (orig-word-symbol phone1 phone2 ...

) to the new word-symbol. [todo: make sure the new word-symbol is always nonzero.]

Definition at line 92 of file word-align-lattice-lexicon.h.

◆ NumPhonesMap

typedef unordered_map<int32, std::pair<int32, int32> > NumPhonesMap

protected

This is a map from the word-id (as present in the original lattice) to the minimum and maximum #phones of lexicon entries for that word.

It helps improve efficiency.

Definition at line 97 of file word-align-lattice-lexicon.h.

◆ ViabilityMap

typedef unordered_map<std::vector<int32>, std::vector<int32>, VectorHasher<int32> > ViabilityMap

protected

The type ViabilityMap maps from sequences of phones (excluding the empty sequence), to the sets of all word-labels [on the input lattice] that could correspond to phone sequences that start with s [but are longer than s].

The sets of word-labels are represented as sorted vectors of int32 Note: the zero word-label is included here. This is used in a kind of co-accessibility test, to see whether it is worth extending this state by traversing arcs in the input lattice.

Definition at line 87 of file word-align-lattice-lexicon.h.

Constructor & Destructor Documentation

◆ WordAlignLatticeLexiconInfo()

WordAlignLatticeLexiconInfo ( const std::vector< std::vector< int32 > > & lexicon )

Definition at line 896 of file word-align-lattice-lexicon.cc.

References rnnlm::i, and KALDI_ASSERT.

                                                  {
   for (size_t i = 0; i < lexicon.size(); i++) {
     const std::vector<int32> &lexicon_entry = lexicon[i];
     KALDI_ASSERT(lexicon_entry.size() >= 2);
     UpdateViabilityMap(lexicon_entry);
     UpdateLexiconMap(lexicon_entry);
     UpdateNumPhonesMap(lexicon_entry);
   }
   FinalizeViabilityMap();
   UpdateEquivalenceMap(lexicon);
 }

Member Function Documentation

◆ EquivalenceClassOf()

int32 EquivalenceClassOf ( int32 word ) const

Purely for the testing code, we map words into equivalence classes derived from the mappings in the first two fields of each line in the lexicon.

This function maps from each word-id to the lowest member of its equivalence class.

Definition at line 866 of file word-align-lattice-lexicon.cc.

Referenced by kaldi::MapSymbols().

                                                                       {
   unordered_map<int32, int32>::const_iterator iter =
       equivalence_map_.find(word);
   if (iter == equivalence_map_.end()) return word; // not in map.
   else return iter->second;
 }

◆ FinalizeViabilityMap()

void FinalizeViabilityMap ( )

protected

Definition at line 791 of file word-align-lattice-lexicon.cc.

References KALDI_ASSERT, kaldi::SortAndUniq(), and words.

                                                        {
   for (ViabilityMap::iterator iter = viability_map_.begin();
        iter != viability_map_.end();
        ++iter) {
     std::vector<int32> &words = iter->second;
     SortAndUniq(&words);
     KALDI_ASSERT(words[0] >= 0 && "Error: negative labels in lexicon.");
   }
 }

◆ IsValidEntry()

bool IsValidEntry ( const std::vector< int32 > & entry ) const

Returns true if this lexicon-entry can appear, intepreted as (output-word phone1 phone2 ...).

Entry contains new-word-id phone1 phone2 ...

This is just used in testing code.

equivalent to all but the 1st entry on a line of the input file.

Definition at line 853 of file word-align-lattice-lexicon.cc.

References KALDI_ASSERT.

Referenced by kaldi::IsPlausibleWord().

                                                                                   {
   KALDI_ASSERT(!entry.empty());
   LexiconMap::const_iterator iter = lexicon_map_.find(entry);
   if (iter != lexicon_map_.end()) {
     int32 tgt_word = (iter->second == kTemporaryEpsilon ? 0 : iter->second);
     if (tgt_word == entry[0]) return true; // symmetric entry.
     // this means that that there would be an output-word with this
     // value, and this sequence of phones.
   }
   // For entries that were not symmetric:
   return (reverse_lexicon_map_.count(entry) != 0);
 }

◆ UpdateEquivalenceMap()

void UpdateEquivalenceMap ( const std::vector< std::vector< int32 > > & lexicon )

protected

Definition at line 873 of file word-align-lattice-lexicon.cc.

References rnnlm::i, KALDI_ASSERT, kaldi::SortAndUniq(), and kaldi::swap().

                                                  {
   std::vector<std::pair<int32, int32> > equiv_pairs; // pairs of
   // (lower,higher) words that are equivalent.
   for (size_t i = 0; i < lexicon.size(); i++) {
     KALDI_ASSERT(lexicon[i].size() >= 2);
     int32 w1 = lexicon[i][0], w2 = lexicon[i][1];
     if (w1 == w2) continue; // They are the same; this provides no information
                             // about equivalence, since any word is equivalent
                             // to itself.
     if (w1 > w2) std::swap(w1, w2); // make sure w1 < w2.
     equiv_pairs.push_back(std::make_pair(w1, w2));
   }
   SortAndUniq(&equiv_pairs);
   equivalence_map_.clear();
   for (size_t i = 0; i < equiv_pairs.size(); i++) {
     int32 w1 = equiv_pairs[i].first, w2 = equiv_pairs[i].second,
         w1dash = EquivalenceClassOf(w1);
     equivalence_map_[w2] = w1dash;
   }
 }

◆ UpdateLexiconMap()

void UpdateLexiconMap ( const std::vector< int32 > & lexicon_entry )

protected

Update the map from a vector (orig-word-symbol phone1 phone2 ...

) to the new word-symbol. The new word-symbol must always be nonzero; we'll replace it with kTemporaryEpsilon = -2, if it was zero.

Definition at line 804 of file word-align-lattice-lexicon.cc.

References KALDI_ASSERT, KALDI_ERR, KALDI_WARN, and kaldi::kTemporaryEpsilon.

                                            {
   KALDI_ASSERT(lexicon_entry.size() >= 2);
   std::vector<int32> key;
   key.reserve(lexicon_entry.size() - 1);
   // add the original word:
   key.push_back(lexicon_entry[0]);
   // add the phones:
   key.insert(key.end(), lexicon_entry.begin() + 2, lexicon_entry.end());
   int32 new_word = lexicon_entry[1]; // This will typically be the same as
   // the original word at lexicon_entry[0] but is allowed to differ.
   if (new_word == 0) new_word = kTemporaryEpsilon; // replace 0's with -2;
   // we'll revert the change at the end.
   if (lexicon_map_.count(key) != 0) {
     if (lexicon_map_[key] == new_word)
       KALDI_WARN << "Duplicate entry in lexicon map for word " << lexicon_entry[0];
     else
       KALDI_ERR << "Duplicate entry in lexicon map for word " << lexicon_entry[0]
                 << " with inconsistent to-word.";
   }
   lexicon_map_[key] = new_word;
 
   if (lexicon_entry[0] != lexicon_entry[1]) {
     // Add reverse lexicon entry, this time with no 0 -> -2 mapping.
     key[0] = lexicon_entry[1];
     // Note: we ignore the situation where there are conflicting
     // entries in reverse_lexicon_map_, as we never actually inspect
     // the contents so it won't matter.
     reverse_lexicon_map_[key] = lexicon_entry[0];
   }
 }

◆ UpdateNumPhonesMap()

void UpdateNumPhonesMap ( const std::vector< int32 > & lexicon_entry )

protected

Definition at line 836 of file word-align-lattice-lexicon.cc.

References KALDI_ERR.

                                            {
   int32 num_phones = static_cast<int32>(lexicon_entry.size()) - 2;
   int32 word = lexicon_entry[0];
   if (num_phones_map_.count(word) == 0)
     num_phones_map_[word] = std::make_pair(num_phones, num_phones);
   else {
     std::pair<int32, int32> &pr = num_phones_map_[word];
     pr.first = std::min(pr.first, num_phones); // update min-num-phones
     pr.second = std::max(pr.second, num_phones); // update max-num-phones
     if (pr.first == 0 && word == 0)
       KALDI_ERR << "Zero word with empty pronunciation is not allowed.";
   }
 }

◆ UpdateViabilityMap()

void UpdateViabilityMap ( const std::vector< int32 > & lexicon_entry )

protected

Definition at line 774 of file word-align-lattice-lexicon.cc.

References rnnlm::n.

                                            {
   int32 word = lexicon_entry[0];  // note: word may be zero.
   int32 num_phones = static_cast<int32>(lexicon_entry.size()) - 2;
   std::vector<int32> phones;
   if (num_phones > 0)
     phones.reserve(num_phones - 1);
   // for each nonempty sequence of phones that is a strict prefix of the phones
   // in the lexicon entry (i.e. lexicon_entry [2 ... ]), add the word to the set
   // in viability_map_[phones].
   for (int32 n = 0; n < num_phones - 1; n++) {
     phones.push_back(lexicon_entry[n + 2]); // first phone is at position 2.
     // n+1 is the length of the sequence of phones
     viability_map_[phones].push_back(word);
   }
 }