WordAlignLatticeLexiconInfo Class Reference

This class extracts some information from the lexicon and stores it in a suitable form for the word-alignment code to use. More...

#include <word-align-lattice-lexicon.h>

Collaboration diagram for WordAlignLatticeLexiconInfo:

Public Member Functions

 WordAlignLatticeLexiconInfo (const std::vector< std::vector< int32 > > &lexicon)
 
bool IsValidEntry (const std::vector< int32 > &entry) const
 Returns true if this lexicon-entry can appear, intepreted as (output-word phone1 phone2 ...). More...
 
int32 EquivalenceClassOf (int32 word) const
 Purely for the testing code, we map words into equivalence classes derived from the mappings in the first two fields of each line in the lexicon. More...
 

Protected Types

typedef unordered_map< std::vector< int32 >, std::vector< int32 >, VectorHasher< int32 > > ViabilityMap
 The type ViabilityMap maps from sequences of phones (excluding the empty sequence), to the sets of all word-labels [on the input lattice] that could correspond to phone sequences that start with s [but are longer than s]. More...
 
typedef unordered_map< std::vector< int32 >, int32, VectorHasher< int32 > > LexiconMap
 This is a map from a vector (orig-word-symbol phone1 phone2 ... More...
 
typedef unordered_map< int32, std::pair< int32, int32 > > NumPhonesMap
 This is a map from the word-id (as present in the original lattice) to the minimum and maximum #phones of lexicon entries for that word. More...
 
typedef unordered_map< int32, int32EquivalenceMap
 This is used only in testing code; it defines a mapping from a word to the primary member of that word's equivalence-class. More...
 

Protected Member Functions

void UpdateViabilityMap (const std::vector< int32 > &lexicon_entry)
 
void UpdateLexiconMap (const std::vector< int32 > &lexicon_entry)
 Update the map from a vector (orig-word-symbol phone1 phone2 ... More...
 
void UpdateNumPhonesMap (const std::vector< int32 > &lexicon_entry)
 
void UpdateEquivalenceMap (const std::vector< std::vector< int32 > > &lexicon)
 
void FinalizeViabilityMap ()
 

Protected Attributes

LexiconMap lexicon_map_
 
NumPhonesMap num_phones_map_
 
ViabilityMap viability_map_
 
LexiconMap reverse_lexicon_map_
 
EquivalenceMap equivalence_map_
 

Friends

class LatticeLexiconWordAligner
 

Detailed Description

This class extracts some information from the lexicon and stores it in a suitable form for the word-alignment code to use.

Definition at line 56 of file word-align-lattice-lexicon.h.

Member Typedef Documentation

◆ EquivalenceMap

typedef unordered_map<int32, int32> EquivalenceMap
protected

This is used only in testing code; it defines a mapping from a word to the primary member of that word's equivalence-class.

Definition at line 101 of file word-align-lattice-lexicon.h.

◆ LexiconMap

typedef unordered_map<std::vector<int32>, int32, VectorHasher<int32> > LexiconMap
protected

This is a map from a vector (orig-word-symbol phone1 phone2 ...

) to the new word-symbol. [todo: make sure the new word-symbol is always nonzero.]

Definition at line 92 of file word-align-lattice-lexicon.h.

◆ NumPhonesMap

typedef unordered_map<int32, std::pair<int32, int32> > NumPhonesMap
protected

This is a map from the word-id (as present in the original lattice) to the minimum and maximum #phones of lexicon entries for that word.

It helps improve efficiency.

Definition at line 97 of file word-align-lattice-lexicon.h.

◆ ViabilityMap

typedef unordered_map<std::vector<int32>, std::vector<int32>, VectorHasher<int32> > ViabilityMap
protected

The type ViabilityMap maps from sequences of phones (excluding the empty sequence), to the sets of all word-labels [on the input lattice] that could correspond to phone sequences that start with s [but are longer than s].

The sets of word-labels are represented as sorted vectors of int32 Note: the zero word-label is included here. This is used in a kind of co-accessibility test, to see whether it is worth extending this state by traversing arcs in the input lattice.

Definition at line 87 of file word-align-lattice-lexicon.h.

Constructor & Destructor Documentation

◆ WordAlignLatticeLexiconInfo()

WordAlignLatticeLexiconInfo ( const std::vector< std::vector< int32 > > &  lexicon)

Definition at line 896 of file word-align-lattice-lexicon.cc.

References rnnlm::i, and KALDI_ASSERT.

897  {
898  for (size_t i = 0; i < lexicon.size(); i++) {
899  const std::vector<int32> &lexicon_entry = lexicon[i];
900  KALDI_ASSERT(lexicon_entry.size() >= 2);
901  UpdateViabilityMap(lexicon_entry);
902  UpdateLexiconMap(lexicon_entry);
903  UpdateNumPhonesMap(lexicon_entry);
904  }
906  UpdateEquivalenceMap(lexicon);
907 }
void UpdateViabilityMap(const std::vector< int32 > &lexicon_entry)
void UpdateEquivalenceMap(const std::vector< std::vector< int32 > > &lexicon)
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
void UpdateNumPhonesMap(const std::vector< int32 > &lexicon_entry)
void UpdateLexiconMap(const std::vector< int32 > &lexicon_entry)
Update the map from a vector (orig-word-symbol phone1 phone2 ...

Member Function Documentation

◆ EquivalenceClassOf()

int32 EquivalenceClassOf ( int32  word) const

Purely for the testing code, we map words into equivalence classes derived from the mappings in the first two fields of each line in the lexicon.

This function maps from each word-id to the lowest member of its equivalence class.

Definition at line 866 of file word-align-lattice-lexicon.cc.

Referenced by kaldi::MapSymbols().

866  {
867  unordered_map<int32, int32>::const_iterator iter =
868  equivalence_map_.find(word);
869  if (iter == equivalence_map_.end()) return word; // not in map.
870  else return iter->second;
871 }

◆ FinalizeViabilityMap()

void FinalizeViabilityMap ( )
protected

Definition at line 791 of file word-align-lattice-lexicon.cc.

References KALDI_ASSERT, kaldi::SortAndUniq(), and words.

791  {
792  for (ViabilityMap::iterator iter = viability_map_.begin();
793  iter != viability_map_.end();
794  ++iter) {
795  std::vector<int32> &words = iter->second;
796  SortAndUniq(&words);
797  KALDI_ASSERT(words[0] >= 0 && "Error: negative labels in lexicon.");
798  }
799 }
int32 words[kMaxOrder]
void SortAndUniq(std::vector< T > *vec)
Sorts and uniq&#39;s (removes duplicates) from a vector.
Definition: stl-utils.h:39
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185

◆ IsValidEntry()

bool IsValidEntry ( const std::vector< int32 > &  entry) const

Returns true if this lexicon-entry can appear, intepreted as (output-word phone1 phone2 ...).

Entry contains new-word-id phone1 phone2 ...

This is just used in testing code.

equivalent to all but the 1st entry on a line of the input file.

Definition at line 853 of file word-align-lattice-lexicon.cc.

References KALDI_ASSERT.

Referenced by kaldi::IsPlausibleWord().

853  {
854  KALDI_ASSERT(!entry.empty());
855  LexiconMap::const_iterator iter = lexicon_map_.find(entry);
856  if (iter != lexicon_map_.end()) {
857  int32 tgt_word = (iter->second == kTemporaryEpsilon ? 0 : iter->second);
858  if (tgt_word == entry[0]) return true; // symmetric entry.
859  // this means that that there would be an output-word with this
860  // value, and this sequence of phones.
861  }
862  // For entries that were not symmetric:
863  return (reverse_lexicon_map_.count(entry) != 0);
864 }
kaldi::int32 int32
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
const int kTemporaryEpsilon

◆ UpdateEquivalenceMap()

void UpdateEquivalenceMap ( const std::vector< std::vector< int32 > > &  lexicon)
protected

Definition at line 873 of file word-align-lattice-lexicon.cc.

References rnnlm::i, KALDI_ASSERT, kaldi::SortAndUniq(), and kaldi::swap().

874  {
875  std::vector<std::pair<int32, int32> > equiv_pairs; // pairs of
876  // (lower,higher) words that are equivalent.
877  for (size_t i = 0; i < lexicon.size(); i++) {
878  KALDI_ASSERT(lexicon[i].size() >= 2);
879  int32 w1 = lexicon[i][0], w2 = lexicon[i][1];
880  if (w1 == w2) continue; // They are the same; this provides no information
881  // about equivalence, since any word is equivalent
882  // to itself.
883  if (w1 > w2) std::swap(w1, w2); // make sure w1 < w2.
884  equiv_pairs.push_back(std::make_pair(w1, w2));
885  }
886  SortAndUniq(&equiv_pairs);
887  equivalence_map_.clear();
888  for (size_t i = 0; i < equiv_pairs.size(); i++) {
889  int32 w1 = equiv_pairs[i].first, w2 = equiv_pairs[i].second,
890  w1dash = EquivalenceClassOf(w1);
891  equivalence_map_[w2] = w1dash;
892  }
893 }
void swap(basic_filebuf< CharT, Traits > &x, basic_filebuf< CharT, Traits > &y)
kaldi::int32 int32
void SortAndUniq(std::vector< T > *vec)
Sorts and uniq&#39;s (removes duplicates) from a vector.
Definition: stl-utils.h:39
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
int32 EquivalenceClassOf(int32 word) const
Purely for the testing code, we map words into equivalence classes derived from the mappings in the f...

◆ UpdateLexiconMap()

void UpdateLexiconMap ( const std::vector< int32 > &  lexicon_entry)
protected

Update the map from a vector (orig-word-symbol phone1 phone2 ...

) to the new word-symbol. The new word-symbol must always be nonzero; we'll replace it with kTemporaryEpsilon = -2, if it was zero.

Definition at line 804 of file word-align-lattice-lexicon.cc.

References KALDI_ASSERT, KALDI_ERR, KALDI_WARN, and kaldi::kTemporaryEpsilon.

805  {
806  KALDI_ASSERT(lexicon_entry.size() >= 2);
807  std::vector<int32> key;
808  key.reserve(lexicon_entry.size() - 1);
809  // add the original word:
810  key.push_back(lexicon_entry[0]);
811  // add the phones:
812  key.insert(key.end(), lexicon_entry.begin() + 2, lexicon_entry.end());
813  int32 new_word = lexicon_entry[1]; // This will typically be the same as
814  // the original word at lexicon_entry[0] but is allowed to differ.
815  if (new_word == 0) new_word = kTemporaryEpsilon; // replace 0's with -2;
816  // we'll revert the change at the end.
817  if (lexicon_map_.count(key) != 0) {
818  if (lexicon_map_[key] == new_word)
819  KALDI_WARN << "Duplicate entry in lexicon map for word " << lexicon_entry[0];
820  else
821  KALDI_ERR << "Duplicate entry in lexicon map for word " << lexicon_entry[0]
822  << " with inconsistent to-word.";
823  }
824  lexicon_map_[key] = new_word;
825 
826  if (lexicon_entry[0] != lexicon_entry[1]) {
827  // Add reverse lexicon entry, this time with no 0 -> -2 mapping.
828  key[0] = lexicon_entry[1];
829  // Note: we ignore the situation where there are conflicting
830  // entries in reverse_lexicon_map_, as we never actually inspect
831  // the contents so it won't matter.
832  reverse_lexicon_map_[key] = lexicon_entry[0];
833  }
834 }
kaldi::int32 int32
#define KALDI_ERR
Definition: kaldi-error.h:147
#define KALDI_WARN
Definition: kaldi-error.h:150
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
const int kTemporaryEpsilon

◆ UpdateNumPhonesMap()

void UpdateNumPhonesMap ( const std::vector< int32 > &  lexicon_entry)
protected

Definition at line 836 of file word-align-lattice-lexicon.cc.

References KALDI_ERR.

837  {
838  int32 num_phones = static_cast<int32>(lexicon_entry.size()) - 2;
839  int32 word = lexicon_entry[0];
840  if (num_phones_map_.count(word) == 0)
841  num_phones_map_[word] = std::make_pair(num_phones, num_phones);
842  else {
843  std::pair<int32, int32> &pr = num_phones_map_[word];
844  pr.first = std::min(pr.first, num_phones); // update min-num-phones
845  pr.second = std::max(pr.second, num_phones); // update max-num-phones
846  if (pr.first == 0 && word == 0)
847  KALDI_ERR << "Zero word with empty pronunciation is not allowed.";
848  }
849 }
kaldi::int32 int32
#define KALDI_ERR
Definition: kaldi-error.h:147

◆ UpdateViabilityMap()

void UpdateViabilityMap ( const std::vector< int32 > &  lexicon_entry)
protected

Definition at line 774 of file word-align-lattice-lexicon.cc.

References rnnlm::n.

775  {
776  int32 word = lexicon_entry[0]; // note: word may be zero.
777  int32 num_phones = static_cast<int32>(lexicon_entry.size()) - 2;
778  std::vector<int32> phones;
779  if (num_phones > 0)
780  phones.reserve(num_phones - 1);
781  // for each nonempty sequence of phones that is a strict prefix of the phones
782  // in the lexicon entry (i.e. lexicon_entry [2 ... ]), add the word to the set
783  // in viability_map_[phones].
784  for (int32 n = 0; n < num_phones - 1; n++) {
785  phones.push_back(lexicon_entry[n + 2]); // first phone is at position 2.
786  // n+1 is the length of the sequence of phones
787  viability_map_[phones].push_back(word);
788  }
789 }
kaldi::int32 int32
struct rnnlm::@11::@12 n

Friends And Related Function Documentation

◆ LatticeLexiconWordAligner

friend class LatticeLexiconWordAligner
friend

Definition at line 69 of file word-align-lattice-lexicon.h.

Member Data Documentation

◆ equivalence_map_

EquivalenceMap equivalence_map_
protected

Definition at line 116 of file word-align-lattice-lexicon.h.

◆ lexicon_map_

◆ num_phones_map_

◆ reverse_lexicon_map_

LexiconMap reverse_lexicon_map_
protected

Definition at line 111 of file word-align-lattice-lexicon.h.

◆ viability_map_

ViabilityMap viability_map_
protected

The documentation for this class was generated from the following files: