ArpaFileParser is an abstract base class for ARPA LM file conversion. More...
#include <arpa-file-parser.h>
Public Member Functions | |
ArpaFileParser (const ArpaParseOptions &options, fst::SymbolTable *symbols) | |
Constructs the parser with the given options and optional symbol table. More... | |
virtual | ~ArpaFileParser () |
void | Read (std::istream &is) |
Read ARPA LM file from a stream. More... | |
const ArpaParseOptions & | Options () const |
Parser options. More... | |
Protected Member Functions | |
virtual void | ReadStarted () |
Override called before reading starts. More... | |
virtual void | HeaderAvailable () |
Override function called to signal that ARPA header with the expected number of n-grams has been read, and ngram_counts() is now valid. More... | |
virtual void | ConsumeNGram (const NGram &)=0 |
Pure override that must be implemented to process current n-gram. More... | |
virtual void | ReadComplete () |
Override function called after the last n-gram has been consumed. More... | |
const fst::SymbolTable * | Symbols () const |
Read-only access to symbol table. Not owned, do not make public. More... | |
int32 | LineNumber () const |
Inside ConsumeNGram(), provides the current line number. More... | |
std::string | LineReference () const |
Inside ConsumeNGram(), returns a formatted reference to the line being compiled, to print out as part of diagnostics. More... | |
bool | ShouldWarn () |
Increments warning count, and returns true if a warning should be printed or false if the count has exceeded the set maximum. More... | |
const std::vector< int32 > & | NgramCounts () const |
N-gram counts. Valid from the point when HeaderAvailable() is called. More... | |
Private Attributes | |
ArpaParseOptions | options_ |
fst::SymbolTable * | symbols_ |
int32 | line_number_ |
uint32 | warning_count_ |
std::string | current_line_ |
std::vector< int32 > | ngram_counts_ |
ArpaFileParser is an abstract base class for ARPA LM file conversion.
See ConstArpaLmBuilder and ArpaLmCompiler for usage examples.
Definition at line 81 of file arpa-file-parser.h.
ArpaFileParser | ( | const ArpaParseOptions & | options, |
fst::SymbolTable * | symbols | ||
) |
Constructs the parser with the given options and optional symbol table.
If symbol table is provided, then the file should contain text n-grams, and the words are mapped to symbols through it. bos_symbol and eos_symbol in the options structure must be valid symbols in the table, and so must be unk_symbol if provided. The table is not owned by the parser, but may be augmented, if oov_handling is set to kAddToSymbols. If symbol table is a null pointer, the file should contain integer symbol values, and oov_handling has no effect. bos_symbol and eos_symbol must be valid symbols still.
Definition at line 32 of file arpa-file-parser.cc.
|
virtual |
Definition at line 38 of file arpa-file-parser.cc.
|
protectedpure virtual |
Pure override that must be implemented to process current n-gram.
The n-grams are sent in the file order, which guarantees that all (k-1)-grams are processed before the first k-gram is.
Implemented in ConstArpaLmBuilder, and ArpaLmCompiler.
Referenced by ArpaFileParser::Read().
|
inlineprotectedvirtual |
Override function called to signal that ARPA header with the expected number of n-grams has been read, and ngram_counts() is now valid.
Reimplemented in ConstArpaLmBuilder, and ArpaLmCompiler.
Definition at line 108 of file arpa-file-parser.h.
Referenced by ArpaFileParser::Read().
|
inlineprotected |
Inside ConsumeNGram(), provides the current line number.
Definition at line 122 of file arpa-file-parser.h.
|
protected |
Inside ConsumeNGram(), returns a formatted reference to the line being compiled, to print out as part of diagnostics.
Definition at line 270 of file arpa-file-parser.cc.
References ArpaFileParser::current_line_, and ArpaFileParser::line_number_.
Referenced by ArpaLmCompilerImpl< HistKey >::ConsumeNGram(), and ArpaFileParser::Read().
|
inlineprotected |
N-gram counts. Valid from the point when HeaderAvailable() is called.
Definition at line 133 of file arpa-file-parser.h.
|
inline |
void Read | ( | std::istream & | is | ) |
Read ARPA LM file from a stream.
Definition at line 45 of file arpa-file-parser.cc.
References NGram::backoff, ArpaParseOptions::bos_symbol, ArpaFileParser::ConsumeNGram(), kaldi::ConvertStringToInteger(), kaldi::ConvertStringToReal(), ArpaFileParser::current_line_, ArpaParseOptions::eos_symbol, ArpaFileParser::HeaderAvailable(), ArpaParseOptions::kAddToSymbols, KALDI_ERR, KALDI_LOG, KALDI_WARN, ArpaParseOptions::kReplaceWithUnk, ArpaParseOptions::kSkipNGram, ArpaFileParser::line_number_, ArpaFileParser::LineReference(), NGram::logprob, M_LN10, ArpaParseOptions::max_warnings, ArpaFileParser::ngram_counts_, ArpaParseOptions::oov_handling, ArpaFileParser::options_, PARSE_ERR, ArpaFileParser::ReadComplete(), ArpaFileParser::ReadStarted(), ArpaFileParser::ShouldWarn(), kaldi::SplitStringToVector(), ArpaFileParser::symbols_, kaldi::TrimTrailingWhitespace(), ArpaParseOptions::unk_symbol, ArpaFileParser::warning_count_, and NGram::words.
Referenced by kaldi::BuildConstArpaLm(), and kaldi::Compile().
|
inlineprotectedvirtual |
Override function called after the last n-gram has been consumed.
Reimplemented in ConstArpaLmBuilder, and ArpaLmCompiler.
Definition at line 116 of file arpa-file-parser.h.
Referenced by ArpaFileParser::Read().
|
inlineprotectedvirtual |
Override called before reading starts.
This is the point to prepare any state in the derived class.
Definition at line 104 of file arpa-file-parser.h.
Referenced by ArpaFileParser::Read().
|
protected |
Increments warning count, and returns true if a warning should be printed or false if the count has exceeded the set maximum.
Definition at line 276 of file arpa-file-parser.cc.
References ArpaParseOptions::max_warnings, ArpaFileParser::options_, and ArpaFileParser::warning_count_.
Referenced by ArpaLmCompilerImpl< HistKey >::ConsumeNGram(), and ArpaFileParser::Read().
|
inlineprotected |
Read-only access to symbol table. Not owned, do not make public.
Definition at line 119 of file arpa-file-parser.h.
|
private |
Definition at line 140 of file arpa-file-parser.h.
Referenced by ArpaFileParser::LineReference(), and ArpaFileParser::Read().
|
private |
Definition at line 138 of file arpa-file-parser.h.
Referenced by ArpaFileParser::LineReference(), and ArpaFileParser::Read().
|
private |
Definition at line 141 of file arpa-file-parser.h.
Referenced by ArpaFileParser::Read().
|
private |
Definition at line 136 of file arpa-file-parser.h.
Referenced by ArpaFileParser::Read(), and ArpaFileParser::ShouldWarn().
|
private |
Definition at line 137 of file arpa-file-parser.h.
Referenced by ArpaFileParser::Read().
|
private |
Definition at line 139 of file arpa-file-parser.h.
Referenced by ArpaFileParser::Read(), and ArpaFileParser::ShouldWarn().