All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
pdf-to-counts.cc File Reference
Include dependency graph for pdf-to-counts.cc:

Go to the source code of this file.

Functions

int main (int argc, char *argv[])
 Sums the pdf vectors to counts. More...
 

Function Documentation

int main ( int  argc,
char *  argv[] 
)

Sums the pdf vectors to counts.

This is used to obtain priors for hybrid decoding.

Definition at line 27 of file pdf-to-counts.cc.

References SequentialTableReader< Holder >::Done(), ParseOptions::GetArg(), rnnlm::i, KALDI_LOG, SequentialTableReader< Holder >::Next(), ParseOptions::NumArgs(), ParseOptions::PrintUsage(), ParseOptions::Read(), ParseOptions::Register(), Output::Stream(), and SequentialTableReader< Holder >::Value().

27  {
28  using namespace kaldi;
29  typedef kaldi::int32 int32;
30  try {
31  const char *usage =
32  "Reads int32 vectors (same format as alignments, but typically\n"
33  "actually representing pdfs, e.g. output by ali-to-pdf), and outputs\n"
34  "counts for each index (typically one per per pdf), as a Vector<float>.\n"
35  "\n"
36  "Usage: pdf-to-counts [options] <pdfs-rspecifier> <counts-wxfilname>\n"
37  "e.g.: \n"
38  " ali-to-pdf \"ark:gunzip -c 1.ali.gz|\" ark:- | \\\n"
39  " pdf-to-counts --binary=false ark:- counts.txt\n";
40  ParseOptions po(usage);
41 
42  bool binary_write = false;
43  po.Register("binary", &binary_write, "Write in binary mode");
44 
45  po.Read(argc, argv);
46 
47  if (po.NumArgs() != 2) {
48  po.PrintUsage();
49  exit(1);
50  }
51 
52  std::string pdfs_rspecifier = po.GetArg(1),
53  counts_wxfilename = po.GetArg(2);
54 
55  SequentialInt32VectorReader pdfs_reader(pdfs_rspecifier);
56 
57  std::vector<int64> counts; // will turn to Vector<BaseFloat> after counting.
58  int32 num_done = 0;
59  for (; !pdfs_reader.Done(); pdfs_reader.Next()) {
60  std::vector<int32> alignment = pdfs_reader.Value();
61 
62  for (size_t i = 0; i < alignment.size(); i++) {
63  int32 value = alignment[i];
64  if(value >= counts.size()) {
65  counts.resize(value+1, 0);
66  }
67  counts[value]++; // accumulate counts
68  }
69  num_done++;
70  }
71 
72  //convert to BaseFloat and write.
73  Vector<BaseFloat> counts_f(counts.size());
74  for(int32 i = 0; i < counts.size(); i++) {
75  counts_f(i) = counts[i];
76  }
77 
78  Output ko(counts_wxfilename, binary_write);
79  counts_f.Write(ko.Stream(), binary_write);
80 
81  KALDI_LOG << "Summed " << num_done << " int32 vectors to counts, "
82  << "total count is " << counts_f.Sum() << ", dim is "
83  << counts_f.Dim();
84  return (num_done == 0 ? 1 : 0); // error exit status if processed nothing.
85  } catch(const std::exception &e) {
86  std::cerr << e.what();
87  return -1;
88  }
89 }
Relabels neural network egs with the read pdf-id alignments.
Definition: chain.dox:20
The class ParseOptions is for parsing command-line options; see Parsing command-line options for more...
Definition: parse-options.h:36
A templated class for reading objects sequentially from an archive or script file; see The Table conc...
Definition: kaldi-table.h:287
#define KALDI_LOG
Definition: kaldi-error.h:133