All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
ExampleMergingStats Class Reference

This class is responsible for storing, and displaying in log messages, statistics about how examples of different sizes (c.f. More...

#include <nnet-example-utils.h>

Collaboration diagram for ExampleMergingStats:

Classes

struct  StatsForExampleSize
 

Public Member Functions

void WroteExample (int32 example_size, size_t structure_hash, int32 minibatch_size)
 Users call this function to inform this class that one minibatch has been written aggregating 'minibatch_size' separate examples of original size 'example_size' (e.g. More...
 
void DiscardedExamples (int32 example_size, size_t structure_hash, int32 num_discarded)
 Users call this function to inform this class that after processing all the data, for examples of original size 'example_size', 'num_discarded' examples could not be put into a minibatch and were discarded. More...
 
void PrintStats () const
 Calling this will cause a log message with information about the examples to be printed. More...
 

Private Types

typedef unordered_map
< std::pair< int32, size_t >
, StatsForExampleSize,
PairHasher< int32, size_t > > 
StatsType
 

Private Member Functions

void PrintAggregateStats () const
 
void PrintSpecificStats () const
 

Private Attributes

StatsType stats_
 

Detailed Description

This class is responsible for storing, and displaying in log messages, statistics about how examples of different sizes (c.f.

GetNnetExampleSize()) were merged into minibatches, and how many examples were left over and discarded.

Definition at line 427 of file nnet-example-utils.h.

Member Typedef Documentation

typedef unordered_map<std::pair<int32, size_t>, StatsForExampleSize, PairHasher<int32, size_t> > StatsType
private

Definition at line 464 of file nnet-example-utils.h.

Member Function Documentation

void DiscardedExamples ( int32  example_size,
size_t  structure_hash,
int32  num_discarded 
)

Users call this function to inform this class that after processing all the data, for examples of original size 'example_size', 'num_discarded' examples could not be put into a minibatch and were discarded.

Definition at line 1047 of file nnet-example-utils.cc.

Referenced by DiscriminativeExampleMerger::Finish(), ChainExampleMerger::Finish(), and ExampleMerger::Finish().

1049  {
1050  std::pair<int32, size_t> p(example_size, structure_hash);
1051  stats_[p].num_discarded += num_discarded;
1052 }
void PrintAggregateStats ( ) const
private

Definition at line 1060 of file nnet-example-utils.cc.

References KALDI_LOG, ExampleMergingStats::StatsForExampleSize::minibatch_to_num_written, and ExampleMergingStats::StatsForExampleSize::num_discarded.

1060  {
1061  // First print some aggregate stats.
1062  int64 num_distinct_egs_types = 0, // number of distinct types of input egs
1063  // (differing in size or structure).
1064  total_discarded_egs = 0, // total number of discarded egs.
1065  total_discarded_egs_size = 0, // total number of discarded egs each multiplied by size
1066  // of that eg
1067  total_non_discarded_egs = 0, // total over all minibatches written, of
1068  // minibatch-size, equals number of input egs
1069  // that were not discarded.
1070  total_non_discarded_egs_size = 0, // total over all minibatches of size-of-eg
1071  // * minibatch-size.
1072  num_minibatches = 0, // total number of minibatches
1073  num_distinct_minibatch_types = 0; // total number of combination of
1074  // (type-of-eg, number of distinct
1075  // minibatch-sizes for that eg-type)-
1076  // reflects the number of time we have
1077  // to compile.
1078 
1079  StatsType::const_iterator eg_iter = stats_.begin(), eg_end = stats_.end();
1080 
1081  for (; eg_iter != eg_end; ++eg_iter) {
1082  int32 eg_size = eg_iter->first.first;
1083  const StatsForExampleSize &stats = eg_iter->second;
1084  num_distinct_egs_types++;
1085  total_discarded_egs += stats.num_discarded;
1086  total_discarded_egs_size += stats.num_discarded * eg_size;
1087 
1088  unordered_map<int32, int32>::const_iterator
1089  mb_iter = stats.minibatch_to_num_written.begin(),
1090  mb_end = stats.minibatch_to_num_written.end();
1091  for (; mb_iter != mb_end; ++mb_iter) {
1092  int32 mb_size = mb_iter->first,
1093  num_written = mb_iter->second;
1094  num_distinct_minibatch_types++;
1095  num_minibatches += num_written;
1096  total_non_discarded_egs += num_written * mb_size;
1097  total_non_discarded_egs_size += num_written * mb_size * eg_size;
1098  }
1099  }
1100  // the averages are written as integers- we don't really need more precision
1101  // than that.
1102  int64 total_input_egs = total_discarded_egs + total_non_discarded_egs,
1103  total_input_egs_size =
1104  total_discarded_egs_size + total_non_discarded_egs_size;
1105 
1106  float avg_input_egs_size = total_input_egs_size * 1.0 / total_input_egs;
1107  float percent_discarded = total_discarded_egs * 100.0 / total_input_egs;
1108  // note: by minibatch size we mean the number of egs per minibatch, it
1109  // does not take note of the size of the input egs.
1110  float avg_minibatch_size = total_non_discarded_egs * 1.0 / num_minibatches;
1111 
1112  std::ostringstream os;
1113  os << std::setprecision(4);
1114  os << "Processed " << total_input_egs
1115  << " egs of avg. size " << avg_input_egs_size
1116  << " into " << num_minibatches << " minibatches, discarding "
1117  << percent_discarded << "% of egs. Avg minibatch size was "
1118  << avg_minibatch_size << ", #distinct types of egs/minibatches "
1119  << "was " << num_distinct_egs_types << "/"
1120  << num_distinct_minibatch_types;
1121  KALDI_LOG << os.str();
1122 }
#define KALDI_LOG
Definition: kaldi-error.h:133
void PrintSpecificStats ( ) const
private

Definition at line 1124 of file nnet-example-utils.cc.

References KALDI_LOG, ExampleMergingStats::StatsForExampleSize::minibatch_to_num_written, and ExampleMergingStats::StatsForExampleSize::num_discarded.

1124  {
1125  KALDI_LOG << "Merged specific eg types as follows [format: <eg-size1>="
1126  "{<mb-size1>-><num-minibatches1>,<mbsize2>-><num-minibatches2>.../d=<num-discarded>}"
1127  ",<egs-size2>={...},... (note,egs-size == number of input "
1128  "frames including context).";
1129  std::ostringstream os;
1130 
1131  // copy from unordered map to map to get sorting, for consistent output.
1132  typedef std::map<std::pair<int32, size_t>, StatsForExampleSize> SortedMapType;
1133 
1134  SortedMapType stats;
1135  stats.insert(stats_.begin(), stats_.end());
1136  SortedMapType::const_iterator eg_iter = stats.begin(), eg_end = stats.end();
1137  for (; eg_iter != eg_end; ++eg_iter) {
1138  int32 eg_size = eg_iter->first.first;
1139  if (eg_iter != stats.begin())
1140  os << ",";
1141  os << eg_size << "={";
1142  const StatsForExampleSize &stats = eg_iter->second;
1143  unordered_map<int32, int32>::const_iterator
1144  mb_iter = stats.minibatch_to_num_written.begin(),
1145  mb_end = stats.minibatch_to_num_written.end();
1146  for (; mb_iter != mb_end; ++mb_iter) {
1147  int32 mb_size = mb_iter->first,
1148  num_written = mb_iter->second;
1149  if (mb_iter != stats.minibatch_to_num_written.begin())
1150  os << ",";
1151  os << mb_size << "->" << num_written;
1152  }
1153  os << ",d=" << stats.num_discarded << "}";
1154  }
1155  KALDI_LOG << os.str();
1156 }
#define KALDI_LOG
Definition: kaldi-error.h:133
void PrintStats ( ) const

Calling this will cause a log message with information about the examples to be printed.

Definition at line 1055 of file nnet-example-utils.cc.

Referenced by DiscriminativeExampleMerger::Finish(), ChainExampleMerger::Finish(), and ExampleMerger::Finish().

1055  {
1058 }
void WroteExample ( int32  example_size,
size_t  structure_hash,
int32  minibatch_size 
)

Users call this function to inform this class that one minibatch has been written aggregating 'minibatch_size' separate examples of original size 'example_size' (e.g.

as determined by GetNnetExampleSize(), but the caller does that. The 'structure_hash' is provided so that this class can distinguish between egs that have the same size but different structure. In the extremely unlikely eventuality that there is a hash collision, it will cause misleading stats to be printed out.

Definition at line 1033 of file nnet-example-utils.cc.

Referenced by DiscriminativeExampleMerger::WriteMinibatch(), ChainExampleMerger::WriteMinibatch(), and ExampleMerger::WriteMinibatch().

1035  {
1036  std::pair<int32, size_t> p(example_size, structure_hash);
1037 
1038 
1039  unordered_map<int32, int32> &h = stats_[p].minibatch_to_num_written;
1040  unordered_map<int32, int32>::iterator iter = h.find(minibatch_size);
1041  if (iter == h.end())
1042  h[minibatch_size] = 1;
1043  else
1044  iter->second += 1;
1045 }

Member Data Documentation

StatsType stats_
private

Definition at line 468 of file nnet-example-utils.h.


The documentation for this class was generated from the following files: