All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
ExampleMerger Class Reference

This class is responsible for arranging examples in groups that have the same strucure (i.e. More...

#include <nnet-example-utils.h>

Collaboration diagram for ExampleMerger:

Public Member Functions

 ExampleMerger (const ExampleMergingConfig &config, NnetExampleWriter *writer)
 
void AcceptExample (NnetExample *a)
 
void Finish ()
 
int32 ExitStatus ()
 
 ~ExampleMerger ()
 

Private Types

typedef unordered_map
< NnetExample *, std::vector
< NnetExample * >
, NnetExampleStructureHasher,
NnetExampleStructureCompare
MapType
 

Private Member Functions

void WriteMinibatch (const std::vector< NnetExample > &egs)
 

Private Attributes

bool finished_
 
int32 num_egs_written_
 
const ExampleMergingConfigconfig_
 
NnetExampleWriterwriter_
 
ExampleMergingStats stats_
 
MapType eg_to_egs_
 

Detailed Description

This class is responsible for arranging examples in groups that have the same strucure (i.e.

the same input and output indexes), and outputting them in suitable minibatches as defined by ExampleMergingConfig.

Definition at line 480 of file nnet-example-utils.h.

Member Typedef Documentation

typedef unordered_map<NnetExample*, std::vector<NnetExample*>, NnetExampleStructureHasher, NnetExampleStructureCompare> MapType
private

Definition at line 515 of file nnet-example-utils.h.

Constructor & Destructor Documentation

ExampleMerger ( const ExampleMergingConfig config,
NnetExampleWriter writer 
)

Definition at line 1170 of file nnet-example-utils.cc.

1171  :
1172  finished_(false), num_egs_written_(0),
1173  config_(config), writer_(writer) { }
const ExampleMergingConfig & config_
~ExampleMerger ( )
inline

Definition at line 500 of file nnet-example-utils.h.

References ExampleMerger::Finish().

Member Function Documentation

void AcceptExample ( NnetExample a)

Definition at line 1176 of file nnet-example-utils.cc.

References ExampleMerger::config_, ExampleMerger::eg_to_egs_, ExampleMerger::finished_, kaldi::nnet3::GetNnetExampleSize(), rnnlm::i, KALDI_ASSERT, ExampleMergingConfig::MinibatchSize(), and ExampleMerger::WriteMinibatch().

Referenced by main().

1176  {
1178  // If an eg with the same structure as 'eg' is already a key in the
1179  // map, it won't be replaced, but if it's new it will be made
1180  // the key. Also we remove the key before making the vector empty.
1181  // This way we ensure that the eg in the key is always the first
1182  // element of the vector.
1183  std::vector<NnetExample*> &vec = eg_to_egs_[eg];
1184  vec.push_back(eg);
1185  int32 eg_size = GetNnetExampleSize(*eg),
1186  num_available = vec.size();
1187  bool input_ended = false;
1188  int32 minibatch_size = config_.MinibatchSize(eg_size, num_available,
1189  input_ended);
1190  if (minibatch_size != 0) { // we need to write out a merged eg.
1191  KALDI_ASSERT(minibatch_size == num_available);
1192 
1193  std::vector<NnetExample*> vec_copy(vec);
1194  eg_to_egs_.erase(eg);
1195 
1196  // MergeExamples() expects a vector of NnetExample, not of pointers,
1197  // so use swap to create that without doing any real work.
1198  std::vector<NnetExample> egs_to_merge(minibatch_size);
1199  for (int32 i = 0; i < minibatch_size; i++) {
1200  egs_to_merge[i].Swap(vec_copy[i]);
1201  delete vec_copy[i]; // we owned those pointers.
1202  }
1203  WriteMinibatch(egs_to_merge);
1204  }
1205 }
void WriteMinibatch(const std::vector< NnetExample > &egs)
const ExampleMergingConfig & config_
int32 GetNnetExampleSize(const NnetExample &a)
This function returns the 'size' of a nnet-example as defined for purposes of merging egs...
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
int32 MinibatchSize(int32 size_of_eg, int32 num_available_egs, bool input_ended) const
This function tells you what minibatch size should be used for this eg.
int32 ExitStatus ( )
inline

Definition at line 498 of file nnet-example-utils.h.

References ExampleMerger::Finish(), and ExampleMerger::num_egs_written_.

Referenced by main().

void Finish ( )

Definition at line 1221 of file nnet-example-utils.cc.

References ExampleMerger::config_, ExampleMergingStats::DiscardedExamples(), ExampleMerger::eg_to_egs_, ExampleMerger::finished_, kaldi::nnet3::GetNnetExampleSize(), rnnlm::i, KALDI_ASSERT, ExampleMergingConfig::MinibatchSize(), ExampleMergingStats::PrintStats(), ExampleMerger::stats_, and ExampleMerger::WriteMinibatch().

Referenced by ExampleMerger::ExitStatus(), main(), and ExampleMerger::~ExampleMerger().

1221  {
1222  if (finished_) return; // already finished.
1223  finished_ = true;
1224 
1225  // we'll convert the map eg_to_egs_ to a vector of vectors to avoid
1226  // iterator invalidation problems.
1227  std::vector<std::vector<NnetExample*> > all_egs;
1228  all_egs.reserve(eg_to_egs_.size());
1229 
1230  MapType::iterator iter = eg_to_egs_.begin(), end = eg_to_egs_.end();
1231  for (; iter != end; ++iter)
1232  all_egs.push_back(iter->second);
1233  eg_to_egs_.clear();
1234 
1235  for (size_t i = 0; i < all_egs.size(); i++) {
1236  int32 minibatch_size;
1237  std::vector<NnetExample*> &vec = all_egs[i];
1238  KALDI_ASSERT(!vec.empty());
1239  int32 eg_size = GetNnetExampleSize(*(vec[0]));
1240  bool input_ended = true;
1241  while (!vec.empty() &&
1242  (minibatch_size = config_.MinibatchSize(eg_size, vec.size(),
1243  input_ended)) != 0) {
1244  // MergeExamples() expects a vector of NnetExample, not of pointers,
1245  // so use swap to create that without doing any real work.
1246  std::vector<NnetExample> egs_to_merge(minibatch_size);
1247  for (int32 i = 0; i < minibatch_size; i++) {
1248  egs_to_merge[i].Swap(vec[i]);
1249  delete vec[i]; // we owned those pointers.
1250  }
1251  vec.erase(vec.begin(), vec.begin() + minibatch_size);
1252  WriteMinibatch(egs_to_merge);
1253  }
1254  if (!vec.empty()) {
1255  int32 eg_size = GetNnetExampleSize(*(vec[0]));
1256  NnetExampleStructureHasher eg_hasher;
1257  size_t structure_hash = eg_hasher(*(vec[0]));
1258  int32 num_discarded = vec.size();
1259  stats_.DiscardedExamples(eg_size, structure_hash, num_discarded);
1260  for (int32 i = 0; i < num_discarded; i++)
1261  delete vec[i];
1262  vec.clear();
1263  }
1264  }
1265  stats_.PrintStats();
1266 }
void DiscardedExamples(int32 example_size, size_t structure_hash, int32 num_discarded)
Users call this function to inform this class that after processing all the data, for examples of ori...
void WriteMinibatch(const std::vector< NnetExample > &egs)
const ExampleMergingConfig & config_
void PrintStats() const
Calling this will cause a log message with information about the examples to be printed.
int32 GetNnetExampleSize(const NnetExample &a)
This function returns the 'size' of a nnet-example as defined for purposes of merging egs...
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
int32 MinibatchSize(int32 size_of_eg, int32 num_available_egs, bool input_ended) const
This function tells you what minibatch size should be used for this eg.
void WriteMinibatch ( const std::vector< NnetExample > &  egs)
private

Definition at line 1207 of file nnet-example-utils.cc.

References ExampleMergingConfig::compress, ExampleMerger::config_, kaldi::nnet3::GetNnetExampleSize(), KALDI_ASSERT, kaldi::nnet3::MergeExamples(), ExampleMerger::num_egs_written_, ExampleMerger::stats_, TableWriter< Holder >::Write(), ExampleMerger::writer_, and ExampleMergingStats::WroteExample().

Referenced by ExampleMerger::AcceptExample(), and ExampleMerger::Finish().

1207  {
1208  KALDI_ASSERT(!egs.empty());
1209  int32 eg_size = GetNnetExampleSize(egs[0]);
1210  NnetExampleStructureHasher eg_hasher;
1211  size_t structure_hash = eg_hasher(egs[0]);
1212  int32 minibatch_size = egs.size();
1213  stats_.WroteExample(eg_size, structure_hash, minibatch_size);
1214  NnetExample merged_eg;
1215  MergeExamples(egs, config_.compress, &merged_eg);
1216  std::ostringstream key;
1217  key << "merged-" << (num_egs_written_++) << "-" << minibatch_size;
1218  writer_->Write(key.str(), merged_eg);
1219 }
void Write(const std::string &key, const T &value) const
const ExampleMergingConfig & config_
int32 GetNnetExampleSize(const NnetExample &a)
This function returns the 'size' of a nnet-example as defined for purposes of merging egs...
void WroteExample(int32 example_size, size_t structure_hash, int32 minibatch_size)
Users call this function to inform this class that one minibatch has been written aggregating 'miniba...
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void MergeExamples(const std::vector< NnetExample > &src, bool compress, NnetExample *merged_eg)
Merge a set of input examples into a single example (typically the size of "src" will be the minibatc...

Member Data Documentation

MapType eg_to_egs_
private

Definition at line 516 of file nnet-example-utils.h.

Referenced by ExampleMerger::AcceptExample(), and ExampleMerger::Finish().

bool finished_
private

Definition at line 506 of file nnet-example-utils.h.

Referenced by ExampleMerger::AcceptExample(), and ExampleMerger::Finish().

int32 num_egs_written_
private
ExampleMergingStats stats_
private

Definition at line 510 of file nnet-example-utils.h.

Referenced by ExampleMerger::Finish(), and ExampleMerger::WriteMinibatch().

NnetExampleWriter* writer_
private

Definition at line 509 of file nnet-example-utils.h.

Referenced by ExampleMerger::WriteMinibatch().


The documentation for this class was generated from the following files: