ExampleMerger Class Reference

This class is responsible for arranging examples in groups that have the same strucure (i.e. More...

#include <nnet-example-utils.h>

Collaboration diagram for ExampleMerger:

Public Member Functions

 ExampleMerger (const ExampleMergingConfig &config, NnetExampleWriter *writer)
 
void AcceptExample (NnetExample *a)
 
void Finish ()
 
int32 ExitStatus ()
 
 ~ExampleMerger ()
 

Private Types

typedef unordered_map< NnetExample *, std::vector< NnetExample * >, NnetExampleStructureHasher, NnetExampleStructureCompareMapType
 

Private Member Functions

void WriteMinibatch (const std::vector< NnetExample > &egs)
 

Private Attributes

bool finished_
 
int32 num_egs_written_
 
const ExampleMergingConfigconfig_
 
NnetExampleWriterwriter_
 
ExampleMergingStats stats_
 
MapType eg_to_egs_
 

Detailed Description

This class is responsible for arranging examples in groups that have the same strucure (i.e.

the same input and output indexes), and outputting them in suitable minibatches as defined by ExampleMergingConfig.

Definition at line 480 of file nnet-example-utils.h.

Member Typedef Documentation

◆ MapType

typedef unordered_map<NnetExample*, std::vector<NnetExample*>, NnetExampleStructureHasher, NnetExampleStructureCompare> MapType
private

Definition at line 515 of file nnet-example-utils.h.

Constructor & Destructor Documentation

◆ ExampleMerger()

ExampleMerger ( const ExampleMergingConfig config,
NnetExampleWriter writer 
)

Definition at line 1188 of file nnet-example-utils.cc.

1189  :
1190  finished_(false), num_egs_written_(0),
1191  config_(config), writer_(writer) { }
const ExampleMergingConfig & config_

◆ ~ExampleMerger()

~ExampleMerger ( )
inline

Definition at line 500 of file nnet-example-utils.h.

Member Function Documentation

◆ AcceptExample()

void AcceptExample ( NnetExample a)

Definition at line 1194 of file nnet-example-utils.cc.

References ExampleMerger::config_, ExampleMerger::eg_to_egs_, ExampleMerger::finished_, kaldi::nnet3::GetNnetExampleSize(), rnnlm::i, KALDI_ASSERT, ExampleMergingConfig::MinibatchSize(), and ExampleMerger::WriteMinibatch().

Referenced by main().

1194  {
1196  // If an eg with the same structure as 'eg' is already a key in the
1197  // map, it won't be replaced, but if it's new it will be made
1198  // the key. Also we remove the key before making the vector empty.
1199  // This way we ensure that the eg in the key is always the first
1200  // element of the vector.
1201  std::vector<NnetExample*> &vec = eg_to_egs_[eg];
1202  vec.push_back(eg);
1203  int32 eg_size = GetNnetExampleSize(*eg),
1204  num_available = vec.size();
1205  bool input_ended = false;
1206  int32 minibatch_size = config_.MinibatchSize(eg_size, num_available,
1207  input_ended);
1208  if (minibatch_size != 0) { // we need to write out a merged eg.
1209  KALDI_ASSERT(minibatch_size == num_available);
1210 
1211  std::vector<NnetExample*> vec_copy(vec);
1212  eg_to_egs_.erase(eg);
1213 
1214  // MergeExamples() expects a vector of NnetExample, not of pointers,
1215  // so use swap to create that without doing any real work.
1216  std::vector<NnetExample> egs_to_merge(minibatch_size);
1217  for (int32 i = 0; i < minibatch_size; i++) {
1218  egs_to_merge[i].Swap(vec_copy[i]);
1219  delete vec_copy[i]; // we owned those pointers.
1220  }
1221  WriteMinibatch(egs_to_merge);
1222  }
1223 }
void WriteMinibatch(const std::vector< NnetExample > &egs)
int32 MinibatchSize(int32 size_of_eg, int32 num_available_egs, bool input_ended) const
This function tells you what minibatch size should be used for this eg.
kaldi::int32 int32
const ExampleMergingConfig & config_
int32 GetNnetExampleSize(const NnetExample &a)
This function returns the &#39;size&#39; of a nnet-example as defined for purposes of merging egs...
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185

◆ ExitStatus()

int32 ExitStatus ( )
inline

Definition at line 498 of file nnet-example-utils.h.

Referenced by main().

◆ Finish()

void Finish ( )

Definition at line 1239 of file nnet-example-utils.cc.

References ExampleMerger::config_, ExampleMergingStats::DiscardedExamples(), ExampleMerger::eg_to_egs_, ExampleMerger::finished_, kaldi::nnet3::GetNnetExampleSize(), rnnlm::i, KALDI_ASSERT, ExampleMergingConfig::MinibatchSize(), ExampleMergingStats::PrintStats(), ExampleMerger::stats_, and ExampleMerger::WriteMinibatch().

Referenced by main().

1239  {
1240  if (finished_) return; // already finished.
1241  finished_ = true;
1242 
1243  // we'll convert the map eg_to_egs_ to a vector of vectors to avoid
1244  // iterator invalidation problems.
1245  std::vector<std::vector<NnetExample*> > all_egs;
1246  all_egs.reserve(eg_to_egs_.size());
1247 
1248  MapType::iterator iter = eg_to_egs_.begin(), end = eg_to_egs_.end();
1249  for (; iter != end; ++iter)
1250  all_egs.push_back(iter->second);
1251  eg_to_egs_.clear();
1252 
1253  for (size_t i = 0; i < all_egs.size(); i++) {
1254  int32 minibatch_size;
1255  std::vector<NnetExample*> &vec = all_egs[i];
1256  KALDI_ASSERT(!vec.empty());
1257  int32 eg_size = GetNnetExampleSize(*(vec[0]));
1258  bool input_ended = true;
1259  while (!vec.empty() &&
1260  (minibatch_size = config_.MinibatchSize(eg_size, vec.size(),
1261  input_ended)) != 0) {
1262  // MergeExamples() expects a vector of NnetExample, not of pointers,
1263  // so use swap to create that without doing any real work.
1264  std::vector<NnetExample> egs_to_merge(minibatch_size);
1265  for (int32 i = 0; i < minibatch_size; i++) {
1266  egs_to_merge[i].Swap(vec[i]);
1267  delete vec[i]; // we owned those pointers.
1268  }
1269  vec.erase(vec.begin(), vec.begin() + minibatch_size);
1270  WriteMinibatch(egs_to_merge);
1271  }
1272  if (!vec.empty()) {
1273  int32 eg_size = GetNnetExampleSize(*(vec[0]));
1274  NnetExampleStructureHasher eg_hasher;
1275  size_t structure_hash = eg_hasher(*(vec[0]));
1276  int32 num_discarded = vec.size();
1277  stats_.DiscardedExamples(eg_size, structure_hash, num_discarded);
1278  for (int32 i = 0; i < num_discarded; i++)
1279  delete vec[i];
1280  vec.clear();
1281  }
1282  }
1283  stats_.PrintStats();
1284 }
void DiscardedExamples(int32 example_size, size_t structure_hash, int32 num_discarded)
Users call this function to inform this class that after processing all the data, for examples of ori...
void WriteMinibatch(const std::vector< NnetExample > &egs)
int32 MinibatchSize(int32 size_of_eg, int32 num_available_egs, bool input_ended) const
This function tells you what minibatch size should be used for this eg.
kaldi::int32 int32
const ExampleMergingConfig & config_
void PrintStats() const
Calling this will cause a log message with information about the examples to be printed.
int32 GetNnetExampleSize(const NnetExample &a)
This function returns the &#39;size&#39; of a nnet-example as defined for purposes of merging egs...
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185

◆ WriteMinibatch()

void WriteMinibatch ( const std::vector< NnetExample > &  egs)
private

Definition at line 1225 of file nnet-example-utils.cc.

References ExampleMergingConfig::compress, ExampleMerger::config_, kaldi::nnet3::GetNnetExampleSize(), KALDI_ASSERT, kaldi::nnet3::MergeExamples(), ExampleMerger::num_egs_written_, ExampleMerger::stats_, TableWriter< Holder >::Write(), ExampleMerger::writer_, and ExampleMergingStats::WroteExample().

Referenced by ExampleMerger::AcceptExample(), and ExampleMerger::Finish().

1225  {
1226  KALDI_ASSERT(!egs.empty());
1227  int32 eg_size = GetNnetExampleSize(egs[0]);
1228  NnetExampleStructureHasher eg_hasher;
1229  size_t structure_hash = eg_hasher(egs[0]);
1230  int32 minibatch_size = egs.size();
1231  stats_.WroteExample(eg_size, structure_hash, minibatch_size);
1232  NnetExample merged_eg;
1233  MergeExamples(egs, config_.compress, &merged_eg);
1234  std::ostringstream key;
1235  key << "merged-" << (num_egs_written_++) << "-" << minibatch_size;
1236  writer_->Write(key.str(), merged_eg);
1237 }
kaldi::int32 int32
const ExampleMergingConfig & config_
void Write(const std::string &key, const T &value) const
int32 GetNnetExampleSize(const NnetExample &a)
This function returns the &#39;size&#39; of a nnet-example as defined for purposes of merging egs...
void WroteExample(int32 example_size, size_t structure_hash, int32 minibatch_size)
Users call this function to inform this class that one minibatch has been written aggregating &#39;miniba...
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
void MergeExamples(const std::vector< NnetExample > &src, bool compress, NnetExample *merged_eg)
Merge a set of input examples into a single example (typically the size of "src" will be the minibatc...

Member Data Documentation

◆ config_

◆ eg_to_egs_

MapType eg_to_egs_
private

Definition at line 516 of file nnet-example-utils.h.

Referenced by ExampleMerger::AcceptExample(), and ExampleMerger::Finish().

◆ finished_

bool finished_
private

Definition at line 506 of file nnet-example-utils.h.

Referenced by ExampleMerger::AcceptExample(), and ExampleMerger::Finish().

◆ num_egs_written_

int32 num_egs_written_
private

Definition at line 507 of file nnet-example-utils.h.

Referenced by ExampleMerger::WriteMinibatch().

◆ stats_

ExampleMergingStats stats_
private

Definition at line 510 of file nnet-example-utils.h.

Referenced by ExampleMerger::Finish(), and ExampleMerger::WriteMinibatch().

◆ writer_

NnetExampleWriter* writer_
private

Definition at line 509 of file nnet-example-utils.h.

Referenced by ExampleMerger::WriteMinibatch().


The documentation for this class was generated from the following files: