ChainExampleMerger Class Reference

This class is responsible for arranging examples in groups that have the same strucure (i.e. More...

#include <nnet-chain-example.h>

Collaboration diagram for ChainExampleMerger:

Public Member Functions

 ChainExampleMerger (const ExampleMergingConfig &config, NnetChainExampleWriter *writer)
 
void AcceptExample (NnetChainExample *a)
 
void Finish ()
 
int32 ExitStatus ()
 
 ~ChainExampleMerger ()
 

Private Types

typedef unordered_map< NnetChainExample *, std::vector< NnetChainExample * >, NnetChainExampleStructureHasher, NnetChainExampleStructureCompareMapType
 

Private Member Functions

void WriteMinibatch (std::vector< NnetChainExample > *egs)
 

Private Attributes

bool finished_
 
int32 num_egs_written_
 
const ExampleMergingConfigconfig_
 
NnetChainExampleWriterwriter_
 
ExampleMergingStats stats_
 
MapType eg_to_egs_
 

Detailed Description

This class is responsible for arranging examples in groups that have the same strucure (i.e.

the same input and output indexes), and outputting them in suitable minibatches as defined by ExampleMergingConfig.

Definition at line 234 of file nnet-chain-example.h.

Member Typedef Documentation

◆ MapType

Definition at line 272 of file nnet-chain-example.h.

Constructor & Destructor Documentation

◆ ChainExampleMerger()

Definition at line 453 of file nnet-chain-example.cc.

454  :
455  finished_(false), num_egs_written_(0),
456  config_(config), writer_(writer) { }
const ExampleMergingConfig & config_
NnetChainExampleWriter * writer_

◆ ~ChainExampleMerger()

~ChainExampleMerger ( )
inline

Definition at line 254 of file nnet-chain-example.h.

Member Function Documentation

◆ AcceptExample()

void AcceptExample ( NnetChainExample a)

Definition at line 459 of file nnet-chain-example.cc.

References ChainExampleMerger::config_, ChainExampleMerger::eg_to_egs_, ChainExampleMerger::finished_, kaldi::nnet3::GetNnetChainExampleSize(), rnnlm::i, KALDI_ASSERT, ExampleMergingConfig::MinibatchSize(), and ChainExampleMerger::WriteMinibatch().

459  {
461  // If an eg with the same structure as 'eg' is already a key in the
462  // map, it won't be replaced, but if it's new it will be made
463  // the key. Also we remove the key before making the vector empty.
464  // This way we ensure that the eg in the key is always the first
465  // element of the vector.
466  std::vector<NnetChainExample*> &vec = eg_to_egs_[eg];
467  vec.push_back(eg);
468  int32 eg_size = GetNnetChainExampleSize(*eg),
469  num_available = vec.size();
470  bool input_ended = false;
471  int32 minibatch_size = config_.MinibatchSize(eg_size, num_available,
472  input_ended);
473  if (minibatch_size != 0) { // we need to write out a merged eg.
474  KALDI_ASSERT(minibatch_size == num_available);
475 
476  std::vector<NnetChainExample*> vec_copy(vec);
477  eg_to_egs_.erase(eg);
478 
479  // MergeChainExamples() expects a vector of NnetChainExample, not of pointers,
480  // so use swap to create that without doing any real work.
481  std::vector<NnetChainExample> egs_to_merge(minibatch_size);
482  for (int32 i = 0; i < minibatch_size; i++) {
483  egs_to_merge[i].Swap(vec_copy[i]);
484  delete vec_copy[i]; // we owned those pointers.
485  }
486  WriteMinibatch(&egs_to_merge);
487  }
488 }
int32 MinibatchSize(int32 size_of_eg, int32 num_available_egs, bool input_ended) const
This function tells you what minibatch size should be used for this eg.
kaldi::int32 int32
int32 GetNnetChainExampleSize(const NnetChainExample &a)
const ExampleMergingConfig & config_
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
void WriteMinibatch(std::vector< NnetChainExample > *egs)

◆ ExitStatus()

int32 ExitStatus ( )
inline

Definition at line 252 of file nnet-chain-example.h.

◆ Finish()

void Finish ( )

Definition at line 505 of file nnet-chain-example.cc.

References ChainExampleMerger::config_, ExampleMergingStats::DiscardedExamples(), ChainExampleMerger::eg_to_egs_, ChainExampleMerger::finished_, kaldi::nnet3::GetNnetChainExampleSize(), rnnlm::i, KALDI_ASSERT, ExampleMergingConfig::MinibatchSize(), ExampleMergingStats::PrintStats(), ChainExampleMerger::stats_, and ChainExampleMerger::WriteMinibatch().

505  {
506  if (finished_) return; // already finished.
507  finished_ = true;
508 
509  // we'll convert the map eg_to_egs_ to a vector of vectors to avoid
510  // iterator invalidation problems.
511  std::vector<std::vector<NnetChainExample*> > all_egs;
512  all_egs.reserve(eg_to_egs_.size());
513 
514  MapType::iterator iter = eg_to_egs_.begin(), end = eg_to_egs_.end();
515  for (; iter != end; ++iter)
516  all_egs.push_back(iter->second);
517  eg_to_egs_.clear();
518 
519  for (size_t i = 0; i < all_egs.size(); i++) {
520  int32 minibatch_size;
521  std::vector<NnetChainExample*> &vec = all_egs[i];
522  KALDI_ASSERT(!vec.empty());
523  int32 eg_size = GetNnetChainExampleSize(*(vec[0]));
524  bool input_ended = true;
525  while (!vec.empty() &&
526  (minibatch_size = config_.MinibatchSize(eg_size, vec.size(),
527  input_ended)) != 0) {
528  // MergeChainExamples() expects a vector of
529  // NnetChainExample, not of pointers, so use swap to create that
530  // without doing any real work.
531  std::vector<NnetChainExample> egs_to_merge(minibatch_size);
532  for (int32 i = 0; i < minibatch_size; i++) {
533  egs_to_merge[i].Swap(vec[i]);
534  delete vec[i]; // we owned those pointers.
535  }
536  vec.erase(vec.begin(), vec.begin() + minibatch_size);
537  WriteMinibatch(&egs_to_merge);
538  }
539  if (!vec.empty()) {
540  int32 eg_size = GetNnetChainExampleSize(*(vec[0]));
541  NnetChainExampleStructureHasher eg_hasher;
542  size_t structure_hash = eg_hasher(*(vec[0]));
543  int32 num_discarded = vec.size();
544  stats_.DiscardedExamples(eg_size, structure_hash, num_discarded);
545  for (int32 i = 0; i < num_discarded; i++)
546  delete vec[i];
547  vec.clear();
548  }
549  }
550  stats_.PrintStats();
551 }
void DiscardedExamples(int32 example_size, size_t structure_hash, int32 num_discarded)
Users call this function to inform this class that after processing all the data, for examples of ori...
int32 MinibatchSize(int32 size_of_eg, int32 num_available_egs, bool input_ended) const
This function tells you what minibatch size should be used for this eg.
kaldi::int32 int32
int32 GetNnetChainExampleSize(const NnetChainExample &a)
void PrintStats() const
Calling this will cause a log message with information about the examples to be printed.
const ExampleMergingConfig & config_
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
void WriteMinibatch(std::vector< NnetChainExample > *egs)

◆ WriteMinibatch()

void WriteMinibatch ( std::vector< NnetChainExample > *  egs)
private

Definition at line 490 of file nnet-chain-example.cc.

References ExampleMergingConfig::compress, ChainExampleMerger::config_, kaldi::nnet3::GetNnetChainExampleSize(), KALDI_ASSERT, kaldi::nnet3::MergeChainExamples(), ChainExampleMerger::num_egs_written_, ChainExampleMerger::stats_, TableWriter< Holder >::Write(), ChainExampleMerger::writer_, and ExampleMergingStats::WroteExample().

Referenced by ChainExampleMerger::AcceptExample(), and ChainExampleMerger::Finish().

491  {
492  KALDI_ASSERT(!egs->empty());
493  int32 eg_size = GetNnetChainExampleSize((*egs)[0]);
494  NnetChainExampleStructureHasher eg_hasher;
495  size_t structure_hash = eg_hasher((*egs)[0]);
496  int32 minibatch_size = egs->size();
497  stats_.WroteExample(eg_size, structure_hash, minibatch_size);
498  NnetChainExample merged_eg;
499  MergeChainExamples(config_.compress, egs, &merged_eg);
500  std::ostringstream key;
501  key << "merged-" << (num_egs_written_++) << "-" << minibatch_size;
502  writer_->Write(key.str(), merged_eg);
503 }
void MergeChainExamples(bool compress, std::vector< NnetChainExample > *input, NnetChainExample *output)
This function merges a list of NnetChainExample objects into a single one– intended to be used when ...
kaldi::int32 int32
void Write(const std::string &key, const T &value) const
int32 GetNnetChainExampleSize(const NnetChainExample &a)
void WroteExample(int32 example_size, size_t structure_hash, int32 minibatch_size)
Users call this function to inform this class that one minibatch has been written aggregating &#39;miniba...
const ExampleMergingConfig & config_
NnetChainExampleWriter * writer_
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185

Member Data Documentation

◆ config_

◆ eg_to_egs_

MapType eg_to_egs_
private

◆ finished_

bool finished_
private

◆ num_egs_written_

int32 num_egs_written_
private

Definition at line 263 of file nnet-chain-example.h.

Referenced by ChainExampleMerger::WriteMinibatch().

◆ stats_

◆ writer_

NnetChainExampleWriter* writer_
private

Definition at line 265 of file nnet-chain-example.h.

Referenced by ChainExampleMerger::WriteMinibatch().


The documentation for this class was generated from the following files: