All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
ExampleMergingConfig Class Reference

#include <nnet-example-utils.h>

Collaboration diagram for ExampleMergingConfig:

Classes

struct  IntSet
 

Public Member Functions

 ExampleMergingConfig (const char *default_minibatch_size="256")
 
void Register (OptionsItf *po)
 
void ComputeDerived ()
 
int32 MinibatchSize (int32 size_of_eg, int32 num_available_egs, bool input_ended) const
 This function tells you what minibatch size should be used for this eg. More...
 

Public Attributes

bool compress
 
std::string measure_output_frames
 
std::string minibatch_size
 
std::string discard_partial_minibatches
 

Static Private Member Functions

static bool ParseIntSet (const std::string &str, IntSet *int_set)
 

Private Attributes

std::vector< std::pair< int32,
IntSet > > 
rules
 

Detailed Description

Definition at line 321 of file nnet-example-utils.h.

Constructor & Destructor Documentation

ExampleMergingConfig ( const char *  default_minibatch_size = "256")
inline

Definition at line 329 of file nnet-example-utils.h.

329  :
330  compress(false),
331  measure_output_frames("deprecated"),
332  minibatch_size(default_minibatch_size),
333  discard_partial_minibatches("deprecated") { }

Member Function Documentation

void ComputeDerived ( )

Definition at line 940 of file nnet-example-utils.cc.

References kaldi::ConvertStringToInteger(), ExampleMergingConfig::discard_partial_minibatches, rnnlm::i, kaldi::IsSortedAndUniq(), KALDI_ERR, KALDI_WARN, ExampleMergingConfig::measure_output_frames, ExampleMergingConfig::minibatch_size, ExampleMergingConfig::ParseIntSet(), ExampleMergingConfig::rules, and kaldi::SplitStringToVector().

Referenced by main().

940  {
941  if (measure_output_frames != "deprecated") {
942  KALDI_WARN << "The --measure-output-frames option is deprecated "
943  "and will be ignored.";
944  }
945  if (discard_partial_minibatches != "deprecated") {
946  KALDI_WARN << "The --discard-partial-minibatches option is deprecated "
947  "and will be ignored.";
948  }
949  std::vector<std::string> minibatch_size_split;
950  SplitStringToVector(minibatch_size, "/", false, &minibatch_size_split);
951  if (minibatch_size_split.empty()) {
952  KALDI_ERR << "Invalid option --minibatch-size=" << minibatch_size;
953  }
954 
955  rules.resize(minibatch_size_split.size());
956  for (size_t i = 0; i < minibatch_size_split.size(); i++) {
957  int32 &eg_size = rules[i].first;
958  IntSet &int_set = rules[i].second;
959  // 'this_rule' will be either something like "256" or like "64-128,256"
960  // (but these two only if minibatch_size_split.size() == 1, or something with
961  // an example-size specified, like "256=64-128,256"
962  std::string &this_rule = minibatch_size_split[i];
963  if (this_rule.find('=') != std::string::npos) {
964  std::vector<std::string> rule_split; // split on '='
965  SplitStringToVector(this_rule, "=", false, &rule_split);
966  if (rule_split.size() != 2) {
967  KALDI_ERR << "Could not parse option --minibatch-size="
968  << minibatch_size;
969  }
970  if (!ConvertStringToInteger(rule_split[0], &eg_size) ||
971  !ParseIntSet(rule_split[1], &int_set))
972  KALDI_ERR << "Could not parse option --minibatch-size="
973  << minibatch_size;
974 
975  } else {
976  if (minibatch_size_split.size() != 1) {
977  KALDI_ERR << "Could not parse option --minibatch-size="
978  << minibatch_size << " (all rules must have "
979  << "eg-size specified if >1 rule)";
980  }
981  if (!ParseIntSet(this_rule, &int_set))
982  KALDI_ERR << "Could not parse option --minibatch-size="
983  << minibatch_size;
984  }
985  }
986  {
987  // check that no size is repeated.
988  std::vector<int32> all_sizes(minibatch_size_split.size());
989  for (size_t i = 0; i < minibatch_size_split.size(); i++)
990  all_sizes[i] = rules[i].first;
991  std::sort(all_sizes.begin(), all_sizes.end());
992  if (!IsSortedAndUniq(all_sizes)) {
993  KALDI_ERR << "Invalid --minibatch-size=" << minibatch_size
994  << " (repeated example-sizes)";
995  }
996  }
997 }
bool ConvertStringToInteger(const std::string &str, Int *out)
Converts a string into an integer via strtoll and returns false if there was any kind of problem (i...
Definition: text-utils.h:118
std::vector< std::pair< int32, IntSet > > rules
static bool ParseIntSet(const std::string &str, IntSet *int_set)
void SplitStringToVector(const std::string &full, const char *delim, bool omit_empty_strings, std::vector< std::string > *out)
Split a string using any of the single character delimiters.
Definition: text-utils.cc:63
#define KALDI_ERR
Definition: kaldi-error.h:127
#define KALDI_WARN
Definition: kaldi-error.h:130
bool IsSortedAndUniq(const std::vector< T > &vec)
Returns true if the vector is sorted and contains each element only once.
Definition: stl-utils.h:63
int32 MinibatchSize ( int32  size_of_eg,
int32  num_available_egs,
bool  input_ended 
) const

This function tells you what minibatch size should be used for this eg.

Parameters
[in]size_of_egThe "size" of the eg, as obtained by GetNnetExampleSize() or a similar function (up to the caller).
[in]num_available_egsThe number of egs of this size that are currently available; should be >0. The value returned will be <= this value, possibly zero.
[in]input_endedTrue if the input has ended, false otherwise. This is important because before the input has ended, we will only batch egs into the largest possible minibatch size among the range allowed for that size of eg.
Returns
Returns the minibatch size to use in this situation, as specified by the configuration.

Definition at line 999 of file nnet-example-utils.cc.

References rnnlm::i, KALDI_ASSERT, KALDI_ERR, and ExampleMergingConfig::rules.

Referenced by DiscriminativeExampleMerger::AcceptExample(), ChainExampleMerger::AcceptExample(), ExampleMerger::AcceptExample(), DiscriminativeExampleMerger::Finish(), ChainExampleMerger::Finish(), and ExampleMerger::Finish().

1001  {
1002  KALDI_ASSERT(num_available_egs > 0 && size_of_eg > 0);
1003  int32 num_rules = rules.size();
1004  if (num_rules == 0)
1005  KALDI_ERR << "You need to call ComputeDerived() before calling "
1006  "MinibatchSize().";
1007  int32 min_distance = std::numeric_limits<int32>::max(),
1008  closest_rule_index = 0;
1009  for (int32 i = 0; i < num_rules; i++) {
1010  int32 distance = std::abs(size_of_eg - rules[i].first);
1011  if (distance < min_distance) {
1012  min_distance = distance;
1013  closest_rule_index = i;
1014  }
1015  }
1016  if (!input_ended) {
1017  // until the input ends, we can only use the largest available
1018  // minibatch-size (otherwise, we could expect more later).
1019  int32 largest_size = rules[closest_rule_index].second.largest_size;
1020  if (largest_size <= num_available_egs)
1021  return largest_size;
1022  else
1023  return 0;
1024  } else {
1025  int32 s = rules[closest_rule_index].second.LargestValueInRange(
1026  num_available_egs);
1027  KALDI_ASSERT(s <= num_available_egs);
1028  return s;
1029  }
1030 }
std::vector< std::pair< int32, IntSet > > rules
#define KALDI_ERR
Definition: kaldi-error.h:127
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
bool ParseIntSet ( const std::string &  str,
ExampleMergingConfig::IntSet int_set 
)
staticprivate

Definition at line 918 of file nnet-example-utils.cc.

References rnnlm::i, ExampleMergingConfig::IntSet::largest_size, ExampleMergingConfig::IntSet::ranges, kaldi::SplitStringToIntegers(), and kaldi::SplitStringToVector().

Referenced by ExampleMergingConfig::ComputeDerived().

919  {
920  std::vector<std::string> split_str;
921  SplitStringToVector(str, ",", false, &split_str);
922  if (split_str.empty())
923  return false;
924  int_set->largest_size = 0;
925  int_set->ranges.resize(split_str.size());
926  for (size_t i = 0; i < split_str.size(); i++) {
927  std::vector<int32> split_range;
928  SplitStringToIntegers(split_str[i], ":", false, &split_range);
929  if (split_range.size() < 1 || split_range.size() > 2 ||
930  split_range[0] > split_range.back() || split_range[0] <= 0)
931  return false;
932  int_set->ranges[i].first = split_range[0];
933  int_set->ranges[i].second = split_range.back();
934  int_set->largest_size = std::max<int32>(int_set->largest_size,
935  split_range.back());
936  }
937  return true;
938 }
bool SplitStringToIntegers(const std::string &full, const char *delim, bool omit_empty_strings, std::vector< I > *out)
Split a string (e.g.
Definition: text-utils.h:68
void SplitStringToVector(const std::string &full, const char *delim, bool omit_empty_strings, std::vector< std::string > *out)
Split a string using any of the single character delimiters.
Definition: text-utils.cc:63
void Register ( OptionsItf po)
inline

Definition at line 335 of file nnet-example-utils.h.

References OptionsItf::Register().

Referenced by main().

335  {
336  po->Register("compress", &compress, "If true, compress the output examples "
337  "(not recommended unless you are writing to disk)");
338  po->Register("measure-output-frames", &measure_output_frames, "This "
339  "value will be ignored (included for back-compatibility)");
340  po->Register("discard-partial-minibatches", &discard_partial_minibatches,
341  "This value will be ignored (included for back-compatibility)");
342  po->Register("minibatch-size", &minibatch_size,
343  "String controlling the minibatch size. May be just an integer, "
344  "meaning a fixed minibatch size (e.g. --minibatch-size=128). "
345  "May be a list of ranges and values, e.g. --minibatch-size=32,64 "
346  "or --minibatch-size=16:32,64,128. All minibatches will be of "
347  "the largest size until the end of the input is reached; "
348  "then, increasingly smaller sizes will be allowed. Only egs "
349  "with the same structure (e.g num-frames) are merged. You may "
350  "specify different minibatch sizes for different sizes of eg "
351  "(defined as the maximum number of Indexes on any input), in "
352  "the format "
353  "--minibatch-size='eg_size1=mb_sizes1/eg_size2=mb_sizes2', e.g. "
354  "--minibatch-size=128=64:128,256/256=32:64,128. Egs are given "
355  "minibatch-sizes based on the specified eg-size closest to "
356  "their actual size.");
357  }

Member Data Documentation

std::string discard_partial_minibatches

Definition at line 327 of file nnet-example-utils.h.

Referenced by ExampleMergingConfig::ComputeDerived().

std::string measure_output_frames

Definition at line 325 of file nnet-example-utils.h.

Referenced by ExampleMergingConfig::ComputeDerived().

std::string minibatch_size

Definition at line 326 of file nnet-example-utils.h.

Referenced by ExampleMergingConfig::ComputeDerived().

std::vector<std::pair<int32, IntSet> > rules
private

The documentation for this class was generated from the following files: