nnet3-egs-augment-image.cc File Reference
Include dependency graph for nnet3-egs-augment-image.cc:

Go to the source code of this file.

Classes

struct  ImageAugmentationConfig
 

Namespaces

 kaldi
 This code computes Goodness of Pronunciation (GOP) and extracts phone-level pronunciation feature for mispronunciations detection tasks, the reference:
 
 kaldi::nnet3
 

Enumerations

enum  FillMode { kNearest, kReflect }
 

Functions

void ApplyAffineTransform (MatrixBase< BaseFloat > &transform, int32 num_channels, MatrixBase< BaseFloat > *image, FillMode fill_mode)
 This function applies a geometric transformation 'transform' to the image. More...
 
void PerturbImage (const ImageAugmentationConfig &config, MatrixBase< BaseFloat > *image)
 This function randomly modifies (perturbs) the image by applying different geometric transformations according to the options in 'config'. More...
 
void PerturbImageInNnetExample (const ImageAugmentationConfig &config, NnetExample *eg)
 This function does image perturbation as directed by 'config' The example 'eg' is expected to contain a NnetIo member with the name 'input', representing an image. More...
 
int main (int argc, char *argv[])
 

Function Documentation

◆ main()

int main ( int  argc,
char *  argv[] 
)

Definition at line 331 of file nnet3-egs-augment-image.cc.

References SequentialTableReader< Holder >::Done(), ParseOptions::GetArg(), KALDI_LOG, SequentialTableReader< Holder >::Key(), SequentialTableReader< Holder >::Next(), ParseOptions::NumArgs(), kaldi::nnet3::PerturbImageInNnetExample(), ParseOptions::PrintUsage(), ParseOptions::Read(), ImageAugmentationConfig::Register(), ParseOptions::Register(), SequentialTableReader< Holder >::Value(), and TableWriter< Holder >::Write().

331  {
332  try {
333  using namespace kaldi;
334  using namespace kaldi::nnet3;
335  typedef kaldi::int32 int32;
336  typedef kaldi::int64 int64;
337 
338  const char *usage =
339  "Copy examples (single frames or fixed-size groups of frames) for neural\n"
340  "network training, doing image augmentation inline (copies after possibly\n"
341  "modifying of each image, randomly chosen according to configuration\n"
342  "parameters).\n"
343  "E.g.:\n"
344  " nnet3-egs-augment-image --horizontal-flip-prob=0.5 --horizontal-shift=0.1\\\n"
345  " --vertical-shift=0.1 --srand=103 --num-channels=3 --fill-mode=nearest ark:- ark:-\n"
346  "\n"
347  "Requires that each eg contain a NnetIo object 'input', with successive\n"
348  "'t' values representing different x offsets , and the feature dimension\n"
349  "representing the y offset and the channel (color), with the channel\n"
350  "varying the fastest.\n"
351  "See also: nnet3-copy-egs\n";
352 
353 
354  int32 srand_seed = 0;
355 
357 
358  ParseOptions po(usage);
359  po.Register("srand", &srand_seed, "Seed for the random number generator");
360 
361  config.Register(&po);
362 
363  po.Read(argc, argv);
364 
365  srand(srand_seed);
366 
367  if (po.NumArgs() < 2) {
368  po.PrintUsage();
369  exit(1);
370  }
371 
372 
373  std::string examples_rspecifier = po.GetArg(1),
374  examples_wspecifier = po.GetArg(2);
375 
376  SequentialNnetExampleReader example_reader(examples_rspecifier);
377  NnetExampleWriter example_writer(examples_wspecifier);
378 
379 
380  int64 num_done = 0;
381  for (; !example_reader.Done(); example_reader.Next(), num_done++) {
382  std::string key = example_reader.Key();
383  NnetExample eg(example_reader.Value());
384  PerturbImageInNnetExample(config, &eg);
385  example_writer.Write(key, eg);
386  }
387  KALDI_LOG << "Perturbed " << num_done << " neural-network training images.";
388  return (num_done == 0 ? 1 : 0);
389  } catch(const std::exception &e) {
390  std::cerr << e.what() << '\n';
391  return -1;
392  }
393 }
NnetExample is the input data and corresponding label (or labels) for one or more frames of input...
Definition: nnet-example.h:111
This code computes Goodness of Pronunciation (GOP) and extracts phone-level pronunciation feature for...
Definition: chain.dox:20
A templated class for writing objects to an archive or script file; see The Table concept...
Definition: kaldi-table.h:368
kaldi::int32 int32
void PerturbImageInNnetExample(const ImageAugmentationConfig &config, NnetExample *eg)
This function does image perturbation as directed by &#39;config&#39; The example &#39;eg&#39; is expected to contain...
The class ParseOptions is for parsing command-line options; see Parsing command-line options for more...
Definition: parse-options.h:36
A templated class for reading objects sequentially from an archive or script file; see The Table conc...
Definition: kaldi-table.h:287
#define KALDI_LOG
Definition: kaldi-error.h:153