Go to the source code of this file.
 | 
|    | kaldi | 
|   | This code computes Goodness of Pronunciation (GOP) and extracts phone-level pronunciation feature for mispronunciations detection tasks, the reference: 
  | 
|   | 
|    | kaldi::nnet3 | 
|   | 
◆ main()
      
        
          | int main  | 
          ( | 
          int  | 
          argc,  | 
        
        
           | 
           | 
          char *  | 
          argv[]  | 
        
        
           | 
          ) | 
           |  | 
        
      
 
Definition at line 331 of file nnet3-egs-augment-image.cc.
References SequentialTableReader< Holder >::Done(), ParseOptions::GetArg(), KALDI_LOG, SequentialTableReader< Holder >::Key(), SequentialTableReader< Holder >::Next(), ParseOptions::NumArgs(), kaldi::nnet3::PerturbImageInNnetExample(), ParseOptions::PrintUsage(), ParseOptions::Read(), ImageAugmentationConfig::Register(), ParseOptions::Register(), SequentialTableReader< Holder >::Value(), and TableWriter< Holder >::Write().
  333     using namespace kaldi;
   336     typedef kaldi::int64 int64;
   339         "Copy examples (single frames or fixed-size groups of frames) for neural\n"   340         "network training, doing image augmentation inline (copies after possibly\n"   341         "modifying of each image, randomly chosen according to configuration\n"   344         "  nnet3-egs-augment-image --horizontal-flip-prob=0.5 --horizontal-shift=0.1\\\n"   345         "       --vertical-shift=0.1 --srand=103 --num-channels=3 --fill-mode=nearest ark:- ark:-\n"   347         "Requires that each eg contain a NnetIo object 'input', with successive\n"   348         "'t' values representing different x offsets , and the feature dimension\n"   349         "representing the y offset and the channel (color), with the channel\n"   350         "varying the fastest.\n"   351         "See also: nnet3-copy-egs\n";
   354     int32 srand_seed = 0;
   359     po.Register(
"srand", &srand_seed, 
"Seed for the random number generator");
   367     if (po.NumArgs() < 2) {
   373     std::string examples_rspecifier = po.GetArg(1),
   374         examples_wspecifier = po.GetArg(2);
   381     for (; !example_reader.Done(); example_reader.Next(), num_done++) {
   382       std::string key = example_reader.Key();
   385       example_writer.Write(key, eg);
   387     KALDI_LOG << 
"Perturbed " << num_done << 
" neural-network training images.";
   388     return (num_done == 0 ? 1 : 0);
   389   } 
catch(
const std::exception &e) {
   390     std::cerr << e.what() << 
'\n';
 NnetExample is the input data and corresponding label (or labels) for one or more frames of input...
 
This code computes Goodness of Pronunciation (GOP) and extracts phone-level pronunciation feature for...
 
A templated class for writing objects to an archive or script file; see The Table concept...
 
void PerturbImageInNnetExample(const ImageAugmentationConfig &config, NnetExample *eg)
This function does image perturbation as directed by 'config' The example 'eg' is expected to contain...
 
The class ParseOptions is for parsing command-line options; see Parsing command-line options for more...
 
A templated class for reading objects sequentially from an archive or script file; see The Table conc...
 
void Register(ParseOptions *po)