Go to the source code of this file.
|
| kaldi |
| This code computes Goodness of Pronunciation (GOP) and extracts phone-level pronunciation feature for mispronunciations detection tasks, the reference:
|
|
| kaldi::nnet3 |
|
◆ main()
int main |
( |
int |
argc, |
|
|
char * |
argv[] |
|
) |
| |
Definition at line 331 of file nnet3-egs-augment-image.cc.
References SequentialTableReader< Holder >::Done(), ParseOptions::GetArg(), KALDI_LOG, SequentialTableReader< Holder >::Key(), SequentialTableReader< Holder >::Next(), ParseOptions::NumArgs(), kaldi::nnet3::PerturbImageInNnetExample(), ParseOptions::PrintUsage(), ParseOptions::Read(), ImageAugmentationConfig::Register(), ParseOptions::Register(), SequentialTableReader< Holder >::Value(), and TableWriter< Holder >::Write().
333 using namespace kaldi;
336 typedef kaldi::int64 int64;
339 "Copy examples (single frames or fixed-size groups of frames) for neural\n" 340 "network training, doing image augmentation inline (copies after possibly\n" 341 "modifying of each image, randomly chosen according to configuration\n" 344 " nnet3-egs-augment-image --horizontal-flip-prob=0.5 --horizontal-shift=0.1\\\n" 345 " --vertical-shift=0.1 --srand=103 --num-channels=3 --fill-mode=nearest ark:- ark:-\n" 347 "Requires that each eg contain a NnetIo object 'input', with successive\n" 348 "'t' values representing different x offsets , and the feature dimension\n" 349 "representing the y offset and the channel (color), with the channel\n" 350 "varying the fastest.\n" 351 "See also: nnet3-copy-egs\n";
354 int32 srand_seed = 0;
359 po.Register(
"srand", &srand_seed,
"Seed for the random number generator");
367 if (po.NumArgs() < 2) {
373 std::string examples_rspecifier = po.GetArg(1),
374 examples_wspecifier = po.GetArg(2);
381 for (; !example_reader.Done(); example_reader.Next(), num_done++) {
382 std::string key = example_reader.Key();
385 example_writer.Write(key, eg);
387 KALDI_LOG <<
"Perturbed " << num_done <<
" neural-network training images.";
388 return (num_done == 0 ? 1 : 0);
389 }
catch(
const std::exception &e) {
390 std::cerr << e.what() <<
'\n';
NnetExample is the input data and corresponding label (or labels) for one or more frames of input...
This code computes Goodness of Pronunciation (GOP) and extracts phone-level pronunciation feature for...
A templated class for writing objects to an archive or script file; see The Table concept...
void PerturbImageInNnetExample(const ImageAugmentationConfig &config, NnetExample *eg)
This function does image perturbation as directed by 'config' The example 'eg' is expected to contain...
The class ParseOptions is for parsing command-line options; see Parsing command-line options for more...
A templated class for reading objects sequentially from an archive or script file; see The Table conc...
void Register(ParseOptions *po)