Kaldi Tools

This page contains a list of all the Kaldi tools, with their brief functions and usage messages.

Tools

Description

align-equal
 Write equally spaced alignments of utterances (to get training started)
Usage:  align-equal <tree-in> <model-in> <lexicon-fst-in> <features-rspecifier> <transcriptions-rspecifier> <alignments-wspecifier>
e.g.: 
 align-equal 1.tree 1.mdl lex.fst scp:train.scp 'ark:sym2int.pl -f 2- words.txt text|' ark:equal.ali 
align-equal-compiled
 Write an equally spaced alignment (for getting training started)Usage:  align-equal-compiled <graphs-rspecifier> <features-rspecifier> <alignments-wspecifier>
e.g.: 
 align-equal-compiled 1.fsts scp:train.scp ark:equal.ali 
acc-tree-stats
 Accumulate statistics for phonetic-context tree building.
Usage:  acc-tree-stats [options] <model-in> <features-rspecifier> <alignments-rspecifier> <tree-accs-out>
e.g.: 
 acc-tree-stats 1.mdl scp:train.scp ark:1.ali 1.tacc 
show-alignments
 Display alignments in human-readable form
Usage:  show-alignments  [options] <phone-syms> <model> <alignments-rspecifier>
e.g.: 
 show-alignments phones.txt 1.mdl ark:1.ali
See also: ali-to-phones, copy-int-vector 
compile-questions
 Compile questions
Usage:  compile-questions [options] <topo> <questions-text-file> <questions-out>
e.g.: 
 compile-questions questions.txt questions.qst 
cluster-phones
 Cluster phones (or sets of phones) into sets for various purposes
Usage:  cluster-phones [options] <tree-stats-in> <phone-sets-in> <clustered-phones-out>
e.g.: 
 cluster-phones 1.tacc phonesets.txt questions.txt 
compute-wer
 Compute WER by comparing different transcriptions
Takes two transcription files, in integer or text format,
and outputs overall WER statistics to standard output.
Usage: compute-wer [options] <ref-rspecifier> <hyp-rspecifier>
E.g.: compute-wer --text --mode=present ark:data/train/text ark:hyp_text
See also: align-text,
Example scoring script: egs/wsj/s5/steps/score_kaldi.sh 
compute-wer-bootci
 Compute a bootstrapping of WER to extract the 95% confidence interval.
Take a reference and a transcription file, in integer or text format,
and outputs overall WER statistics to standard output along with its
confidence interval using the bootstrap method of Bisani and Ney.
If a second transcription file corresponding to the same reference is
provided, a bootstrap comparison of the two transcription is performed
to estimate the probability of improvement.
Usage: compute-wer-bootci [options] <ref-rspecifier> <hyp-rspecifier> [<hyp2-rspecifier>]
E.g.: compute-wer-bootci --mode=present ark:data/train/text ark:hyp_text
or compute-wer-bootci ark:data/train/text ark:hyp_text ark:hyp_text2
See also: compute-wer 
make-h-transducer
 Make H transducer from transition-ids to context-dependent phones, 
 without self-loops [use add-self-loops to add them]
Usage:   make-h-transducer <ilabel-info-file> <tree-file> <transition-gmm/acoustic-model> [<H-fst-out>]
e.g.: 
 make-h-transducer ilabel_info  1.tree 1.mdl > H.fst 
add-self-loops
 Add self-loops and transition probabilities to transducer.  Input transducer
has transition-ids on the input side, but only the forward transitions, not the
self-loops.  Output transducer has transition-ids on the input side, but with
self-loops added.  The --reorder option controls whether the loop is added before
the forward transition (if false), or afterward (if true).  The default (true)
is recommended as the decoding will in that case be faster.
Usage:   add-self-loops [options] transition-gmm/acoustic-model [fst-in] [fst-out]
e.g.: 
 add-self-loops --self-loop-scale=0.1 1.mdl HCLGa.fst HCLG.fst
or:  add-self-loops --self-loop-scale=0.1 1.mdl <HCLGa.fst >HCLG.fst 
convert-ali
 Convert alignments from one decision-tree/model to another
Usage:  convert-ali  [options] <old-model> <new-model> <new-tree> <old-alignments-rspecifier> <new-alignments-wspecifier>
e.g.: 
 convert-ali old/final.mdl new/0.mdl new/tree ark:old/ali.1 ark:new/ali.1 
compile-train-graphs
 Creates training graphs (without transition-probabilities, by default)
Usage:   compile-train-graphs [options] <tree-in> <model-in> <lexicon-fst-in> <transcriptions-rspecifier> <graphs-wspecifier>
e.g.: 
 compile-train-graphs tree 1.mdl lex.fst 'ark:sym2int.pl -f 2- words.txt text|' ark:graphs.fsts 
compile-train-graphs-fsts
 Creates training graphs (without transition-probabilities, by default)
This version takes FSTs as inputs (e.g., representing a separate weighted
grammar for each utterance)
Note: the lexicon should contain disambiguation symbols and you should
supply the --read-disambig-syms option which is the filename of a list
of disambiguation symbols.
Warning: you probably want to set the --transition-scale and --self-loop-scale
options; the defaults (zero) are probably not appropriate.
Usage:   compile-train-graphs-fsts [options] <tree-in> <model-in> <lexicon-fst-in>  <graphs-rspecifier> <graphs-wspecifier>
e.g.: 
 compile-train-graphs-fsts --read-disambig-syms=disambig.list\
   tree 1.mdl lex.fst ark:train.fsts ark:graphs.fsts 
make-pdf-to-tid-transducer
 Make transducer from pdfs to transition-ids
Usage:   make-pdf-to-tid-transducer model-filename [fst-out]
e.g.: 
 make-pdf-to-tid-transducer 1.mdl > pdf2tid.fst 
make-ilabel-transducer
 Make transducer that de-duplicates context-dependent ilabels that map to the same state
Usage:   make-ilabel-transducer ilabel-info-right tree-file transition-gmm/model ilabel-info-left [mapping-fst-out]
e.g.: 
 make-ilabel-transducer old_ilabel_info 1.tree 1.mdl new_ilabel_info > convert.fst 
show-transitions
 Print debugging info from transition model, in human-readable form
Usage:  show-transitions <phones-symbol-table> <transition/model-file> [<occs-file>]
e.g.: 
 show-transitions phones.txt 1.mdl 1.occs 
ali-to-phones
 Convert model-level alignments to phone-sequences (in integer, not text, form)
Usage:  ali-to-phones  [options] <model> <alignments-rspecifier> <phone-transcript-wspecifier|ctm-wxfilename>
e.g.: 
 ali-to-phones 1.mdl ark:1.ali ark:-
or:
 ali-to-phones --ctm-output 1.mdl ark:1.ali 1.ctm
See also: show-alignments lattice-align-phones, compare-int-vector 
ali-to-post
 Convert alignments to posteriors.  This is simply a format change
from integer vectors to Posteriors, which are vectors of lists of
pairs (int, float) where the float represents the posterior.  The
floats would all be 1.0 in this case.
The posteriors will still be in terms of whatever integer index
the input contained, which will be transition-ids if they came
directly from decoding, or pdf-ids if they were processed by
ali-to-post.
Usage:  ali-to-post [options] <alignments-rspecifier> <posteriors-wspecifier>
e.g.:
 ali-to-post ark:1.ali ark:1.post
See also: ali-to-pdf, ali-to-phones, show-alignments, post-to-weights 
weight-silence-post
 Apply weight to silences in posts
Usage:  weight-silence-post [options] <silence-weight> <silence-phones> <model> <posteriors-rspecifier> <posteriors-wspecifier>
e.g.:
 weight-silence-post 0.0 1:2:3 1.mdl ark:1.post ark:nosil.post 
acc-lda
 Accumulate LDA statistics based on pdf-ids.
Usage:  acc-lda [options] <transition-gmm/model> <features-rspecifier> <posteriors-rspecifier> <lda-acc-out>
Typical usage:
 ali-to-post ark:1.ali ark:- | acc-lda 1.mdl "ark:splice-feats scp:train.scp|"  ark:- ldaacc.1 
est-lda
 Estimate LDA transform using stats obtained with acc-lda.
Usage:  est-lda [options] <lda-matrix-out> <lda-acc-1> <lda-acc-2> ... 
ali-to-pdf
 Converts alignments (containing transition-ids) to pdf-ids, zero-based.
Usage:  ali-to-pdf  [options] <model> <alignments-rspecifier> <pdfs-wspecifier>
e.g.: 
 ali-to-pdf 1.mdl ark:1.ali ark,t:- 
est-mllt
 Do update for MLLT (also known as STC)
Usage:  est-mllt [options] <mllt-mat-out> <stats-in1> <stats-in2> ... 
e.g.: est-mllt 2.mat 1a.macc 1b.macc ... 
where the stats are obtained from gmm-acc-mllt
Note: use compose-transforms <mllt-mat-out> <prev-mllt-mat> to combine with previous
  MLLT or LDA transform, if any, and
  gmm-transform-means to apply <mllt-mat-out> to GMM means. 
build-tree
 Train decision tree
Usage:  build-tree [options] <tree-stats-in> <roots-file> <questions-file> <topo-file> <tree-out>
e.g.: 
 build-tree treeacc roots.txt 1.qst topo tree 
build-tree-two-level
 Trains two-level decision tree.  Outputs the larger tree, and a mapping from the
leaf-ids of the larger tree to those of the smaller tree.  Useful, for instance,
in tied-mixture systems with multiple codebooks.
Usage:  build-tree-two-level [options] <tree-stats-in> <roots-file> <questions-file> <topo-file> <tree-out> <mapping-out>
e.g.: 
 build-tree-two-level treeacc roots.txt 1.qst topo tree tree.map 
decode-faster
 Decode, reading log-likelihoods (of transition-ids or whatever symbol is on the graph)
as matrices.  Note: you'll usually want decode-faster-mapped rather than this program.
Usage:   decode-faster [options] <fst-in> <loglikes-rspecifier> <words-wspecifier> [<alignments-wspecifier>] 
decode-faster-mapped
 Decode, reading log-likelihoods as matrices
 (model is needed only for the integer mappings in its transition-model)
Usage:   decode-faster-mapped [options] <model-in> <fst-in> <loglikes-rspecifier> <words-wspecifier> [<alignments-wspecifier>] 
vector-scale
 Scale vectors, or archives of vectors (useful for speaker vectors and per-frame weights)
Usage: vector-scale [options] <vector-in-rspecifier> <vector-out-wspecifier>
   or: vector-scale [options] <vector-in-rxfilename> <vector-out-wxfilename>
 e.g.: vector-scale --scale=-1.0 1.vec -
       vector-scale --scale=-2.0 ark:vec.ark ark,t:-
See also: copy-vector, vector-sum 
copy-transition-model
 Copies a transition model (this can be used to separate transition 
 models from the acoustic models they are written with.
Usage:  copy-transition-model [options] <transition-model or model file> <transition-model-out>
e.g.: 
 copy-transition-model --binary=false 1.mdl 1.txt 
phones-to-prons
 Convert pairs of (phone-level, word-level) transcriptions to
output that indicates the phones assigned to each word.
Format is standard format for archives of vector<vector<int32> >
i.e. :
utt-id  600 4 7 19 ; 512 4 18 ; 0 1
where 600, 512 and 0 are the word-ids (0 for non-word phones, e.g.
optional-silence introduced by the lexicon), and the phone-ids
follow the word-ids.
Note: L_align.fst must have word-start and word-end symbols in it
Usage:  phones-to-prons [options] <L_align.fst> <word-start-sym> <word-end-sym> <phones-rspecifier> <words-rspecifier> <prons-wspecifier>
e.g.: 
 ali-to-phones 1.mdl ark:1.ali ark:- | \
  phones-to-prons L_align.fst 46 47 ark:- 'ark:sym2int.pl -f 2- words.txt text|' ark:1.prons 
prons-to-wordali
 Caution: this program relates to older scripts and is deprecated,
for modern scripts see egs/wsj/s5/steps/{get_ctm,get_train_ctm}.sh
Given per-utterance pronunciation information as output by 
words-to-prons, and per-utterance phone alignment information
as output by ali-to-phones --write-lengths, output word alignment
information that can be turned into the ctm format.
Outputs is pairs of (word, #frames), or if --per-frame is given,
just the word for each frame.
Note: zero word-id usually means optional silence.
Format is standard format for archives of vector<pair<int32, int32> >
i.e. :
utt-id  600 22 ; 1028 32 ; 0 41
where 600, 1028 and 0 are the word-ids, and 22, 32 and 41 are the
lengths.
Usage:  prons-to-wordali [options] <prons-rspecifier> <phone-lengths-rspecifier> <wordali-wspecifier>
e.g.: 
 ali-to-phones 1.mdl ark:1.ali ark:- | \
  phones-to-prons L_align.fst 46 47 ark:- 'ark:sym2int.pl -f 2- words.txt text|' \
  ark:- | prons-to-wordali ark:- \
    "ark:ali-to-phones --write-lengths 1.mdl ark:1.ali ark:-|" ark:1.wali 
copy-gselect
 Copy Gaussian indices for pruning, possibly making the
lists shorter (e.g. the --n=10 limits to the 10 best indices
See also gmm-gselect, fgmm-gselect
Usage: copy-gselect [options] <gselect-rspecifier> <gselect-wspecifier> 
copy-tree
 Copy decision tree (possibly changing binary/text format)
Usage:  copy-tree [--binary=false] <tree-in> <tree-out> 
scale-post
 Scale posteriors with either a global scale, or a different scale for  each utterance.
Usage: scale-post <post-rspecifier> (<scale-rspecifier>|<scale>) <post-wspecifier> 
post-to-weights
 Turn posteriors into per-frame weights (typically most useful after
weight-silence-post, to get silence weights)
See also: weight-silence-post, post-to-pdf-post, post-to-phone-post
post-to-feats, get-post-on-ali
Usage: post-to-weights <post-rspecifier> <weights-wspecifier> 
sum-tree-stats
 Sum statistics for phonetic-context tree building.
Usage:  sum-tree-stats [options] tree-accs-out tree-accs-in1 tree-accs-in2 ...
e.g.: 
 sum-tree-stats treeacc 1.treeacc 2.treeacc 3.treeacc 
weight-post
 Takes archives (typically per-utterance) of posteriors and per-frame weights,
and weights the posteriors by the per-frame weights
Usage: weight-post <post-rspecifier> <weights-rspecifier> <post-wspecifier> 
post-to-tacc
 From posteriors, compute transition-accumulators
The output is a vector of counts/soft-counts, indexed by transition-id)
Note: the model is only read in order to get the size of the vector
Usage: post-to-tacc [options] <model> <post-rspecifier> <accs>
 e.g.: post-to-tacc --binary=false 1.mdl "ark:ali-to-post 1.ali|" 1.tacc
See also: get-post-on-ali 
copy-matrix
 Copy matrices, or archives of matrices (e.g. features or transforms)
Also see copy-feats which has other format options
Usage: copy-matrix [options] <matrix-in-rspecifier> <matrix-out-wspecifier>
  or: copy-matrix [options] <matrix-in-rxfilename> <matrix-out-wxfilename>
 e.g.: copy-matrix --binary=false 1.mat -
   copy-matrix ark:2.trans ark,t:-
See also: copy-feats, matrix-sum 
copy-vector
 Copy vectors, or archives of vectors (e.g. transition-accs; speaker vectors)
Usage: copy-vector [options] (<vector-in-rspecifier>|<vector-in-rxfilename>) (<vector-out-wspecifier>|<vector-out-wxfilename>)
 e.g.: copy-vector --binary=false 1.mat -
   copy-vector ark:2.trans ark,t:-
see also: dot-weights, append-vector-to-feats 
copy-int-vector
 Copy vectors of integers, or archives of vectors of integers 
(e.g. alignments)
Usage: copy-int-vector [options] (vector-in-rspecifier|vector-in-rxfilename) (vector-out-wspecifier|vector-out-wxfilename)
 e.g.: copy-int-vector --binary=false foo -
   copy-int-vector ark:1.ali ark,t:- 
sum-post
 Sum two sets of posteriors for each utterance, e.g. useful in fMMI.
To take the difference of posteriors, use e.g. --scale2=-1.0
Usage: sum-post <post-rspecifier1> <post-rspecifier2> <post-wspecifier> 
sum-matrices
 Sum matrices, e.g. stats for fMPE training
Usage:  sum-matrices [options] <mat-out> <mat-in1> <mat-in2> ...
e.g.:
 sum-matrices mat 1.mat 2.mat 3.mat 
draw-tree
 Outputs a decision tree description in GraphViz format
Usage: draw-tree [options] <phone-symbols> <tree>
e.g.: draw-tree phones.txt tree | dot -Gsize=8,10.5 -Tps | ps2pdf - tree.pdf 
align-mapped
 Generate alignments, reading log-likelihoods as matrices.
 (model is needed only for the integer mappings in its transition-model)
Usage:   align-mapped [options] <tree-in> <trans-model-in> <lexicon-fst-in> <feature-rspecifier> <transcriptions-rspecifier> <alignments-wspecifier>
e.g.: 
 align-mapped tree trans.mdl lex.fst scp:train.scp ark:train.tra ark:nnet.ali 
align-compiled-mapped
 Generate alignments, reading log-likelihoods as matrices.
 (model is needed only for the integer mappings in its transition-model)
Usage:   align-compiled-mapped [options] trans-model-in graphs-rspecifier feature-rspecifier alignments-wspecifier
e.g.: 
 nnet-align-compiled trans.mdl ark:graphs.fsts scp:train.scp ark:nnet.ali
or:
 compile-train-graphs tree trans.mdl lex.fst ark:train.tra b, ark:- | \
   nnet-align-compiled trans.mdl ark:- scp:loglikes.scp t, ark:nnet.ali 
latgen-faster-mapped
 Generate lattices, reading log-likelihoods as matrices
 (model is needed only for the integer mappings in its transition-model)
Usage: latgen-faster-mapped [options] trans-model-in (fst-in|fsts-rspecifier) loglikes-rspecifier lattice-wspecifier [ words-wspecifier [alignments-wspecifier] ] 
latgen-faster-mapped-parallel
 Generate lattices, reading log-likelihoods as matrices, using multiple decoding threads
 (model is needed only for the integer mappings in its transition-model)
Usage: latgen-faster-mapped-parallel [options] trans-model-in (fst-in|fsts-rspecifier) loglikes-rspecifier lattice-wspecifier [ words-wspecifier [alignments-wspecifier] ] 
hmm-info
 Write to standard output various properties of HMM-based transition model
Usage:  hmm-info [options] <model-in>
e.g.:
 hmm-info trans.mdl 
analyze-counts
 Computes element counts from integer vector table.
(e.g. get pdf-counts to estimate DNN-output priors for data analysis)
Verbosity : level 1 => print frequencies and histogram
Usage: analyze-counts [options] <alignments-rspecifier> <counts-wxfilname>
e.g.: 
 analyze-counts ark:1.ali prior.counts
 Show phone counts by:
 ali-to-phones --per-frame=true ark:1.ali ark:- | analyze-counts --verbose=1 ark:- - >/dev/null
Note: this is deprecated, see post-to-tacc. 
post-to-phone-post
 Convert posteriors (or pdf-level posteriors) to phone-level posteriors
See also: post-to-pdf-post, post-to-weights, get-post-on-ali
First, the usage when your posteriors are on transition-ids (the normal case):
Usage: post-to-phone-post [options] <model> <post-rspecifier> <phone-post-wspecifier>
 e.g.: post-to-phone-post --binary=false 1.mdl "ark:ali-to-post 1.ali|" ark,t:-
Next, the usage when your posteriors are on pdfs (e.g. if they are neural-net
posteriors)
post-to-phone-post --transition-id-counts=final.tacc 1.mdl ark:pdf_post.ark ark,t:-
See documentation of --transition-id-counts option for more details. 
post-to-pdf-post
 This program turns per-frame posteriors, which have transition-ids as
the integers, into pdf-level posteriors
See also: post-to-phone-post, post-to-weights, get-post-on-ali
Usage:  post-to-pdf-post [options] <model-file> <posteriors-rspecifier> <posteriors-wspecifier>
e.g.: post-to-pdf-post 1.mdl ark:- ark:- 
logprob-to-post
 Convert a matrix of log-probabilities (e.g. from nnet-logprob) to posteriors
Usage:  logprob-to-post [options] <logprob-matrix-rspecifier> <posteriors-wspecifier>
e.g.:
 nnet-logprob [args] | logprob-to-post ark:- ark:1.post
Caution: in this particular example, the output would be posteriors of pdf-ids,
rather than transition-ids (c.f. post-to-pdf-post) 
prob-to-post
 Convert a matrix of probabilities (e.g. from nnet-logprob2) to posteriors
Usage:  prob-to-post [options] <prob-matrix-rspecifier> <posteriors-wspecifier>
e.g.:
 nnet-logprob2 [args] | prob-to-post ark:- ark:1.post
Caution: in this particular example, the output would be posteriors of pdf-ids,
rather than transition-ids (c.f. post-to-pdf-post) 
copy-post
 Copy archives of posteriors, with optional scaling
Usage: copy-post <post-rspecifier> <post-wspecifier>
See also: post-to-weights, scale-post, sum-post, weight-post ... 
matrix-sum
 Add matrices (supports various forms)
Type one usage:
 matrix-sum [options] <matrix-in-rspecifier1> [<matrix-in-rspecifier2> <matrix-in-rspecifier3> ...] <matrix-out-wspecifier>
  e.g.: matrix-sum ark:1.weights ark:2.weights ark:combine.weights
  This usage supports the --scale1 and --scale2 options to scale the
  first two input tables.
Type two usage (sums a single table input to produce a single output):
 matrix-sum [options] <matrix-in-rspecifier> <matrix-out-wxfilename>
 e.g.: matrix-sum --binary=false mats.ark sum.mat
Type three usage (sums or averages single-file inputs to produce
a single output):
 matrix-sum [options] <matrix-in-rxfilename1> <matrix-in-rxfilename2> ... <matrix-out-wxfilename>
 e.g.: matrix-sum --binary=false 1.mat 2.mat 3.mat sum.mat
See also: matrix-sum-rows, copy-matrix 
build-pfile-from-ali
 Build pfiles for neural network training from alignment.
Usage:  build-pfile-from-ali [options] <model> <alignments-rspecifier> <feature-rspecifier> 
<pfile-wspecifier>
e.g.: 
 build-pfile-from-ali 1.mdl ark:1.ali features 
 "|pfile_create -i - -o pfile.1 -f 143 -l 1"  
get-post-on-ali
 Given input posteriors, e.g. derived from lattice-to-post, and an alignment
typically derived from the best path of a lattice, outputs the probability in
the posterior of the corresponding index in the alignment, or zero if it was
not there.  These are output as a vector of weights, one per utterance.
While, by default, lattice-to-post (as a source of posteriors) and sources of
alignments such as lattice-best-path will output transition-ids as the index,
it will generally make sense to either convert these to pdf-ids using
post-to-pdf-post and ali-to-pdf respectively, or to phones using post-to-phone-post
and (ali-to-phones --per-frame=true).  Since this program only sees the integer
indexes, it does not care what they represent-- but of course they should match
(e.g. don't input posteriors with transition-ids and alignments with pdf-ids).
See http://kaldi-asr.org/doc/hmm.html#transition_model_identifiers for an
explanation of these types of indexes.
See also: post-to-tacc, weight-post, post-to-weights, reverse-weights
Usage:  get-post-on-ali [options] <posteriors-rspecifier> <ali-rspecifier> <weights-wspecifier>
e.g.: get-post-on-ali ark:post.ark ark,s,cs:ali.ark ark:weights.ark 
tree-info
 Print information about decision tree (mainly the number of pdfs), to stdout
Usage:  tree-info <tree-in> 
am-info
 Write to standard output various properties of a model, of any type
(reads only the transition model)
Usage:  am-info [options] <model-in>
e.g.:
 am-info 1.mdl 
vector-sum
 Add vectors (e.g. weights, transition-accs; speaker vectors)
If you need to scale the inputs, use vector-scale on the inputs
Type one usage:
 vector-sum [options] <vector-in-rspecifier1> [<vector-in-rspecifier2> <vector-in-rspecifier3> ...] <vector-out-wspecifier>
  e.g.: vector-sum ark:1.weights ark:2.weights ark:combine.weights
Type two usage (sums a single table input to produce a single output):
 vector-sum [options] <vector-in-rspecifier> <vector-out-wxfilename>
 e.g.: vector-sum --binary=false vecs.ark sum.vec
Type three usage (sums single-file inputs to produce a single output):
 vector-sum [options] <vector-in-rxfilename1> <vector-in-rxfilename2> ... <vector-out-wxfilename>
 e.g.: vector-sum --binary=false 1.vec 2.vec 3.vec sum.vec
See also: copy-vector, dot-weights 
matrix-sum-rows
 Sum the rows of an input table of matrices and output the corresponding
table of vectors
Usage: matrix-sum-rows [options] <matrix-rspecifier> <vector-wspecifier>
e.g.: matrix-sum-rows ark:- ark:- | vector-sum ark:- sum.vec
See also: matrix-sum, vector-sum 
est-pca
 Estimate PCA transform; dimension reduction is optional (if not specified
we don't reduce the dimension; if you specify --normalize-variance=true,
we normalize the (centered) covariance of the features, and if you specify
--normalize-mean=true the mean is also normalized.  So a variety of transform
types are supported.  Because this type of transform does not need too much
data to estimate robustly, we don't support separate accumulator files;
this program reads in the features directly.  For large datasets you may
want to subset the features (see example below)
By default the program reads in matrices (e.g. features), but with
--read-vectors=true, can read in vectors (e.g. iVectors).
Usage:  est-pca [options] (<feature-rspecifier>|<vector-rspecifier>) <pca-matrix-out>
e.g.:
utils/shuffle_list.pl data/train/feats.scp | head -n 5000 | sort | \
  est-pca --dim=50 scp:- some/dir/0.mat 
sum-lda-accs
 Sum stats obtained with acc-lda.
Usage: sum-lda-accs [options] <stats-out> <stats-in1> <stats-in2> ... 
sum-mllt-accs
 Sum stats obtained with gmm-acc-mllt.
Usage: sum-mllt-accs [options] <stats-out> <stats-in1> <stats-in2> ... 
transform-vec
 This program applies a linear or affine transform to individual vectors, e.g.
iVectors.  It is transform-feats, except it works on vectors rather than matrices,
and expects a single transform matrix rather than possibly a table of matrices
Usage: transform-vec [options] <transform-rxfilename> <feats-rspecifier> <feats-wspecifier>
See also: transform-feats, est-pca 
align-text
 Computes alignment between two sentences with the same key in the
two given input text-rspecifiers. The current implementation uses
Levenshtein distance as the distance metric.
The input text file looks like follows:
  key1 a b c
  key2 d e
The output alignment file looks like follows:
  key1 a a ; b <eps> ; c c 
  key2 d f ; e e 
where the aligned pairs are separated by ";"
Usage: align-text [options] <text1-rspecifier> <text2-rspecifier> \
                              <alignment-wspecifier>
 e.g.: align-text ark:text1.txt ark:text2.txt ark,t:alignment.txt
See also: compute-wer,
Example scoring script: egs/wsj/s5/steps/score_kaldi.sh 
matrix-dim
 Print dimension info on an input matrix (rows then cols, separated by tab), to
standard output.  Output for single filename: rows[tab]cols.  Output per line for
archive of files: key[tab]rows[tab]cols
Usage: matrix-dim [options] <matrix-in>|<in-rspecifier>
e.g.: matrix-dim final.mat | cut -f 2
See also: feat-to-len, feat-to-dim 
post-to-smat
 This program turns an archive of per-frame posteriors, e.g. from
ali-to-post | post-to-pdf-post,
into an archive of SparseMatrix.  This is just a format transformation.
This may not make sense if the indexes in question are one-based (at least,
you'd have to increase the dimension by one.
See also: post-to-phone-post, ali-to-post, post-to-pdf-post
Usage:  post-to-smat [options] <posteriors-rspecifier> <sparse-matrix-wspecifier>
e.g.: post-to-smat --dim=1038 ark:- ark:- 
compile-graph
 Creates HCLG decoding graph.  Similar to mkgraph.sh but done in code.
Usage:   compile-graph [options] <tree-in> <model-in> <lexicon-fst-in>  <gammar-rspecifier> <hclg-wspecifier>
e.g.: 
 compile-train-graphs-fsts tree 1.mdl L_disambig.fst G.fst HCLG.fst 
compare-int-vector
 Compare vectors of integers (e.g. phone alignments)
Prints to stdout fields of the form:
<utterance-id>  <num-frames-in-utterance>  <num-frames-that-differ>
e.g.:
 SWB1_A_31410_32892 420 36
Usage:
compare-int-vector [options] <vector1-rspecifier> <vector2-rspecifier>
e.g. compare-int-vector scp:foo.scp scp:bar.scp > comparison
E.g. the inputs might come from ali-to-phones.
Warnings are printed if the vector lengths differ for a given utterance-id,
and in those cases, the number of frames printed will be the smaller of the
See also: ali-to-phones, copy-int-vector 
latgen-incremental-mapped
 Generate lattices, reading log-likelihoods as matrices
 (model is needed only for the integer mappings in its transition-model)
The lattice determinization algorithm here can operate
incrementally.
Usage: latgen-incremental-mapped [options] trans-model-in (fst-in|fsts-rspecifier) loglikes-rspecifier lattice-wspecifier [ words-wspecifier [alignments-wspecifier] ] 
compute-gop
 Compute Goodness Of Pronunciation (GOP) from a matrix of probabilities (e.g. from nnet3-compute).
Usage:  compute-gop [options] <model> <alignments-rspecifier> <prob-matrix-rspecifier> <gop-wspecifier> [<phone-feature-wspecifier>]
e.g.:
 nnet3-compute [args] | compute-gop 1.mdl ark:ali-phone.1 ark:- ark:gop.1 ark:phone-feat.1 
chain-est-phone-lm
 Initialize un-smoothed phone language model for 'chain' training
Output in FST format (epsilon-free deterministic acceptor)
Usage:  chain-est-phone-lm [options] <phone-seqs-rspecifier> <phone-lm-fst-out>
The phone-sequences are used to train a language model.
e.g.:
gunzip -c input_dir/ali.*.gz | ali-to-phones input_dir/final.mdl ark:- ark:- | \
 chain-est-phone-lm --leftmost-context-questions=dir/leftmost_questions.txt ark:- dir/phone_G.fst 
chain-get-supervision
 Get a 'chain' supervision object for each file of training data.
This will normally be piped into nnet3-chain-get-egs, where it
will be split up into pieces and combined with the features.
Input can come in two formats: from alignments
(from ali-to-phones --write-lenghts=true), or from lattices
(e.g. derived from aligning the data, see steps/align_fmllr_lats.sh)
that have been converged to phone-level lattices with
lattice-align-phones --replace-output-symbols=true.
Usage: chain-get-supervision [options] <tree> <transition-model> [<phones-with-lengths-rspecifier>|<phone-lattice-rspecifier>] <supervision-wspecifier>
See steps/nnet3/chain/get_egs.sh for example 
chain-make-den-fst
 Created 'denominator' FST for 'chain' training
Outputs in FST format.  <denominator-fst-out> is an epsilon-free acceptor
<normalization-fst-out> is a modified version of <denominator-fst> (w.r.t.
initial and final probs) that is used in example generation.
Usage: chain-make-den-fsth [options] <tree> <transition-model> <phone-lm-fst> <denominator-fst-out> <normalization-fst-out>
e.g.:
chain-make-den-fst dir/tree dir/0.trans_mdl dir/phone_lm.fst dir/den.fst dir/normalization.fst 
nnet3-chain-get-egs
 Get frame-by-frame examples of data for nnet3+chain neural network
training.  This involves breaking up utterances into pieces of a
fixed size.  Input will come from chain-get-supervision.
Note: if <normalization-fst> is not supplied the egs will not be
ready for training; in that case they should later be processed
with nnet3-chain-normalize-egs
Usage:  nnet3-chain-get-egs [options] [<normalization-fst>] <features-rspecifier> <chain-supervision-rspecifier> <egs-wspecifier>
An example [where $feats expands to the actual features]:
chain-get-supervision [args] | \
  nnet3-chain-get-egs --left-context=25 --right-context=9 --num-frames=150,100,90 dir/normalization.fst \
  "$feats" ark,s,cs:- ark:cegs.1.ark
Note: the --frame-subsampling-factor option must be the same as given to
chain-get-supervision. 
nnet3-chain-copy-egs
 Copy examples for nnet3+chain network training, possibly changing the binary mode.
Supports multiple wspecifiers, in which case it will write the examples
round-robin to the outputs.
Usage:  nnet3-chain-copy-egs [options] <egs-rspecifier> <egs-wspecifier1> [<egs-wspecifier2> ...]
e.g.
nnet3-chain-copy-egs ark:train.cegs ark,t:text.cegs
or:
nnet3-chain-copy-egs ark:train.cegs ark:1.cegs ark:2.cegs 
nnet3-chain-merge-egs
 This copies nnet3+chain training examples from input to output, merging them
into composite examples.  The --minibatch-size option controls how many egs
are merged into a single output eg.
Usage:  nnet3-chain-merge-egs [options] <egs-rspecifier> <egs-wspecifier>
e.g.
nnet3-chain-merge-egs --minibatch-size=128 ark:1.cegs ark:- | nnet3-chain-train-simple ... 
See also nnet3-chain-copy-egs 
nnet3-chain-shuffle-egs
 Copy nnet3+chain examples for neural network training, from the input to output,
while randomly shuffling the order.  This program will keep all of the examples
in memory at once, unless you use the --buffer-size option
Usage:  nnet3-chain-shuffle-egs [options] <egs-rspecifier> <egs-wspecifier>
nnet3-chain-shuffle-egs --srand=1 ark:train.egs ark:shuffled.egs 
nnet3-chain-subset-egs
 Creates a random subset of the input nnet3+chain examples, of a specified size.
Uses no more memory than the size of the subset.
Usage:  nnet3-chain-cubset-egs [options] <egs-rspecifier> [<egs-wspecifier2> ...]
e.g.
nnet3-chain-subset-egs [args] ark:- | nnet-subset-egs --n=1000 ark:- ark:subset.cegs 
nnet3-chain-acc-lda-stats
 Accumulate statistics in the same format as acc-lda (i.e. stats for
estimation of LDA and similar types of transform), starting from nnet+chain
training examples.  This program puts the features through the network,
and the network output will be the features; the supervision in the
training examples is used for the class labels.  Used in obtaining
feature transforms that help nnet training work better.
Note: the time boundaries it gets from the chain supervision will be
a little fuzzy (which is not ideal), but it should not matter much in
this situation
Usage:  nnet3-chain-acc-lda-stats [options] <raw-nnet-in> <training-examples-in> <lda-stats-out>
e.g.:
nnet3-chain-acc-lda-stats 0.raw ark:1.cegs 1.acc
See also: nnet-get-feature-transform 
nnet3-chain-train
 Train nnet3+chain neural network parameters with backprop and stochastic
gradient descent.  Minibatches are to be created by nnet3-chain-merge-egs in
the input pipeline.  This training program is single-threaded (best to
use it with a GPU).
Usage:  nnet3-chain-train [options] <raw-nnet-in> <denominator-fst-in> <chain-training-examples-in> <raw-nnet-out>
nnet3-chain-train 1.raw den.fst 'ark:nnet3-merge-egs 1.cegs ark:-|' 2.raw 
nnet3-chain-compute-prob
 Computes and prints to in logging messages the average log-prob per frame of
the given data with an nnet3+chain neural net.  The input of this is the output of
e.g. nnet3-chain-get-egs | nnet3-chain-merge-egs.
Usage:  nnet3-chain-compute-prob [options] <raw-nnet3-model-in> <denominator-fst> <training-examples-in>
e.g.: nnet3-chain-compute-prob 0.mdl den.fst ark:valid.egs 
nnet3-chain-combine
 Using a subset of training or held-out nnet3+chain examples, compute
the average over the first n nnet models where we maximize the
'chain' objective function for n. Note that the order of models has
been reversed before feeding into this binary. So we are actually
combining last n models.
Inputs and outputs are nnet3 raw nnets.
Usage:  nnet3-chain-combine [options] <den-fst> <raw-nnet-in1> <raw-nnet-in2> ... <raw-nnet-inN> <chain-examples-in> <raw-nnet-out>
e.g.:
 nnet3-combine den.fst 35.raw 36.raw 37.raw 38.raw ark:valid.cegs final.raw 
nnet3-chain-normalize-egs
 Add weights from 'normalization' FST to nnet3+chain examples.
Should be done if and only if the <normalization-fst> argument of
nnet3-chain-get-egs was not supplied when the original egs were
created.
Usage:  nnet3-chain-normalize-egs [options] <normalization-fst> <egs-rspecifier> <egs-wspecifier>
e.g.
nnet3-chain-normalize-egs dir/normalization.fst ark:train_in.cegs ark:train_out.cegs 
nnet3-chain-e2e-get-egs
 Get frame-by-frame examples of data for nnet3+chain end2end neural network
training.Note: if <normalization-fst> is not supplied the egs will not be
ready for training; in that case they should later be processed
with nnet3-chain-normalize-egs
Usage:  nnet3-chain-get-egs [options] [<normalization-fst>] <features-rspecifier> <fst-rspecifier> <trans-model> <egs-wspecifier> 
nnet3-chain-compute-post
 Compute posteriors from 'denominator FST' of chain model and optionally map them to phones.
Usage: nnet3-chain-compute-post [options] <nnet-in> <den-fst> <features-rspecifier> <matrix-wspecifier>
 e.g.: nnet3-chain-compute-post --transform-mat=transform.mat final.raw den.fst scp:feats.scp ark:nnet_prediction.ark
See also: nnet3-compute
See steps/nnet3/chain/get_phone_post.sh for example of usage.
Note: this program makes *extremely inefficient* use of the GPU.
You are advised to run this on CPU until it's improved. 
batched-wav-nnet3-cuda
 Reads in wav file(s) and simulates online decoding with neural nets
(nnet3 setup), with optional iVector-based speaker adaptation and
optional endpointing.  Note: some configuration values and inputs are
set via config files whose filenames are passed as options
Usage: batched-wav-nnet3-cuda [options] <nnet3-in> <fst-in> <wav-rspecifier> <lattice-wspecifier> 
add-deltas
 Add deltas (typically to raw mfcc or plp features
Usage: add-deltas [options] in-rspecifier out-wspecifier 
add-deltas-sdc
 Add shifted delta cepstra (typically to raw mfcc or plp features
Usage: add-deltas-sdc [options] in-rspecifier out-wspecifier 
append-post-to-feats
 Append posteriors to features
Usage: append-post-to-feats [options] <in-rspecifier1> <in-rspecifier2> <out-wspecifier>
 or: append-post-to-feats [options] <in-rxfilename1> <in-rxfilename2> <out-wxfilename>
e.g.: append-post-to-feats --post-dim=50 ark:input.ark scp:post.scp ark:output.ark
See also: paste-feats, concat-feats, append-vector-to-feats 
append-vector-to-feats
 Append a vector to each row of input feature files
Usage: append-vector-to-feats <in-rspecifier1> <in-rspecifier2> <out-wspecifier>
 or: append-vector-to-feats <in-rxfilename1> <in-rxfilename2> <out-wxfilename>
See also: paste-feats, concat-feats 
apply-cmvn
 Apply cepstral mean and (optionally) variance normalization
Per-utterance by default, or per-speaker if utt2spk option provided
Usage: apply-cmvn [options] (<cmvn-stats-rspecifier>|<cmvn-stats-rxfilename>) <feats-rspecifier> <feats-wspecifier>
e.g.: apply-cmvn --utt2spk=ark:data/train/utt2spk scp:data/train/cmvn.scp scp:data/train/feats.scp ark:-
See also: modify-cmvn-stats, matrix-sum, compute-cmvn-stats 
apply-cmvn-sliding
 Apply sliding-window cepstral mean (and optionally variance)
normalization per utterance.  If center == true, window is centered
on frame being normalized; otherwise it precedes it in time.
Useful for speaker-id; see also apply-cmvn-online
Usage: apply-cmvn-sliding [options] <feats-rspecifier> <feats-wspecifier> 
compare-feats
 Computes relative difference between two sets of features
per dimension and an average difference
Can be used to figure out how different two sets of features are.
Inputs must have same dimension.  Prints to stdout a similarity
metric vector that is 1.0 per dimension if the features identical,
and <1.0 otherwise, and an average overall similarity value.
Usage: compare-feats [options] <in-rspecifier1> <in-rspecifier2>
e.g.: compare-feats ark:1.ark ark:2.ark 
compose-transforms
 Compose (affine or linear) feature transforms
Usage: compose-transforms [options] (<transform-A-rspecifier>|<transform-A-rxfilename>) (<transform-B-rspecifier>|<transform-B-rxfilename>) (<transform-out-wspecifier>|<transform-out-wxfilename>)
 Note: it does matrix multiplication (A B) so B is the transform that gets applied
  to the features first.  If b-is-affine = true, then assume last column of b corresponds to offset
 e.g.: compose-transforms 1.mat 2.mat 3.mat
   compose-transforms 1.mat ark:2.trans ark:3.trans
   compose-transforms ark:1.trans ark:2.trans ark:3.trans
 See also: transform-feats, transform-vec, extend-transform-dim, est-lda, est-pca 
compute-and-process-kaldi-pitch-feats
 Apply Kaldi pitch extractor and pitch post-processor, starting from wav input.
Equivalent to compute-kaldi-pitch-feats | process-kaldi-pitch-feats, except
that it is able to simulate online pitch extraction; see options like
--frames-per-chunk, --simulate-first-pass-online, --recompute-frame.
Usage: compute-and-process-kaldi-pitch-feats [options...] <wav-rspecifier> <feats-wspecifier>
e.g.
compute-and-process-kaldi-pitch-feats --simulate-first-pass-online=true \
  --frames-per-chunk=10 --sample-frequency=8000 scp:wav.scp ark:- 
See also: compute-kaldi-pitch-feats, process-kaldi-pitch-feats 
compute-cmvn-stats
 Compute cepstral mean and variance normalization statistics
If wspecifier provided: per-utterance by default, or per-speaker if
spk2utt option provided; if wxfilename: global
Usage: compute-cmvn-stats  [options] <feats-rspecifier> (<stats-wspecifier>|<stats-wxfilename>)
e.g.: compute-cmvn-stats --spk2utt=ark:data/train/spk2utt scp:data/train/feats.scp ark,scp:/foo/bar/cmvn.ark,data/train/cmvn.scp
See also: apply-cmvn, modify-cmvn-stats 
compute-cmvn-stats-two-channel
 Compute cepstral mean and variance normalization statistics
Specialized for two-sided telephone data where we only accumulate
the louder of the two channels at each frame (and add it to that
side's stats).  Reads a 'reco2file_and_channel' file, normally like
sw02001-A sw02001 A
sw02001-B sw02001 B
sw02005-A sw02005 A
sw02005-B sw02005 B
interpreted as <utterance-id> <call-id> <side> and for each <call-id>
that has two sides, does the 'only-the-louder' computation, else doesn
per-utterance stats in the normal way.
Note: loudness is judged by the first feature component, either energy or c0;
only applicable to MFCCs or PLPs (this code could be modified to handle filterbanks).
Usage: compute-cmvn-stats-two-channel  [options] <reco2file-and-channel> <feats-rspecifier> <stats-wspecifier>
e.g.: compute-cmvn-stats-two-channel data/train_unseg/reco2file_and_channel scp:data/train_unseg/feats.scp ark,t:- 
compute-fbank-feats
 Create Mel-filter bank (FBANK) feature files.
Usage:  compute-fbank-feats [options...] <wav-rspecifier> <feats-wspecifier> 
compute-kaldi-pitch-feats
 Apply Kaldi pitch extractor, starting from wav input.  Output is 2-dimensional
features consisting of (NCCF, pitch in Hz), where NCCF is between -1 and 1, and
higher for voiced frames.  You will typically pipe this into
process-kaldi-pitch-feats.
Usage: compute-kaldi-pitch-feats [options...] <wav-rspecifier> <feats-wspecifier>
e.g.
compute-kaldi-pitch-feats --sample-frequency=8000 scp:wav.scp ark:-
See also: process-kaldi-pitch-feats, compute-and-process-kaldi-pitch-feats 
compute-mfcc-feats
 Create MFCC feature files.
Usage:  compute-mfcc-feats [options...] <wav-rspecifier> <feats-wspecifier> 
compute-plp-feats
 Create PLP feature files.
Usage:  compute-plp-feats [options...] <wav-rspecifier> <feats-wspecifier> 
compute-spectrogram-feats
 Create spectrogram feature files.
Usage:  compute-spectrogram-feats [options...] <wav-rspecifier> <feats-wspecifier> 
concat-feats
 Concatenate feature files (assuming they have the same dimensions),
so the output file has the sum of the num-frames of the inputs.
Usage: concat-feats <in-rxfilename1> <in-rxfilename2> [<in-rxfilename3> ...] <out-wxfilename>
 e.g. concat-feats mfcc/foo.ark:12343 mfcc/foo.ark:56789 -
See also: copy-feats, append-vector-to-feats, paste-feats 
copy-feats
 Copy features [and possibly change format]
Usage: copy-feats [options] <feature-rspecifier> <feature-wspecifier>
or:   copy-feats [options] <feats-rxfilename> <feats-wxfilename>
e.g.: copy-feats ark:- ark,scp:foo.ark,foo.scp
 or: copy-feats ark:foo.ark ark,t:txt.ark
See also: copy-matrix, copy-feats-to-htk, copy-feats-to-sphinx, select-feats,
extract-feature-segments, subset-feats, subsample-feats, splice-feats, paste-feats,
concat-feats 
copy-feats-to-htk
 Save features as HTK files:
Each utterance will be stored as a unique HTK file in a specified directory.
The HTK filename will correspond to the utterance-id (key) in the input table, with the specified extension.
Usage: copy-feats-to-htk [options] in-rspecifier
Example: copy-feats-to-htk --output-dir=/tmp/HTK-features --output-ext=fea  scp:feats.scp 
copy-feats-to-sphinx
 Save features as Sphinx files:
Each utterance will be stored as a unique Sphinx file in a specified directory.
The Sphinx filename will correspond to the utterance-id (key) in the input table, with the specified extension.
Usage: copy-feats-to-sphinx [options] in-rspecifier
Example: copy-feats-to-sphinx --output-dir=/tmp/sphinx-features --output-ext=fea  scp:feats.scp 
extend-transform-dim
 Read in transform from dimension d -> d (affine or linear), and output a transform
from dimension e -> e (with e >= d, and e controlled by option --new-dimension).
This new transform will leave the extra dimension unaffected, and transform the old
dimensions in the same way.
Usage: extend-transform-dim [options] (transform-A-rspecifier|transform-A-rxfilename) (transform-out-wspecifier|transform-out-wxfilename)
E.g.: extend-transform-dim --new-dimension=117 in.mat big.mat 
extract-feature-segments
 Create feature files by segmenting input files.
Note: this program should no longer be needed now that
'ranges' in scp files are supported; search for 'ranges' in
http://kaldi-asr.org/doc/io_tut.html, or see the script
utils/data/subsegment_data_dir.sh.
Usage:  extract-feature-segments [options...] <feats-rspecifier>  <segments-file> <feats-wspecifier>
 (segments-file has lines like: output-utterance-id input-utterance-or-spk-id 1.10 2.36) 
extract-segments
 Extract segments from a large audio file in WAV format.
Usage:  extract-segments [options] <wav-rspecifier> <segments-file> <wav-wspecifier>
e.g. extract-segments scp:wav.scp segments ark:- | <some-other-program>
 segments-file format: each line is either
<segment-id> <recording-id> <start-time> <end-time>
e.g. call-861225-A-0050-0065 call-861225-A 5.0 6.5
or (less frequently, and not supported in scripts):
<segment-id> <wav-file-name> <start-time> <end-time> <channel>
where <channel> will normally be 0 (left) or 1 (right)
e.g. call-861225-A-0050-0065 call-861225 5.0 6.5 1
And <end-time> of -1 means the segment runs till the end of the WAV file
See also: extract-feature-segments, wav-copy, wav-to-duration 
feat-to-dim
 Reads an archive of features.  If second argument is wxfilename, writes
the feature dimension of the first feature file; if second argument is
wspecifier, writes an archive of the feature dimension, indexed by utterance
id.
Usage: feat-to-dim [options] <feat-rspecifier> (<dim-wspecifier>|<dim-wxfilename>)
e.g.: feat-to-dim scp:feats.scp - 
feat-to-len
 Reads an archive of features and writes a corresponding archive
that maps utterance-id to utterance length in frames, or (with
one argument) print to stdout the total number of frames in the
input archive.
Usage: feat-to-len [options] <in-rspecifier> [<out-wspecifier>]
e.g.: feat-to-len scp:feats.scp ark,t:feats.lengths
or: feat-to-len scp:feats.scp 
fmpe-acc-stats
 Compute statistics for fMPE training
Usage:  fmpe-acc-stats [options...] <fmpe-object> <feat-rspecifier> <feat-diff-rspecifier> <gselect-rspecifier> <stats-out>
Note: gmm-fmpe-acc-stats avoids computing the features an extra time 
fmpe-apply-transform
 Apply fMPE transform to features
Usage:  fmpe-apply-transform [options...] <fmpe-object> <feat-rspecifier> <gselect-rspecifier> <feat-wspecifier> 
fmpe-est
 Do one iteration of learning (modified gradient descent)
on fMPE transform
Usage: fmpe-est [options...] <fmpe-in> <stats-in> <fmpe-out>
E.g. fmpe-est 1.fmpe 1.accs 2.fmpe 
fmpe-init
 Initialize fMPE transform (to zero)
Usage: fmpe-init [options...] <diag-gmm-in> <fmpe-out>
E.g. fmpe-init 1.ubm 1.fmpe 
fmpe-sum-accs
 Sum fMPE stats
Usage: fmpe-sum-accs [options...] <accs-out> <stats-in1> <stats-in2> ... 
E.g. fmpe-sum-accs 1.accs 1.1.accs 1.2.accs 1.3.accs 1.4.accs 
get-full-lda-mat
 This is a special-purpose program to be used in "predictive SGMMs".
It takes in an LDA+MLLT matrix, and the original "full" LDA matrix
as output by the --write-full-matrix option of est-lda; and it writes
out a "full" LDA+MLLT matrix formed by the LDA+MLLT matrix plus the
remaining rows of the "full" LDA matrix; and also writes out its inverse
Usage: get-full-lda-mat [options] <lda-mllt-rxfilename> <full-lda-rxfilename> <full-lda-mllt-wxfilename> [<inv-full-lda-mllt-wxfilename>]
E.g.: get-full-lda-mat final.mat full.mat full_lda_mllt.mat full_lda_mllt_inv.mat 
interpolate-pitch
 This is a rather special-purpose program which processes 2-dimensional
features consisting of (prob-of-voicing, pitch).  By default we do model-based
pitch smoothing and interpolation (see code), or if --linear-interpolation=true,
just linear interpolation across gaps where pitch == 0 (not predicted).
Usage:  interpolate-pitch [options...] <feats-rspecifier> <feats-wspecifier> 
modify-cmvn-stats
 Copy cepstral mean/variance stats so that some dimensions have 'fake' stats
that will skip normalization
Usage: modify-cmvn-stats [options] [<fake-dims>] <in-rspecifier> <out-wspecifier>
e.g.: modify-cmvn-stats 13:14:15 ark:- ark:-
or: modify-cmvn-stats --convert-to-mean-and-var=true ark:- ark:-
See also: compute-cmvn-stats 
paste-feats
 Paste feature files (assuming they have about the same durations,
see --length-tolerance), appending the features on each frame;
think of the unix command 'paste'.
Usage: paste-feats <in-rspecifier1> <in-rspecifier2> [<in-rspecifier3> ...] <out-wspecifier>
 or: paste-feats <in-rxfilename1> <in-rxfilename2> [<in-rxfilename3> ...] <out-wxfilename>
 e.g. paste-feats ark:feats1.ark "ark:select-feats 0-3 ark:feats2.ark ark:- |" ark:feats-out.ark
  or: paste-feats foo.mat bar.mat baz.mat
See also: copy-feats, copy-matrix, append-vector-to-feats, concat-feats 
post-to-feats
 Convert posteriors to features
Usage: post-to-feats [options] <in-rspecifier> <out-wspecifier>
 or: post-to-feats [options] <in-rxfilename> <out-wxfilename>
e.g.: post-to-feats --post-dim=50 ark:post.ark ark:feat.ark
See also: post-to-weights feat-to-post, append-vector-to-feats, append-post-to-feats 
process-kaldi-pitch-feats
 Post-process Kaldi pitch features, consisting of pitch and NCCF, into
features suitable for input to ASR system.  Default setup produces
3-dimensional features consisting of (pov-feature, pitch-feature,
delta-pitch-feature), where pov-feature is warped NCCF, pitch-feature
is log-pitch with POV-weighted mean subtraction over 1.5 second window,
and delta-pitch-feature is delta feature computed on raw log pitch.
In general, you can select from four features: (pov-feature, 
pitch-feature, delta-pitch-feature, raw-log-pitch), produced in that 
order, by setting the boolean options (--add-pov-feature, 
--add-normalized-log-pitch, --add-delta-pitch and --add-raw-log-pitch)
Usage: process-kaldi-pitch-feats [options...] <feat-rspecifier> <feats-wspecifier>
e.g.: compute-kaldi-pitch-feats [args] ark:- | process-kaldi-pitch-feats ark:- ark:feats.ark
See also: compute-kaldi-pitch-feats, compute-and-process-kaldi-pitch-feats 
process-pitch-feats
 This is a rather special-purpose program which processes 2-dimensional
features consisting of (prob-of-voicing, pitch) into something suitable
to put into a speech recognizer.  First use interpolate-feats
Usage:  process-pitch-feats [options...] <feats-rspecifier> <feats-wspecifier> 
select-feats
 Select certain dimensions of the feature file;  think of it as the unix
command cut -f ...
Usage: select-feats <selection> <in-rspecifier> <out-wspecifier>
  e.g. select-feats 0,24-22,3-12 scp:feats.scp ark,scp:feat-red.ark,feat-red.scp
See also copy-feats, extract-feature-segments, subset-feats, subsample-feats 
shift-feats
 Copy features, and possibly shift them while maintaining the num-frames.
Usage: shift-feats [options] <feature-rspecifier> <feature-wspecifier>
or:  shift-feats [options] <feats-rxfilename> <feats-wxfilename>
e.g.: shift-feats --shift=-1 foo.scp bar.ark
or: shift-feats --shift=1 foo.mat bar.mat
See also: copy-feats, copy-matrix, select-feats, subset-feats,
subsample-feats, splice-feats, paste-feats, concat-feats, extract-feature-segments 
splice-feats
 Splice features with left and right context (e.g. prior to LDA)
Usage: splice-feats [options] <feature-rspecifier> <feature-wspecifier>
e.g.: splice-feats scp:feats.scp ark:- 
subsample-feats
 Sub-samples features by taking every n'th frame.
With negative values of n, will repeat each frame n times
(e.g. --n=-2 will repeat each frame twice)
Usage: subsample-feats [options] <in-rspecifier> <out-wspecifier>
  e.g. subsample-feats --n=2 ark:- ark:- 
subset-feats
 Copy a subset of features (by default, the first n feature files)
Usually used where only a small amount of data is needed
Note: if you want a specific subset, it's usually best to
filter the original .scp file with utils/filter_scp.pl
(possibly with the --exclude option).  The --include and --exclude
options of this program are intended for specialized uses.
The --include and --exclude options are mutually exclusive, 
and both cause the --n option to be ignored.
Usage: subset-feats [options] <in-rspecifier> <out-wspecifier>
e.g.: subset-feats --n=10 ark:- ark:-
or:  subset-feats --include=include_uttlist ark:- ark:-
or:  subset-feats --exclude=exclude_uttlist ark:- ark:-
See also extract-feature-segments, select-feats, subsample-feats 
transform-feats
 Apply transform (e.g. LDA; HLDA; fMLLR/CMLLR; MLLT/STC)
Linear transform if transform-num-cols == feature-dim, affine if
transform-num-cols == feature-dim+1 (->append 1.0 to features)
Per-utterance by default, or per-speaker if utt2spk option provided
Global if transform-rxfilename provided.
Usage: transform-feats [options] (<transform-rspecifier>|<transform-rxfilename>) <feats-rspecifier> <feats-wspecifier>
See also: transform-vec, copy-feats, compose-transforms 
wav-copy
 Copy wave file or archives of wave files
Usage: wav-copy [options] <wav-rspecifier> <wav-wspecifier>
  or:  wav-copy [options] <wav-rxfilename> <wav-wxfilename>
e.g. wav-copy scp:wav.scp ark:-
     wav-copy wav.ark:123456 -
See also: wav-to-duration extract-segments 
wav-reverberate
 Corrupts the wave files supplied via input pipe with the specified
room-impulse response (rir_matrix) and additive noise distortions
(specified by corresponding files).
Usage:  wav-reverberate [options...] <wav-in-rxfilename> <wav-out-wxfilename>
e.g.
wav-reverberate --duration=20.25 --impulse-response=rir.wav --additive-signals='noise1.wav,noise2.wav' --snrs='20.0,15.0' --start-times='0,17.8' input.wav output.wav 
wav-to-duration
 Read wav files and output an archive consisting of a single float:
the duration of each one in seconds.
Usage:  wav-to-duration [options...] <wav-rspecifier> <duration-wspecifier>
E.g.: wav-to-duration scp:wav.scp ark,t:-
See also: wav-copy extract-segments feat-to-len
Currently this program may output a lot of harmless warnings regarding
nonzero exit status of pipes 
fgmm-global-acc-stats
 Accumulate stats for training a full-covariance GMM.
Usage:  fgmm-global-acc-stats [options] <model-in> <feature-rspecifier> <stats-out>
e.g.: fgmm-global-acc-stats 1.mdl scp:train.scp 1.acc 
fgmm-global-sum-accs
 Sum multiple accumulated stats files for full-covariance GMM training.
Usage: fgmm-global-sum-accs [options] stats-out stats-in1 stats-in2 ... 
fgmm-global-est
 Estimate a full-covariance GMM from the accumulated stats.
Usage:  fgmm-global-est [options] <model-in> <stats-in> <model-out> 
fgmm-global-merge
 Combine a number of GMMs into a larger GMM, with #Gauss = 
  sum(individual #Gauss)).  Output full GMM, and a text file with
  sizes of each individual GMM.
Usage: fgmm-global-merge [options] fgmm-out sizes-file-out fgmm-in1 fgmm-in2 ... 
fgmm-global-to-gmm
 Convert single full-covariance GMM to single diagonal-covariance GMM.
Usage: fgmm-global-to-gmm [options] 1.fgmm 1.gmm 
fgmm-gselect
 Precompute Gaussian indices for pruning
 (e.g. in training UBMs, SGMMs, tied-mixture systems)
 For each frame, gives a list of the n best Gaussian indices,
 sorted from best to worst.
See also: gmm-gselect, copy-gselect, fgmm-gselect-to-post
Usage: fgmm-gselect [options] <model-in> <feature-rspecifier> <gselect-wspecifier>
The --gselect option (which takes an rspecifier) limits selection to a subset
of indices:
e.g.: fgmm-gselect "--gselect=ark:gunzip -c bigger.gselect.gz|" --n=20 1.gmm "ark:feature-command |" "ark,t:|gzip -c >1.gselect.gz" 
fgmm-global-get-frame-likes
 Print out per-frame log-likelihoods for each utterance, as an archive
of vectors of floats.  If --average=true, prints out the average per-frame
log-likelihood for each utterance, as a single float.
Usage:  fgmm-global-get-frame-likes [options] <model-in> <feature-rspecifier> <likes-out-wspecifier>
e.g.: fgmm-global-get-frame-likes 1.mdl scp:train.scp ark:1.likes 
fgmm-global-copy
 Copy a full-covariance GMM
Usage:  fgmm-global-copy [options] <model-in> <model-out>
e.g.: fgmm-global-copy --binary=false 1.model - | less 
fgmm-global-gselect-to-post
 Given features and Gaussian-selection (gselect) information for
a full-covariance GMM, output per-frame posteriors for the selected
indices.  Also supports pruning the posteriors if they are below
a stated threshold, (and renormalizing the rest to sum to one)
See also: gmm-gselect, fgmm-gselect, gmm-global-get-post,
 gmm-global-gselect-to-post
Usage:  fgmm-global-gselect-to-post [options] <model-in> <feature-rspecifier> <gselect-rspecifier> <post-wspecifier>
e.g.: fgmm-global-gselect-to-post 1.ubm ark:- 'ark:gunzip -c 1.gselect|' ark:- 
fgmm-global-info
 Write to standard output various properties of full-covariance GMM model
This is for a single mixture of Gaussians, e.g. as used for a UBM.
Usage:  fgmm-global-info [options] <gmm>
e.g.:
 fgmm-global-info 1.ubm 
fgmm-global-acc-stats-post
 Accumulate stats from posteriors and features for instantiating a full-covariance GMM. See also fgmm-global-acc-stats.
Usage:  fgmm-global-acc-stats-post [options] <posterior-rspecifier> <number-of-components> <feature-rspecifier> <stats-out>
e.g.: fgmm-global-acc-stats-post scp:post.scp 2048 scp:train.scp 1.acc 
fgmm-global-init-from-accs
 Initialize a full-covariance GMM from the accumulated stats.
This binary is similar to fgmm-global-est, but does not use a preexisting model.  See also fgmm-global-est.
Usage:  fgmm-global-init-from-accs [options] <stats-in> <number-of-components> <model-out> 
fstdeterminizestar
 Removes epsilons and determinizes in one step
Usage:  fstdeterminizestar [in.fst [out.fst] ]
See also: fstdeterminizelog, lattice-determinize 
fstrmsymbols
 With no options, replaces a subset of symbols with epsilon, wherever
they appear on the input side of an FST.With --remove-arcs=true, will remove arcs that contain these symbols
on the input
With --penalty=<float>, will add the specified penalty to the
cost of any arc that has one of the given symbols on its input side
In all cases, the option --apply-to-output=true (or for
back-compatibility, --remove-from-output=true) makes this apply
to the output side.
Usage:  fstrmsymbols [options] <in-disambig-list>  [<in.fst> [<out.fst>]]
E.g:  fstrmsymbols in.list  < in.fst > out.fst
<in-disambig-list> is an rxfilename specifying a file containing list of integers
representing symbols, in text form, one per line. 
fstisstochastic
 Checks whether an FST is stochastic and exits with success if so.
Prints out maximum error (in log units).
Usage:  fstisstochastic [ in.fst ] 
fstminimizeencoded
 Minimizes FST after encoding [similar to fstminimize, but no weight-pushing]
Usage:  fstminimizeencoded [in.fst [out.fst] ] 
fstmakecontextfst
 Constructs a context FST with a specified context-width and context-position.
Outputs the context FST, and a file in Kaldi format that describes what the
input labels mean.  Note: this is very inefficient if there are a lot of phones,
better to use fstcomposecontext instead
Usage:  fstmakecontextfst <phones-symbol-table> <subsequential-symbol> <ilabels-output-file> [<out-fst>]
E.g.:   fstmakecontextfst phones.txt 42 ilabels.sym > C.fst 
fstmakecontextsyms
 Create input symbols for CLG
Usage: fstmakecontextsyms phones-symtab ilabels_input_file [output-symtab.txt]
E.g.:  fstmakecontextsyms  phones.txt ilabels.sym > context_symbols.txt 
fstaddsubsequentialloop
 Minimizes FST after encoding [this algorithm applicable to all FSTs in tropical semiring]
Usage:  fstaddsubsequentialloop subseq_sym [in.fst [out.fst] ]
E.g.:   fstaddsubsequentialloop 52 < LG.fst > LG_sub.fst 
fstaddselfloops
 Adds self-loops to states of an FST to propagate disambiguation symbols through it
They are added on each final state and each state with non-epsilon output symbols
on at least one arc out of the state.  Useful in conjunction with predeterminize
Usage:  fstaddselfloops in-disambig-list out-disambig-list  [in.fst [out.fst] ]
E.g:  fstaddselfloops in.list out.list < in.fst > withloops.fst
in.list and out.list are lists of integers, one per line, of the
same length. 
fstrmepslocal
 Removes some (but not all) epsilons in an algorithm that will always reduce the number of
arcs+states.  Option to preserves equivalence in tropical or log semiring, and
if in tropical, stochasticit in either log or tropical.
Usage:  fstrmepslocal  [in.fst [out.fst] ] 
fstcomposecontext
 Composes on the left with a dynamically created context FST
Usage:  fstcomposecontext <ilabels-output-file>  [<in.fst> [<out.fst>] ]
E.g:  fstcomposecontext ilabels.sym < LG.fst > CLG.fst 
fsttablecompose
 Composition algorithm [between two FSTs of standard type, in tropical
semiring] that is more efficient for certain cases-- in particular,
where one of the FSTs (the left one, if --match-side=left) has large
out-degree
Usage:  fsttablecompose (fst1-rxfilename|fst1-rspecifier) (fst2-rxfilename|fst2-rspecifier) [(out-rxfilename|out-rspecifier)] 
fstrand
 Generate random FST
Usage:  fstrand [out.fst] 
fstdeterminizelog
 Determinizes in the log semiring
Usage:  fstdeterminizelog [in.fst [out.fst] ]
See also fstdeterminizestar 
fstphicompose
 Composition, where the right FST has "failure" (phi) transitions
that are only taken where there was no match of a "real" label
You supply the label corresponding to phi.
Usage:  fstphicompose phi-label (fst1-rxfilename|fst1-rspecifier) (fst2-rxfilename|fst2-rspecifier) [(out-rxfilename|out-rspecifier)]
E.g.: fstphicompose 54 a.fst b.fst c.fst
or: fstphicompose 11 ark:a.fsts G.fst ark:b.fsts 
fstcopy
 Copy tables/archives of FSTs, indexed by a string (e.g. utterance-id)
Usage: fstcopy <fst-rspecifier> <fst-wspecifier> 
fstpushspecial
 Pushes weights in an FST such that all the states
in the FST have arcs and final-probs with weights that
sum to the same amount (viewed as being in the log semiring).
Thus, the "extra weight" is distributed throughout the FST.
Tolerance parameter --delta controls how exact this is, and the
speed.
Usage:  fstpushspecial [options] [in.fst [out.fst] ] 
fsts-to-transcripts
 Reads a table of FSTs; for each element, finds the best path and 
prints out the output-symbol sequence (if --output-side=true), or 
input-symbol sequence otherwise.
Usage:
 fsts-to-transcripts [options] <fsts-rspecifier> <transcriptions-wspecifier>
e.g.:
 fsts-to-transcripts ark:train.fsts ark,t:train.text 
fsts-project
 Reads kaldi archive of FSTs; for each element, performs the project
operation either on input (default) or on the output (if the option
--project-output is true).
Usage: fsts-project [options] <fsts-rspecifier> <fsts-wspecifier>
 e.g.: fsts-project ark:train.fsts ark,t:train.fsts
see also: fstproject (from the OpenFst toolkit) 
fsts-union
 Reads a kaldi archive of FSTs. Performs the FST operation union on
all fsts sharing the same key. Assumes the archive is sorted by key.
Usage: fsts-union [options] <fsts-rspecifier> <fsts-wspecifier>
 e.g.: fsts-union ark:keywords_tmp.fsts ark,t:keywords.fsts
see also: fstunion (from the OpenFst toolkit) 
fsts-concat
 Reads kaldi archives with FSTs. Concatenates the fsts from all the rspecifiers.
The fsts to concatenate must have same key. The sequencing is given by the position of arguments.
Usage: fsts-concat [options] <fsts-rspecifier1> <fsts-rspecifier2> ... <fsts-wspecifier>
 e.g.: fsts-concat scp:fsts1.scp scp:fsts2.scp ... ark:fsts_out.ark
see also: fstconcat (from the OpenFst toolkit) 
make-grammar-fst
 Construct GrammarFst and write it to disk (or convert it to ConstFst
and write that to disk instead).  Mostly intended for demonstration
and testing purposes (since it may be more convenient to construct
GrammarFst from code).  See kaldi-asr.org/doc/grammar.html
Can also be used to prepares FSTs for this use, by calling
PrepareForGrammarFst(), which does things like adding final-probs and
making small structural tweaks to the FST
Usage (1): make-grammar-fst [options] <top-level-fst> <symbol1> <fst1> \
                            [<symbol2> <fst2> ...]] <fst-out>
<symbol1>, <symbol2> are the integer ids of the corresponding
 user-defined nonterminal symbols (e.g. #nonterm:contact_list) in the
 phones.txt file.
e.g.: make-grammar-fst --nonterm-phones-offset=317 HCLG.fst \
            320 HCLG1.fst HCLG_grammar.fst
Usage (2): make-grammar-fst <fst-in> <fst-out>
  Prepare individual FST for compilation into GrammarFst.
  E.g. make-grammar-fst HCLG.fst HCLGmod.fst.  The outputs of this
   will then become the arguments <top-level-fst>, <fst1>, ... for usage
   pattern (1).
The --nonterm-phones-offset option is required for both usage patterns. 
gmm-init-mono
 Initialize monophone GMM.
Usage:  gmm-init-mono <topology-in> <dim> <model-out> <tree-out> 
e.g.: 
 gmm-init-mono topo 39 mono.mdl mono.tree 
gmm-est
 Do Maximum Likelihood re-estimation of GMM-based acoustic model
Usage:  gmm-est [options] <model-in> <stats-in> <model-out>
e.g.: gmm-est 1.mdl 1.acc 2.mdl 
gmm-acc-stats-ali
 Accumulate stats for GMM training.
Usage:  gmm-acc-stats-ali [options] <model-in> <feature-rspecifier> <alignments-rspecifier> <stats-out>
e.g.:
 gmm-acc-stats-ali 1.mdl scp:train.scp ark:1.ali 1.acc 
gmm-align
 Align features given [GMM-based] models.
Usage:   gmm-align [options] tree-in model-in lexicon-fst-in feature-rspecifier transcriptions-rspecifier alignments-wspecifier
e.g.: 
 gmm-align tree 1.mdl lex.fst scp:train.scp 'ark:sym2int.pl -f 2- words.txt text|' ark:1.ali 
gmm-decode-faster
 Decode features using GMM-based model.
Usage:  gmm-decode-faster [options] model-in fst-in features-rspecifier words-wspecifier [alignments-wspecifier [lattice-wspecifier]]
Note: lattices, if output, will just be linear sequences; use gmm-latgen-faster
  if you want "real" lattices. 
gmm-decode-simple
 Decode features using GMM-based model.
Viterbi decoding, Only produces linear sequence; any lattice
produced is linear
Usage:   gmm-decode-simple [options] <model-in> <fst-in> <features-rspecifier> <words-wspecifier> [<alignments-wspecifier>] [<lattice-wspecifier>] 
gmm-align-compiled
 Align features given [GMM-based] models.
Usage:   gmm-align-compiled [options] <model-in> <graphs-rspecifier> <feature-rspecifier> <alignments-wspecifier> [scores-wspecifier]
e.g.: 
 gmm-align-compiled 1.mdl ark:graphs.fsts scp:train.scp ark:1.ali
or:
 compile-train-graphs tree 1.mdl lex.fst 'ark:sym2int.pl -f 2- words.txt text|' \
   ark:- | gmm-align-compiled 1.mdl ark:- scp:train.scp t, ark:1.ali 
gmm-sum-accs
 Sum multiple accumulated stats files for GMM training.
Usage: gmm-sum-accs [options] <stats-out> <stats-in1> <stats-in2> ...
E.g.: gmm-sum-accs 1.acc 1.1.acc 1.2.acc 
gmm-est-regtree-fmllr
 Compute FMLLR transforms per-utterance (default) or per-speaker for the supplied set of speakers (spk2utt option).  Note: writes RegtreeFmllrDiagGmm objects
Usage: gmm-est-regtree-fmllr  [options] <model-in> <feature-rspecifier> <posteriors-rspecifier> <regression-tree> <transforms-wspecifier> 
gmm-acc-stats-twofeats
 Accumulate stats for GMM training, computing posteriors with one set of features
but accumulating statistics with another.
First features are used to get posteriors, second to accumulate stats
Usage:  gmm-acc-stats-twofeats [options] <model-in> <feature1-rspecifier> <feature2-rspecifier> <posteriors-rspecifier> <stats-out>
e.g.: 
 gmm-acc-stats-twofeats 1.mdl 1.ali scp:train.scp scp:train_new.scp ark:1.ali 1.acc 
gmm-acc-stats
 Accumulate stats for GMM training (reading in posteriors).
Usage:  gmm-acc-stats [options] <model-in> <feature-rspecifier><posteriors-rspecifier> <stats-out>
e.g.: 
 gmm-acc-stats 1.mdl scp:train.scp ark:1.post 1.acc 
gmm-init-lvtln
 Initialize lvtln transforms
Usage:  gmm-init-lvtln [options] <lvtln-out>
e.g.: 
 gmm-init-lvtln --dim=13 --num-classes=21 --default-class=10 1.lvtln 
gmm-est-lvtln-trans
 Estimate linear-VTLN transforms, either per utterance or for the supplied set of speakers (spk2utt option).  Reads posteriors. 
Usage: gmm-est-lvtln-trans [options] <model-in> <lvtln-in> <feature-rspecifier> <gpost-rspecifier> <lvtln-trans-wspecifier> [<warp-wspecifier>] 
gmm-train-lvtln-special
 Set one of the transforms in lvtln to the minimum-squared-error solution
to mapping feats-untransformed to feats-transformed; posteriors may
optionally be used to downweight/remove silence.
Usage: gmm-train-lvtln-special [options] class-index <lvtln-in> <lvtln-out>  <feats-untransformed-rspecifier> <feats-transformed-rspecifier> [<posteriors-rspecifier>]
e.g.: 
 gmm-train-lvtln-special 5 5.lvtln 6.lvtln scp:train.scp scp:train_warp095.scp ark:nosil.post 
gmm-acc-mllt
 Accumulate MLLT (global STC) statistics
Usage:  gmm-acc-mllt [options] <model-in> <feature-rspecifier> <posteriors-rspecifier> <stats-out>
e.g.: 
 gmm-acc-mllt 1.mdl scp:train.scp ark:1.post 1.macc 
gmm-mixup
 Does GMM mixing up (and Gaussian merging)
Usage:  gmm-mixup [options] <model-in> <state-occs-in> <model-out>
e.g. of mixing up:
 gmm-mixup --mix-up=4000 1.mdl 1.occs 2.mdl
e.g. of merging:
 gmm-mixup --mix-down=2000 1.mdl 1.occs 2.mdl 
gmm-init-model
 Initialize GMM from decision tree and tree stats
Usage:  gmm-init-model [options] <tree-in> <tree-stats-in> <topo-file> <model-out> [<old-tree> <old-model>]
e.g.: 
  gmm-init-model tree treeacc topo 1.mdl
or (initializing GMMs with old model):
  gmm-init-model tree treeacc topo 1.mdl prev/tree prev/30.mdl 
gmm-transform-means
 Transform GMM means with linear or affine transform
Usage:  gmm-transform-means <transform-matrix> <model-in> <model-out>
e.g.: gmm-transform-means 2.mat 2.mdl 3.mdl 
gmm-make-regtree
 Build regression class tree.
Usage: gmm-make-regtree [options] <model-file> <regtree-out>
E.g.: gmm-make-regtree --silphones=1:2:3 --state-occs=1.occs 1.mdl 1.regtree
 [Note: state-occs come from --write-occs option of gmm-est] 
gmm-decode-faster-regtree-fmllr
 Decode features using GMM-based model.
Usage: gmm-decode-faster-regtree-fmllr [options] model-in fst-in regtree-in features-rspecifier transforms-rspecifier words-wspecifier [alignments-wspecifier] 
gmm-post-to-gpost
 Convert state-level posteriors to Gaussian-level posteriors
Usage:  gmm-post-to-gpost [options] <model-in> <feature-rspecifier> <posteriors-rspecifier> <gpost-wspecifier>
e.g.: 
 gmm-post-to-gpost 1.mdl scp:train.scp ark:1.post ark:1.gpost 
gmm-est-fmllr-gpost
 Estimate global fMLLR transforms, either per utterance or for the supplied
set of speakers (spk2utt option).  Reads Gaussian-level posteriors.  Writes
to a table of matrices.
Usage: gmm-est-fmllr-gpost [options] <model-in> <feature-rspecifier> <gpost-rspecifier> <transform-wspecifier> 
gmm-est-fmllr
 Estimate global fMLLR transforms, either per utterance or for the supplied
set of speakers (spk2utt option).  Reads posteriors (on transition-ids).  Writes
to a table of matrices.
Usage: gmm-est-fmllr [options] <model-in> <feature-rspecifier> <post-rspecifier> <transform-wspecifier> 
gmm-est-regtree-fmllr-ali
 Compute FMLLR transforms per-utterance (default) or per-speaker for the supplied set of speakers (spk2utt option).  Note: writes RegtreeFmllrDiagGmm objects
Usage: gmm-est-regtree-fmllr-ali  [options] <model-in> <feature-rspecifier> <alignments-rspecifier> <regression-tree> <transforms-wspecifier> 
gmm-est-regtree-mllr
 Compute MLLR transforms per-utterance (default) or per-speaker for the supplied set of speakers (spk2utt option).  Note: writes RegtreeMllrDiagGmm objects
Usage: gmm-est-regtree-mllr  [options] <model-in> <feature-rspecifier> <posteriors-rspecifier> <regression-tree> <transforms-wspecifier> 
gmm-compute-likes
 Compute log-likelihoods from GMM-based model
(outputs matrices of log-likelihoods indexed by (frame, pdf)
Usage: gmm-compute-likes [options] model-in features-rspecifier likes-wspecifier 
gmm-decode-faster-regtree-mllr
 Decode features using GMM-based model.
Usage: gmm-decode-faster-regtree-mllr [options] model-in fst-in regtree-in features-rspecifier transforms-rspecifier words-wspecifier [alignments-wspecifier] 
gmm-latgen-simple
 Generate lattices using GMM-based model.
Usage: gmm-latgen-simple [options] model-in fst-in features-rspecifier lattice-wspecifier [ words-wspecifier [alignments-wspecifier] ] 
gmm-rescore-lattice
 Replace the acoustic scores on a lattice using a new model.
Usage: gmm-rescore-lattice [options] <model-in> <lattice-rspecifier> <feature-rspecifier> <lattice-wspecifier>
 e.g.: gmm-rescore-lattice 1.mdl ark:1.lats scp:trn.scp ark:2.lats 
gmm-decode-biglm-faster
 Decode features using GMM-based model.
User supplies LM used to generate decoding graph, and desired LM;
this decoder applies the difference during decoding
Usage:  gmm-decode-biglm-faster [options] model-in fst-in oldlm-fst-in newlm-fst-in features-rspecifier words-wspecifier [alignments-wspecifier [lattice-wspecifier]] 
gmm-est-gaussians-ebw
 Do EBW update for MMI, MPE or MCE discriminative training.
Numerator stats should already be I-smoothed (e.g. use gmm-ismooth-stats)
Usage:  gmm-est-gaussians-ebw [options] <model-in> <stats-num-in> <stats-den-in> <model-out>
e.g.: gmm-est-gaussians-ebw 1.mdl num.acc den.acc 2.mdl 
gmm-est-weights-ebw
 Do EBW update on weights for MMI, MPE or MCE discriminative training.
Numerator stats should not be I-smoothed
Usage:  gmm-est-weights-ebw [options] <model-in> <stats-num-in> <stats-den-in> <model-out>
e.g.: gmm-est-weights-ebw 1.mdl num.acc den.acc 2.mdl 
gmm-latgen-faster
 Generate lattices using GMM-based model.
Usage: gmm-latgen-faster [options] model-in (fst-in|fsts-rspecifier) features-rspecifier lattice-wspecifier [ words-wspecifier [alignments-wspecifier] ] 
gmm-copy
 Copy GMM based model (and possibly change binary/text format)
Usage:  gmm-copy [options] <model-in> <model-out>
e.g.:
 gmm-copy --binary=false 1.mdl 1_txt.mdl 
gmm-global-acc-stats
 Accumulate stats for training a diagonal-covariance GMM.
Usage:  gmm-global-acc-stats [options] <model-in> <feature-rspecifier> <stats-out>
e.g.: gmm-global-acc-stats 1.mdl scp:train.scp 1.acc 
gmm-global-est
 Estimate a diagonal-covariance GMM from the accumulated stats.
Usage:  gmm-global-est [options] <model-in> <stats-in> <model-out> 
gmm-global-sum-accs
 Sum multiple accumulated stats files for diagonal-covariance GMM training.
Usage: gmm-global-sum-accs [options] stats-out stats-in1 stats-in2 ... 
gmm-gselect
 Precompute Gaussian indices for pruning
 (e.g. in training UBMs, SGMMs, tied-mixture systems)
 For each frame, gives a list of the n best Gaussian indices,
 sorted from best to worst.
See also: gmm-global-get-post, fgmm-global-gselect-to-post,
copy-gselect, fgmm-gselect
Usage: gmm-gselect [options] <model-in> <feature-rspecifier> <gselect-wspecifier>
The --gselect option (which takes an rspecifier) limits selection to a subset
of indices:
e.g.: gmm-gselect "--gselect=ark:gunzip -c bigger.gselect.gz|" --n=20 1.gmm "ark:feature-command |" "ark,t:|gzip -c >gselect.1.gz" 
gmm-latgen-biglm-faster
 Generate lattices using GMM-based model.
User supplies LM used to generate decoding graph, and desired LM;
this decoder applies the difference during decoding
Usage: gmm-latgen-biglm-faster [options] model-in (fst-in|fsts-rspecifier) oldlm-fst-in newlm-fst-in features-rspecifier lattice-wspecifier [ words-wspecifier [alignments-wspecifier] ] 
gmm-ismooth-stats
 Apply I-smoothing to statistics, e.g. for discriminative training
Usage:  gmm-ismooth-stats [options] [--smooth-from-model] [<src-stats-in>|<src-model-in>] <dst-stats-in> <stats-out>
e.g.: gmm-ismooth-stats --tau=100 ml.acc num.acc smoothed.acc
or: gmm-ismooth-stats --tau=50 --smooth-from-model 1.mdl num.acc smoothed.acc
or: gmm-ismooth-stats --tau=100 num.acc num.acc smoothed.acc 
gmm-global-get-frame-likes
 Print out per-frame log-likelihoods for each utterance, as an archive
of vectors of floats.  If --average=true, prints out the average per-frame
log-likelihood for each utterance, as a single float.
Usage:  gmm-global-get-frame-likes [options] <model-in> <feature-rspecifier> <likes-out-wspecifier>
e.g.: gmm-global-get-frame-likes 1.mdl scp:train.scp ark:1.likes 
gmm-global-est-fmllr
 Estimate global fMLLR transforms, either per utterance or for the supplied
set of speakers (spk2utt option).  Reads features, and (with --weights option)
weights for each frame (also see --gselect option)
Usage: gmm-global-est-fmllr [options] <gmm-in> <feature-rspecifier> <transform-wspecifier> 
gmm-global-to-fgmm
 Convert single diagonal-covariance GMM to single full-covariance GMM.
Usage: gmm-global-to-fgmm [options] 1.gmm 1.fgmm 
gmm-global-acc-stats-twofeats
 Accumulate stats for training a diagonal-covariance GMM, two-feature version
First features are used to get posteriors, second to accumulate stats
Usage:  gmm-global-acc-stats-twofeats [options] <model-in> <feature1-rspecifier> <feature2-rspecifier> <stats-out>
e.g.: gmm-global-acc-stats-twofeats 1.mdl scp:train.scp scp:train2.scp 1.acc 
gmm-global-copy
 Copy a diagonal-covariance GMM
Usage:  gmm-global-copy [options] <model-in> <model-out>
e.g.: gmm-global-copy --binary=false 1.model - | less 
gmm-fmpe-acc-stats
 Accumulate stats for fMPE training, using GMM model.  Note: this could
be done using gmm-get-feat-deriv and fmpe-acc-stats (but you'd be computing
the features twice).  Features input should be pre-fMPE features.
Usage:  gmm-fmpe-acc-stats [options] <model-in> <fmpe-in> <feature-rspecifier> <gselect-rspecifier> <posteriors-rspecifier> <fmpe-stats-out>
e.g.: 
 gmm-fmpe-acc-stats --model-derivative 1.accs 1.mdl 1.fmpe "$feats" ark:1.gselect ark:1.post 1.fmpe_stats 
gmm-acc-stats2
 Accumulate stats for GMM training (from posteriors)
This version writes two accumulators (e.g. num and den),
and puts the positive accumulators in num, negative in den
Usage:  gmm-acc-stats2 [options] <model> <feature-rspecifier><posteriors-rspecifier> <num-stats-out> <den-stats-out>
e.g.:
gmm-acc-stats 1.mdl "$feats" ark:1.post 1.num_acc 1.den_acc 
gmm-init-model-flat
 Initialize GMM, with Gaussians initialized to mean and variance
of some provided example data (or to 0,1 if not provided: in that
case, provide --dim option)
Usage:  gmm-init-model-flat [options] <tree-in> <topo-file> <model-out> [<features-rspecifier>]
e.g.: 
  gmm-init-model-flat tree topo 1.mdl ark:feats.scp 
gmm-info
 Write to standard output various properties of GMM-based model
Usage:  gmm-info [options] <model-in>
e.g.:
 gmm-info 1.mdl
See also: gmm-global-info, am-info 
gmm-get-stats-deriv
 Get statistics derivative for GMM models
(used in fMPE/fMMI feature-space discriminative training)
Usage:  gmm-get-stats-deriv [options] <model-in> <num-stats-in> <den-stats-in> <ml-stats-in> <deriv-out>
e.g. (for fMMI/fBMMI): gmm-get-stats-deriv 1.mdl 1.acc 2.mdl 
gmm-est-rescale
 Do "re-scaling" re-estimation of GMM-based model
 (this update changes the model as features change, but preserves
  the difference between the model and the features, to keep
  the effect of any prior discriminative training).  Used in fMPE.
  Does not update the transitions or weights.
Usage: gmm-est-rescale [options] <model-in> <old-stats-in> <new-stats-in> <model-out>
e.g.: gmm-est-rescale 1.mdl old.acc new.acc 2.mdl 
gmm-boost-silence
 Modify GMM-based model to boost (by a certain factor) all
probabilities associated with the specified phones (could be
all silence phones, or just the ones used for optional silence).
Note: this is done by modifying the GMM weights.  If the silence
model shares a GMM with other models, then it will modify the GMM
weights for all models that may correspond to silence.
Usage:  gmm-boost-silence [options] <silence-phones-list> <model-in> <model-out>
e.g.: gmm-boost-silence --boost=1.5 1:2:3 1.mdl 1_boostsil.mdl 
gmm-basis-fmllr-accs
 Accumulate gradient scatter from training set, either per utterance or 
for the supplied set of speakers (spk2utt option). Reads posterior to accumulate 
fMLLR stats for each speaker/utterance. Writes gradient scatter matrix.
Usage: gmm-basis-fmllr-accs [options] <model-in> <feature-rspecifier><post-rspecifier> <accs-wspecifier> 
gmm-basis-fmllr-training
 Estimate fMLLR basis representation. Reads a set of gradient scatter
accumulations. Outputs basis matrices.
Usage: gmm-basis-fmllr-training [options] <model-in> <basis-wspecifier> <accs-in1> <accs-in2> ... 
gmm-est-basis-fmllr
 Perform basis fMLLR adaptation in testing stage, either per utterance or
for the supplied set of speakers (spk2utt option). Reads posterior to
accumulate fMLLR stats for each speaker/utterance. Writes to a table of
matrices.
Usage: gmm-est-basis-fmllr [options] <model-in> <basis-rspecifier> <feature-rspecifier> <post-rspecifier> <transform-wspecifier> 
gmm-est-map
 Do Maximum A Posteriori re-estimation of GMM-based acoustic model
Usage:  gmm-est-map [options] <model-in> <stats-in> <model-out>
e.g.: gmm-est-map 1.mdl 1.acc 2.mdl 
gmm-adapt-map
 Compute MAP estimates per-utterance (default) or per-speaker for
the supplied set of speakers (spk2utt option).  This will typically
be piped into gmm-latgen-map
Usage: gmm-adapt-map  [options] <model-in> <feature-rspecifier> <posteriors-rspecifier> <map-am-wspecifier> 
gmm-latgen-map
 Decode features using GMM-based model.  Note: the input
<gmms-rspecifier> will typically be piped in from gmm-est-map.
Note: <model-in> is only needed for the transition-model, which isn't
included in <gmms-rspecifier>.
Usage: gmm-latgen-map [options] <model-in> <gmms-rspecifier> <fsts-rxfilename|fsts-rspecifier> <features-rspecifier> <lattice-wspecifier> [ <words-wspecifier> [ <alignments-wspecifier> ] ] 
gmm-basis-fmllr-accs-gpost
 Accumulate gradient scatter from training set, either per utterance or 
for the supplied set of speakers (spk2utt option). Reads Gaussian-level 
posterior to accumulate fMLLR stats for each speaker/utterance. Writes 
gradient scatter matrix.
Usage: gmm-basis-fmllr-accs-gpost [options] <model-in> <feature-rspecifier><post-rspecifier> <accs-wspecifier> 
gmm-est-basis-fmllr-gpost
 Perform basis fMLLR adaptation in testing stage, either per utterance or
for the supplied set of speakers (spk2utt option). Reads Gaussian-level
posterior to accumulate fMLLR stats for each speaker/utterance. Writes
to a table of matrices.
Usage: gmm-est-basis-fmllr-gpost [options] <model-in> <basis-rspecifier> <feature-rspecifier> <post-rspecifier> <transform-wspecifier> 
gmm-latgen-faster-parallel
 Decode features using GMM-based model.  Uses multiple decoding threads,
but interface and behavior is otherwise the same as gmm-latgen-faster
Usage: gmm-latgen-faster-parallel [options] model-in (fst-in|fsts-rspecifier) features-rspecifier lattice-wspecifier [ words-wspecifier [alignments-wspecifier] ] 
gmm-est-fmllr-raw
 Estimate fMLLR transforms in the space before splicing and linear transforms
such as LDA+MLLT, but using models in the space transformed by these transforms
Requires the original spliced features, and the full LDA+MLLT (or similar) matrix
including the 'rejected' rows (see the program get-full-lda-mat)
Usage: gmm-est-fmllr-raw [options] <model-in> <full-lda-mat-in> <feature-rspecifier> <post-rspecifier> <transform-wspecifier> 
gmm-est-fmllr-raw-gpost
 Estimate fMLLR transforms in the space before splicing and linear transforms
such as LDA+MLLT, but using models in the space transformed by these transforms
Requires the original spliced features, and the full LDA+MLLT (or similar) matrix
including the 'rejected' rows (see the program get-full-lda-mat).  Reads in
Gaussian-level posteriors.
Usage: gmm-est-fmllr-raw-gpost [options] <model-in> <full-lda-mat-in> <feature-rspecifier> <gpost-rspecifier> <transform-wspecifier> 
gmm-global-init-from-feats
 This program initializes a single diagonal GMM and does multiple iterations of
training from features stored in memory.
Usage:  gmm-global-init-from-feats [options] <feature-rspecifier> <model-out>
e.g.: gmm-global-init-from-feats scp:train.scp 1.mdl 
gmm-global-info
 Write to standard output various properties of GMM model
This is for a single diagonal GMM, e.g. as used for a UBM.
Usage:  gmm-global-info [options] <gmm>
e.g.:
 gmm-global-info 1.dubm
See also: gmm-info, am-info 
gmm-latgen-faster-regtree-fmllr
 Generate lattices using GMM-based model and RegTree-FMLLR adaptation.
Usage: gmm-latgen-faster-regtree-fmllr [options] model-in regtree-in (fst-in|fsts-rspecifier) features-rspecifier transform-rspecifier lattice-wspecifier [ words-wspecifier [alignments-wspecifier] ] 
gmm-est-fmllr-global
 Estimate global fMLLR transforms, either per utterance or for the supplied
set of speakers (spk2utt option).  This version is for when you have a single
global GMM, e.g. a UBM.  Writes to a table of matrices.
Usage: gmm-est-fmllr-global [options] <gmm-in> <feature-rspecifier> <transform-wspecifier>
e.g.: gmm-est-fmllr-global 1.ubm scp:feats.scp ark:trans.1 
gmm-acc-mllt-global
 Accumulate MLLT (global STC) statistics: this version is for where there is
one global GMM (e.g. a UBM)
Usage:  gmm-acc-mllt-global [options] <gmm-in> <feature-rspecifier> <stats-out>
e.g.: 
 gmm-acc-mllt-global 1.dubm scp:feats.scp 1.macc 
gmm-transform-means-global
 Transform GMM means with linear or affine transform
This version for a single GMM, e.g. a UBM.
Useful when estimating MLLT/STC
Usage:  gmm-transform-means-global <transform-matrix> <gmm-in> <gmm-out>
e.g.: gmm-transform-means-global 2.mat 2.dubm 3.dubm 
gmm-global-get-post
 Precompute Gaussian indices and convert immediately to top-n
posteriors (useful in iVector extraction with diagonal UBMs)
See also: gmm-gselect, fgmm-gselect, fgmm-global-gselect-to-post
 (e.g. in training UBMs, SGMMs, tied-mixture systems)
 For each frame, gives a list of the n best Gaussian indices,
 sorted from best to worst.
Usage: gmm-global-get-post [options] <model-in> <feature-rspecifier> <post-wspecifier>
e.g.: gmm-global-get-post --n=20 1.gmm "ark:feature-command |" "ark,t:|gzip -c >post.1.gz" 
gmm-global-gselect-to-post
 Given features and Gaussian-selection (gselect) information for
a diagonal-covariance GMM, output per-frame posteriors for the selected
indices.  Also supports pruning the posteriors if they are below
a stated threshold, (and renormalizing the rest to sum to one)
See also: gmm-gselect, fgmm-gselect, gmm-global-get-post,
 fgmm-global-gselect-to-post
Usage:  gmm-global-gselect-to-post [options] <model-in> <feature-rspecifier> <gselect-rspecifier> <post-wspecifier>
e.g.: gmm-global-gselect-to-post 1.dubm ark:- 'ark:gunzip -c 1.gselect|' ark:- 
gmm-global-est-lvtln-trans
 Estimate linear-VTLN transforms, either per utterance or for the supplied set of speakers (spk2utt option); this version
is for a global diagonal GMM (also known as a UBM).  Reads posteriors
indicating Gaussian indexes in the UBM.
Usage: gmm-global-est-lvtln-trans [options] <gmm-in> <lvtln-in> <feature-rspecifier> <gpost-rspecifier> <lvtln-trans-wspecifier> [<warp-wspecifier>]
e.g.: gmm-global-est-lvtln-trans 0.ubm 0.lvtln '$feats' ark,s,cs:- ark:1.trans ark:1.warp
(where the <gpost-rspecifier> will likely come from gmm-global-get-post or
gmm-global-gselect-to-post 
gmm-init-biphone
 Initialize a biphone context-dependency tree with all the
leaves (i.e. a full tree). Intended for end-to-end tree-free models.
Usage:  gmm-init-biphone <topology-in> <dim> <model-out> <tree-out> 
e.g.: 
 gmm-init-biphone topo 39 bi.mdl bi.tree 
ivector-extractor-init
 Initialize ivector-extractor
Usage:  ivector-extractor-init [options] <fgmm-in> <ivector-extractor-out>
e.g.:
 ivector-extractor-init 4.fgmm 0.ie 
ivector-extractor-copy
 Copy the i-vector extractor to a text file
Usage:  ivector-extractor-copy [options] <ivector-extractor-in> <ivector-extractor-out>
e.g.:
 ivector-extractor-copy --binary=false 0.ie 0_txt.ie 
ivector-extractor-acc-stats
 Accumulate stats for iVector extractor training
Reads in features and Gaussian-level posteriors (typically from a full GMM)
Supports multiple threads, but won't be able to make use of too many at a time
(e.g. more than about 4)
Usage:  ivector-extractor-acc-stats [options] <model-in> <feature-rspecifier><posteriors-rspecifier> <stats-out>
e.g.: 
 fgmm-global-gselect-to-post 1.fgmm '$feats' 'ark:gunzip -c gselect.1.gz|' ark:- | \
  ivector-extractor-acc-stats 2.ie '$feats' ark,s,cs:- 2.1.acc 
ivector-extractor-sum-accs
 Sum accumulators for training of iVector extractor
Usage: ivector-extractor-sum-accs [options] <stats-in1> <stats-in2> ... <stats-inN> <stats-out> 
ivector-extractor-est
 Do model re-estimation of iVector extractor (this is
the update phase of a single pass of E-M)
Usage: ivector-extractor-est [options] <model-in> <stats-in> <model-out> 
ivector-extract
 Extract iVectors for utterances, using a trained iVector extractor,
and features and Gaussian-level posteriors
Usage:  ivector-extract [options] <model-in> <feature-rspecifier> <posteriors-rspecifier> <ivector-wspecifier>
e.g.: 
 fgmm-global-gselect-to-post 1.ubm '$feats' 'ark:gunzip -c gselect.1.gz|' ark:- | \
  ivector-extract final.ie '$feats' ark,s,cs:- ark,t:ivectors.1.ark 
compute-vad
 This program reads input features and writes out, for each utterance,
a vector of floats that are 1.0 if we judge the frame voiced and 0.0
otherwise.  The algorithm is very simple and is based on thresholding
the log mel energy (and taking the consensus of threshold decisions
within a window centered on the current frame).  See the options for
more details, and egs/sid/s1/run.sh for examples; this program is
intended for use in speaker-ID.
Usage: compute-vad [options] <feats-rspecifier> <vad-wspecifier>
e.g.: compute-vad scp:feats.scp ark:vad.ark 
select-voiced-frames
 Select a subset of frames of the input files, based on the output of
compute-vad or a similar program (a vector of length num-frames,
containing 1.0 for voiced, 0.0 for unvoiced).  Caution: this is
mostly useful only in speaker identification applications.
Usage: select-voiced-frames [options] <feats-rspecifier>  <vad-rspecifier> <feats-wspecifier>
E.g.: select-voiced-frames [options] scp:feats.scp scp:vad.scp ark:- 
compute-vad-from-frame-likes
 This program computes frame-level voice activity decisions from a
set of input frame-level log-likelihoods.  Usually, these
log-likelihoods are the output of fgmm-global-get-frame-likes.
Frames are assigned labels according to the class for which the
log-likelihood (optionally weighted by a prior) is maximal.  The
class labels are determined by the order of inputs on the command
line.  See options for more details.
Usage: compute-vad-from-frame-likes [options] <likes-rspecifier-1>
    ... <likes-rspecifier-n> <vad-wspecifier>
e.g.: compute-vad-from-frame-likes --map=label_map.txt
    scp:likes1.scp scp:likes2.scp ark:vad.ark
See also: fgmm-global-get-frame-likes, compute-vad, merge-vads 
merge-vads
 This program merges two archives of per-frame weights representing
voice activity decisions.  By default, the program assumes that the
input vectors consist of floats that are 0.0 if a frame is judged
as nonspeech and 1.0 if it is considered speech.  The default
behavior produces a frame-level decision of 1.0 if both input frames
are 1.0, and 0.0 otherwise.  Additional classes (e.g., 2.0 for music)
can be handled using the "map" option.
Usage: merge-vads [options] <vad-rspecifier-1> <vad-rspecifier-2>
    <vad-wspecifier>
e.g.: merge-vads [options] scp:vad_energy.scp scp:vad_gmm.scp
    ark:vad.ark
See also: compute-vad-from-frame-likes, compute-vad, ali-to-post,
post-to-weights 
ivector-normalize-length
 Normalize length of iVectors to equal sqrt(feature-dimension)
Usage:  ivector-normalize-length [options] <ivector-rspecifier> <ivector-wspecifier>
e.g.: 
 ivector-normalize-length ark:ivectors.ark ark:normalized_ivectors.ark 
ivector-transform
 Multiplies iVectors (on the left) by a supplied transformation matrix
Usage:  ivector-transform [options] <matrix-in> <ivector-rspecifier><ivector-wspecifier>
e.g.: 
 ivector-transform transform.mat ark:ivectors.ark ark:transformed_ivectors.ark 
ivector-compute-dot-products
 Computes dot-products between iVectors; useful in application of an
iVector-based system.  The 'trials-file' has lines of the form
<key1> <key2>
and the output will have the form
<key1> <key2> [<dot-product>]
(if either key could not be found, the dot-product field in the output
will be absent, and this program will print a warning)
Usage:  ivector-compute-dot-products [options] <trials-in> <ivector1-rspecifier> <ivector2-rspecifier> <scores-out>
e.g.: 
 ivector-compute-dot-products trials ark:train_ivectors.scp ark:test_ivectors.scp trials.scored
See also: ivector-plda-scoring 
ivector-mean
 With 3 or 4 arguments, averages iVectors over all the
utterances of each speaker using the spk2utt file.
Input the spk2utt file and a set of iVectors indexed by
utterance; output is iVectors indexed by speaker.  If 4
arguments are given, extra argument is a table for the number
of utterances per speaker (can be useful for PLDA).  If 2
arguments are given, computes the mean of all input files and
writes out the mean vector.
Usage: ivector-mean <spk2utt-rspecifier> <ivector-rspecifier> <ivector-wspecifier> [<num-utt-wspecifier>]
or: ivector-mean <ivector-rspecifier> <mean-wxfilename>
e.g.: ivector-mean data/spk2utt exp/ivectors.ark exp/spk_ivectors.ark exp/spk_num_utts.ark
or: ivector-mean exp/ivectors.ark exp/mean.vec
See also: ivector-subtract-global-mean 
ivector-compute-lda
 Compute an LDA matrix for iVector system.  Reads in iVectors per utterance,
and an utt2spk file which it uses to help work out the within-speaker and
between-speaker covariance matrices.  Outputs an LDA projection to a
specified dimension.  By default it will normalize so that the projected
within-class covariance is unit, but if you set --normalize-total-covariance
to true, it will normalize the total covariance.
Note: the transform we produce is actually an affine transform which will
also set the global mean to zero.
Usage:  ivector-compute-lda [options] <ivector-rspecifier> <utt2spk-rspecifier> <lda-matrix-out>
e.g.: 
 ivector-compute-lda ark:ivectors.ark ark:utt2spk lda.mat 
ivector-compute-plda
 Computes a Plda object (for Probabilistic Linear Discriminant Analysis)
from a set of iVectors.  Uses speaker information from a spk2utt file
to compute within and between class variances.
Usage:  ivector-compute-plda [options] <spk2utt-rspecifier> <ivector-rspecifier> <plda-out>
e.g.: 
 ivector-compute-plda ark:spk2utt ark,s,cs:ivectors.ark plda 
ivector-copy-plda
 Copy a PLDA object, possibly applying smoothing to the within-class
covariance
Usage: ivector-copy-plda <plda-in> <plda-out>
e.g.: ivector-copy-plda --smoothing=0.1 plda plda.smooth0.1 
compute-eer
 Computes Equal Error Rate
Input is a series of lines, each with two fields.
The first field must be a numeric score, and the second
either the string 'target' or 'nontarget'. 
The EER will be printed to the standard output.
Usage: compute-eer <scores-in>
e.g.: compute-eer - 
ivector-subtract-global-mean
 Copies a table of iVectors but subtracts the global mean as
it does so.  The mean may be specified as the first argument; if not,
the sum of the input iVectors is used.
Usage: ivector-subtract-global-mean <ivector-rspecifier> <ivector-wspecifier>
or: ivector-subtract-global-mean <mean-rxfliename> <ivector-rspecifier> <ivector-wspecifier>
e.g.: ivector-subtract-global-mean scp:ivectors.scp ark:-
or: ivector-subtract-global-mean mean.vec scp:ivectors.scp ark:-
See also: ivector-mean 
ivector-plda-scoring
 Computes log-likelihood ratios for trials using PLDA model
Note: the 'trials-file' has lines of the form
<key1> <key2>
and the output will have the form
<key1> <key2> [<dot-product>]
(if either key could not be found, the dot-product field in the output
will be absent, and this program will print a warning)
For training examples, the input is the iVectors averaged over speakers;
a separate archive containing the number of utterances per speaker may be
optionally supplied using the --num-utts option; this affects the PLDA
scoring (if not supplied, it defaults to 1 per speaker).
Usage: ivector-plda-scoring <plda> <train-ivector-rspecifier> <test-ivector-rspecifier>
 <trials-rxfilename> <scores-wxfilename>
e.g.: ivector-plda-scoring --num-utts=ark:exp/train/num_utts.ark plda ark:exp/train/spk_ivectors.ark ark:exp/test/ivectors.ark trials scores
See also: ivector-compute-dot-products, ivector-compute-plda 
logistic-regression-train
 Trains a model using Logistic Regression with L-BFGS from
a set of vectors. The class labels in <classes-rspecifier>
must be a set of integers such that there are no gaps in 
its range and the smallest label must be 0.
Usage: logistic-regression-train <vector-rspecifier>
<classes-rspecifier> <model-out> 
logistic-regression-eval
 Evaluates a model on input vectors and outputs either
log posterior probabilities or scores.
Usage1: logistic-regression-eval <model> <input-vectors-rspecifier>
                                <output-log-posteriors-wspecifier>
Usage2: logistic-regression-eval <model> <trials-file> <input-vectors-rspecifier>
                                <output-scores-file> 
logistic-regression-copy
 Copy a logistic-regression model, possibly changing the binary mode;
also supports the --scale-priors option which can scale the prior probabilities
the model assigns to different classes (e.g., you can remove the effect of
unbalanced training data by scaling by the inverse of the class priors in the
training data)
Usage: logistic-regression-copy [options] <model-in> <model-out>
e.g.: echo '[ 2.6 1.7 3.9 1.24 7.5 ]' | logistic-regression-copy --scale-priors=- \
  1.model scaled_priors.mdl 
ivector-extract-online
 Extract iVectors for utterances, using a trained iVector extractor,
and features and Gaussian-level posteriors.  This version extracts an
iVector every n frames (see the --ivector-period option), by including
all frames up to that point in the utterance.  This is designed to
correspond with what will happen in a streaming decoding scenario;
the iVectors would be used in neural net training.  The iVectors are
output as an archive of matrices, indexed by utterance-id; each row
corresponds to an iVector.
See also ivector-extract-online2
Usage:  ivector-extract-online [options] <model-in> <feature-rspecifier><posteriors-rspecifier> <ivector-wspecifier>
e.g.: 
 gmm-global-get-post 1.dubm '$feats' ark:- | \
  ivector-extract-online --ivector-period=10 final.ie '$feats' ark,s,cs:- ark,t:ivectors.1.ark 
ivector-adapt-plda
 Adapt a PLDA object using unsupervised adaptation-data iVectors from a different
domain to the training data.
Usage: ivector-adapt-plda [options] <plda-in> <ivectors-rspecifier> <plda-out>
e.g.: ivector-adapt-plda plda ark:ivectors.ark plda.adapted 
ivector-plda-scoring-dense
 Perform PLDA scoring for speaker diarization.  The input reco2utt
should be of the form <recording-id> <seg1> <seg2> ... <segN> and
there should be one iVector for each segment.  PLDA scoring is
performed between all pairs of iVectors in a recording and outputs
an archive of score matrices, one for each recording-id.  The rows
and columns of the the matrix correspond the sorted order of the
segments.
Usage: ivector-plda-scoring-dense [options] <plda> <reco2utt> <ivectors-rspecifier> <scores-wspecifier>
e.g.: 
  ivector-plda-scoring-dense plda reco2utt scp:ivectors.scp ark:scores.ark ark,t:ivectors.1.ark 
agglomerative-cluster
 Cluster utterances by similarity score, used in diarization.
Takes a table of score matrices indexed by recording, with the
rows/columns corresponding to the utterances of that recording in
sorted order and a reco2utt file that contains the mapping from
recordings to utterances, and outputs a list of labels in the form
<utt> <label>.  Clustering is done using agglomerative hierarchical
clustering with a score threshold as stop criterion.  By default, the
program reads in similarity scores, but with --read-costs=true
the scores are interpreted as costs (i.e. a smaller value indicates
utterance similarity).
Usage: agglomerative-cluster [options] <scores-rspecifier> <reco2utt-rspecifier> <labels-wspecifier>
e.g.: 
 agglomerative-cluster ark:scores.ark ark:reco2utt 
   ark,t:labels.txt 
lattice-to-kws-index
 Create an inverted index of the given lattices. The output index is 
in the T*T*T semiring. For details for the semiring, please refer to
Dogan Can and Murat Saraclar's paper named "Lattice Indexing for Spoken Term Detection"
Usage: lattice-to-kws-index [options]   <utter-symtab-rspecifier> <lattice-rspecifier> <index-wspecifier>
e.g.: 
 lattice-to-kws-index ark:utter.symtab ark:1.lats ark:global.idx 
kws-index-union
 Take a union of the indexed lattices. The input index is in  the T*T*T semiring and
the output index is also in the T*T*T semiring. At the end of this program, encoded
epsilon removal, determinization and minimization will be applied.
Usage: kws-index-union [options]  index-rspecifier index-wspecifier
 e.g.: kws-index-union ark:input.idx ark:global.idx 
transcripts-to-fsts
 Build a linear acceptor for each transcription in the archive. Read in the transcriptions in archive format and write out the linear acceptors in archive format with the same key. The costs of the arcs are set to be zero. The cost of the acceptor can be changed
by supplying the costs archive. In that case, the first arc's cost
will be set to the value obtained from the archive, i.e. the total
cost will be equal to cost. The cost archive can be sparse, i.e.
does not have to include zero-cost transcriptions. It is prefered
for the archive to be sorted (for efficiency).
Usage: 
 transcripts-to-fsts [options]  <transcriptions-rspecifier> [<costs-rspecifier>] <fsts-wspecifier>
e.g.: 
 transcripts-to-fsts ark:train.tra ark,s,cs,t:costs.txt  ark:train.fsts 
kws-search
 Search the keywords over the index. This program can be executed
in parallel, either on the index side or the keywords side; we use
a script to combine the final search results. Note that the index
archive has a single key "global".
Search has one or two outputs. The first one is mandatory and will
contain the seach output, i.e. list of all found keyword instances
The file is in the following format:
kw_id utt_id beg_frame end_frame neg_logprob
 e.g.: 
KW105-0198 7 335 376 1.91254
The second parameter is optional and allows the user to gather more
statistics about the individual instances from the posting list.
Remember "keyword" is an FST and as such, there can be multiple
paths matching in the keyword and in the lattice index in that given
time period. The stats output will provide all matching paths
each with the appropriate score. 
The format is as follows:
kw_id utt_id beg_frame end_frame neg_logprob 0 w_id1 w_id2 ... 0
 e.g.: 
KW105-0198 7 335 376 16.01254 0 5766 5659 0
Usage: kws-search [options] <index-rspecifier> <keywords-rspecifier> <results-wspecifier> [<stats_wspecifier>]
 e.g.: kws-search ark:index.idx ark:keywords.fsts ark:results ark:stats 
generate-proxy-keywords
 Convert the keywords into in-vocabulary words using the given phone
level edit distance fst (E.fst). The large lexicon (L2.fst) and
inverted small lexicon (L1'.fst) are also expected to be present. We
actually use the composed FST L2xE.fst to be more efficient. Ideally
we should have used L2xExL1'.fst but this is quite computationally
expensive at command level. Keywords.int is in the transcription
format. If kwlist-wspecifier is given, the program also prints out
the proxy fst in a format where each line is "kwid weight proxy".
Usage: generate-proxy-keywords [options] <L2xE.fst> <L1'.fst> \
    <keyword-rspecifier> <proxy-wspecifier> [kwlist-wspecifier] 
 e.g.: generate-proxy-keywords L2xE.fst L1'.fst ark:keywords.int \
                           ark:proxy.fsts [ark,t:proxy.kwlist.txt] 
compute-atwv
 Computes the Actual Term-Weighted Value and prints it.
Usage: 
 compute-atwv [options] <nof-trials> <ref-rspecifier> <hyp-rspecifier> [alignment-csv-filename]
e.g.: 
 compute-atwv 32485.4 ark:ref.1 ark:hyp.1 ali.csv
or: 
 compute-atwv 32485.4 ark:ref.1 ark:hyp.1
NOTES: 
  a) the number of trials is usually equal to the size of the searched
     collection in seconds
  b  the ref-rspecifier/hyp-rspecifier are the kaldi IO specifiers 
     for both the reference and the hypotheses (found hits),      respectively The format is the same for both of them. Each line
     is of the following format
     <KW-ID> <utterance-id> <start-frame> <end-frame> <score>
     e.g.:
     KW106-189 348 459 560 0.8
  b) the alignment-csv-filename is an optional parameter. 
     If present, the alignment i.e. detailed information about what 
     hypotheses match up with which reference entries will be 
     generated. The alignemnt file format is equivalent to 
     the alignment file produced using the F4DE tool. However, we do     not set some fields and the utterance identifiers are numeric.
     You can use the script utils/int2sym.pl and the utterance and 
     keyword maps to convert the numerical ids into text form
  c) the scores are expected to be probabilities. Please note that
     the output from the kws-search is in -log(probability).
  d) compute-atwv does not perform any score normalization (it's just
     for scoring purposes). Without score normalization/calibration
     the performance of the search will be quite poor. 
print-proxy-keywords
 Reads in the proxy keywords FSTs and print them to a file where each
line is "kwid w1 w2 .. 2n"
Usage: 
 print-proxy-keywords [options] <proxy-rspecifier>  <kwlist-wspecifier> [<cost-wspecifier>]]
e.g.:
 print-proxy-keywords ark:proxy.fsts ark,t:kwlist.txt ark,t:costs.txt 
lattice-best-path
 Generate 1-best path through lattices; output as transcriptions and alignments
Note: if you want output as FSTs, use lattice-1best; if you want output
with acoustic and LM scores, use lattice-1best | nbest-to-linear
Usage: lattice-best-path [options]  <lattice-rspecifier> [ <transcriptions-wspecifier> [ <alignments-wspecifier>] ]
 e.g.: lattice-best-path --acoustic-scale=0.1 ark:1.lats 'ark,t:|int2sym.pl -f 2- words.txt > text' ark:1.ali 
lattice-prune
 Apply beam pruning to lattices
Usage: lattice-prune [options] lattice-rspecifier lattice-wspecifier
 e.g.: lattice-prune --acoustic-scale=0.1 --beam=4.0 ark:1.lats ark:pruned.lats 
lattice-equivalent
 Test whether sets of lattices are equivalent (return with status 0 if
all were equivalent, 1 otherwise, -1 on error)
Usage: lattice-equivalent [options] lattice-rspecifier1 lattice-rspecifier2
 e.g.: lattice-equivalent ark:1.lats ark:2.lats 
lattice-to-nbest
 Work out N-best paths in lattices and write out as FSTs
Note: only guarantees distinct word sequences if distinct paths in
input lattices had distinct word-sequences (this will not be true if
you produced lattices with --determinize-lattice=false, i.e. state-level
lattices).
Usage: lattice-to-nbest [options] <lattice-rspecifier> <lattice-wspecifier>
 e.g.: lattice-to-nbest --acoustic-scale=0.1 --n=10 ark:1.lats ark:nbest.lats 
lattice-lmrescore
 Add lm_scale * [cost of best path through LM FST] to graph-cost of
paths through lattice.  Does this by composing with LM FST, then
lattice-determinizing (it has to negate weights first if lm_scale<0)
Usage: lattice-lmrescore [options] <lattice-rspecifier> <lm-fst-in> <lattice-wspecifier>
 e.g.: lattice-lmrescore --lm-scale=-1.0 ark:in.lats 'fstproject --project_output=true data/lang/G.fst|' ark:out.lats 
lattice-scale
 Apply scaling to lattice weights
Usage: lattice-scale [options] lattice-rspecifier lattice-wspecifier
 e.g.: lattice-scale --lm-scale=0.0 ark:1.lats ark:scaled.lats 
lattice-union
 Takes two archives of lattices (indexed by utterances) and computes the union of the individual lattice pairs (one from each archive).
Usage: lattice-union [options] lattice-rspecifier1 lattice-rspecifier2 lattice-wspecifier
 e.g.: lattice-union ark:den.lats ark:num.lats ark:union.lats 
lattice-to-post
 Do forward-backward and collect posteriors over lattices.
Usage: lattice-to-post [options] lats-rspecifier posts-wspecifier [loglikes-wspecifier]
 e.g.: lattice-to-post --acoustic-scale=0.1 ark:1.lats ark:1.post
See also: lattice-to-ctm-conf, post-to-pdf-post, lattice-arc-post 
lattice-determinize
 This program is deprecated, please used lattice-determinize-pruned.
lattice-determinize lattices (and apply a pruning beam)
 (see http://kaldi-asr.org/doc/lattices.html for more explanation)
 note: this program is tyically only useful if you generated state-level
 lattices, e.g. called gmm-latgen-simple with --determinize=false
Usage: lattice-determinize [options] lattice-rspecifier lattice-wspecifier
 e.g.: lattice-determinize --acoustic-scale=0.1 --beam=15.0 ark:1.lats ark:det.lats 
lattice-oracle
 Finds the path having the smallest edit-distance between a lattice
and a reference string.
Usage: lattice-oracle [options] <test-lattice-rspecifier> \
                                <reference-rspecifier> \
                                <transcriptions-wspecifier> \
                                [<edit-distance-wspecifier>]
 e.g.: lattice-oracle ark:lat.1 'ark:sym2int.pl -f 2- \
                       data/lang/words.txt <data/test/text|' ark,t:-
Note the --write-lattices option by which you can write out the
optimal path as a lattice.
Note: you can use this program to compute the n-best oracle WER by
first piping the input lattices through lattice-to-nbest and then
nbest-to-lattice. 
lattice-rmali
 Remove state-sequences from lattice weights
Usage: lattice-rmali [options] lattice-rspecifier lattice-wspecifier
 e.g.: lattice-rmali  ark:1.lats ark:proj.lats 
lattice-compose
 Composes lattices (in transducer form, as type Lattice).  Depending
on the command-line arguments, either composes lattices with lattices,
or lattices with FSTs (rspecifiers are assumed to be lattices, and
rxfilenames are assumed to be FSTs, which have their weights interpreted
as "graph weights" when converted into the Lattice format.
Usage: lattice-compose [options] lattice-rspecifier1 (lattice-rspecifier2|fst-rxfilename2) lattice-wspecifier
 e.g.: lattice-compose ark:1.lats ark:2.lats ark:composed.lats
 or: lattice-compose ark:1.lats G.fst ark:composed.lats 
lattice-boost-ali
 Boost graph likelihoods (decrease graph costs) by b * #frame-phone-errors
on each arc in the lattice.  Useful for discriminative training, e.g.
boosted MMI.  Modifies input lattices.  This version takes the reference
in the form of alignments.  Needs the model (just the transitions) to
transform pdf-ids to phones.  Takes the --silence-phones option and these
phones appearing in the lattice are always assigned zero error, or with the
--max-silence-error option, at most this error-count per frame
(--max-silence-error=1 is equivalent to not specifying --silence-phones).
Usage: lattice-boost-ali [options] model lats-rspecifier ali-rspecifier lats-wspecifier
 e.g.: lattice-boost-ali --silence-phones=1:2:3 --b=0.05 1.mdl ark:1.lats ark:1.ali ark:boosted.lats 
lattice-copy
 Copy lattices (e.g. useful for changing to text mode or changing
format to standard from compact lattice.)
The --include and --exclude options can be used to copy only a subset of lattices, where are the --include option specifies the whitelisted utterances that would be copied and --exclude option specifies the blacklisted utterances that would not be copied.
Only one of --include and --exclude can be supplied.
Usage: lattice-copy [options] lattice-rspecifier lattice-wspecifier
 e.g.: lattice-copy --write-compact=false ark:1.lats ark,t:text.lats
See also: lattice-scale, lattice-to-fst, and
   the script egs/wsj/s5/utils/convert_slf.pl 
lattice-to-fst
 Turn lattices into normal FSTs, retaining only the word labels
By default, removes all weights and also epsilons (configure with
with --acoustic-scale, --lm-scale and --rm-eps)
Usage: lattice-to-fst [options] lattice-rspecifier fsts-wspecifier
 e.g.: lattice-to-fst  ark:1.lats ark:1.fsts 
lattice-to-phone-lattice
 Convert the words or transition-ids into phones, which are worked out
from the transition-ids.  If --replace-words=true (true by default),
replaces the words with phones, otherwise replaces the transition-ids.
If --replace-words=false, it will preserve the alignment of transition-ids/phones
to words, so that if you do 
lattice-align-words | lattice-to-phone-lattice --replace-words=false,
you can get the phones corresponding to each word in the lattice.
Usage: lattice-to-phone-lattice [options] model lattice-rspecifier lattice-wspecifier
 e.g.: lattice-to-phone-lattice 1.mdl ark:1.lats ark:phones.lats
See also: lattice-align-words, lattice-align-phones 
lattice-interp
 Takes two archives of lattices (indexed by utterances) and combines
the individual lattice pairs (one from each archive).  Keeps the alignments
from the first lattice.  Equivalent to
projecting the second archive on words (lattice-project), then composing
the pairs of lattices (lattice-compose), then scaling graph and acoustic
costs by 0.5 (lattice-scale).  You can control the individual scales with
--alpha, which is the scale of the first lattices (the second is 1-alpha).
Usage: lattice-interp [options] lattice-rspecifier-a lattice-rspecifier-b lattice-wspecifier
 e.g.: lattice-compose ark:1.lats ark:2.lats ark:composed.lats 
lattice-project
 Project lattices (in their transducer form); by default makes them
word->word transducers (set --project-output=false for tid->tid).
Usage: lattice-project [options] lattice-rspecifier lattice-wspecifier
 e.g.: lattice-project ark:1.lats ark:word2word.lats
or: lattice-project --project-output=false ark:1.lats ark:tid2tid.lats 
lattice-add-trans-probs
 Add transition probabilities into graph part of lattice scores,
controlled by options --transition-scale and --self-loop-scale, which
for compatibility with the original graph, would normally be set to the same
values used in graph compilatoin
Usage: lattice-add-trans-probs [options] model lattice-rspecifier lattice-wspecifier
 e.g.: lattice-add-trans-probs --transition-scale=1.0 --self-loop-scale=0.1 1.mdl ark:in.lats ark:out.lats 
lattice-difference
 Compute FST difference on lattices (remove sequences in first lattice
 that appear in second lattice)
Useful for the denominator lattice for MCE.
Usage: lattice-difference [options] lattice1-rspecifier lattice2-rspecifier lattice-wspecifier
 e.g.: lattice-difference ark:den.lats ark:num.lats ark:den_mce.lats 
nbest-to-linear
 Takes as input lattices/n-bests which must be linear (single path);
convert from lattice to up to 4 archives containing transcriptions, alignments,
and acoustic and LM costs (note: use ark:/dev/null for unwanted outputs)
Usage: nbest-to-linear [options] <nbest-rspecifier> <alignments-wspecifier> [<transcriptions-wspecifier> [<lm-cost-wspecifier> [<ac-cost-wspecifier>]]]
 e.g.: lattice-to-nbest --n=10 ark:1.lats ark:- | \
   nbest-to-linear ark:1.lats ark,t:1.ali 'ark,t:|int2sym.pl -f 2- words.txt > text' 
nbest-to-lattice
 Read in a Table containing N-best entries from a lattices (i.e. individual
lattices with a linear structure, one for each N-best entry, indexed by
utt_id_a-1, utt_id_a-2, etc., and take the union of them for each utterance
id (e.g. utt_id_a), outputting a lattice for each.
Usage:  nbest-to-lattice <nbest-rspecifier> <lattices-wspecifier>
 e.g.: nbest-to-lattice ark:1.nbest ark:1.lats 
latbin/lattice-1best.cc "lattice-1best"
 Compute best path through lattices and write out as FSTs
Note: differs from lattice-nbest with --n=1 because we won't
append -1 to the utterance-ids.  Differs from lattice-best-path
because output is FST.
Usage: lattice-1best [options] <lattice-rspecifier> <lattice-wspecifier>
 e.g.: lattice-1best --acoustic-scale=0.1 ark:1.lats ark:1best.lats 
linear-to-nbest
 This does the opposite of nbest-to-linear.  It takes 4 archives,
containing alignments, word-sequences, and acoustic and LM costs,
and turns it into a single archive containing FSTs with a linear
structure.  The program is called linear-to-nbest because very often
the archives concerned will represent N-best lists
Usage:  linear-to-nbest [options] <alignments-rspecifier> <transcriptions-rspecifier> (<lm-cost-rspecifier>|'') (<ac-cost-rspecifier>|'') <nbest-wspecifier>
Note: if the rspecifiers for lm-cost or ac-cost are the empty string,
these value will default to zero.
 e.g.: linear-to-nbest ark:1.ali 'ark:sym2int.pl -f 2- words.txt text|' ark:1.lmscore ark:1.acscore ark:1.nbest 
lattice-mbr-decode
 Do Minimum Bayes Risk decoding (decoding that aims to minimize the 
expected word error rate).  Possible outputs include the 1-best path
(i.e. the word-sequence, as a sequence of ints per utterance), the
computed Bayes Risk for each utterance, and the sausage stats as
(for each utterance) std::vector<std::vector<std::pair<int32, float> > >
for which we use the same I/O routines as for posteriors (type Posterior).
times-wspecifier writes pairs of (start-time, end-time) in frames, for
each sausage position, or for each one-best entry if --one-best-times=true.
Note: use ark:/dev/null or the empty string for unwanted outputs.
Note: times will only be very meaningful if you first use lattice-word-align.
If you need ctm-format output, don't use this program but use lattice-to-ctm-conf
with --decode-mbr=true.
Usage: lattice-mbr-decode [options]  lattice-rspecifier transcriptions-wspecifier [ bayes-risk-wspecifier [ sausage-stats-wspecifier [ times-wspecifier] ] ] 
 e.g.: lattice-mbr-decode --acoustic-scale=0.1 ark:1.lats 'ark,t:|int2sym.pl -f 2- words.txt > text' ark:/dev/null ark:1.sau 
lattice-align-words
 Convert lattices so that the arcs in the CompactLattice format correspond with
words (i.e. aligned with word boundaries).  Note: it will generally be more
efficient if you apply 'lattice-push' before this program.
Usage: lattice-align-words [options] <word-boundary-file> <model> <lattice-rspecifier> <lattice-wspecifier>
 e.g.: lattice-align-words  --silence-label=4320 --partial-word-label=4324 \
   data/lang/phones/word_boundary.int final.mdl ark:1.lats ark:aligned.lats
Note: word-boundary file has format (on each line):
<integer-phone-id> [begin|end|singleton|internal|nonword]
See also: lattice-align-words-lexicon, for use in cases where phones
don't have word-position information. 
lattice-to-mpe-post
 Do forward-backward and collect frame level MPE posteriors over
lattices, which can be fed into gmm-acc-stats2 to do MPE traning.
Caution: this is not really MPE, this is MPFE (minimum phone frame
error).  The posteriors may be positive or negative.
Usage: lattice-to-mpe-post [options] <model> <num-posteriors-rspecifier>
 <lats-rspecifier> <posteriors-wspecifier> 
e.g.: lattice-to-mpe-post --acoustic-scale=0.1 1.mdl ark:num.post
 ark:1.lats ark:1.post 
lattice-copy-backoff
 Copy a table of lattices (1st argument), but for any keys that appear
in the table from the 2nd argument, use the one from the 2nd argument.
If the sets of keys are identical, this is equivalent to copying the 2nd
table.  Note: the arguments are in this order due to the convention that
sequential access is always over the 1st argument.
Usage: lattice-copy-backoff [options] <lat-rspecifier1> <lat-rspecifier2> <lat-wspecifier>
 e.g.: lattice-copy-backoff ark:bad_but_complete.lat ark:good_but_incomplete.lat ark:out.lat 
nbest-to-ctm
 Takes as input lattices which must be linear (single path),
and must be in CompactLattice form where the transition-ids on the arcs
have been aligned with the word boundaries... typically the input will
be a lattice that has been piped through lattice-1best and then
lattice-align-words. On the other hand, whenever we directly pipe
the output of lattice-align-words-lexicon into nbest-to-ctm,
we need to put the command `lattice-1best ark:- ark:-` between them,
because even for linear lattices, lattice-align-words-lexicon can
in certain cases produce non-linear outputs (due to disambiguity
in the lexicon). It outputs ctm format (with integers in place of words),
assuming the frame length is 0.01 seconds by default (change this with the
--frame-length option).  Note: the output is in the form
<utterance-id> 1 <begin-time> <end-time> <word-id>
and you can post-process this to account for segmentation issues and to 
convert ints to words; note, the times are relative to start of the utterance.
Usage: nbest-to-ctm [options] <aligned-linear-lattice-rspecifier> <ctm-wxfilename>
e.g.: lattice-1best --acoustic-weight=0.08333 ark:1.lats | \
      lattice-align-words data/lang/phones/word_boundary.int exp/dir/final.mdl ark:- ark:- | \
      nbest-to-ctm ark:- 1.ctm
e.g.: lattice-align-words-lexicon data/lang/phones/align_lexicon.int exp/dir/final.mdl ark:1.lats ark:- | \
      lattice-1best ark:- ark:- | \
      nbest-to-ctm ark:- 1.ctm 
lattice-determinize-pruned
 Determinize lattices, keeping only the best path (sequence of acoustic states)
for each input-symbol sequence.  This version does pruning as part of the
determinization algorithm, which is more efficient and prevents blowup.
See http://kaldi-asr.org/doc/lattices.html for more information on lattices.
Usage: lattice-determinize-pruned [options] lattice-rspecifier lattice-wspecifier
 e.g.: lattice-determinize-pruned --acoustic-scale=0.1 --beam=6.0 ark:in.lats ark:det.lats 
lattice-to-ctm-conf
 This tool turns a lattice into a ctm with confidences, based on the
posterior probabilities in the lattice.  The word sequence in the
ctm is determined as follows.  Firstly we determine the initial word
sequence.  In the 3-argument form, we read it from the
<1best-rspecifier> input; otherwise it is the 1-best of the lattice.
Then, if --decode-mbr=true, we iteratively refine the hypothesis
using Minimum Bayes Risk decoding. (Note that the default value of decode_mbr
is true. If you provide <1best-rspecifier> from MAP decoding, the output ctm
from MBR decoding may be mismatched with the provided 1best hypothesis (the
starting point of optimization). If you don't need confidences,
you can do lattice-1best and pipe to nbest-to-ctm. The ctm this
program produces will be relative to the utterance-id; a standard
ctm relative to the filename can be obtained using
utils/convert_ctm.pl.  The times produced by this program will only
be meaningful if you do lattice-align-words on the input.  The
<1-best-rspecifier> could be the output of utils/int2sym.pl or
nbest-to-linear.
Usage: lattice-to-ctm-conf [options]  <lattice-rspecifier> \
                                          <ctm-wxfilename>
Usage: lattice-to-ctm-conf [options]  <lattice-rspecifier> \
                     [<1best-rspecifier> [<times-rspecifier]] <ctm-wxfilename>
 e.g.: lattice-to-ctm-conf --acoustic-scale=0.1 ark:1.lats 1.ctm
   or: lattice-to-ctm-conf --acoustic-scale=0.1 --decode-mbr=false\
                                      ark:1.lats ark:1.1best 1.ctm
See also: lattice-mbr-decode, nbest-to-ctm, lattice-arc-post,
 steps/get_ctm.sh, steps/get_train_ctm.sh and utils/convert_ctm.pl. 
lattice-combine
 Combine lattices generated by different systems by removing the total
cost of all paths (backward cost) from individual lattices and doing
a union of the reweighted lattices.  Note: the acoustic and LM scales
that this program applies are not removed before outputting the lattices.
Intended for use in system combination prior to MBR decoding, see comments
in code.
Usage: lattice-combine [options] <lattice-rspecifier1> <lattice-rspecifier2> [<lattice-rspecifier3> ... ] <lattice-wspecifier>
E.g.: lattice-combine 'ark:gunzip -c foo/lat.1.gz|' 'ark:gunzip -c bar/lat.1.gz|' ark:- | ... 
lattice-rescore-mapped
 Replace the acoustic scores on a lattice using log-likelihoods read in
as a matrix for each utterance, indexed (frame, pdf-id).  This does the same
as (e.g.) gmm-rescore-lattice, but from a matrix.  The "mapped" means that
the transition-model is used to map transition-ids to pdf-ids.  (c.f.
latgen-faster-mapped).  Note: <transition-model-in> can be any type of
model file, e.g. GMM-based or neural-net based; only the transition model is read.
Usage: lattice-rescore-mapped [options] <transition-model-in> <lattice-rspecifier> <loglikes-rspecifier> <lattice-wspecifier>
 e.g.: nnet-logprob [args] .. | lattice-rescore-mapped final.mdl ark:1.lats ark:- ark:2.lats 
lattice-depth
 Compute the lattice depths in terms of the average number of arcs that
cross a frame.  See also lattice-depth-per-frame
Usage: lattice-depth <lattice-rspecifier> [<depth-wspecifier>]
E.g.: lattice-depth ark:- ark,t:- 
lattice-align-phones
 Convert lattices so that the arcs in the CompactLattice format correspond with
phones.  The output symbols are still words, unless you specify --replace-output-symbols=true
Usage: lattice-align-phones [options] <model> <lattice-rspecifier> <lattice-wspecifier>
 e.g.: lattice-align-phones final.mdl ark:1.lats ark:phone_aligned.lats
See also: lattice-to-phone-lattice, lattice-align-words, lattice-align-words-lexicon
Note: if you just want the phone alignment from a lattice, the easiest path is
 lattice-1best | nbest-to-linear [keeping only alignment] | ali-to-phones
If you want the words and phones jointly (i.e. pronunciations of words, with word
alignment), try
 lattice-1best | nbest-to-prons 
lattice-to-smbr-post
 Do forward-backward and collect frame level posteriors for
the state-level minimum Bayes Risk criterion (SMBR), which
is like MPE with the criterion at a context-dependent state level.
The output may be fed into gmm-acc-stats2 or similar to train the
models discriminatively.  The posteriors may be positive or negative.
Usage: lattice-to-smbr-post [options] <model> <num-posteriors-rspecifier>
 <lats-rspecifier> <posteriors-wspecifier> 
e.g.: lattice-to-smbr-post --acoustic-scale=0.1 1.mdl ark:num.post
 ark:1.lats ark:1.post 
lattice-determinize-pruned-parallel
 Determinize lattices, keeping only the best path (sequence of acoustic states)
for each input-symbol sequence.  This is a version of lattice-determnize-pruned
that accepts the --num-threads option.  These programs do pruning as part of the
determinization algorithm, which is more efficient and prevents blowup.
See http://kaldi-asr.org/doc/lattices.html for more information on lattices.
Usage: lattice-determinize-pruned-parallel [options] lattice-rspecifier lattice-wspecifier
 e.g.: lattice-determinize-pruned-parallel --acoustic-scale=0.1 --beam=6.0 ark:in.lats ark:det.lats 
lattice-add-penalty
 Add word insertion penalty to the lattice.
Note: penalties are negative log-probs, base e, and are added to the
'language model' part of the cost.
Usage: lattice-add-penalty [options] <lattice-rspecifier> <lattice-wspecifier>
 e.g.: lattice-add-penalty --word-ins-penalty=1.0 ark:- ark:- 
lattice-align-words-lexicon
 Convert lattices so that the arcs in the CompactLattice format correspond with
words (i.e. aligned with word boundaries).  This is the newest form, that
reads in a lexicon in integer format, where each line is (integer id of)
 word-in word-out phone1 phone2 ... phoneN
(note: word-in is word before alignment, word-out is after, e.g. for replacing
<eps> with SIL or vice versa)
This may be more efficient if you first apply 'lattice-push'.
Usage: lattice-align-words-lexicon [options] <lexicon-file> <model> <lattice-rspecifier> <lattice-wspecifier>
 e.g.: lattice-align-words-lexicon  --partial-word-label=4324 --max-expand 10.0 --test true \
   data/lang/phones/align_lexicon.int final.mdl ark:1.lats ark:aligned.lats
See also: lattice-align-words, which is only applicable if your phones have word-position
markers, i.e. each phone comes in 5 versions like AA_B, AA_I, AA_W, AA_S, AA. 
lattice-push
 Push lattices, in CompactLattice format, so that the strings are as
close to the start as possible, and the lowest cost weight for each
state except the start state is (0, 0).  This can be helpful prior to
word-alignment (in this case, only strings need to be pushed)
Usage: lattice-push [options] lattice-rspecifier lattice-wspecifier
 e.g.: lattice-push ark:1.lats ark:2.lats 
lattice-minimize
 Minimize lattices, in CompactLattice format.  Should be applied to
determinized lattices (e.g. produced with --determinize-lattice=true)
Note: by default this program
pushes the strings and weights prior to minimization.Usage: lattice-minimize [options] lattice-rspecifier lattice-wspecifier
 e.g.: lattice-minimize ark:1.lats ark:2.lats 
lattice-limit-depth
 Limit the number of arcs crossing any frame, to a specified maximum.
Requires an acoustic scale, because forward-backward Viterbi probs are
needed, which will be affected by this.
Usage: lattice-limit-depth [options] <lattice-rspecifier> <lattice-wspecifier>
E.g.: lattice-limit-depth --max-arcs-per-frame=1000 --acoustic-scale=0.1 ark:- ark:- 
lattice-depth-per-frame
 For each lattice, compute a vector of length (num-frames) saying how
may arcs cross each frame.  See also lattice-depth
Usage: lattice-depth-per-frame <lattice-rspecifier> <depth-wspecifier> [<lattice-wspecifier>]
The final <lattice-wspecifier> allows you to write the input lattices out
in case you want to do something else with them as part of the same pipe.
E.g.: lattice-depth-per-frame ark:- ark,t:- 
lattice-confidence
 Compute sentence-level lattice confidence measures for each lattice.
The output is simly the difference between the total costs of the best and
second-best paths in the lattice (or a very large value if the lattice
had only one path).  Caution: this is not necessarily a very good confidence
measure.  You almost certainly want to specify the acoustic scale.
If the input is a state-level lattice, you need to specify
--read-compact-lattice=false, or the confidences will be very small
(and wrong).  You can get word-level confidence info from lattice-mbr-decode.
Usage: lattice-confidence <lattice-rspecifier> <confidence-wspecifier>
E.g.: lattice-confidence --acoustic-scale=0.08333 ark:- ark,t:- 
lattice-determinize-phone-pruned
 Determinize lattices, keeping only the best path (sequence of
acoustic states) for each input-symbol sequence. This version does
phone inertion when doing a first pass determinization, it then
removes the inserted symbols and does a second pass determinization.
It also does pruning as part of the determinization algorithm, which
is more efficient and prevents blowup.
Usage: lattice-determinize-phone-pruned [options] <model> \
                  <lattice-rspecifier> <lattice-wspecifier>
 e.g.: lattice-determinize-phone-pruned --acoustic-scale=0.1 \
                            final.mdl ark:in.lats ark:det.lats 
lattice-determinize-phone-pruned-parallel
 Determinize lattices, keeping only the best path (sequence of
acoustic states) for each input-symbol sequence. This is a version
of lattice-determinize-phone-pruned that accepts the --num-threads
option. The program does phone insertion when doing a first pass
determinization, it then removes the inserted symbols and does a
second pass determinization. It also does pruning as part of the
determinization algorithm, which is more efficient and prevents
blowup.
Usage: lattice-determinize-phone-pruned-parallel [options] \
                 <model> <lattice-rspecifier> <lattice-wspecifier>
 e.g.: lattice-determinize-phone-pruned-parallel \
           --acoustic-scale=0.1 final.mdl ark:in.lats ark:det.lats 
lattice-expand-ngram
 Expand lattices so that each arc has a unique n-label history, for
a specified n (defaults to 3).
Usage: lattice-expand-ngram [options] lattice-rspecifier lattice-wspecifier
e.g.: lattice-expand-ngram --n=3 ark:lat ark:expanded_lat 
lattice-lmrescore-const-arpa
 Rescores lattice with the ConstArpaLm format language model. The LM
will be wrapped into the DeterministicOnDemandFst interface and the
rescoring is done by composing with the wrapped LM using a special
type of composition algorithm. Determinization will be applied on
the composed lattice.
Usage: lattice-lmrescore-const-arpa [options] lattice-rspecifier \
                                   const-arpa-in lattice-wspecifier
 e.g.: lattice-lmrescore-const-arpa --lm-scale=-1.0 ark:in.lats \
                                   const_arpa ark:out.lats 
lattice-lmrescore-rnnlm
 Rescores lattice with rnnlm. The LM will be wrapped into the
DeterministicOnDemandFst interface and the rescoring is done by
composing with the wrapped LM using a special type of composition
algorithm. Determinization will be applied on the composed lattice.
Usage: lattice-lmrescore-rnnlm [options] [unk_prob_rspecifier] \
             <word-symbol-table-rxfilename> <lattice-rspecifier> \
             <rnnlm-rxfilename> <lattice-wspecifier>
 e.g.: lattice-lmrescore-rnnlm --lm-scale=-1.0 words.txt \
                     ark:in.lats rnnlm ark:out.lats 
nbest-to-prons
 Reads lattices which must be linear (single path), and must be in
CompactLattice form where the transition-ids on the arcs
have been aligned with the word boundaries (see lattice-align-words*)
and outputs a vaguely ctm-like format where each line is of the form:
<utterance-id> <begin-frame> <num-frames> <word> <phone1> <phone2> ... <phoneN>
where the words and phones will both be written as integers.  For arcs
in the input lattice that don't correspond to words, <word> may be zero; this
will typically be the case for the optional silences.
Usage: nbest-to-prons [options] <model> <aligned-linear-lattice-rspecifier> <output-wxfilename>
e.g.: lattice-1best --acoustic-weight=0.08333 ark:1.lats | \
      lattice-align-words data/lang/phones/word_boundary.int exp/dir/final.mdl ark:- ark:- | \
      nbest-to-prons exp/dir/final.mdl ark:- 1.prons
Note: the type of the model doesn't matter as only the transition-model is read. 
lattice-arc-post
 Print out information regarding posteriors of lattice arcs
This program computes posteriors from a lattice and prints out
information for each arc (the format is reminiscent of ctm, but
contains information from multiple paths).  Each line is:
 <utterance-id> <start-frame> <num-frames> <posterior> <word> [<ali>] [<phone1> <phone2>...]
for instance:
2013a04-bk42\t104\t26\t0.95,242,242,242,71,894,894,62,63,63,63,63 8 9
where the --print-alignment option determines whether the alignments (i.e. the
sequences of transition-ids) are printed, and the phones are printed only if the
<model> is supplied on the command line.  Note, there are tabs between the major
fields, but the phones are separated by spaces.
Usage: lattice-arc-post [<model>] <lattices-rspecifier> <output-wxfilename>
e.g.: lattice-arc-post --acoustic-scale=0.1 final.mdl 'ark:gunzip -c lat.1.gz|' post.txt
You will probably want to word-align the lattices (e.g. lattice-align-words or
lattice-align-words-lexicon) before this program, apply an acoustic scale either
via the --acoustic-scale option or using lattice-scale.
See also: lattice-post, lattice-to-ctm-conf, nbest-to-ctm 
lattice-determinize-non-compact
 lattice-determinize lattices (and apply a pruning beam)
 (see http://kaldi-asr.org/doc/lattices.html for more explanation)
This version of the program retains the original acoustic scores of arcs in the determinized lattice and writes it as a normal (non-compact) lattice. 
 note: this program is tyically only useful if you generated state-level
 lattices, e.g. called gmm-latgen-simple with --determinize=false
Usage: lattice-determinize-non-compact [options] lattice-rspecifier lattice-wspecifier
 e.g.: lattice-determinize-non-compact --acoustic-scale=0.1 --beam=15.0 ark:1.lats ark:det.lats 
lattice-lmrescore-kaldi-rnnlm
 Rescores lattice with kaldi-rnnlm. This script is called from 
scripts/rnnlm/lmrescore.sh. An example for rescoring 
lattices is at egs/swbd/s5c/local/rnnlm/run_lstm.sh
Usage: lattice-lmrescore-kaldi-rnnlm [options] \
             <embedding-file> <raw-rnnlm-rxfilename> \
             <lattice-rspecifier> <lattice-wspecifier>
 e.g.: lattice-lmrescore-kaldi-rnnlm --lm-scale=-1.0 \
              word_embedding.mat \
              --bos-symbol=1 --eos-symbol=2 \
              final.raw ark:in.lats ark:out.lats 
lattice-lmrescore-pruned
 This program can be used to subtract scores from one language model and
add scores from another one.  It uses an efficient rescoring algorithm that
avoids exploring the entire composed lattice.  The first (negative-weight)
language model is expected to be an FST, e.g. G.fst; the second one can
either be in FST or const-arpa format.  Any FST-format language models will
be projected on their output by this program, making it unnecessary for the
caller to remove disambiguation symbols.
Usage: lattice-lmrescore-pruned [options] <lm-to-subtract> <lm-to-add> <lattice-rspecifier> <lattice-wspecifier>
 e.g.: lattice-lmrescore-pruned --acoustic-scale=0.1 \
      data/lang/G.fst data/lang_fg/G.fst ark:in.lats ark:out.lats
 or: lattice-lmrescore-pruned --acoustic-scale=0.1 --add-const-arpa=true\
      data/lang/G.fst data/lang_fg/G.carpa ark:in.lats ark:out.lats 
lattice-lmrescore-kaldi-rnnlm-pruned
 Rescores lattice with kaldi-rnnlm. This script is called from 
scripts/rnnlm/lmrescore_pruned.sh. An example for rescoring 
lattices is at egs/swbd/s5c/local/rnnlm/run_lstm.sh
Usage: lattice-lmrescore-kaldi-rnnlm-pruned [options] \
             <old-lm-rxfilename> <embedding-file> \
             <raw-rnnlm-rxfilename> \
             <lattice-rspecifier> <lattice-wspecifier>
 e.g.: lattice-lmrescore-kaldi-rnnlm-pruned --lm-scale=-1.0 fst_words.txt \
              --bos-symbol=1 --eos-symbol=2 \
              data/lang_test/G.fst word_embedding.mat \
              final.raw ark:in.lats ark:out.lats
       lattice-lmrescore-kaldi-rnnlm-pruned --lm-scale=-1.0 fst_words.txt \
              --bos-symbol=1 --eos-symbol=2 \
              data/lang_test_fg/G.carpa word_embedding.mat \
              final.raw ark:in.lats ark:out.lats 
lattice-reverse
 Reverse a lattice in order to rescore the lattice with a RNNLM 
trained reversed text. An example for its application is at 
swbd/local/rnnlm/run_lstm_tdnn_back.sh
Usage: lattice-reverse lattice-rspecifier lattice-wspecifier
 e.g.: lattice-reverse ark:forward.lats ark:backward.lats 
arpa2fst
 Convert an ARPA format language model into an FST
Usage: arpa2fst [opts] <input-arpa> <output-fst>
 e.g.: arpa2fst --disambig-symbol=#0 --read-symbol-table=data/lang/words.txt lm/input.arpa G.fst
Note: When called without switches, the output G.fst will contain
an embedded symbol table. This is compatible with the way a previous
version of arpa2fst worked. 
arpa-to-const-arpa
 Converts an Arpa format language model into ConstArpaLm format,
which is an in-memory representation of the pre-built Arpa language
model. The output language model can then be read in by a program
that wants to rescore lattices. We assume that the words in the
input arpa language model has been converted to integers.
The program is used jointly with utils/map_arpa_lm.pl to build
ConstArpaLm format language model. We first map the words in an Arpa
format language model to integers using utils/map_arpa_m.pl, and
then use this program to build a ConstArpaLm format language model.
Usage: arpa-to-const-arpa [opts] <input-arpa> <const-arpa>
 e.g.: arpa-to-const-arpa --bos-symbol=1 --eos-symbol=2 \
                          arpa.txt const_arpa 
nnet-am-info
 Print human-readable information about the neural network
acoustic model to the standard output
Usage:  nnet-am-info [options] <nnet-in>
e.g.:
 nnet-am-info 1.nnet 
nnet-init
 Initialize the nnet2 neural network from a config file with a line for each
component.  Note, this only outputs the neural net itself, not the associated
information such as the transition-model; you'll probably want to pipe
the output into something like nnet-am-init.
Usage:  nnet-init [options] <config-in> <raw-nnet-out>
e.g.:
 nnet-init nnet.config 1.raw 
nnet-train-simple
 Train the neural network parameters with backprop and stochastic
gradient descent using minibatches.  Training examples would be
produced by nnet-get-egs.
Usage:  nnet-train-simple [options] <model-in> <training-examples-in> <model-out>
e.g.:
nnet-train-simple 1.nnet ark:1.egs 2.nnet 
nnet-train-ensemble
 Train an ensemble of neural networks with backprop and stochastic
gradient descent using minibatches.  Modified version of nnet-train-simple.
Implements parallel gradient descent with a term that encourages the nnets to
produce similar outputs.
Usage:  nnet-train-ensemble [options] <model-in-1> <model-in-2> ... <model-in-n>  <training-examples-in> <model-out-1> <model-out-2> ... <model-out-n>
e.g.:
 nnet-train-ensemble 1.1.nnet 2.1.nnet ark:egs.ark 2.1.nnet 2.2.nnet  
nnet-train-transitions
 Train the transition probabilities of a neural network acoustic model
Usage:  nnet-train-transitions [options] <nnet-in> <alignments-rspecifier> <nnet-out>
e.g.:
 nnet-train-transitions 1.nnet "ark:gunzip -c ali.*.gz|" 2.nnet 
nnet-latgen-faster
 Generate lattices using neural net model.
Usage: nnet-latgen-faster [options] <nnet-in> <fst-in|fsts-rspecifier> <features-rspecifier> <lattice-wspecifier> [ <words-wspecifier> [<alignments-wspecifier>] ] 
nnet-am-copy
 Copy a (nnet2) neural net and its associated transition model,
possibly changing the binary mode
Also supports multiplying all the learning rates by a factor
(the --learning-rate-factor option) and setting them all to a given
value (the --learning-rate options)
Usage:  nnet-am-copy [options] <nnet-in> <nnet-out>
e.g.:
 nnet-am-copy --binary=false 1.mdl text.mdl 
nnet-am-init
 Initialize the neural network acoustic model and its associated
transition-model, from a tree, a topology file, and a neural-net
without an associated acoustic model.
See example scripts to see how this works in practice.
Usage:  nnet-am-init [options] <tree-in> <topology-in> <raw-nnet-in> <nnet-am-out>
or:  nnet-am-init [options] <transition-model-in> <raw-nnet-in> <nnet-am-out>
e.g.:
 nnet-am-init tree topo "nnet-init nnet.config - |" 1.mdl 
nnet-insert
 Insert components into a neural network-based acoustic model.
This is mostly intended for adding new hidden layers to neural networks.
You can either specify the option --insert-at=n (specifying the index of
the component after which you want your neural network inserted), or by
default this program will insert it just before the component before the
softmax component.  CAUTION: It will also randomize the parameters of the
component before the softmax (typically AffineComponent), with stddev equal
to the --stddev-factor option (default 0.1), times the inverse square root
of the number of inputs to that component.
Set --randomize-next-component=false to turn this off.
Usage:  nnet-insert [options] <nnet-in> <raw-nnet-to-insert-in> <nnet-out>
e.g.:
 nnet-insert 1.nnet "nnet-init hidden_layer.config -|" 2.nnet 
nnet-align-compiled
 Align features given neural-net-based model
Usage:   nnet-align-compiled [options] <model-in> <graphs-rspecifier> <feature-rspecifier> <alignments-wspecifier>
e.g.: 
 nnet-align-compiled 1.mdl ark:graphs.fsts scp:train.scp ark:1.ali
or:
 compile-train-graphs tree 1.mdl lex.fst 'ark:sym2int.pl -f 2- words.txt text|' \
   ark:- | nnet-align-compiled 1.mdl ark:- scp:train.scp t, ark:1.ali 
nnet-compute-prob
 Computes and prints the average log-prob per frame of the given data with a
neural net.  The input of this is the output of e.g. nnet-get-egs
Aside from the logging output, which goes to the standard error, this program
prints the average log-prob per frame to the standard output.
Also see nnet-logprob, which produces a matrix of log-probs for each utterance.
Usage:  nnet-compute-prob [options] <model-in> <training-examples-in>
e.g.: nnet-compute-prob 1.nnet ark:valid.egs 
nnet-copy-egs
 Copy examples (typically single frames) for neural network training,
possibly changing the binary mode.  Supports multiple wspecifiers, in
which case it will write the examples round-robin to the outputs.
Usage:  nnet-copy-egs [options] <egs-rspecifier> <egs-wspecifier1> [<egs-wspecifier2> ...]
e.g.
nnet-copy-egs ark:train.egs ark,t:text.egs
or:
nnet-copy-egs ark:train.egs ark:1.egs ark:2.egs 
nnet-combine
 Using a validation set, compute an optimal combination of a number of
neural nets (the combination weights are separate for each layer and
do not have to sum to one).  The optimization is BFGS, which is initialized
from the best of the individual input neural nets (or as specified by
--initial-model)
Usage:  nnet-combine [options] <model-in1> <model-in2> ... <model-inN> <valid-examples-in> <model-out>
e.g.:
 nnet-combine 1.1.nnet 1.2.nnet 1.3.nnet ark:valid.egs 2.nnet
Caution: the first input neural net must not be a gradient. 
nnet-am-average
 This program averages (or sums, if --sum=true) the parameters over a
number of neural nets.  If you supply the option --skip-last-layer=true,
the parameters of the last updatable layer are copied from <model1> instead
of being averaged (useful in multi-language scenarios).
The --weights option can be used to weight each model differently.
Usage:  nnet-am-average [options] <model1> <model2> ... <modelN> <model-out>
e.g.:
 nnet-am-average 1.1.nnet 1.2.nnet 1.3.nnet 2.nnet 
nnet-am-compute
 Does the neural net computation for each file of input features, and
outputs as a matrix the result.  Used mostly for debugging.
Note: if you want it to apply a log (e.g. for log-likelihoods), use
--apply-log=true
Usage:  nnet-am-compute [options] <model-in> <feature-rspecifier> <feature-or-loglikes-wspecifier>
See also: nnet-compute, nnet-logprob 
nnet-am-mixup
 Add mixture-components to a neural net (comparable to mixtures in a Gaussian
mixture model).  Number of mixture components must be greater than the number
of pdfs
Usage:  nnet-am-mixup [options] <nnet-in> <nnet-out>
e.g.:
 nnet-am-mixup --power=0.3 --num-mixtures=5000 1.mdl 2.mdl 
nnet-get-egs
 Get frame-by-frame examples of data for neural network training.
Essentially this is a format change from features and posteriors
into a special frame-by-frame format.  To split randomly into
different subsets, do nnet-copy-egs with --random=true, but
note that this does not randomize the order of frames.
Usage:  nnet-get-egs [options] <features-rspecifier> <pdf-post-rspecifier> <training-examples-out>
An example [where $feats expands to the actual features]:
nnet-get-egs --left-context=8 --right-context=8 "$feats" \
  "ark:gunzip -c exp/nnet/ali.1.gz | ali-to-pdf exp/nnet/1.nnet ark:- ark:- | ali-to-post ark:- ark:- |" \
   ark:- 
Note: the --left-context and --right-context would be derived from
the output of nnet-info. 
nnet-train-parallel
 Train the neural network parameters with backprop and stochastic
gradient descent using minibatches.  As nnet-train-simple, but
uses multiple threads in a Hogwild type of update (for CPU, not GPU).
Usage:  nnet-train-parallel [options] <model-in> <training-examples-in> <model-out>
e.g.:
nnet-train-parallel --num-threads=8 1.nnet ark:1.1.egs 2.nnet 
nnet-combine-fast
 Using a validation set, compute an optimal combination of a number of
neural nets (the combination weights are separate for each layer and
do not have to sum to one).  The optimization is BFGS, which is initialized
from the best of the individual input neural nets (or as specified by
--initial-model)
Usage:  nnet-combine-fast [options] <model-in1> <model-in2> ... <model-inN> <valid-examples-in> <model-out>
e.g.:
 nnet-combine-fast 1.1.nnet 1.2.nnet 1.3.nnet ark:valid.egs 2.nnet
Caution: the first input neural net must not be a gradient. 
nnet-subset-egs
 Creates a random subset of the input examples, of a specified size.
Uses no more memory than the size of the subset.
Usage:  nnet-subset-egs [options] <egs-rspecifier> [<egs-wspecifier2> ...]
e.g.
nnet-subset-egs [args] ark:- | nnet-subset-egs --n=1000 ark:- ark:subset.egs 
nnet-shuffle-egs
 Copy examples (typically single frames) for neural network training,
from the input to output, but randomly shuffle the order.  This program will keep
all of the examples in memory at once, unless you use the --buffer-size option
Usage:  nnet-shuffle-egs [options] <egs-rspecifier> <egs-wspecifier>
nnet-shuffle-egs --srand=1 ark:train.egs ark:shuffled.egs 
nnet-am-fix
 Copy a (cpu-based) neural net and its associated transition model,
but modify it to remove certain pathologies.  We use the average
derivative statistics stored with the layers derived from
NonlinearComponent.  Note: some processes, such as nnet-combine-fast,
may not process these statistics correctly, and you may have to recover
them using the --stats-from option of nnet-am-copy before you use.
this program.
Usage:  nnet-am-fix [options] <nnet-in> <nnet-out>
e.g.:
 nnet-am-fix 1.mdl 1_fixed.mdl
or:
 nnet-am-fix --get-counts-from=1.gradient 1.mdl 1_shrunk.mdl 
nnet-latgen-faster-parallel
 Generate lattices using neural net model.
Usage: nnet-latgen-faster-parallel [options] <nnet-in> <fst-in|fsts-rspecifier> <features-rspecifier> <lattice-wspecifier> [ <words-wspecifier> [<alignments-wspecifier>] ] 
nnet-to-raw-nnet
 Copy a (cpu-based) neural net: reads the AmNnet with its transition model, but
writes just the Nnet with no transition model (i.e. the raw neural net.)
Usage:  nnet-to-raw-nnet [options] <nnet-in> <raw-nnet-out>
e.g.:
 nnet-to-raw-nnet --binary=false 1.mdl 1.raw 
nnet-compute
 Does the neural net computation for each file of input features, and
outputs as a matrix the result.  Used mostly for debugging.
Note: if you want it to apply a log (e.g. for log-likelihoods), use
--apply-log=true.  Unlike nnet-am-compute, this version reads a 'raw'
neural net
Usage:  nnet-compute [options] <raw-nnet-in> <feature-rspecifier> <feature-or-loglikes-wspecifier> 
raw-nnet-concat
 Concatenate two 'raw' neural nets, e.g. as output by nnet-init or
nnet-to-raw-nnet
Usage:  raw-nnet-concat [options] <raw-nnet-in1> <raw-nnet-in2> <raw-nnet-out>
e.g.:
 raw-nnet-concat nnet1 nnet2 nnet_concat 
raw-nnet-info
 Print human-readable information about the raw neural network
to the standard output
Usage:  raw-nnet-info [options] <nnet-in>
e.g.:
 raw-nnet-info 1.nnet 
nnet-get-feature-transform
 Get feature-projection transform using stats obtained with acc-lda.
See comments in the code of nnet2/get-feature-transform.h for more
information.
Usage:  nnet-get-feature-transform [options] <matrix-out> <lda-acc-1> <lda-acc-2> ... 
nnet-compute-from-egs
 Does the neural net computation, taking as input the nnet-training examples
(typically an archive with the extension .egs), ignoring the labels; it
outputs as a matrix the result.  Used mostly for debugging.
Usage:  nnet-compute-from-egs [options] <raw-nnet-in> <egs-rspecifier> <feature-wspecifier>
e.g.:  nnet-compute-from-egs 'nnet-to-raw-nnet final.mdl -|' egs.10.1.ark ark:- 
nnet-am-widen
 Copy a (cpu-based) neural net and its associated transition model,
possibly changing the binary mode
Also supports multiplying all the learning rates by a factor
(the --learning-rate-factor option) and setting them all to a given
value (the --learning-rate options)
Usage:  nnet-am-widen [options] <nnet-in> <nnet-out>
e.g.:
 nnet-am-widen --hidden-layer-dim=1024 1.mdl 2.mdl 
nnet-show-progress
 Given an old and a new model and some training examples (possibly held-out),
show the average objective function given the mean of the two models,
and the breakdown by component of why this happened (computed from
derivative information).  Also shows parameter differences per layer.
If training examples not provided, only shows parameter differences per
layer.
Usage:  nnet-show-progress [options] <old-model-in> <new-model-in> [<training-examples-in>]
e.g.: nnet-show-progress 1.nnet 2.nnet ark:valid.egs 
nnet-get-feature-transform-multi
 Get feature-projection transform using stats obtained with acc-lda.
The file <index-list> contains a series of line, each containing a list
of integer indexes.  For each line we create a transform of the same type
as nnet-get-feature-transform would produce, taking as input just the
listed feature dimensions.  The output transform will be the concatenation
of all these transforms.  The output-dim will be the number of integers in
the file <index-list> (the individual transforms are not dimension-reducing).
Do not set the --dim option.Usage:  nnet-get-feature-transform-multi [options] <index-list> <lda-acc-1> <lda-acc-2> ... <lda-acc-n> <matrix-out> 
nnet-copy-egs-discriminative
 Copy examples for discriminative neural
network training.  Supports multiple wspecifiers, in
which case it will write the examples round-robin to the outputs.
Usage:  nnet-copy-egs-discriminative [options] <egs-rspecifier> <egs-wspecifier1> [<egs-wspecifier2> ...]
e.g.
nnet-copy-egs-discriminative ark:train.degs ark,t:text.degs
or:
nnet-copy-egs-discriminative ark:train.degs ark:1.degs ark:2.degs 
nnet-get-egs-discriminative
 Get examples of data for discriminative neural network training;
each one corresponds to part of a file, of variable (and configurable)
length.
Usage:  nnet-get-egs-discriminative [options] <model> <features-rspecifier> <ali-rspecifier> <den-lat-rspecifier> <training-examples-out>
An example [where $feats expands to the actual features]:
nnet-get-egs-discriminative --acoustic-scale=0.1 \
  1.mdl '$feats' 'ark,s,cs:gunzip -c ali.1.gz|' 'ark,s,cs:gunzip -c lat.1.gz|' ark:1.degs 
nnet-shuffle-egs-discriminative
 Copy examples (typically single frames) for neural network training,
from the input to output, but randomly shuffle the order.  This program will keep
all of the examples in memory at once, so don't give it too many.
Usage:  nnet-shuffle-egs-discriminative [options] <egs-rspecifier> <egs-wspecifier>
nnet-shuffle-egs-discriminative --srand=1 ark:train.degs ark:shuffled.degs 
nnet-compare-hash-discriminative
 Compares two archives of discriminative training examples and checks
that they behave the same way for purposes of discriminative training.
This program was created as a way of testing nnet-get-egs-discriminative
The model is only needed for its transition-model.
Usage:  nnet-compare-hash-discriminative [options] <model-rxfilename> <egs-rspecifier1> <egs-rspecifier2>
Note: options --drop-frames and --criterion should be matched with the
command line of nnet-get-egs-discriminative used to get the examples
nnet-compare-hash-discriminative --drop-frames=true --criterion=mmi ark:1.degs ark:2.degs 
nnet-combine-egs-discriminative
 Copy examples for discriminative neural network training,
and combine successive examples if their combined length will
be less than --max-length.  This can help to improve efficiency
(--max-length corresponds to minibatch size)
Usage:  nnet-combine-egs-discriminative [options] <egs-rspecifier> <egs-wspecifier>
e.g.
nnet-combine-egs-discriminative --max-length=512 ark:temp.1.degs ark:1.degs 
nnet-train-discriminative-simple
 Train the neural network parameters with a discriminative objective
function (MMI, SMBR or MPFE).  This uses training examples prepared with
nnet-get-egs-discriminative
Usage:  nnet-train-discriminative-simple [options] <model-in> <training-examples-in> <model-out>
e.g.:
nnet-train-discriminative-simple 1.nnet ark:1.degs 2.nnet 
nnet-train-discriminative-parallel
 Train the neural network parameters with a discriminative objective
function (MMI, SMBR or MPFE).  This uses training examples prepared with
nnet-get-egs-discriminative
This version uses multiple threads (but no GPU)
Usage:  nnet-train-discriminative-parallel [options] <model-in> <training-examples-in> <model-out>
e.g.:
nnet-train-discriminative-parallel --num-threads=8 1.nnet ark:1.degs 2.nnet 
nnet-modify-learning-rates
 This program modifies the learning rates so as to equalize the
relative changes in parameters for each layer, while keeping their
geometric mean the same (or changing it to a value specified using
the --average-learning-rate option).
Usage: nnet-modify-learning-rates [options] <prev-model> \
                                  <cur-model> <modified-cur-model>
e.g.: nnet-modify-learning-rates --average-learning-rate=0.0002 \
                                 5.mdl 6.mdl 6.mdl 
nnet-normalize-stddev
 This program first identifies any affine or block affine layers that
are followed by pnorm and then renormalize layers. Then it rescales
those layers such that the parameter stddev is 1.0 after scaling
(the target stddev is configurable by the --stddev option).
If you supply the option --stddev-from=<model-filename>, it rescales
those layers to match the standard deviations of corresponding layers
in the specified model.
Usage: nnet-normalize-stddev [options] <model-in> <model-out>
 e.g.: nnet-normalize-stddev final.mdl final.mdl 
nnet-get-weighted-egs
 Get frame-by-frame examples of data for neural network training.
Essentially this is a format change from features and posteriors
into a special frame-by-frame format.  To split randomly into
different subsets, do nnet-copy-egs with --random=true, but
note that this does not randomize the order of frames.
Usage:  nnet-get-weighted-egs [options] <features-rspecifier> <pdf-post-rspecifier> <weights-rspecifier> <training-examples-out>
An example [where $feats expands to the actual features]:
nnet-get-weighted-egs --left-context=8 --right-context=8 "$feats" \
  "ark:gunzip -c exp/nnet/ali.1.gz | ali-to-pdf exp/nnet/1.nnet ark:- ark:- | ali-to-post ark:- ark:- |" \
   ark:- 
Note: the --left-context and --right-context would be derived from
the output of nnet-info. 
nnet-adjust-priors
 Set the priors of the neural net to the computed posterios from the net,
on typical data (e.g. training data). This is correct under more general
circumstances than using the priors of the class labels in the training data
Typical usage of this program will involve computation of an average pdf-level
posterior with nnet-compute or nnet-compute-from-egs, piped into matrix-sum-rows
and then vector-sum, to compute the average posterior
Usage: nnet-adjust-priors [options] <nnet-in> <summed-posterior-vector-in> <nnet-out>
e.g.:
 nnet-adjust-priors final.mdl prior.vec final.mdl 
nnet-replace-last-layers
 This program is for adding new layers to a neural-network acoustic model.
It removes the last --remove-layers layers, and adds the layers from the
supplied raw-nnet.  The typical use is to remove the last two layers
(the softmax, and the affine component before it), and add in replacements
for them newly initialized by nnet-init.  This program is a more flexible
way of adding layers than nnet-insert, but the inserted network needs to
contain replacements for the removed layers.
Usage:  nnet-replace-last-layers [options] <nnet-in> <raw-nnet-to-insert-in> <nnet-out>
e.g.:
 nnet-replace-last-layers 1.nnet "nnet-init hidden_layer.config -|" 2.nnet 
nnet-am-switch-preconditioning
 Copy a (cpu-based) neural net and its associated transition model,
and switch it to online preconditioning, i.e. change any components
derived from AffineComponent to components of type
AffineComponentPreconditionedOnline.
Usage:  nnet-am-switch-preconditioning [options] <nnet-in> <nnet-out>
e.g.:
 nnet-am-switch-preconditioning --binary=false 1.mdl text.mdl 
nnet1-to-raw-nnet
 Convert nnet1 neural net to nnet2 'raw' neural net
Usage:  nnet1-to-raw-nnet [options] <nnet1-in> <nnet2-out>
e.g.:
 nnet1-to-raw-nnet srcdir/final.nnet - | nnet-am-init dest/tree dest/topo - dest/0.mdl 
raw-nnet-copy
 Copy a raw neural net (this version works on raw nnet2 neural nets,
without the transition model.  Supports the 'truncate' option.
Usage:  raw-nnet-copy [options] <raw-nnet-in> <raw-nnet-out>
e.g.:
 raw-nnet-copy --binary=false 1.mdl text.mdl
See also: nnet-to-raw-nnet, nnet-am-copy 
nnet-relabel-egs
 Relabel neural network egs with the read pdf-id alignments, zero-based..
Usage: nnet-relabel-egs [options] <pdf-aligment-rspecifier> <egs_rspecifier1> ... <egs_rspecifierN> <egs_wspecifier1> ... <egs_wspecifierN>
e.g.: 
 nnet-relabel-egs ark:1.ali egs_in/egs.1.ark egs_in/egs.2.ark egs_out/egs.1.ark egs_out/egs.2.ark
See also: nnet-get-egs, nnet-copy-egs, steps/nnet2/relabel_egs.sh 
nnet-am-reinitialize
 This program can used when transferring a neural net from one language
to another (or one tree to another).  It takes a neural net and a
transition model from a different neural net, resizes the last layer
to match the new transition model, zeroes it, and writes out the new,
resized .mdl file.  If the original model had been 'mixed-up', the associated
SumGroupComponent will be removed.
Usage:  nnet-am-reinitialize [options] <nnet-in> <new-transition-model> <nnet-out>
e.g.:
 nnet-am-reinitialize 1.mdl exp/tri6/final.mdl 2.mdl 
nnet3-init
 Initialize nnet3 neural network from a config file; outputs 'raw' nnet
without associated information such as transition model and priors.
Search for examples in scripts in /egs/wsj/s5/steps/nnet3/
Can also be used to add layers to existing model (provide existing model
as 1st arg)
Usage:  nnet3-init [options] [<existing-model-in>] <config-in> <raw-nnet-out>
e.g.:
 nnet3-init nnet.config 0.raw
or: nnet3-init 1.raw nnet.config 2.raw
See also: nnet3-copy, nnet3-info 
nnet3-info
 Print some text information about 'raw' nnet3 neural network, to
standard output
Usage:  nnet3-info [options] <raw-nnet>
e.g.:
 nnet3-info 0.raw
See also: nnet3-am-info 
nnet3-get-egs
 Get frame-by-frame examples of data for nnet3 neural network training.
Essentially this is a format change from features and posteriors
into a special frame-by-frame format.  This program handles the
common case where you have some input features, possibly some
iVectors, and one set of labels.  If people in future want to
do different things they may have to extend this program or create
different versions of it for different tasks (the egs format is quite
general)
Usage:  nnet3-get-egs [options] <features-rspecifier> <pdf-post-rspecifier> <egs-out>
An example [where $feats expands to the actual features]:
nnet3-get-egs --num-pdfs=2658 --left-context=12 --right-context=9 --num-frames=8 "$feats"\
"ark:gunzip -c exp/nnet/ali.1.gz | ali-to-pdf exp/nnet/1.nnet ark:- ark:- | ali-to-post ark:- ark:- |" \
   ark:- 
See also: nnet3-chain-get-egs, nnet3-get-egs-simple 
nnet3-copy-egs
 Copy examples (single frames or fixed-size groups of frames) for neural
network training, possibly changing the binary mode.  Supports multiple wspecifiers, in
which case it will write the examples round-robin to the outputs.
Usage:  nnet3-copy-egs [options] <egs-rspecifier> <egs-wspecifier1> [<egs-wspecifier2> ...]
e.g.
nnet3-copy-egs ark:train.egs ark,t:text.egs
or:
nnet3-copy-egs ark:train.egs ark:1.egs ark:2.egs
See also: nnet3-subset-egs, nnet3-get-egs, nnet3-merge-egs, nnet3-shuffle-egs 
nnet3-subset-egs
 Creates a random subset of the input examples, of a specified size.
Uses no more memory than the size of the subset.
Usage:  nnet3-subset-egs [options] <egs-rspecifier> [<egs-wspecifier2> ...]
e.g.
nnet3-copy-egs [args] ark:egs.1.ark ark:- | nnet-subset-egs --n=1000 ark:- ark:subset.egs 
nnet3-shuffle-egs
 Copy examples (typically single frames or small groups of frames) for
neural network training, from the input to output, but randomly shuffle the order.
This program will keep all of the examples in memory at once, unless you
use the --buffer-size option
Usage:  nnet3-shuffle-egs [options] <egs-rspecifier> <egs-wspecifier>
nnet3-shuffle-egs --srand=1 ark:train.egs ark:shuffled.egs 
nnet3-acc-lda-stats
 Accumulate statistics in the same format as acc-lda (i.e. stats for
estimation of LDA and similar types of transform), starting from nnet
training examples.  This program puts the features through the network,
and the network output will be the features; the supervision in the
training examples is used for the class labels.  Used in obtaining
feature transforms that help nnet training work better.
Usage:  nnet3-acc-lda-stats [options] <raw-nnet-in> <training-examples-in> <lda-stats-out>
e.g.:
nnet3-acc-lda-stats 0.raw ark:1.egs 1.acc
See also: nnet-get-feature-transform 
nnet3-merge-egs
 This copies nnet training examples from input to output, but while doing so it
merges many NnetExample objects into one, forming a minibatch consisting of a
single NnetExample.
Usage:  nnet3-merge-egs [options] <egs-rspecifier> <egs-wspecifier>
e.g.
nnet3-merge-egs --minibatch-size=512 ark:1.egs ark:- | nnet3-train-simple ... 
See also nnet3-copy-egs 
nnet3-compute-from-egs
 Read input nnet training examples, and compute the output for each one.
If --apply-exp=true, apply the Exp() function to the output before writing
it out.
Usage:  nnet3-compute-from-egs [options] <raw-nnet-in> <training-examples-in> <matrices-out>
e.g.:
nnet3-compute-from-egs --apply-exp=true 0.raw ark:1.egs ark:- | matrix-sum-rows ark:- ... 
See also: nnet3-compute 
nnet3-train
 Train nnet3 neural network parameters with backprop and stochastic
gradient descent.  Minibatches are to be created by nnet3-merge-egs in
the input pipeline.  This training program is single-threaded (best to
use it with a GPU); see nnet3-train-parallel for multi-threaded training
that is better suited to CPUs.
Usage:  nnet3-train [options] <raw-model-in> <training-examples-in> <raw-model-out>
e.g.:
nnet3-train 1.raw 'ark:nnet3-merge-egs 1.egs ark:-|' 2.raw 
nnet3-am-init
 Initialize nnet3 am-nnet (i.e. neural network-based acoustic model, with
associated transition model) from an existing transition model and nnet..
Search for examples in scripts in /egs/wsj/s5/steps/nnet3/
Set priors using nnet3-am-train-transitions or nnet3-am-adjust-priors
Usage:  nnet3-am-init [options] <tree-in> <topology-in> <input-raw-nnet> <output-am-nnet>
  or:  nnet3-am-init [options] <trans-model-in> <input-raw-nnet> <output-am-nnet>
e.g.:
 nnet3-am-init tree topo 0.raw 0.mdl
See also: nnet3-init, nnet3-am-copy, nnet3-am-info, nnet3-am-train-transitions,
 nnet3-am-adjust-priors 
nnet3-am-train-transitions
 Train the transition probabilities of an nnet3 neural network acoustic model
Usage:  nnet3-am-train-transitions [options] <nnet-in> <alignments-rspecifier> <nnet-out>
e.g.:
 nnet3-am-train-transitions 1.nnet "ark:gunzip -c ali.*.gz|" 2.nnet 
nnet3-am-adjust-priors
 Set the priors of the nnet3 neural net to the computed posterios from the net,
on typical data (e.g. training data). This is correct under more general
circumstances than using the priors of the class labels in the training data
Typical usage of this program will involve computation of an average pdf-level
posterior with nnet3-compute or nnet3-compute-from-egs, piped into matrix-sum-rows
and then vector-sum, to compute the average posterior
Usage: nnet3-am-adjust-priors [options] <nnet-in> <summed-posterior-vector-in> <nnet-out>
e.g.:
 nnet3-am-adjust-priors final.mdl counts.vec final.mdl 
nnet3-am-copy
 Copy nnet3 neural-net acoustic model file; supports conversion
to raw model (--raw=true).
Also supports setting all learning rates to a supplied
value (the --learning-rate option),
and supports replacing the raw nnet in the model (the Nnet)
with a provided raw nnet (the --set-raw-nnet option)
Usage:  nnet3-am-copy [options] <nnet-in> <nnet-out>
e.g.:
 nnet3-am-copy --binary=false 1.mdl text.mdl
 nnet3-am-copy --raw=true 1.mdl 1.raw 
nnet3-compute-prob
 Computes and prints in logging messages the average log-prob per frame of
the given data with an nnet3 neural net.  The input of this is the output of
e.g. nnet3-get-egs | nnet3-merge-egs.
Usage:  nnet3-compute-prob [options] <raw-model-in> <training-examples-in>
e.g.: nnet3-compute-prob 0.raw ark:valid.egs 
nnet3-average
 This program averages the parameters over a number of 'raw' nnet3 neural nets.
Usage:  nnet3-average [options] <model1> <model2> ... <modelN> <model-out>
e.g.:
 nnet3-average 1.1.nnet 1.2.nnet 1.3.nnet 2.nnet 
nnet3-am-info
 Print some text information about an nnet3 neural network, to
standard output
Usage:  nnet3-am-info [options] <nnet>
e.g.:
 nnet3-am-info 0.mdl
See also: nnet3-am-info 
nnet3-combine
 Using a subset of training or held-out examples, compute the average
over the first n nnet3 models where we maxize the objective function
for n. Note that the order of models has been reversed before
being fed into this binary. So we are actually combining last n models.
Inputs and outputs are 'raw' nnets.
Usage:  nnet3-combine [options] <nnet-in1> <nnet-in2> ... <nnet-inN> <valid-examples-in> <nnet-out>
e.g.:
 nnet3-combine 1.1.raw 1.2.raw 1.3.raw ark:valid.egs 2.raw 
nnet3-latgen-faster
 Generate lattices using nnet3 neural net model.
Usage: nnet3-latgen-faster [options] <nnet-in> <fst-in|fsts-rspecifier> <features-rspecifier> <lattice-wspecifier> [ <words-wspecifier> [<alignments-wspecifier>] ]
See also: nnet3-latgen-faster-parallel, nnet3-latgen-faster-batch 
nnet3-latgen-faster-parallel
 Generate lattices using nnet3 neural net model.  This version supports
multiple decoding threads (using a shared decoding graph.)
Usage: nnet3-latgen-faster-parallel [options] <nnet-in> <fst-in|fsts-rspecifier> <features-rspecifier> <lattice-wspecifier> [ <words-wspecifier> [<alignments-wspecifier>] ]
See also: nnet3-latgen-faster-batch (which supports GPUs) 
nnet3-show-progress
 Given an old and a new 'raw' nnet3 network and some training examples
(possibly held-out), show the average objective function given the
mean of the two networks, and the breakdown by component of why this
happened (computed from derivative information). Also shows parameter
differences per layer. If training examples not provided, only shows
parameter differences per layer.
Usage:  nnet3-show-progress [options] <old-net-in> <new-net-in> [<training-examples-in>]
e.g.: nnet3-show-progress 1.nnet 2.nnet ark:valid.egs 
nnet3-align-compiled
 Align features given nnet3 neural net model
Usage:   nnet3-align-compiled [options] <nnet-in> <graphs-rspecifier> <features-rspecifier> <alignments-wspecifier>
e.g.: 
 nnet3-align-compiled 1.mdl ark:graphs.fsts scp:train.scp ark:1.ali
or:
 compile-train-graphs tree 1.mdl lex.fst 'ark:sym2int.pl -f 2- words.txt text|' \
   ark:- | nnet3-align-compiled 1.mdl ark:- scp:train.scp t, ark:1.ali 
nnet3-copy
 Copy 'raw' nnet3 neural network to standard output
Also supports setting all the learning rates to a value
(the --learning-rate option)
Usage:  nnet3-copy [options] <nnet-in> <nnet-out>
e.g.:
 nnet3-copy --binary=false 0.raw text.raw 
nnet3-get-egs-dense-targets
 Get frame-by-frame examples of data for nnet3 neural network training.
This program is similar to nnet3-get-egs, but the targets here are dense matrices instead of posteriors (sparse matrices).
This is useful when you want the targets to be continuous real-valued with the neural network possibly trained with a quadratic objective
Usage:  nnet3-get-egs-dense-targets --num-targets=<n> [options] <features-rspecifier> <targets-rspecifier> <egs-out>
An example [where $feats expands to the actual features]:
nnet-get-egs-dense-targets --num-targets=26 --left-context=12 \
--right-context=9 --num-frames=8 "$feats" \
"ark:copy-matrix ark:exp/snrs/snr.1.ark ark:- |"
   ark:-  
nnet3-compute
 Propagate the features through raw neural network model and write the output.
If --apply-exp=true, apply the Exp() function to the output before writing it out.
Usage: nnet3-compute [options] <nnet-in> <features-rspecifier> <matrix-wspecifier>
 e.g.: nnet3-compute final.raw scp:feats.scp ark:nnet_prediction.ark
See also: nnet3-compute-from-egs, nnet3-chain-compute-post
Note: this program does not currently make very efficient use of the GPU. 
nnet3-discriminative-get-egs
 Get frame-by-frame examples of data for nnet3+sequence neural network
training.  This involves breaking up utterances into pieces of sizes
determined by the --num-frames option.
Usage:  nnet3-discriminative-get-egs [options] <model> <features-rspecifier> <denominator-lattice-rspecifier> <numerator-alignment-rspecifier> <egs-wspecifier>
An example [where $feats expands to the actual features]:
  nnet3-discriminative-get-egs --left-context=25 --right-context=9 --num-frames=150,100,90 \
  "$feats" "ark,s,cs:gunzip -c lat.1.gz" scp:ali.scp ark:degs.1.ark 
nnet3-discriminative-copy-egs
 Copy examples for nnet3 discriminative training, possibly changing the binary mode.
Supports multiple wspecifiers, in which case it will write the examples
round-robin to the outputs.
Usage:  nnet3-discriminative-copy-egs [options] <egs-rspecifier> <egs-wspecifier1> [<egs-wspecifier2> ...]
e.g.
nnet3-discriminative-copy-egs ark:train.degs ark,t:text.degs
or:
nnet3-discriminative-copy-egs ark:train.degs ark:1.degs ark:2.degs 
nnet3-discriminative-merge-egs
 This copies nnet3 discriminative training examples from input to output, merging them
into composite examples.  The --minibatch-size option controls how many egs
are merged into a single output eg.
Usage:  nnet3-discriminative-egs [options] <egs-rspecifier> <egs-wspecifier>
e.g.
nnet3-discriminative-merge-egs --minibatch-size=128 ark:1.degs ark:- | nnet3-discriminative-train ... 
See also nnet3-discriminative-copy-egs 
nnet3-discriminative-shuffle-egs
 Copy nnet3 discriminative training examples from the input to output,
while randomly shuffling the order.  This program will keep all of the examples
in memory at once, unless you use the --buffer-size option
Usage:  nnet3-discriminative-shuffle-egs [options] <egs-rspecifier> <egs-wspecifier>
nnet3-discriminative-shuffle-egs --srand=1 ark:train.egs ark:shuffled.egs 
nnet3-discriminative-compute-objf
 Computes and prints to in logging messages the objective function per frame of
the given data with an nnet3 neural net.  The input of this is the output of
e.g. nnet3-discriminative-get-egs | nnet3-discriminative-merge-egs.
Usage:  nnet3-discrminative-compute-objf [options] <nnet3-model-in> <training-examples-in>
e.g.: nnet3-discriminative-compute-objf 0.mdl ark:valid.degs 
nnet3-discriminative-train
 Train nnet3 neural network parameters with discriminative sequence objective 
gradient descent.  Minibatches are to be created by nnet3-discriminative-merge-egs in
the input pipeline.  This training program is single-threaded (best to
use it with a GPU).
Usage:  nnet3-discriminative-train [options] <nnet-in> <discriminative-training-examples-in> <raw-nnet-out>
nnet3-discriminative-train 1.mdl 'ark:nnet3-merge-egs 1.degs ark:-|' 2.raw 
nnet3-discriminative-subset-egs
 Creates a random subset of the input examples, of a specified size.
Uses no more memory than the size of the subset.
Usage:  nnet3-discriminative-subset-egs [options] <degs-rspecifier> [<degs-wspecifier2> ...]
e.g.
nnet3-discriminative-copy-egs [args] ark:degs.1.ark ark:- | nnet-discriminative-subset-egs --n=1000 ark:- ark:subset.egs 
nnet3-get-egs-simple
 Get frame-by-frame examples of data for nnet3 neural network training.
This is like nnet3-get-egs, but does not split up its inputs into pieces
and allows more general generation of egs.  E.g. this is usable for image
recognition tasks.
Usage:  nnet3-get-egs-simple [options] <name1>=<rspecifier1> <name2>=<rspecifier2> ...
e.g.:
nnet3-get-egs-simple input=scp:images.scp \
output='ark,o:ali-to-post ark:labels.txt ark:- | post-to-smat --dim=10 ark:- ark:-' ark:egs.ark
See also: nnet3-get-egs 
nnet3-discriminative-compute-from-egs
 Read input nnet discriminative training examples, and compute the output for each one. This program is similar to nnet3-compute-from-egs, but works with discriminative egs. 
If --apply-exp=true, apply the Exp() function to the output before writing
it out.
Note: This program uses only the input; it does not do forward-backward
over the lattice. See nnet3-discriminative-compute-objf for that.
Usage:  nnet3-discriminative-compute-from-egs [options] <raw-nnet-in> <training-examples-in> <matrices-out>
e.g.:
nnet3-discriminative-compute-from-egs --apply-exp=true 0.raw ark:1.degs ark:- | matrix-sum-rows ark:- ... 
See also: nnet3-compute nnet3-compute-from-egs 
nnet3-latgen-faster-looped
 Generate lattices using nnet3 neural net model.
[this version uses the 'looped' computation, which may be slightly faster for
many architectures, but should not be used for backwards-recurrent architectures
such as BLSTMs.
Usage: nnet3-latgen-faster-looped [options] <nnet-in> <fst-in|fsts-rspecifier> <features-rspecifier> <lattice-wspecifier> [ <words-wspecifier> [<alignments-wspecifier>] ] 
nnet3-egs-augment-image
 Copy examples (single frames or fixed-size groups of frames) for neural
network training, doing image augmentation inline (copies after possibly
modifying of each image, randomly chosen according to configuration
parameters).
E.g.:
  nnet3-egs-augment-image --horizontal-flip-prob=0.5 --horizontal-shift=0.1\
       --vertical-shift=0.1 --srand=103 --num-channels=3 --fill-mode=nearest ark:- ark:-
Requires that each eg contain a NnetIo object 'input', with successive
't' values representing different x offsets , and the feature dimension
representing the y offset and the channel (color), with the channel
varying the fastest.
See also: nnet3-copy-egs 
nnet3-xvector-get-egs
 Get examples for training an nnet3 neural network for the xvector
system.  Each output example contains a chunk of features from some
utterance along with a speaker label.  The location and length of
the feature chunks are specified in the 'ranges' file.  Each line
is interpreted as follows:
  <source-utterance> <relative-output-archive-index> <absolute-archive-index> <start-frame-index> <num-frames> <speaker-label>
where <relative-output-archive-index> is interpreted as a zero-based
index into the wspecifiers provided on the command line (<egs-0-out>
and so on), and <absolute-archive-index> is ignored by this program.
For example:
  utt1  3  13  65  300  3
  utt1  0  10  50  400  3
  utt2  ...
Usage:  nnet3-xvector-get-egs [options] <ranges-filename> <features-rspecifier> <egs-0-out> <egs-1-out> ... <egs-N-1-out>
For example:
nnet3-xvector-get-egs ranges.1 "$feats" ark:egs_temp.1.ark  ark:egs_temp.2.ark ark:egs_temp.3.ark 
nnet3-xvector-compute
 Propagate features through an xvector neural network model and write
the output vectors.  "Xvector" is our term for a vector or
embedding which is the output of a particular type of neural network
architecture found in speaker recognition.  This architecture
consists of several layers that operate on frames, a statistics
pooling layer that aggregates over the frame-level representations
and possibly additional layers that operate on segment-level
representations.  The xvectors are generally extracted from an
output layer after the statistics pooling layer.  By default, one
xvector is extracted directly from the set of features for each
utterance.  Optionally, xvectors are extracted from chunks of input
features and averaged, to produce a single vector.
Usage: nnet3-xvector-compute [options] <raw-nnet-in> <features-rspecifier> <vector-wspecifier>
e.g.: nnet3-xvector-compute final.raw scp:feats.scp ark:nnet_prediction.ark
See also: nnet3-compute 
nnet3-xvector-compute-batched
 Propagate features through an xvector neural network model and write
the output vectors.  "Xvector" is our term for a vector or
embedding which is the output of a particular type of neural network
architecture found in speaker recognition.  This architecture
consists of several layers that operate on frames, a statistics
pooling layer that aggregates over the frame-level representations
and possibly additional layers that operate on segment-level
representations.  The xvectors are generally extracted from an
output layer after the statistics pooling layer.  By default, one
xvector is extracted directly from the set of features for each
utterance.  Optionally, xvectors are extracted from chunks of input
features and averaged, to produce a single vector.
Usage: nnet3-xvector-compute [options] <raw-nnet-in> <features-rspecifier> <vector-wspecifier>
e.g.: nnet3-xvector-compute final.raw scp:feats.scp ark:nnet_prediction.ark
See also: nnet3-compute 
nnet3-latgen-grammar
 Generate lattices using nnet3 neural net model, and GrammarFst-based graph
see kaldi-asr.org/doc/grammar.html for more context.
Usage: nnet3-latgen-grammar [options] <nnet-in> <grammar-fst-in> <features-rspecifier> <lattice-wspecifier> [ <words-wspecifier> [<alignments-wspecifier>] ] 
nnet3-compute-batch
 Propagate the features through raw neural network model and write the output.  This version is optimized for GPU use. If --apply-exp=true, apply the Exp() function to the output before writing it out.
Usage: nnet3-compute-batch [options] <nnet-in> <features-rspecifier> <matrix-wspecifier>
 e.g.: nnet3-compute-batch final.raw scp:feats.scp ark:nnet_prediction.ark 
nnet3-latgen-faster-batch
 Generate lattices using nnet3 neural net model.  This version is optimized
for GPU-based inference.
Usage: nnet3-latgen-faster-batch [options] <nnet-in> <fst-in> <features-rspecifier> <lattice-wspecifier> 
cuda-gpu-available
 Test if there is a GPU available, and if the GPU setup is correct.
A GPU is acquired and a small computation is done
(generating a random matrix and computing softmax for its rows).
exit-code: 0 = success, 1 = compiled without GPU support, -1 = error
Usage:  cuda-gpu-available 
cuda-compiled
 This program returns exit status 0 (success) if the code
was compiled with CUDA support, and 1 otherwise.  To support CUDA, you
must run 'configure' on a machine that has the CUDA compiler 'nvcc'
available. 
nnet-train-frmshuff
 Perform one iteration (epoch) of Neural Network training with
mini-batch Stochastic Gradient Descent. The training targets
are usually pdf-posteriors, prepared by ali-to-post.
Usage:  nnet-train-frmshuff [options] <feature-rspecifier> <targets-rspecifier> <model-in> [<model-out>]
e.g.: nnet-train-frmshuff scp:feats.scp ark:posterior.ark nnet.init nnet.iter1 
nnet-train-perutt
 Perform one iteration of NN training by SGD with per-utterance updates.
The training targets are represented as pdf-posteriors, usually prepared by ali-to-post.
Usage: nnet-train-perutt [options] <feature-rspecifier> <targets-rspecifier> <model-in> [<model-out>]
e.g.: nnet-train-perutt scp:feature.scp ark:posterior.ark nnet.init nnet.iter1 
nnet-train-mmi-sequential
 Perform one iteration of MMI training using SGD with per-utteranceupdates
Usage:  nnet-train-mmi-sequential [options] <model-in> <transition-model-in> <feature-rspecifier> <den-lat-rspecifier> <ali-rspecifier> [<model-out>]
e.g.: nnet-train-mmi-sequential nnet.init trans.mdl scp:feats.scp scp:denlats.scp ark:ali.ark nnet.iter1 
nnet-train-mpe-sequential
 Perform one iteration of MPE/sMBR training using SGD with per-utteranceupdates.
Usage:  nnet-train-mpe-sequential [options] <model-in> <transition-model-in> <feature-rspecifier> <den-lat-rspecifier> <ali-rspecifier> [<model-out>]
e.g.: nnet-train-mpe-sequential nnet.init trans.mdl scp:feats.scp scp:denlats.scp ark:ali.ark nnet.iter1 
nnet-train-multistream
 Perform one iteration of Multi-stream training, truncated BPTT for LSTMs.
The training targets are pdf-posteriors, usually prepared by ali-to-post.
The updates are per-utterance.
Usage: nnet-train-multistream [options] <feature-rspecifier> <targets-rspecifier> <model-in> [<model-out>]
e.g.: nnet-train-lstm-streams scp:feature.scp ark:posterior.ark nnet.init nnet.iter1 
nnet-train-multistream-perutt
 Perform one iteration of Multi-stream training, per-utterance BPTT for (B)LSTMs.
The updates are done per-utterance, while several utterances are 
processed at the same time.
Usage: nnet-train-multistream-perutt [options] <feature-rspecifier> <labels-rspecifier> <model-in> [<model-out>]
e.g.: nnet-train-blstm-streams scp:feats.scp ark:targets.ark nnet.init nnet.iter1 
rbm-train-cd1-frmshuff
 Train RBM by Contrastive Divergence alg. with 1 step of Markov Chain Monte-Carlo.
The tool can perform several iterations (--num-iters) or it can subsample the training dataset (--drop-data)
Usage: rbm-train-cd1-frmshuff [options] <model-in> <feature-rspecifier> <model-out>
e.g.: rbm-train-cd1-frmshuff 1.rbm.init scp:train.scp 1.rbm 
rbm-convert-to-nnet
 Convert RBM to <affinetransform> and <sigmoid>
Usage:  rbm-convert-to-nnet [options] <rbm-in> <nnet-out>
e.g.:
 rbm-convert-to-nnet --binary=false rbm.mdl nnet.mdl 
nnet-forward
 Perform forward pass through Neural Network.
Usage: nnet-forward [options] <nnet1-in> <feature-rspecifier> <feature-wspecifier>
e.g.: nnet-forward final.nnet ark:input.ark ark:output.ark 
nnet-copy
 Copy Neural Network model (and possibly change binary/text format)
Usage:  nnet-copy [options] <model-in> <model-out>
e.g.:
 nnet-copy --binary=false nnet.mdl nnet_txt.mdl 
nnet-info
 Print human-readable information about the neural network.
(topology, various weight statistics, etc.) It prints to stdout.
Usage:  nnet-info [options] <nnet-in>
e.g.:
 nnet-info 1.nnet 
nnet-concat
 Concatenate Neural Networks (and possibly change binary/text format)
Usage: nnet-concat [options] <nnet-in1> <...> <nnet-inN> <nnet-out>
e.g.:
 nnet-concat --binary=false nnet.1 nnet.2 nnet.1.2 
transf-to-nnet
 Convert transformation matrix to <affine-transform>
Usage:  transf-to-nnet [options] <transf-in> <nnet-out>
e.g.:
 transf-to-nnet --binary=false transf.mat nnet.mdl 
cmvn-to-nnet
 Convert cmvn-stats into <AddShift> and <Rescale> components.
Usage:  cmvn-to-nnet [options] <transf-in> <nnet-out>
e.g.:
 cmvn-to-nnet --binary=false transf.mat nnet.mdl 
nnet-initialize
 Initialize Neural Network parameters according to a prototype (nnet1).
Usage:  nnet-initialize [options] <nnet-prototype-in> <nnet-out>
e.g.: nnet-initialize --binary=false nnet.proto nnet.init 
feat-to-post
 Convert features into posterior format, which is the generic format 
of NN training targets in Karel's nnet1 tools.
(speed is not an issue for reasonably low NN-output dimensions)
Usage:  feat-to-post [options] feat-rspecifier posteriors-wspecifier
e.g.:
 feat-to-post scp:feats.scp ark:feats.post 
paste-post
 Combine 2 or more streams with NN-training targets into single stream.
As the posterior streams are pasted, the output dimension is the sum
of the input dimensions. This is used when training NN with
multiple softmaxes on its output. This is used in multi-task, 
multi-lingual or multi-database training. Depending on the context,
an utterance is not required to be in all the input streams.
For a multi-database training only 1 output layer will be active.
The lengths of utterances are provided as 1st argument.
The dimensions of input stream are set as 2nd in argument.
Follow the input and output streams which are in 'posterior' format.
Usage: paste-post <featlen-rspecifier> <dims-csl> <post1-rspecifier> ... <postN-rspecifier> <post-wspecifier>
e.g.: paste-post 'ark:feat-to-len $feats ark,t:-|' 1029:1124 ark:post1.ark ark:post2.ark ark:pasted.ark 
train-transitions
 Train the transition probabilities in transition-model (used in nnet1 recipe).
Usage: train-transitions [options] <trans-model-in> <alignments-rspecifier> <trans-model-out>
e.g.: train-transitions 1.mdl "ark:gunzip -c ali.*.gz|" 2.mdl 
nnet-set-learnrate
 Sets learning rate coefficient inside of 'nnet1' model
Usage: nnet-set-learnrate --components=<csl> --coef=<float> <nnet-in> <nnet-out>
e.g.: nnet-set-learnrate --components=1:3:5 --coef=0.5 --bias-coef=0.1 nnet-in nnet-out 
online2-wav-gmm-latgen-faster
 Reads in wav file(s) and simulates online decoding, including
basis-fMLLR adaptation and endpointing.  Writes lattices.
Models are specified via options.
Usage: online2-wav-gmm-latgen-faster [options] <fst-in> <spk2utt-rspecifier> <wav-rspecifier> <lattice-wspecifier>
Run egs/rm/s5/local/run_online_decoding.sh for example 
apply-cmvn-online
 Apply online cepstral mean (and possibly variance) computation online,
using the same code as used for online decoding in the 'new' setup in
online2/ and online2bin/.  If the --spk2utt option is used, it uses
prior utterances from the same speaker to back off to at the utterance
beginning.  See also apply-cmvn-sliding.
Usage: apply-cmvn-online [options] <global-cmvn-stats> <feature-rspecifier> <feature-wspecifier>
e.g. apply-cmvn-online 'matrix-sum scp:data/train/cmvn.scp -|' data/train/split8/1/feats.scp ark:-
or: apply-cmvn-online --spk2utt=ark:data/train/split8/1/spk2utt 'matrix-sum scp:data/train/cmvn.scp -|'  data/train/split8/1/feats.scp ark:- 
extend-wav-with-silence
 Extend wave data with a fairly long silence at the end (e.g. 5 seconds).
The input waveforms are assumed having silences at the begin/end and those
segments are extracted and appended to the end of the utterance.
Note this is for use in testing endpointing in decoding.
Usage: extend-wav-with-silence [options] <wav-rspecifier> <wav-wspecifier>
       extend-wav-with-silence [options] <wav-rxfilename> <wav-wxfilename> 
compress-uncompress-speex
 Demonstrating how to use the Speex wrapper in Kaldi by compressing input waveforms 
chunk by chunk and then decompressing them.
Usage: compress-uncompress-speex [options] <wav-rspecifier> <wav-wspecifier> 
online2-wav-nnet2-latgen-faster
 Reads in wav file(s) and simulates online decoding with neural nets
(nnet2 setup), with optional iVector-based speaker adaptation and
optional endpointing.  Note: some configuration values and inputs are
set via config files whose filenames are passed as options
Usage: online2-wav-nnet2-latgen-faster [options] <nnet2-in> <fst-in> <spk2utt-rspecifier> <wav-rspecifier> <lattice-wspecifier>
The spk2utt-rspecifier can just be <utterance-id> <utterance-id> if
you want to decode utterance by utterance.
See egs/rm/s5/local/run_online_decoding_nnet2.sh for example
See also online2-wav-nnet2-latgen-threaded 
ivector-extract-online2
 Extract iVectors for utterances every --ivector-period frames, using a trained
iVector extractor and features and Gaussian-level posteriors.  Similar to
ivector-extract-online but uses the actual online decoder code to do it,
and does everything in-memory instead of using multiple processes.
Note: the value of the --use-most-recent-ivector config variable is ignored
it's set to false.  The <spk2utt-rspecifier> is mandatory, to simplify the code;
if you want to do it separately per utterance, just make it of the form
<utterance-id> <utterance-id>.
The iVectors are output as an archive of matrices, indexed by utterance-id;
each row corresponds to an iVector.  If --repeat=true, outputs the whole matrix
of iVectors, not just every (ivector-period)'th frame
The input features are the raw, non-cepstral-mean-normalized features, e.g. MFCC.
Usage:  ivector-extract-online2 [options] <spk2utt-rspecifier> <feature-rspecifier> <ivector-wspecifier>
e.g.: 
  ivector-extract-online2 --config=exp/nnet2_online/nnet_online/conf/ivector_extractor.conf \
    ark:data/train/spk2utt scp:data/train/feats.scp ark,t:ivectors.1.ark 
online2-wav-dump-features
 Reads in wav file(s) and processes them as in online2-wav-nnet2-latgen-faster,
but instead of decoding, dumps the features.  Most of the parameters
are set via configuration variables.
Usage: online2-wav-dump-features [options] <spk2utt-rspecifier> <wav-rspecifier> <feature-wspecifier>
The spk2utt-rspecifier can just be <utterance-id> <utterance-id> if
you want to generate features utterance by utterance.
Alternate usage: online2-wav-dump-features [options] --print-ivector-dim=true
See steps/online/nnet2/{dump_nnet_activations,get_egs.sh} for examples. 
ivector-randomize
 Copy matrices of online-estimated iVectors, but randomize them;
this is intended primarily for training the online nnet2 setup
with iVectors.  For each input matrix, each row with index t is,
with probability given by the option --randomize-prob, replaced
with the contents an input row chosen randomly from the interval [t, T]
where T is the index of the last row of the matrix.
Usage: ivector-randomize [options] <ivector-rspecifier> <ivector-wspecifier>
 e.g.: ivector-randomize ark:- ark:-
See also: ivector-extract-online, ivector-extract-online2, subsample-feats 
online2-wav-nnet2-am-compute
 Simulates the online neural net computation for each file of input
features, and outputs as a matrix the result, with optional
iVector-based speaker adaptation. Note: some configuration values
and inputs are set via config files whose filenames are passed as
options.  Used mostly for debugging.
Note: if you want it to apply a log (e.g. for log-likelihoods), use
--apply-log=true.
Usage:  online2-wav-nnet2-am-compute [options] <nnet-in>
<spk2utt-rspecifier> <wav-rspecifier> <feature-or-loglikes-wspecifier>
The spk2utt-rspecifier can just be <utterance-id> <utterance-id> if
you want to compute utterance by utterance. 
online2-wav-nnet2-latgen-threaded
 Reads in wav file(s) and simulates online decoding with neural nets
(nnet2 setup), with optional iVector-based speaker adaptation and
optional endpointing.  This version uses multiple threads for decoding.
Note: some configuration values and inputs are set via config files
whose filenames are passed as options
Usage: online2-wav-nnet2-latgen-threaded [options] <nnet2-in> <fst-in> <spk2utt-rspecifier> <wav-rspecifier> <lattice-wspecifier>
The spk2utt-rspecifier can just be <utterance-id> <utterance-id> if
you want to decode utterance by utterance.
See egs/rm/s5/local/run_online_decoding_nnet2.sh for example
See also online2-wav-nnet2-latgen-faster 
online2-wav-nnet3-latgen-faster
 Reads in wav file(s) and simulates online decoding with neural nets
(nnet3 setup), with optional iVector-based speaker adaptation and
optional endpointing.  Note: some configuration values and inputs are
set via config files whose filenames are passed as options
Usage: online2-wav-nnet3-latgen-faster [options] <nnet3-in> <fst-in> <spk2utt-rspecifier> <wav-rspecifier> <lattice-wspecifier>
The spk2utt-rspecifier can just be <utterance-id> <utterance-id> if
you want to decode utterance by utterance. 
online2-wav-nnet3-latgen-grammar
 Reads in wav file(s) and simulates online decoding with neural nets
(nnet3 setup), with optional iVector-based speaker adaptation and
optional endpointing.  Note: some configuration values and inputs are
set via config files whose filenames are passed as options.
This program like online2-wav-nnet3-latgen-faster but when the FST to
be decoded is of type GrammarFst.
Usage: online2-wav-nnet3-latgen-grammar [options] <nnet3-in> <fst-in> <spk2utt-rspecifier> <wav-rspecifier> <lattice-wspecifier>
The spk2utt-rspecifier can just be <utterance-id> <utterance-id> if
you want to decode utterance by utterance. 
online2-tcp-nnet3-decode-faster
 Reads in audio from a network socket and performs online
decoding with neural nets (nnet3 setup), with iVector-based
speaker adaptation and endpointing.
Note: some configuration values and inputs are set via config
files whose filenames are passed as options
Usage: online2-tcp-nnet3-decode-faster [options] <nnet3-in> <fst-in> <word-symbol-table> 
online2-wav-nnet3-latgen-incremental
 Reads in wav file(s) and simulates online decoding with neural nets
(nnet3 setup), with optional iVector-based speaker adaptation and
optional endpointing.  Note: some configuration values and inputs are
set via config files whose filenames are passed as options
The lattice determinization algorithm here can operate
incrementally.
Usage: online2-wav-nnet3-latgen-incremental [options] <nnet3-in> <fst-in> <spk2utt-rspecifier> <wav-rspecifier> <lattice-wspecifier>
The spk2utt-rspecifier can just be <utterance-id> <utterance-id> if
you want to decode utterance by utterance. 
online-net-client
 Takes input using a microphone(PortAudio), extracts features and sends them
to a speech recognition server over a network connection
Usage: online-net-client server-address server-port 
online-server-gmm-decode-faster
 Decode speech, using feature batches received over a network connection
Utterance segmentation is done on-the-fly.
Feature splicing/LDA transform is used, if the optional(last) argument is given.
Otherwise delta/delta-delta(2-nd order) features are produced.
Usage: online-server-gmm-decode-faster [options] model-infst-in word-symbol-table silence-phones udp-port [lda-matrix-in]
Example: online-server-gmm-decode-faster --rt-min=0.3 --rt-max=0.5 --max-active=4000 --beam=12.0 --acoustic-scale=0.0769 model HCLG.fst words.txt '1:2:3:4:5' 1234 lda-matrix 
online-gmm-decode-faster
 Decode speech, using microphone input(PortAudio)
Utterance segmentation is done on-the-fly.
Feature splicing/LDA transform is used, if the optional(last) argument is given.
Otherwise delta/delta-delta(2-nd order) features are produced.
Usage: online-gmm-decode-faster [options] <model-in><fst-in> <word-symbol-table> <silence-phones> [<lda-matrix-in>]
Example: online-gmm-decode-faster --rt-min=0.3 --rt-max=0.5 --max-active=4000 --beam=12.0 --acoustic-scale=0.0769 model HCLG.fst words.txt '1:2:3:4:5' lda-matrix 
online-wav-gmm-decode-faster
 Reads in wav file(s) and simulates online decoding.
Writes integerized-text and .ali files for WER computation. Utterance segmentation is done on-the-fly.
Feature splicing/LDA transform is used, if the optional(last) argument is given.
Otherwise delta/delta-delta(i.e. 2-nd order) features are produced.
Caution: the last few frames of the wav file may not be decoded properly.
Hence, don't use one wav file per utterance, but rather use one wav file per show.
Usage: online-wav-gmm-decode-faster [options] wav-rspecifier model-infst-in word-symbol-table silence-phones transcript-wspecifier alignments-wspecifier [lda-matrix-in]
Example: ./online-wav-gmm-decode-faster --rt-min=0.3 --rt-max=0.5 --max-active=4000 --beam=12.0 --acoustic-scale=0.0769 scp:wav.scp model HCLG.fst words.txt '1:2:3:4:5' ark,t:trans.txt ark,t:ali.txt 
online-audio-server-decode-faster
 Starts a TCP server that receives RAW audio and outputs aligned words.
A sample client can be found in: onlinebin/online-audio-client
Usage: online-audio-server-decode-faster [options] model-in fst-in word-symbol-table silence-phones word_boundary_file tcp-port [lda-matrix-in]
example: online-audio-server-decode-faster --verbose=1 --rt-min=0.5 --rt-max=3.0 --max-active=6000
--beam=72.0 --acoustic-scale=0.0769 final.mdl graph/HCLG.fst graph/words.txt '1:2:3:4:5'
graph/word_boundary.int 5000 final.mat 
online-audio-client
 Sends an audio file to the KALDI audio server (onlinebin/online-audio-server-decode-faster)
and prints the result optionally saving it to an HTK label file or WebVTT subtitles file
e.g.: ./online-audio-client 192.168.50.12 9012 'scp:wav_files.scp' 
rnnlm-get-egs
 This program processes lines of text (typically sentences) with weights,
in a format like:
  1.0 67 5689 21 8940 6723
and turns them into examples (class RnnlmExample) for RNNLM training.
This involves splitting up the sentences to a maximum length,
importance sampling and other procedures.
Usage:
(1) no sampling:
 rnnlm-get-egs [options] <sentences-rxfilename> <rnnlm-egs-wspecifier>
(2) sampling, ARPA LM read:
 rnnlm-get-egs [options] <symbol-table> <ARPA-rxfilename> \
                         <sentences-rxfilename>  <rnnlm-egs-wspecifier>
(3) sampling, non-ARPA LM read:
    rnnlm-get-egs [options] <LM-rxfilename> <sentences-rxfilename>\
                            <rnnlm-egs-wspecifier>
E.g.:
 ... | rnnlm-get-egs --vocab-size=20002 - ark:- | rnnlm-train ...
or (with sampling, reading LM as ARPA):
 ... | rnnlm-get-egs words.txt foo.arpa - ark:- | rnnlm-train ...
or (with sampling, reading LM natively):
 ... | rnnlm-get-egs sampling.lm - ark:- | rnnlm-train ...
See also: rnnlm-train 
rnnlm-train
 Train nnet3-based RNNLM language model (reads minibatches prepared
by rnnlm-get-egs).  Supports various modes depending which parameters
we are training.
Usage:
 rnnlm-train [options] <egs-rspecifier>
e.g.:
 rnnlm-get-egs ... ark:- | \
 rnnlm-train --read-rnnlm=foo/0.raw --write-rnnlm=foo/1.raw --read-embedding=foo/0.embedding \
       --write-embedding=foo/1.embedding --read-sparse-word-features=foo/word_feats.txt ark:-
See also: rnnlm-get-egs 
rnnlm-get-sampling-lm
 Estimate highly-pruned backoff LM for use in importance sampling for
RNNLM training.  Reads integerized text.
Usage:
 rnnlm-get-sampling-lm [options] <input-integerized-weighted-text> \
            <sampling-lm-out>
 (this form writes a non-human-readable format that can be read by
 rnnlm-get-egs).
 e.g.:
  ... | rnnlm-get-sampling-lm --vocab-size=10002 - sampling.lm
The word symbol table is used to write the ARPA file, but is expected
to already have been used to convert the words into integer form.
Each line of integerized input text should have a corpus weight as
the first field, e.g.:
 1.0   782 1271 3841 82
and lines of input text should not be repeated (just increase the
weight).
See also: rnnlm-get-egs 
rnnlm-get-word-embedding
 This very simple program multiplies a sparse matrix by a
dense matrix compute the word embedding (which is also a dense matrix).
The sparse matrix is in a text format specific to the RNNLM tools.
Usage:
 rnnlm-get-word-embedding [options] <sparse-word-features-rxfilename> \
   <feature-embedding-rxfilename> <word-embedding-wxfilename>
 e.g.:
 rnnlm-get-word-embedding word_features.txt feat_embedding.mat word_embedding.mat
See also: rnnlm-get-egs, rnnlm-train 
rnnlm-compute-prob
 This program computes the probability per word of the provided training
data in 'egs' format as prepared by rnnlm-get-egs.  The interface is similar
to rnnlm-train, except that it doesn't train, and doesn't write the model;
it just prints the average probability to the standard output (in addition
to printing various diagnostics to the standard error).
Usage:
 rnnlm-compute-prob [options] <rnnlm> <word-embedding-matrix> <egs-rspecifier>
e.g.:
 rnnlm-get-egs ... ark:- | \
 rnnlm-compute-prob 0.raw 0.word_embedding ark:-
(note: use rnnlm-get-word-embedding to get the word embedding matrix if
you are using sparse word features.) 
rnnlm-sentence-probs
 This program takes input of a text corpus (with words represented by
symbol-id's), and an already trained RNNLM model, and prints the log
-probabilities of each word in the corpus. The RNNLM resets its hidden
state for each new line. This is used in n-best rescoring with RNNLMs
An example the n-best rescoring usage is at egs/swbd/s5c$ vi local/rnnlm/run_tdnn_lstm.sh
Usage:
 rnnlm-sentence-probs [options] <rnnlm> <word-embedding-matrix> <input-text-file> 
e.g.:
 rnnlm-sentence-probs rnnlm/final.raw rnnlm/final.word_embedding dev_corpus.txt > output_logprobs.txt 
sgmm2-init
 Initialize an SGMM from a trained full-covariance UBM and a specified model topology.
Usage: sgmm2-init [options] <topology> <tree> <init-model> <sgmm-out>
The <init-model> argument can be a UBM (the default case) or another
SGMM (if the --init-from-sgmm flag is used).
For systems with two-level tree, use --pdf-map argument. 
sgmm2-gselect
 Precompute Gaussian indices for SGMM training Usage: sgmm2-gselect [options] <model-in> <feature-rspecifier> <gselect-wspecifier>
e.g.: sgmm2-gselect 1.sgmm "ark:feature-command |" ark:1.gs
Note: you can do the same thing by combining the programs sgmm2-write-ubm, fgmm-global-to-gmm,
gmm-gselect and fgmm-gselect 
sgmm2-acc-stats
 Accumulate stats for SGMM training.
Usage: sgmm2-acc-stats [options] <model-in> <feature-rspecifier> <posteriors-rspecifier> <stats-out>
e.g.: sgmm2-acc-stats --gselect=ark:gselect.ark 1.mdl 1.ali scp:train.scp 'ark:ali-to-post 1.ali ark:-|' 1.acc
(note: gselect option is mandatory) 
sgmm2-est
 Estimate SGMM model parameters from accumulated stats.
Usage: sgmm2-est [options] <model-in> <stats-in> <model-out> 
sgmm2-sum-accs
 Sum multiple accumulated stats files for SGMM training.
Usage: sgmm2-sum-accs [options] stats-out stats-in1 stats-in2 ... 
sgmm2-align-compiled
 Align features given [SGMM-based] models.
Usage: sgmm2-align-compiled [options] <model-in> <graphs-rspecifier> <feature-rspecifier> <alignments-wspecifier>
e.g.: sgmm2-align-compiled 1.mdl ark:graphs.fsts scp:train.scp ark:1.ali 
sgmm2-est-spkvecs
 Estimate SGMM speaker vectors, either per utterance or for the supplied set of speakers (with spk2utt option).
Reads Gaussian-level posteriors. Writes to a table of vectors.
Usage: sgmm2-est-spkvecs [options] <model-in> <feature-rspecifier> <post-rspecifier> <vecs-wspecifier>
note: --gselect option is required. 
sgmm2-post-to-gpost
 Convert posteriors to Gaussian-level posteriors for SGMM training.
Usage: sgmm2-post-to-gpost [options] <model-in> <feature-rspecifier> <posteriors-rspecifier> <gpost-wspecifier>
e.g.: sgmm2-post-to-gpost 1.mdl 1.ali scp:train.scp 'ark:ali-to-post ark:1.ali ark:-|' ark:- 
sgmm2-acc-stats-gpost
 Accumulate stats for SGMM training, given Gaussian-level posteriors
Usage: sgmm2-acc-stats-gpost [options] <model-in> <feature-rspecifier> <gpost-rspecifier> <stats-out>
e.g.: sgmm2-acc-stats-gpost 1.mdl 1.ali scp:train.scp ark, s, cs:- 1.acc 
sgmm2-latgen-faster
 Decode features using SGMM-based model.
Usage:  sgmm2-latgen-faster [options] <model-in> (<fst-in>|<fsts-rspecifier>) <features-rspecifier> <lattices-wspecifier> [<words-wspecifier> [<alignments-wspecifier>] ] 
sgmm2-est-spkvecs-gpost
 Estimate SGMM speaker vectors, either per utterance or for the supplied set of speakers (with spk2utt option).
Reads Gaussian-level posteriors. Writes to a table of vectors.
Usage: sgmm2-est-spkvecs-gpost [options] <model-in> <feature-rspecifier> <gpost-rspecifier> <vecs-wspecifier> 
sgmm2-rescore-lattice
 Replace the acoustic scores on a lattice using a new model.
Usage: sgmm2-rescore-lattice [options] <model-in> <lattice-rspecifier> <feature-rspecifier> <lattice-wspecifier>
 e.g.: sgmm2-rescore-lattice 1.mdl ark:1.lats scp:trn.scp ark:2.lats 
sgmm2-copy
 Copy SGMM (possibly changing binary/text format)
Usage: sgmm2-copy [options] <model-in> <model-out>
e.g.: sgmm2-copy --binary=false 1.mdl 1_text.mdl 
sgmm2-info
 Print various information about an SGMM.
Usage: sgmm2-info [options] <model-in> [model-in2 ... ] 
sgmm2-est-ebw
 Estimate SGMM model parameters discriminatively using Extended
Baum-Welch style of update
Usage: sgmm2-est-ebw [options] <model-in> <num-stats-in> <den-stats-in> <model-out> 
sgmm2-acc-stats2
 Accumulate numerator and denominator stats for discriminative training
of SGMMs (input is posteriors of mixed sign)
Usage: sgmm2-acc-stats2 [options] <model-in> <feature-rspecifier> <posteriors-rspecifier> <num-stats-out> <den-stats-out>
e.g.: sgmm2-acc-stats2 1.mdl 1.ali scp:train.scp ark:1.posts num.acc den.acc 
sgmm2-comp-prexform
 Compute "pre-transform" parameters required for estimating fMLLR with
SGMMs, and write to a model file, after the SGMM.
Usage: sgmm2-comp-prexform [options] <sgmm2-in> <occs-in> <sgmm-out> 
sgmm2-est-fmllr
 Estimate FMLLR transform for SGMMs, either per utterance or for the supplied set of speakers (with spk2utt option).
Reads state-level posteriors. Writes to a table of matrices.
--gselect option is mandatory.
Usage: sgmm2-est-fmllr [options] <model-in> <feature-rspecifier> <post-rspecifier> <mats-wspecifier> 
sgmm2-project
 Compute SGMM model projection that only models a part of a pre-LDA space.
Used in predictive SGMMs.  Takes as input an LDA+MLLT transform,
and outputs a transform from the pre-LDA+MLLT space to the space that
we want to model
Usage: sgmm2-project [options] <model-in> <lda-mllt-mat-in> <model-out> <new-projection-out>
e.g.: sgmm2-project --start-dim=0 --end-dim=52 final.mdl final.inv_full_mat final_proj1.mdl proj1.mat 
sgmm2-latgen-faster-parallel
 Decode features using SGMM-based model.  This version accepts the --num-threads
option but otherwise behaves identically to sgmm2-latgen-faster
Usage:  sgmm2-latgen-faster-parallel [options] <model-in> (<fst-in>|<fsts-rspecifier>) <features-rspecifier> <lattices-wspecifier> [<words-wspecifier> [<alignments-wspecifier>] ] 
init-ubm
 Cluster the Gaussians in a diagonal-GMM acoustic model
to a single full-covariance or diagonal-covariance GMM.
Usage: init-ubm [options] <model-file> <state-occs> <gmm-out> 
lattice-lmrescore-tf-rnnlm
 Rescores lattice with rnnlm that is trained with TensorFlow.
An example script for training and rescoring with the TensorFlow
RNNLM is at egs/ami/s5/local/tfrnnlm/run_lstm_fast.sh
Usage: lattice-lmrescore-tf-rnnlm [options] [unk-file] <rnnlm-wordlist> \
             <word-symbol-table-rxfilename> <lattice-rspecifier> \
             <rnnlm-rxfilename> <lattice-wspecifier>
 e.g.: lattice-lmrescore-tf-rnnlm --lm-scale=0.5     data/tensorflow_lstm/unkcounts.txt data/tensorflow_lstm/rnnwords.txt \
    data/lang/words.txt ark:in.lats data/tensorflow_lstm/rnnlm ark:out.lats 
lattice-lmrescore-tf-rnnlm-pruned
 Rescores lattice with rnnlm that is trained with TensorFlow.
An example script for training and rescoring with the TensorFlow
RNNLM is at egs/ami/s5/local/tfrnnlm/run_lstm_fast.sh
Usage: lattice-lmrescore-tf-rnnlm-pruned [options] [unk-file] \
             <old-lm> <fst-wordlist> <rnnlm-wordlist> \
             <rnnlm-rxfilename> <lattice-rspecifier> <lattice-wspecifier>
 e.g.: lattice-lmrescore-tf-rnnlm-pruned --lm-scale=0.5 data/tensorflow_lstm/unkcounts.txt \
              data/test/G.fst data/lang/words.txt data/tensorflow_lstm/rnnwords.txt \
              data/tensorflow_lstm/rnnlm ark:in.lats ark:out.lats
 e.g.: lattice-lmrescore-tf-rnnlm-pruned --lm-scale=0.5 data/tensorflow_lstm/unkcounts.txt \
              data/test_fg/G.carpa data/lang/words.txt data/tensorflow_lstm/rnnwords.txt \
              data/tensorflow_lstm/rnnlm ark:in.lats ark:out.lats