This page contains a list of all the Kaldi tools, with their brief functions and usage messages.
Tools | Description |
align-equal | Write equally spaced alignments of utterances (to get training started) Usage: align-equal <tree-in> <model-in> <lexicon-fst-in> <features-rspecifier> <transcriptions-rspecifier> <alignments-wspecifier> e.g.: align-equal 1.tree 1.mdl lex.fst scp:train.scp 'ark:sym2int.pl -f 2- words.txt text|' ark:equal.ali |
align-equal-compiled | Write an equally spaced alignment (for getting training started)Usage: align-equal-compiled <graphs-rspecifier> <features-rspecifier> <alignments-wspecifier> e.g.: align-equal-compiled 1.fsts scp:train.scp ark:equal.ali |
acc-tree-stats | Accumulate statistics for phonetic-context tree building. Usage: acc-tree-stats [options] <model-in> <features-rspecifier> <alignments-rspecifier> <tree-accs-out> e.g.: acc-tree-stats 1.mdl scp:train.scp ark:1.ali 1.tacc |
show-alignments | Display alignments in human-readable form Usage: show-alignments [options] <phone-syms> <model> <alignments-rspecifier> e.g.: show-alignments phones.txt 1.mdl ark:1.ali See also: ali-to-phones, copy-int-vector |
compile-questions | Compile questions Usage: compile-questions [options] <topo> <questions-text-file> <questions-out> e.g.: compile-questions questions.txt questions.qst |
cluster-phones | Cluster phones (or sets of phones) into sets for various purposes Usage: cluster-phones [options] <tree-stats-in> <phone-sets-in> <clustered-phones-out> e.g.: cluster-phones 1.tacc phonesets.txt questions.txt |
compute-wer | Compute WER by comparing different transcriptions Takes two transcription files, in integer or text format, and outputs overall WER statistics to standard output. Usage: compute-wer [options] <ref-rspecifier> <hyp-rspecifier> E.g.: compute-wer --text --mode=present ark:data/train/text ark:hyp_text See also: align-text, Example scoring script: egs/wsj/s5/steps/score_kaldi.sh |
compute-wer-bootci | Compute a bootstrapping of WER to extract the 95% confidence interval. Take a reference and a transcription file, in integer or text format, and outputs overall WER statistics to standard output along with its confidence interval using the bootstrap method of Bisani and Ney. If a second transcription file corresponding to the same reference is provided, a bootstrap comparison of the two transcription is performed to estimate the probability of improvement. Usage: compute-wer-bootci [options] <ref-rspecifier> <hyp-rspecifier> [<hyp2-rspecifier>] E.g.: compute-wer-bootci --mode=present ark:data/train/text ark:hyp_text or compute-wer-bootci ark:data/train/text ark:hyp_text ark:hyp_text2 See also: compute-wer |
make-h-transducer | Make H transducer from transition-ids to context-dependent phones, without self-loops [use add-self-loops to add them] Usage: make-h-transducer <ilabel-info-file> <tree-file> <transition-gmm/acoustic-model> [<H-fst-out>] e.g.: make-h-transducer ilabel_info 1.tree 1.mdl > H.fst |
add-self-loops | Add self-loops and transition probabilities to transducer. Input transducer has transition-ids on the input side, but only the forward transitions, not the self-loops. Output transducer has transition-ids on the input side, but with self-loops added. The --reorder option controls whether the loop is added before the forward transition (if false), or afterward (if true). The default (true) is recommended as the decoding will in that case be faster. Usage: add-self-loops [options] transition-gmm/acoustic-model [fst-in] [fst-out] e.g.: add-self-loops --self-loop-scale=0.1 1.mdl HCLGa.fst HCLG.fst or: add-self-loops --self-loop-scale=0.1 1.mdl <HCLGa.fst >HCLG.fst |
convert-ali | Convert alignments from one decision-tree/model to another Usage: convert-ali [options] <old-model> <new-model> <new-tree> <old-alignments-rspecifier> <new-alignments-wspecifier> e.g.: convert-ali old/final.mdl new/0.mdl new/tree ark:old/ali.1 ark:new/ali.1 |
compile-train-graphs | Creates training graphs (without transition-probabilities, by default) Usage: compile-train-graphs [options] <tree-in> <model-in> <lexicon-fst-in> <transcriptions-rspecifier> <graphs-wspecifier> e.g.: compile-train-graphs tree 1.mdl lex.fst 'ark:sym2int.pl -f 2- words.txt text|' ark:graphs.fsts |
compile-train-graphs-fsts | Creates training graphs (without transition-probabilities, by default) This version takes FSTs as inputs (e.g., representing a separate weighted grammar for each utterance) Note: the lexicon should contain disambiguation symbols and you should supply the --read-disambig-syms option which is the filename of a list of disambiguation symbols. Warning: you probably want to set the --transition-scale and --self-loop-scale options; the defaults (zero) are probably not appropriate. Usage: compile-train-graphs-fsts [options] <tree-in> <model-in> <lexicon-fst-in> <graphs-rspecifier> <graphs-wspecifier> e.g.: compile-train-graphs-fsts --read-disambig-syms=disambig.list\ tree 1.mdl lex.fst ark:train.fsts ark:graphs.fsts |
make-pdf-to-tid-transducer | Make transducer from pdfs to transition-ids Usage: make-pdf-to-tid-transducer model-filename [fst-out] e.g.: make-pdf-to-tid-transducer 1.mdl > pdf2tid.fst |
make-ilabel-transducer | Make transducer that de-duplicates context-dependent ilabels that map to the same state Usage: make-ilabel-transducer ilabel-info-right tree-file transition-gmm/model ilabel-info-left [mapping-fst-out] e.g.: make-ilabel-transducer old_ilabel_info 1.tree 1.mdl new_ilabel_info > convert.fst |
show-transitions | Print debugging info from transition model, in human-readable form Usage: show-transitions <phones-symbol-table> <transition/model-file> [<occs-file>] e.g.: show-transitions phones.txt 1.mdl 1.occs |
ali-to-phones | Convert model-level alignments to phone-sequences (in integer, not text, form) Usage: ali-to-phones [options] <model> <alignments-rspecifier> <phone-transcript-wspecifier|ctm-wxfilename> e.g.: ali-to-phones 1.mdl ark:1.ali ark:- or: ali-to-phones --ctm-output 1.mdl ark:1.ali 1.ctm See also: show-alignments lattice-align-phones, compare-int-vector |
ali-to-post | Convert alignments to posteriors. This is simply a format change from integer vectors to Posteriors, which are vectors of lists of pairs (int, float) where the float represents the posterior. The floats would all be 1.0 in this case. The posteriors will still be in terms of whatever integer index the input contained, which will be transition-ids if they came directly from decoding, or pdf-ids if they were processed by ali-to-post. Usage: ali-to-post [options] <alignments-rspecifier> <posteriors-wspecifier> e.g.: ali-to-post ark:1.ali ark:1.post See also: ali-to-pdf, ali-to-phones, show-alignments, post-to-weights |
weight-silence-post | Apply weight to silences in posts Usage: weight-silence-post [options] <silence-weight> <silence-phones> <model> <posteriors-rspecifier> <posteriors-wspecifier> e.g.: weight-silence-post 0.0 1:2:3 1.mdl ark:1.post ark:nosil.post |
acc-lda | Accumulate LDA statistics based on pdf-ids. Usage: acc-lda [options] <transition-gmm/model> <features-rspecifier> <posteriors-rspecifier> <lda-acc-out> Typical usage: ali-to-post ark:1.ali ark:- | acc-lda 1.mdl "ark:splice-feats scp:train.scp|" ark:- ldaacc.1 |
est-lda | Estimate LDA transform using stats obtained with acc-lda. Usage: est-lda [options] <lda-matrix-out> <lda-acc-1> <lda-acc-2> ... |
ali-to-pdf | Converts alignments (containing transition-ids) to pdf-ids, zero-based. Usage: ali-to-pdf [options] <model> <alignments-rspecifier> <pdfs-wspecifier> e.g.: ali-to-pdf 1.mdl ark:1.ali ark,t:- |
est-mllt | Do update for MLLT (also known as STC) Usage: est-mllt [options] <mllt-mat-out> <stats-in1> <stats-in2> ... e.g.: est-mllt 2.mat 1a.macc 1b.macc ... where the stats are obtained from gmm-acc-mllt Note: use compose-transforms <mllt-mat-out> <prev-mllt-mat> to combine with previous MLLT or LDA transform, if any, and gmm-transform-means to apply <mllt-mat-out> to GMM means. |
build-tree | Train decision tree Usage: build-tree [options] <tree-stats-in> <roots-file> <questions-file> <topo-file> <tree-out> e.g.: build-tree treeacc roots.txt 1.qst topo tree |
build-tree-two-level | Trains two-level decision tree. Outputs the larger tree, and a mapping from the leaf-ids of the larger tree to those of the smaller tree. Useful, for instance, in tied-mixture systems with multiple codebooks. Usage: build-tree-two-level [options] <tree-stats-in> <roots-file> <questions-file> <topo-file> <tree-out> <mapping-out> e.g.: build-tree-two-level treeacc roots.txt 1.qst topo tree tree.map |
decode-faster | Decode, reading log-likelihoods (of transition-ids or whatever symbol is on the graph) as matrices. Note: you'll usually want decode-faster-mapped rather than this program. Usage: decode-faster [options] <fst-in> <loglikes-rspecifier> <words-wspecifier> [<alignments-wspecifier>] |
decode-faster-mapped | Decode, reading log-likelihoods as matrices (model is needed only for the integer mappings in its transition-model) Usage: decode-faster-mapped [options] <model-in> <fst-in> <loglikes-rspecifier> <words-wspecifier> [<alignments-wspecifier>] |
vector-scale | Scale vectors, or archives of vectors (useful for speaker vectors and per-frame weights) Usage: vector-scale [options] <vector-in-rspecifier> <vector-out-wspecifier> or: vector-scale [options] <vector-in-rxfilename> <vector-out-wxfilename> e.g.: vector-scale --scale=-1.0 1.vec - vector-scale --scale=-2.0 ark:vec.ark ark,t:- See also: copy-vector, vector-sum |
copy-transition-model | Copies a transition model (this can be used to separate transition models from the acoustic models they are written with. Usage: copy-transition-model [options] <transition-model or model file> <transition-model-out> e.g.: copy-transition-model --binary=false 1.mdl 1.txt |
phones-to-prons | Convert pairs of (phone-level, word-level) transcriptions to output that indicates the phones assigned to each word. Format is standard format for archives of vector<vector<int32> > i.e. : utt-id 600 4 7 19 ; 512 4 18 ; 0 1 where 600, 512 and 0 are the word-ids (0 for non-word phones, e.g. optional-silence introduced by the lexicon), and the phone-ids follow the word-ids. Note: L_align.fst must have word-start and word-end symbols in it Usage: phones-to-prons [options] <L_align.fst> <word-start-sym> <word-end-sym> <phones-rspecifier> <words-rspecifier> <prons-wspecifier> e.g.: ali-to-phones 1.mdl ark:1.ali ark:- | \ phones-to-prons L_align.fst 46 47 ark:- 'ark:sym2int.pl -f 2- words.txt text|' ark:1.prons |
prons-to-wordali | Caution: this program relates to older scripts and is deprecated, for modern scripts see egs/wsj/s5/steps/{get_ctm,get_train_ctm}.sh Given per-utterance pronunciation information as output by words-to-prons, and per-utterance phone alignment information as output by ali-to-phones --write-lengths, output word alignment information that can be turned into the ctm format. Outputs is pairs of (word, #frames), or if --per-frame is given, just the word for each frame. Note: zero word-id usually means optional silence. Format is standard format for archives of vector<pair<int32, int32> > i.e. : utt-id 600 22 ; 1028 32 ; 0 41 where 600, 1028 and 0 are the word-ids, and 22, 32 and 41 are the lengths. Usage: prons-to-wordali [options] <prons-rspecifier> <phone-lengths-rspecifier> <wordali-wspecifier> e.g.: ali-to-phones 1.mdl ark:1.ali ark:- | \ phones-to-prons L_align.fst 46 47 ark:- 'ark:sym2int.pl -f 2- words.txt text|' \ ark:- | prons-to-wordali ark:- \ "ark:ali-to-phones --write-lengths 1.mdl ark:1.ali ark:-|" ark:1.wali |
copy-gselect | Copy Gaussian indices for pruning, possibly making the lists shorter (e.g. the --n=10 limits to the 10 best indices See also gmm-gselect, fgmm-gselect Usage: copy-gselect [options] <gselect-rspecifier> <gselect-wspecifier> |
copy-tree | Copy decision tree (possibly changing binary/text format) Usage: copy-tree [--binary=false] <tree-in> <tree-out> |
scale-post | Scale posteriors with either a global scale, or a different scale for each utterance. Usage: scale-post <post-rspecifier> (<scale-rspecifier>|<scale>) <post-wspecifier> |
post-to-weights | Turn posteriors into per-frame weights (typically most useful after weight-silence-post, to get silence weights) See also: weight-silence-post, post-to-pdf-post, post-to-phone-post post-to-feats, get-post-on-ali Usage: post-to-weights <post-rspecifier> <weights-wspecifier> |
sum-tree-stats | Sum statistics for phonetic-context tree building. Usage: sum-tree-stats [options] tree-accs-out tree-accs-in1 tree-accs-in2 ... e.g.: sum-tree-stats treeacc 1.treeacc 2.treeacc 3.treeacc |
weight-post | Takes archives (typically per-utterance) of posteriors and per-frame weights, and weights the posteriors by the per-frame weights Usage: weight-post <post-rspecifier> <weights-rspecifier> <post-wspecifier> |
post-to-tacc | From posteriors, compute transition-accumulators The output is a vector of counts/soft-counts, indexed by transition-id) Note: the model is only read in order to get the size of the vector Usage: post-to-tacc [options] <model> <post-rspecifier> <accs> e.g.: post-to-tacc --binary=false 1.mdl "ark:ali-to-post 1.ali|" 1.tacc See also: get-post-on-ali |
copy-matrix | Copy matrices, or archives of matrices (e.g. features or transforms) Also see copy-feats which has other format options Usage: copy-matrix [options] <matrix-in-rspecifier> <matrix-out-wspecifier> or: copy-matrix [options] <matrix-in-rxfilename> <matrix-out-wxfilename> e.g.: copy-matrix --binary=false 1.mat - copy-matrix ark:2.trans ark,t:- See also: copy-feats, matrix-sum |
copy-vector | Copy vectors, or archives of vectors (e.g. transition-accs; speaker vectors) Usage: copy-vector [options] (<vector-in-rspecifier>|<vector-in-rxfilename>) (<vector-out-wspecifier>|<vector-out-wxfilename>) e.g.: copy-vector --binary=false 1.mat - copy-vector ark:2.trans ark,t:- see also: dot-weights, append-vector-to-feats |
copy-int-vector | Copy vectors of integers, or archives of vectors of integers (e.g. alignments) Usage: copy-int-vector [options] (vector-in-rspecifier|vector-in-rxfilename) (vector-out-wspecifier|vector-out-wxfilename) e.g.: copy-int-vector --binary=false foo - copy-int-vector ark:1.ali ark,t:- |
sum-post | Sum two sets of posteriors for each utterance, e.g. useful in fMMI. To take the difference of posteriors, use e.g. --scale2=-1.0 Usage: sum-post <post-rspecifier1> <post-rspecifier2> <post-wspecifier> |
sum-matrices | Sum matrices, e.g. stats for fMPE training Usage: sum-matrices [options] <mat-out> <mat-in1> <mat-in2> ... e.g.: sum-matrices mat 1.mat 2.mat 3.mat |
draw-tree | Outputs a decision tree description in GraphViz format Usage: draw-tree [options] <phone-symbols> <tree> e.g.: draw-tree phones.txt tree | dot -Gsize=8,10.5 -Tps | ps2pdf - tree.pdf |
align-mapped | Generate alignments, reading log-likelihoods as matrices. (model is needed only for the integer mappings in its transition-model) Usage: align-mapped [options] <tree-in> <trans-model-in> <lexicon-fst-in> <feature-rspecifier> <transcriptions-rspecifier> <alignments-wspecifier> e.g.: align-mapped tree trans.mdl lex.fst scp:train.scp ark:train.tra ark:nnet.ali |
align-compiled-mapped | Generate alignments, reading log-likelihoods as matrices. (model is needed only for the integer mappings in its transition-model) Usage: align-compiled-mapped [options] trans-model-in graphs-rspecifier feature-rspecifier alignments-wspecifier e.g.: nnet-align-compiled trans.mdl ark:graphs.fsts scp:train.scp ark:nnet.ali or: compile-train-graphs tree trans.mdl lex.fst ark:train.tra b, ark:- | \ nnet-align-compiled trans.mdl ark:- scp:loglikes.scp t, ark:nnet.ali |
latgen-faster-mapped | Generate lattices, reading log-likelihoods as matrices (model is needed only for the integer mappings in its transition-model) Usage: latgen-faster-mapped [options] trans-model-in (fst-in|fsts-rspecifier) loglikes-rspecifier lattice-wspecifier [ words-wspecifier [alignments-wspecifier] ] |
latgen-faster-mapped-parallel | Generate lattices, reading log-likelihoods as matrices, using multiple decoding threads (model is needed only for the integer mappings in its transition-model) Usage: latgen-faster-mapped-parallel [options] trans-model-in (fst-in|fsts-rspecifier) loglikes-rspecifier lattice-wspecifier [ words-wspecifier [alignments-wspecifier] ] |
hmm-info | Write to standard output various properties of HMM-based transition model Usage: hmm-info [options] <model-in> e.g.: hmm-info trans.mdl |
analyze-counts | Computes element counts from integer vector table. (e.g. get pdf-counts to estimate DNN-output priors for data analysis) Verbosity : level 1 => print frequencies and histogram Usage: analyze-counts [options] <alignments-rspecifier> <counts-wxfilname> e.g.: analyze-counts ark:1.ali prior.counts Show phone counts by: ali-to-phones --per-frame=true ark:1.ali ark:- | analyze-counts --verbose=1 ark:- - >/dev/null Note: this is deprecated, see post-to-tacc. |
post-to-phone-post | Convert posteriors (or pdf-level posteriors) to phone-level posteriors See also: post-to-pdf-post, post-to-weights, get-post-on-ali First, the usage when your posteriors are on transition-ids (the normal case): Usage: post-to-phone-post [options] <model> <post-rspecifier> <phone-post-wspecifier> e.g.: post-to-phone-post --binary=false 1.mdl "ark:ali-to-post 1.ali|" ark,t:- Next, the usage when your posteriors are on pdfs (e.g. if they are neural-net posteriors) post-to-phone-post --transition-id-counts=final.tacc 1.mdl ark:pdf_post.ark ark,t:- See documentation of --transition-id-counts option for more details. |
post-to-pdf-post | This program turns per-frame posteriors, which have transition-ids as the integers, into pdf-level posteriors See also: post-to-phone-post, post-to-weights, get-post-on-ali Usage: post-to-pdf-post [options] <model-file> <posteriors-rspecifier> <posteriors-wspecifier> e.g.: post-to-pdf-post 1.mdl ark:- ark:- |
logprob-to-post | Convert a matrix of log-probabilities (e.g. from nnet-logprob) to posteriors Usage: logprob-to-post [options] <logprob-matrix-rspecifier> <posteriors-wspecifier> e.g.: nnet-logprob [args] | logprob-to-post ark:- ark:1.post Caution: in this particular example, the output would be posteriors of pdf-ids, rather than transition-ids (c.f. post-to-pdf-post) |
prob-to-post | Convert a matrix of probabilities (e.g. from nnet-logprob2) to posteriors Usage: prob-to-post [options] <prob-matrix-rspecifier> <posteriors-wspecifier> e.g.: nnet-logprob2 [args] | prob-to-post ark:- ark:1.post Caution: in this particular example, the output would be posteriors of pdf-ids, rather than transition-ids (c.f. post-to-pdf-post) |
copy-post | Copy archives of posteriors, with optional scaling Usage: copy-post <post-rspecifier> <post-wspecifier> See also: post-to-weights, scale-post, sum-post, weight-post ... |
matrix-sum | Add matrices (supports various forms) Type one usage: matrix-sum [options] <matrix-in-rspecifier1> [<matrix-in-rspecifier2> <matrix-in-rspecifier3> ...] <matrix-out-wspecifier> e.g.: matrix-sum ark:1.weights ark:2.weights ark:combine.weights This usage supports the --scale1 and --scale2 options to scale the first two input tables. Type two usage (sums a single table input to produce a single output): matrix-sum [options] <matrix-in-rspecifier> <matrix-out-wxfilename> e.g.: matrix-sum --binary=false mats.ark sum.mat Type three usage (sums or averages single-file inputs to produce a single output): matrix-sum [options] <matrix-in-rxfilename1> <matrix-in-rxfilename2> ... <matrix-out-wxfilename> e.g.: matrix-sum --binary=false 1.mat 2.mat 3.mat sum.mat See also: matrix-sum-rows, copy-matrix |
build-pfile-from-ali | Build pfiles for neural network training from alignment. Usage: build-pfile-from-ali [options] <model> <alignments-rspecifier> <feature-rspecifier> <pfile-wspecifier> e.g.: build-pfile-from-ali 1.mdl ark:1.ali features "|pfile_create -i - -o pfile.1 -f 143 -l 1" |
get-post-on-ali | Given input posteriors, e.g. derived from lattice-to-post, and an alignment typically derived from the best path of a lattice, outputs the probability in the posterior of the corresponding index in the alignment, or zero if it was not there. These are output as a vector of weights, one per utterance. While, by default, lattice-to-post (as a source of posteriors) and sources of alignments such as lattice-best-path will output transition-ids as the index, it will generally make sense to either convert these to pdf-ids using post-to-pdf-post and ali-to-pdf respectively, or to phones using post-to-phone-post and (ali-to-phones --per-frame=true). Since this program only sees the integer indexes, it does not care what they represent-- but of course they should match (e.g. don't input posteriors with transition-ids and alignments with pdf-ids). See http://kaldi-asr.org/doc/hmm.html#transition_model_identifiers for an explanation of these types of indexes. See also: post-to-tacc, weight-post, post-to-weights, reverse-weights Usage: get-post-on-ali [options] <posteriors-rspecifier> <ali-rspecifier> <weights-wspecifier> e.g.: get-post-on-ali ark:post.ark ark,s,cs:ali.ark ark:weights.ark |
tree-info | Print information about decision tree (mainly the number of pdfs), to stdout Usage: tree-info <tree-in> |
am-info | Write to standard output various properties of a model, of any type (reads only the transition model) Usage: am-info [options] <model-in> e.g.: am-info 1.mdl |
vector-sum | Add vectors (e.g. weights, transition-accs; speaker vectors) If you need to scale the inputs, use vector-scale on the inputs Type one usage: vector-sum [options] <vector-in-rspecifier1> [<vector-in-rspecifier2> <vector-in-rspecifier3> ...] <vector-out-wspecifier> e.g.: vector-sum ark:1.weights ark:2.weights ark:combine.weights Type two usage (sums a single table input to produce a single output): vector-sum [options] <vector-in-rspecifier> <vector-out-wxfilename> e.g.: vector-sum --binary=false vecs.ark sum.vec Type three usage (sums single-file inputs to produce a single output): vector-sum [options] <vector-in-rxfilename1> <vector-in-rxfilename2> ... <vector-out-wxfilename> e.g.: vector-sum --binary=false 1.vec 2.vec 3.vec sum.vec See also: copy-vector, dot-weights |
matrix-sum-rows | Sum the rows of an input table of matrices and output the corresponding table of vectors Usage: matrix-sum-rows [options] <matrix-rspecifier> <vector-wspecifier> e.g.: matrix-sum-rows ark:- ark:- | vector-sum ark:- sum.vec See also: matrix-sum, vector-sum |
est-pca | Estimate PCA transform; dimension reduction is optional (if not specified we don't reduce the dimension; if you specify --normalize-variance=true, we normalize the (centered) covariance of the features, and if you specify --normalize-mean=true the mean is also normalized. So a variety of transform types are supported. Because this type of transform does not need too much data to estimate robustly, we don't support separate accumulator files; this program reads in the features directly. For large datasets you may want to subset the features (see example below) By default the program reads in matrices (e.g. features), but with --read-vectors=true, can read in vectors (e.g. iVectors). Usage: est-pca [options] (<feature-rspecifier>|<vector-rspecifier>) <pca-matrix-out> e.g.: utils/shuffle_list.pl data/train/feats.scp | head -n 5000 | sort | \ est-pca --dim=50 scp:- some/dir/0.mat |
sum-lda-accs | Sum stats obtained with acc-lda. Usage: sum-lda-accs [options] <stats-out> <stats-in1> <stats-in2> ... |
sum-mllt-accs | Sum stats obtained with gmm-acc-mllt. Usage: sum-mllt-accs [options] <stats-out> <stats-in1> <stats-in2> ... |
transform-vec | This program applies a linear or affine transform to individual vectors, e.g. iVectors. It is transform-feats, except it works on vectors rather than matrices, and expects a single transform matrix rather than possibly a table of matrices Usage: transform-vec [options] <transform-rxfilename> <feats-rspecifier> <feats-wspecifier> See also: transform-feats, est-pca |
align-text | Computes alignment between two sentences with the same key in the two given input text-rspecifiers. The current implementation uses Levenshtein distance as the distance metric. The input text file looks like follows: key1 a b c key2 d e The output alignment file looks like follows: key1 a a ; b <eps> ; c c key2 d f ; e e where the aligned pairs are separated by ";" Usage: align-text [options] <text1-rspecifier> <text2-rspecifier> \ <alignment-wspecifier> e.g.: align-text ark:text1.txt ark:text2.txt ark,t:alignment.txt See also: compute-wer, Example scoring script: egs/wsj/s5/steps/score_kaldi.sh |
matrix-dim | Print dimension info on an input matrix (rows then cols, separated by tab), to standard output. Output for single filename: rows[tab]cols. Output per line for archive of files: key[tab]rows[tab]cols Usage: matrix-dim [options] <matrix-in>|<in-rspecifier> e.g.: matrix-dim final.mat | cut -f 2 See also: feat-to-len, feat-to-dim |
post-to-smat | This program turns an archive of per-frame posteriors, e.g. from ali-to-post | post-to-pdf-post, into an archive of SparseMatrix. This is just a format transformation. This may not make sense if the indexes in question are one-based (at least, you'd have to increase the dimension by one. See also: post-to-phone-post, ali-to-post, post-to-pdf-post Usage: post-to-smat [options] <posteriors-rspecifier> <sparse-matrix-wspecifier> e.g.: post-to-smat --dim=1038 ark:- ark:- |
compile-graph | Creates HCLG decoding graph. Similar to mkgraph.sh but done in code. Usage: compile-graph [options] <tree-in> <model-in> <lexicon-fst-in> <gammar-rspecifier> <hclg-wspecifier> e.g.: compile-train-graphs-fsts tree 1.mdl L_disambig.fst G.fst HCLG.fst |
compare-int-vector | Compare vectors of integers (e.g. phone alignments) Prints to stdout fields of the form: <utterance-id> <num-frames-in-utterance> <num-frames-that-differ> e.g.: SWB1_A_31410_32892 420 36 Usage: compare-int-vector [options] <vector1-rspecifier> <vector2-rspecifier> e.g. compare-int-vector scp:foo.scp scp:bar.scp > comparison E.g. the inputs might come from ali-to-phones. Warnings are printed if the vector lengths differ for a given utterance-id, and in those cases, the number of frames printed will be the smaller of the See also: ali-to-phones, copy-int-vector |
latgen-incremental-mapped | Generate lattices, reading log-likelihoods as matrices (model is needed only for the integer mappings in its transition-model) The lattice determinization algorithm here can operate incrementally. Usage: latgen-incremental-mapped [options] trans-model-in (fst-in|fsts-rspecifier) loglikes-rspecifier lattice-wspecifier [ words-wspecifier [alignments-wspecifier] ] |
compute-gop | Compute Goodness Of Pronunciation (GOP) from a matrix of probabilities (e.g. from nnet3-compute). Usage: compute-gop [options] <model> <alignments-rspecifier> <prob-matrix-rspecifier> <gop-wspecifier> [<phone-feature-wspecifier>] e.g.: nnet3-compute [args] | compute-gop 1.mdl ark:ali-phone.1 ark:- ark:gop.1 ark:phone-feat.1 |
chain-est-phone-lm | Initialize un-smoothed phone language model for 'chain' training Output in FST format (epsilon-free deterministic acceptor) Usage: chain-est-phone-lm [options] <phone-seqs-rspecifier> <phone-lm-fst-out> The phone-sequences are used to train a language model. e.g.: gunzip -c input_dir/ali.*.gz | ali-to-phones input_dir/final.mdl ark:- ark:- | \ chain-est-phone-lm --leftmost-context-questions=dir/leftmost_questions.txt ark:- dir/phone_G.fst |
chain-get-supervision | Get a 'chain' supervision object for each file of training data. This will normally be piped into nnet3-chain-get-egs, where it will be split up into pieces and combined with the features. Input can come in two formats: from alignments (from ali-to-phones --write-lenghts=true), or from lattices (e.g. derived from aligning the data, see steps/align_fmllr_lats.sh) that have been converged to phone-level lattices with lattice-align-phones --replace-output-symbols=true. Usage: chain-get-supervision [options] <tree> <transition-model> [<phones-with-lengths-rspecifier>|<phone-lattice-rspecifier>] <supervision-wspecifier> See steps/nnet3/chain/get_egs.sh for example |
chain-make-den-fst | Created 'denominator' FST for 'chain' training Outputs in FST format. <denominator-fst-out> is an epsilon-free acceptor <normalization-fst-out> is a modified version of <denominator-fst> (w.r.t. initial and final probs) that is used in example generation. Usage: chain-make-den-fsth [options] <tree> <transition-model> <phone-lm-fst> <denominator-fst-out> <normalization-fst-out> e.g.: chain-make-den-fst dir/tree dir/0.trans_mdl dir/phone_lm.fst dir/den.fst dir/normalization.fst |
nnet3-chain-get-egs | Get frame-by-frame examples of data for nnet3+chain neural network training. This involves breaking up utterances into pieces of a fixed size. Input will come from chain-get-supervision. Note: if <normalization-fst> is not supplied the egs will not be ready for training; in that case they should later be processed with nnet3-chain-normalize-egs Usage: nnet3-chain-get-egs [options] [<normalization-fst>] <features-rspecifier> <chain-supervision-rspecifier> <egs-wspecifier> An example [where $feats expands to the actual features]: chain-get-supervision [args] | \ nnet3-chain-get-egs --left-context=25 --right-context=9 --num-frames=150,100,90 dir/normalization.fst \ "$feats" ark,s,cs:- ark:cegs.1.ark Note: the --frame-subsampling-factor option must be the same as given to chain-get-supervision. |
nnet3-chain-copy-egs | Copy examples for nnet3+chain network training, possibly changing the binary mode. Supports multiple wspecifiers, in which case it will write the examples round-robin to the outputs. Usage: nnet3-chain-copy-egs [options] <egs-rspecifier> <egs-wspecifier1> [<egs-wspecifier2> ...] e.g. nnet3-chain-copy-egs ark:train.cegs ark,t:text.cegs or: nnet3-chain-copy-egs ark:train.cegs ark:1.cegs ark:2.cegs |
nnet3-chain-merge-egs | This copies nnet3+chain training examples from input to output, merging them into composite examples. The --minibatch-size option controls how many egs are merged into a single output eg. Usage: nnet3-chain-merge-egs [options] <egs-rspecifier> <egs-wspecifier> e.g. nnet3-chain-merge-egs --minibatch-size=128 ark:1.cegs ark:- | nnet3-chain-train-simple ... See also nnet3-chain-copy-egs |
nnet3-chain-shuffle-egs | Copy nnet3+chain examples for neural network training, from the input to output, while randomly shuffling the order. This program will keep all of the examples in memory at once, unless you use the --buffer-size option Usage: nnet3-chain-shuffle-egs [options] <egs-rspecifier> <egs-wspecifier> nnet3-chain-shuffle-egs --srand=1 ark:train.egs ark:shuffled.egs |
nnet3-chain-subset-egs | Creates a random subset of the input nnet3+chain examples, of a specified size. Uses no more memory than the size of the subset. Usage: nnet3-chain-cubset-egs [options] <egs-rspecifier> [<egs-wspecifier2> ...] e.g. nnet3-chain-subset-egs [args] ark:- | nnet-subset-egs --n=1000 ark:- ark:subset.cegs |
nnet3-chain-acc-lda-stats | Accumulate statistics in the same format as acc-lda (i.e. stats for estimation of LDA and similar types of transform), starting from nnet+chain training examples. This program puts the features through the network, and the network output will be the features; the supervision in the training examples is used for the class labels. Used in obtaining feature transforms that help nnet training work better. Note: the time boundaries it gets from the chain supervision will be a little fuzzy (which is not ideal), but it should not matter much in this situation Usage: nnet3-chain-acc-lda-stats [options] <raw-nnet-in> <training-examples-in> <lda-stats-out> e.g.: nnet3-chain-acc-lda-stats 0.raw ark:1.cegs 1.acc See also: nnet-get-feature-transform |
nnet3-chain-train | Train nnet3+chain neural network parameters with backprop and stochastic gradient descent. Minibatches are to be created by nnet3-chain-merge-egs in the input pipeline. This training program is single-threaded (best to use it with a GPU). Usage: nnet3-chain-train [options] <raw-nnet-in> <denominator-fst-in> <chain-training-examples-in> <raw-nnet-out> nnet3-chain-train 1.raw den.fst 'ark:nnet3-merge-egs 1.cegs ark:-|' 2.raw |
nnet3-chain-compute-prob | Computes and prints to in logging messages the average log-prob per frame of the given data with an nnet3+chain neural net. The input of this is the output of e.g. nnet3-chain-get-egs | nnet3-chain-merge-egs. Usage: nnet3-chain-compute-prob [options] <raw-nnet3-model-in> <denominator-fst> <training-examples-in> e.g.: nnet3-chain-compute-prob 0.mdl den.fst ark:valid.egs |
nnet3-chain-combine | Using a subset of training or held-out nnet3+chain examples, compute the average over the first n nnet models where we maximize the 'chain' objective function for n. Note that the order of models has been reversed before feeding into this binary. So we are actually combining last n models. Inputs and outputs are nnet3 raw nnets. Usage: nnet3-chain-combine [options] <den-fst> <raw-nnet-in1> <raw-nnet-in2> ... <raw-nnet-inN> <chain-examples-in> <raw-nnet-out> e.g.: nnet3-combine den.fst 35.raw 36.raw 37.raw 38.raw ark:valid.cegs final.raw |
nnet3-chain-normalize-egs | Add weights from 'normalization' FST to nnet3+chain examples. Should be done if and only if the <normalization-fst> argument of nnet3-chain-get-egs was not supplied when the original egs were created. Usage: nnet3-chain-normalize-egs [options] <normalization-fst> <egs-rspecifier> <egs-wspecifier> e.g. nnet3-chain-normalize-egs dir/normalization.fst ark:train_in.cegs ark:train_out.cegs |
nnet3-chain-e2e-get-egs | Get frame-by-frame examples of data for nnet3+chain end2end neural network training.Note: if <normalization-fst> is not supplied the egs will not be ready for training; in that case they should later be processed with nnet3-chain-normalize-egs Usage: nnet3-chain-get-egs [options] [<normalization-fst>] <features-rspecifier> <fst-rspecifier> <trans-model> <egs-wspecifier> |
nnet3-chain-compute-post | Compute posteriors from 'denominator FST' of chain model and optionally map them to phones. Usage: nnet3-chain-compute-post [options] <nnet-in> <den-fst> <features-rspecifier> <matrix-wspecifier> e.g.: nnet3-chain-compute-post --transform-mat=transform.mat final.raw den.fst scp:feats.scp ark:nnet_prediction.ark See also: nnet3-compute See steps/nnet3/chain/get_phone_post.sh for example of usage. Note: this program makes *extremely inefficient* use of the GPU. You are advised to run this on CPU until it's improved. |
batched-wav-nnet3-cuda | Reads in wav file(s) and simulates online decoding with neural nets (nnet3 setup), with optional iVector-based speaker adaptation and optional endpointing. Note: some configuration values and inputs are set via config files whose filenames are passed as options Usage: batched-wav-nnet3-cuda [options] <nnet3-in> <fst-in> <wav-rspecifier> <lattice-wspecifier> |
add-deltas | Add deltas (typically to raw mfcc or plp features Usage: add-deltas [options] in-rspecifier out-wspecifier |
add-deltas-sdc | Add shifted delta cepstra (typically to raw mfcc or plp features Usage: add-deltas-sdc [options] in-rspecifier out-wspecifier |
append-post-to-feats | Append posteriors to features Usage: append-post-to-feats [options] <in-rspecifier1> <in-rspecifier2> <out-wspecifier> or: append-post-to-feats [options] <in-rxfilename1> <in-rxfilename2> <out-wxfilename> e.g.: append-post-to-feats --post-dim=50 ark:input.ark scp:post.scp ark:output.ark See also: paste-feats, concat-feats, append-vector-to-feats |
append-vector-to-feats | Append a vector to each row of input feature files Usage: append-vector-to-feats <in-rspecifier1> <in-rspecifier2> <out-wspecifier> or: append-vector-to-feats <in-rxfilename1> <in-rxfilename2> <out-wxfilename> See also: paste-feats, concat-feats |
apply-cmvn | Apply cepstral mean and (optionally) variance normalization Per-utterance by default, or per-speaker if utt2spk option provided Usage: apply-cmvn [options] (<cmvn-stats-rspecifier>|<cmvn-stats-rxfilename>) <feats-rspecifier> <feats-wspecifier> e.g.: apply-cmvn --utt2spk=ark:data/train/utt2spk scp:data/train/cmvn.scp scp:data/train/feats.scp ark:- See also: modify-cmvn-stats, matrix-sum, compute-cmvn-stats |
apply-cmvn-sliding | Apply sliding-window cepstral mean (and optionally variance) normalization per utterance. If center == true, window is centered on frame being normalized; otherwise it precedes it in time. Useful for speaker-id; see also apply-cmvn-online Usage: apply-cmvn-sliding [options] <feats-rspecifier> <feats-wspecifier> |
compare-feats | Computes relative difference between two sets of features per dimension and an average difference Can be used to figure out how different two sets of features are. Inputs must have same dimension. Prints to stdout a similarity metric vector that is 1.0 per dimension if the features identical, and <1.0 otherwise, and an average overall similarity value. Usage: compare-feats [options] <in-rspecifier1> <in-rspecifier2> e.g.: compare-feats ark:1.ark ark:2.ark |
compose-transforms | Compose (affine or linear) feature transforms Usage: compose-transforms [options] (<transform-A-rspecifier>|<transform-A-rxfilename>) (<transform-B-rspecifier>|<transform-B-rxfilename>) (<transform-out-wspecifier>|<transform-out-wxfilename>) Note: it does matrix multiplication (A B) so B is the transform that gets applied to the features first. If b-is-affine = true, then assume last column of b corresponds to offset e.g.: compose-transforms 1.mat 2.mat 3.mat compose-transforms 1.mat ark:2.trans ark:3.trans compose-transforms ark:1.trans ark:2.trans ark:3.trans See also: transform-feats, transform-vec, extend-transform-dim, est-lda, est-pca |
compute-and-process-kaldi-pitch-feats | Apply Kaldi pitch extractor and pitch post-processor, starting from wav input. Equivalent to compute-kaldi-pitch-feats | process-kaldi-pitch-feats, except that it is able to simulate online pitch extraction; see options like --frames-per-chunk, --simulate-first-pass-online, --recompute-frame. Usage: compute-and-process-kaldi-pitch-feats [options...] <wav-rspecifier> <feats-wspecifier> e.g. compute-and-process-kaldi-pitch-feats --simulate-first-pass-online=true \ --frames-per-chunk=10 --sample-frequency=8000 scp:wav.scp ark:- See also: compute-kaldi-pitch-feats, process-kaldi-pitch-feats |
compute-cmvn-stats | Compute cepstral mean and variance normalization statistics If wspecifier provided: per-utterance by default, or per-speaker if spk2utt option provided; if wxfilename: global Usage: compute-cmvn-stats [options] <feats-rspecifier> (<stats-wspecifier>|<stats-wxfilename>) e.g.: compute-cmvn-stats --spk2utt=ark:data/train/spk2utt scp:data/train/feats.scp ark,scp:/foo/bar/cmvn.ark,data/train/cmvn.scp See also: apply-cmvn, modify-cmvn-stats |
compute-cmvn-stats-two-channel | Compute cepstral mean and variance normalization statistics Specialized for two-sided telephone data where we only accumulate the louder of the two channels at each frame (and add it to that side's stats). Reads a 'reco2file_and_channel' file, normally like sw02001-A sw02001 A sw02001-B sw02001 B sw02005-A sw02005 A sw02005-B sw02005 B interpreted as <utterance-id> <call-id> <side> and for each <call-id> that has two sides, does the 'only-the-louder' computation, else doesn per-utterance stats in the normal way. Note: loudness is judged by the first feature component, either energy or c0; only applicable to MFCCs or PLPs (this code could be modified to handle filterbanks). Usage: compute-cmvn-stats-two-channel [options] <reco2file-and-channel> <feats-rspecifier> <stats-wspecifier> e.g.: compute-cmvn-stats-two-channel data/train_unseg/reco2file_and_channel scp:data/train_unseg/feats.scp ark,t:- |
compute-fbank-feats | Create Mel-filter bank (FBANK) feature files. Usage: compute-fbank-feats [options...] <wav-rspecifier> <feats-wspecifier> |
compute-kaldi-pitch-feats | Apply Kaldi pitch extractor, starting from wav input. Output is 2-dimensional features consisting of (NCCF, pitch in Hz), where NCCF is between -1 and 1, and higher for voiced frames. You will typically pipe this into process-kaldi-pitch-feats. Usage: compute-kaldi-pitch-feats [options...] <wav-rspecifier> <feats-wspecifier> e.g. compute-kaldi-pitch-feats --sample-frequency=8000 scp:wav.scp ark:- See also: process-kaldi-pitch-feats, compute-and-process-kaldi-pitch-feats |
compute-mfcc-feats | Create MFCC feature files. Usage: compute-mfcc-feats [options...] <wav-rspecifier> <feats-wspecifier> |
compute-plp-feats | Create PLP feature files. Usage: compute-plp-feats [options...] <wav-rspecifier> <feats-wspecifier> |
compute-spectrogram-feats | Create spectrogram feature files. Usage: compute-spectrogram-feats [options...] <wav-rspecifier> <feats-wspecifier> |
concat-feats | Concatenate feature files (assuming they have the same dimensions), so the output file has the sum of the num-frames of the inputs. Usage: concat-feats <in-rxfilename1> <in-rxfilename2> [<in-rxfilename3> ...] <out-wxfilename> e.g. concat-feats mfcc/foo.ark:12343 mfcc/foo.ark:56789 - See also: copy-feats, append-vector-to-feats, paste-feats |
copy-feats | Copy features [and possibly change format] Usage: copy-feats [options] <feature-rspecifier> <feature-wspecifier> or: copy-feats [options] <feats-rxfilename> <feats-wxfilename> e.g.: copy-feats ark:- ark,scp:foo.ark,foo.scp or: copy-feats ark:foo.ark ark,t:txt.ark See also: copy-matrix, copy-feats-to-htk, copy-feats-to-sphinx, select-feats, extract-feature-segments, subset-feats, subsample-feats, splice-feats, paste-feats, concat-feats |
copy-feats-to-htk | Save features as HTK files: Each utterance will be stored as a unique HTK file in a specified directory. The HTK filename will correspond to the utterance-id (key) in the input table, with the specified extension. Usage: copy-feats-to-htk [options] in-rspecifier Example: copy-feats-to-htk --output-dir=/tmp/HTK-features --output-ext=fea scp:feats.scp |
copy-feats-to-sphinx | Save features as Sphinx files: Each utterance will be stored as a unique Sphinx file in a specified directory. The Sphinx filename will correspond to the utterance-id (key) in the input table, with the specified extension. Usage: copy-feats-to-sphinx [options] in-rspecifier Example: copy-feats-to-sphinx --output-dir=/tmp/sphinx-features --output-ext=fea scp:feats.scp |
extend-transform-dim | Read in transform from dimension d -> d (affine or linear), and output a transform from dimension e -> e (with e >= d, and e controlled by option --new-dimension). This new transform will leave the extra dimension unaffected, and transform the old dimensions in the same way. Usage: extend-transform-dim [options] (transform-A-rspecifier|transform-A-rxfilename) (transform-out-wspecifier|transform-out-wxfilename) E.g.: extend-transform-dim --new-dimension=117 in.mat big.mat |
extract-feature-segments | Create feature files by segmenting input files. Note: this program should no longer be needed now that 'ranges' in scp files are supported; search for 'ranges' in http://kaldi-asr.org/doc/io_tut.html, or see the script utils/data/subsegment_data_dir.sh. Usage: extract-feature-segments [options...] <feats-rspecifier> <segments-file> <feats-wspecifier> (segments-file has lines like: output-utterance-id input-utterance-or-spk-id 1.10 2.36) |
extract-segments | Extract segments from a large audio file in WAV format. Usage: extract-segments [options] <wav-rspecifier> <segments-file> <wav-wspecifier> e.g. extract-segments scp:wav.scp segments ark:- | <some-other-program> segments-file format: each line is either <segment-id> <recording-id> <start-time> <end-time> e.g. call-861225-A-0050-0065 call-861225-A 5.0 6.5 or (less frequently, and not supported in scripts): <segment-id> <wav-file-name> <start-time> <end-time> <channel> where <channel> will normally be 0 (left) or 1 (right) e.g. call-861225-A-0050-0065 call-861225 5.0 6.5 1 And <end-time> of -1 means the segment runs till the end of the WAV file See also: extract-feature-segments, wav-copy, wav-to-duration |
feat-to-dim | Reads an archive of features. If second argument is wxfilename, writes the feature dimension of the first feature file; if second argument is wspecifier, writes an archive of the feature dimension, indexed by utterance id. Usage: feat-to-dim [options] <feat-rspecifier> (<dim-wspecifier>|<dim-wxfilename>) e.g.: feat-to-dim scp:feats.scp - |
feat-to-len | Reads an archive of features and writes a corresponding archive that maps utterance-id to utterance length in frames, or (with one argument) print to stdout the total number of frames in the input archive. Usage: feat-to-len [options] <in-rspecifier> [<out-wspecifier>] e.g.: feat-to-len scp:feats.scp ark,t:feats.lengths or: feat-to-len scp:feats.scp |
fmpe-acc-stats | Compute statistics for fMPE training Usage: fmpe-acc-stats [options...] <fmpe-object> <feat-rspecifier> <feat-diff-rspecifier> <gselect-rspecifier> <stats-out> Note: gmm-fmpe-acc-stats avoids computing the features an extra time |
fmpe-apply-transform | Apply fMPE transform to features Usage: fmpe-apply-transform [options...] <fmpe-object> <feat-rspecifier> <gselect-rspecifier> <feat-wspecifier> |
fmpe-est | Do one iteration of learning (modified gradient descent) on fMPE transform Usage: fmpe-est [options...] <fmpe-in> <stats-in> <fmpe-out> E.g. fmpe-est 1.fmpe 1.accs 2.fmpe |
fmpe-init | Initialize fMPE transform (to zero) Usage: fmpe-init [options...] <diag-gmm-in> <fmpe-out> E.g. fmpe-init 1.ubm 1.fmpe |
fmpe-sum-accs | Sum fMPE stats Usage: fmpe-sum-accs [options...] <accs-out> <stats-in1> <stats-in2> ... E.g. fmpe-sum-accs 1.accs 1.1.accs 1.2.accs 1.3.accs 1.4.accs |
get-full-lda-mat | This is a special-purpose program to be used in "predictive SGMMs". It takes in an LDA+MLLT matrix, and the original "full" LDA matrix as output by the --write-full-matrix option of est-lda; and it writes out a "full" LDA+MLLT matrix formed by the LDA+MLLT matrix plus the remaining rows of the "full" LDA matrix; and also writes out its inverse Usage: get-full-lda-mat [options] <lda-mllt-rxfilename> <full-lda-rxfilename> <full-lda-mllt-wxfilename> [<inv-full-lda-mllt-wxfilename>] E.g.: get-full-lda-mat final.mat full.mat full_lda_mllt.mat full_lda_mllt_inv.mat |
interpolate-pitch | This is a rather special-purpose program which processes 2-dimensional features consisting of (prob-of-voicing, pitch). By default we do model-based pitch smoothing and interpolation (see code), or if --linear-interpolation=true, just linear interpolation across gaps where pitch == 0 (not predicted). Usage: interpolate-pitch [options...] <feats-rspecifier> <feats-wspecifier> |
modify-cmvn-stats | Copy cepstral mean/variance stats so that some dimensions have 'fake' stats that will skip normalization Usage: modify-cmvn-stats [options] [<fake-dims>] <in-rspecifier> <out-wspecifier> e.g.: modify-cmvn-stats 13:14:15 ark:- ark:- or: modify-cmvn-stats --convert-to-mean-and-var=true ark:- ark:- See also: compute-cmvn-stats |
paste-feats | Paste feature files (assuming they have about the same durations, see --length-tolerance), appending the features on each frame; think of the unix command 'paste'. Usage: paste-feats <in-rspecifier1> <in-rspecifier2> [<in-rspecifier3> ...] <out-wspecifier> or: paste-feats <in-rxfilename1> <in-rxfilename2> [<in-rxfilename3> ...] <out-wxfilename> e.g. paste-feats ark:feats1.ark "ark:select-feats 0-3 ark:feats2.ark ark:- |" ark:feats-out.ark or: paste-feats foo.mat bar.mat baz.mat See also: copy-feats, copy-matrix, append-vector-to-feats, concat-feats |
post-to-feats | Convert posteriors to features Usage: post-to-feats [options] <in-rspecifier> <out-wspecifier> or: post-to-feats [options] <in-rxfilename> <out-wxfilename> e.g.: post-to-feats --post-dim=50 ark:post.ark ark:feat.ark See also: post-to-weights feat-to-post, append-vector-to-feats, append-post-to-feats |
process-kaldi-pitch-feats | Post-process Kaldi pitch features, consisting of pitch and NCCF, into features suitable for input to ASR system. Default setup produces 3-dimensional features consisting of (pov-feature, pitch-feature, delta-pitch-feature), where pov-feature is warped NCCF, pitch-feature is log-pitch with POV-weighted mean subtraction over 1.5 second window, and delta-pitch-feature is delta feature computed on raw log pitch. In general, you can select from four features: (pov-feature, pitch-feature, delta-pitch-feature, raw-log-pitch), produced in that order, by setting the boolean options (--add-pov-feature, --add-normalized-log-pitch, --add-delta-pitch and --add-raw-log-pitch) Usage: process-kaldi-pitch-feats [options...] <feat-rspecifier> <feats-wspecifier> e.g.: compute-kaldi-pitch-feats [args] ark:- | process-kaldi-pitch-feats ark:- ark:feats.ark See also: compute-kaldi-pitch-feats, compute-and-process-kaldi-pitch-feats |
process-pitch-feats | This is a rather special-purpose program which processes 2-dimensional features consisting of (prob-of-voicing, pitch) into something suitable to put into a speech recognizer. First use interpolate-feats Usage: process-pitch-feats [options...] <feats-rspecifier> <feats-wspecifier> |
select-feats | Select certain dimensions of the feature file; think of it as the unix command cut -f ... Usage: select-feats <selection> <in-rspecifier> <out-wspecifier> e.g. select-feats 0,24-22,3-12 scp:feats.scp ark,scp:feat-red.ark,feat-red.scp See also copy-feats, extract-feature-segments, subset-feats, subsample-feats |
shift-feats | Copy features, and possibly shift them while maintaining the num-frames. Usage: shift-feats [options] <feature-rspecifier> <feature-wspecifier> or: shift-feats [options] <feats-rxfilename> <feats-wxfilename> e.g.: shift-feats --shift=-1 foo.scp bar.ark or: shift-feats --shift=1 foo.mat bar.mat See also: copy-feats, copy-matrix, select-feats, subset-feats, subsample-feats, splice-feats, paste-feats, concat-feats, extract-feature-segments |
splice-feats | Splice features with left and right context (e.g. prior to LDA) Usage: splice-feats [options] <feature-rspecifier> <feature-wspecifier> e.g.: splice-feats scp:feats.scp ark:- |
subsample-feats | Sub-samples features by taking every n'th frame. With negative values of n, will repeat each frame n times (e.g. --n=-2 will repeat each frame twice) Usage: subsample-feats [options] <in-rspecifier> <out-wspecifier> e.g. subsample-feats --n=2 ark:- ark:- |
subset-feats | Copy a subset of features (by default, the first n feature files) Usually used where only a small amount of data is needed Note: if you want a specific subset, it's usually best to filter the original .scp file with utils/filter_scp.pl (possibly with the --exclude option). The --include and --exclude options of this program are intended for specialized uses. The --include and --exclude options are mutually exclusive, and both cause the --n option to be ignored. Usage: subset-feats [options] <in-rspecifier> <out-wspecifier> e.g.: subset-feats --n=10 ark:- ark:- or: subset-feats --include=include_uttlist ark:- ark:- or: subset-feats --exclude=exclude_uttlist ark:- ark:- See also extract-feature-segments, select-feats, subsample-feats |
transform-feats | Apply transform (e.g. LDA; HLDA; fMLLR/CMLLR; MLLT/STC) Linear transform if transform-num-cols == feature-dim, affine if transform-num-cols == feature-dim+1 (->append 1.0 to features) Per-utterance by default, or per-speaker if utt2spk option provided Global if transform-rxfilename provided. Usage: transform-feats [options] (<transform-rspecifier>|<transform-rxfilename>) <feats-rspecifier> <feats-wspecifier> See also: transform-vec, copy-feats, compose-transforms |
wav-copy | Copy wave file or archives of wave files Usage: wav-copy [options] <wav-rspecifier> <wav-wspecifier> or: wav-copy [options] <wav-rxfilename> <wav-wxfilename> e.g. wav-copy scp:wav.scp ark:- wav-copy wav.ark:123456 - See also: wav-to-duration extract-segments |
wav-reverberate | Corrupts the wave files supplied via input pipe with the specified room-impulse response (rir_matrix) and additive noise distortions (specified by corresponding files). Usage: wav-reverberate [options...] <wav-in-rxfilename> <wav-out-wxfilename> e.g. wav-reverberate --duration=20.25 --impulse-response=rir.wav --additive-signals='noise1.wav,noise2.wav' --snrs='20.0,15.0' --start-times='0,17.8' input.wav output.wav |
wav-to-duration | Read wav files and output an archive consisting of a single float: the duration of each one in seconds. Usage: wav-to-duration [options...] <wav-rspecifier> <duration-wspecifier> E.g.: wav-to-duration scp:wav.scp ark,t:- See also: wav-copy extract-segments feat-to-len Currently this program may output a lot of harmless warnings regarding nonzero exit status of pipes |
fgmm-global-acc-stats | Accumulate stats for training a full-covariance GMM. Usage: fgmm-global-acc-stats [options] <model-in> <feature-rspecifier> <stats-out> e.g.: fgmm-global-acc-stats 1.mdl scp:train.scp 1.acc |
fgmm-global-sum-accs | Sum multiple accumulated stats files for full-covariance GMM training. Usage: fgmm-global-sum-accs [options] stats-out stats-in1 stats-in2 ... |
fgmm-global-est | Estimate a full-covariance GMM from the accumulated stats. Usage: fgmm-global-est [options] <model-in> <stats-in> <model-out> |
fgmm-global-merge | Combine a number of GMMs into a larger GMM, with #Gauss = sum(individual #Gauss)). Output full GMM, and a text file with sizes of each individual GMM. Usage: fgmm-global-merge [options] fgmm-out sizes-file-out fgmm-in1 fgmm-in2 ... |
fgmm-global-to-gmm | Convert single full-covariance GMM to single diagonal-covariance GMM. Usage: fgmm-global-to-gmm [options] 1.fgmm 1.gmm |
fgmm-gselect | Precompute Gaussian indices for pruning (e.g. in training UBMs, SGMMs, tied-mixture systems) For each frame, gives a list of the n best Gaussian indices, sorted from best to worst. See also: gmm-gselect, copy-gselect, fgmm-gselect-to-post Usage: fgmm-gselect [options] <model-in> <feature-rspecifier> <gselect-wspecifier> The --gselect option (which takes an rspecifier) limits selection to a subset of indices: e.g.: fgmm-gselect "--gselect=ark:gunzip -c bigger.gselect.gz|" --n=20 1.gmm "ark:feature-command |" "ark,t:|gzip -c >1.gselect.gz" |
fgmm-global-get-frame-likes | Print out per-frame log-likelihoods for each utterance, as an archive of vectors of floats. If --average=true, prints out the average per-frame log-likelihood for each utterance, as a single float. Usage: fgmm-global-get-frame-likes [options] <model-in> <feature-rspecifier> <likes-out-wspecifier> e.g.: fgmm-global-get-frame-likes 1.mdl scp:train.scp ark:1.likes |
fgmm-global-copy | Copy a full-covariance GMM Usage: fgmm-global-copy [options] <model-in> <model-out> e.g.: fgmm-global-copy --binary=false 1.model - | less |
fgmm-global-gselect-to-post | Given features and Gaussian-selection (gselect) information for a full-covariance GMM, output per-frame posteriors for the selected indices. Also supports pruning the posteriors if they are below a stated threshold, (and renormalizing the rest to sum to one) See also: gmm-gselect, fgmm-gselect, gmm-global-get-post, gmm-global-gselect-to-post Usage: fgmm-global-gselect-to-post [options] <model-in> <feature-rspecifier> <gselect-rspecifier> <post-wspecifier> e.g.: fgmm-global-gselect-to-post 1.ubm ark:- 'ark:gunzip -c 1.gselect|' ark:- |
fgmm-global-info | Write to standard output various properties of full-covariance GMM model This is for a single mixture of Gaussians, e.g. as used for a UBM. Usage: fgmm-global-info [options] <gmm> e.g.: fgmm-global-info 1.ubm |
fgmm-global-acc-stats-post | Accumulate stats from posteriors and features for instantiating a full-covariance GMM. See also fgmm-global-acc-stats. Usage: fgmm-global-acc-stats-post [options] <posterior-rspecifier> <number-of-components> <feature-rspecifier> <stats-out> e.g.: fgmm-global-acc-stats-post scp:post.scp 2048 scp:train.scp 1.acc |
fgmm-global-init-from-accs | Initialize a full-covariance GMM from the accumulated stats. This binary is similar to fgmm-global-est, but does not use a preexisting model. See also fgmm-global-est. Usage: fgmm-global-init-from-accs [options] <stats-in> <number-of-components> <model-out> |
fstdeterminizestar | Removes epsilons and determinizes in one step Usage: fstdeterminizestar [in.fst [out.fst] ] See also: fstdeterminizelog, lattice-determinize |
fstrmsymbols | With no options, replaces a subset of symbols with epsilon, wherever they appear on the input side of an FST.With --remove-arcs=true, will remove arcs that contain these symbols on the input With --penalty=<float>, will add the specified penalty to the cost of any arc that has one of the given symbols on its input side In all cases, the option --apply-to-output=true (or for back-compatibility, --remove-from-output=true) makes this apply to the output side. Usage: fstrmsymbols [options] <in-disambig-list> [<in.fst> [<out.fst>]] E.g: fstrmsymbols in.list < in.fst > out.fst <in-disambig-list> is an rxfilename specifying a file containing list of integers representing symbols, in text form, one per line. |
fstisstochastic | Checks whether an FST is stochastic and exits with success if so. Prints out maximum error (in log units). Usage: fstisstochastic [ in.fst ] |
fstminimizeencoded | Minimizes FST after encoding [similar to fstminimize, but no weight-pushing] Usage: fstminimizeencoded [in.fst [out.fst] ] |
fstmakecontextfst | Constructs a context FST with a specified context-width and context-position. Outputs the context FST, and a file in Kaldi format that describes what the input labels mean. Note: this is very inefficient if there are a lot of phones, better to use fstcomposecontext instead Usage: fstmakecontextfst <phones-symbol-table> <subsequential-symbol> <ilabels-output-file> [<out-fst>] E.g.: fstmakecontextfst phones.txt 42 ilabels.sym > C.fst |
fstmakecontextsyms | Create input symbols for CLG Usage: fstmakecontextsyms phones-symtab ilabels_input_file [output-symtab.txt] E.g.: fstmakecontextsyms phones.txt ilabels.sym > context_symbols.txt |
fstaddsubsequentialloop | Minimizes FST after encoding [this algorithm applicable to all FSTs in tropical semiring] Usage: fstaddsubsequentialloop subseq_sym [in.fst [out.fst] ] E.g.: fstaddsubsequentialloop 52 < LG.fst > LG_sub.fst |
fstaddselfloops | Adds self-loops to states of an FST to propagate disambiguation symbols through it They are added on each final state and each state with non-epsilon output symbols on at least one arc out of the state. Useful in conjunction with predeterminize Usage: fstaddselfloops in-disambig-list out-disambig-list [in.fst [out.fst] ] E.g: fstaddselfloops in.list out.list < in.fst > withloops.fst in.list and out.list are lists of integers, one per line, of the same length. |
fstrmepslocal | Removes some (but not all) epsilons in an algorithm that will always reduce the number of arcs+states. Option to preserves equivalence in tropical or log semiring, and if in tropical, stochasticit in either log or tropical. Usage: fstrmepslocal [in.fst [out.fst] ] |
fstcomposecontext | Composes on the left with a dynamically created context FST Usage: fstcomposecontext <ilabels-output-file> [<in.fst> [<out.fst>] ] E.g: fstcomposecontext ilabels.sym < LG.fst > CLG.fst |
fsttablecompose | Composition algorithm [between two FSTs of standard type, in tropical semiring] that is more efficient for certain cases-- in particular, where one of the FSTs (the left one, if --match-side=left) has large out-degree Usage: fsttablecompose (fst1-rxfilename|fst1-rspecifier) (fst2-rxfilename|fst2-rspecifier) [(out-rxfilename|out-rspecifier)] |
fstrand | Generate random FST Usage: fstrand [out.fst] |
fstdeterminizelog | Determinizes in the log semiring Usage: fstdeterminizelog [in.fst [out.fst] ] See also fstdeterminizestar |
fstphicompose | Composition, where the right FST has "failure" (phi) transitions that are only taken where there was no match of a "real" label You supply the label corresponding to phi. Usage: fstphicompose phi-label (fst1-rxfilename|fst1-rspecifier) (fst2-rxfilename|fst2-rspecifier) [(out-rxfilename|out-rspecifier)] E.g.: fstphicompose 54 a.fst b.fst c.fst or: fstphicompose 11 ark:a.fsts G.fst ark:b.fsts |
fstcopy | Copy tables/archives of FSTs, indexed by a string (e.g. utterance-id) Usage: fstcopy <fst-rspecifier> <fst-wspecifier> |
fstpushspecial | Pushes weights in an FST such that all the states in the FST have arcs and final-probs with weights that sum to the same amount (viewed as being in the log semiring). Thus, the "extra weight" is distributed throughout the FST. Tolerance parameter --delta controls how exact this is, and the speed. Usage: fstpushspecial [options] [in.fst [out.fst] ] |
fsts-to-transcripts | Reads a table of FSTs; for each element, finds the best path and prints out the output-symbol sequence (if --output-side=true), or input-symbol sequence otherwise. Usage: fsts-to-transcripts [options] <fsts-rspecifier> <transcriptions-wspecifier> e.g.: fsts-to-transcripts ark:train.fsts ark,t:train.text |
fsts-project | Reads kaldi archive of FSTs; for each element, performs the project operation either on input (default) or on the output (if the option --project-output is true). Usage: fsts-project [options] <fsts-rspecifier> <fsts-wspecifier> e.g.: fsts-project ark:train.fsts ark,t:train.fsts see also: fstproject (from the OpenFst toolkit) |
fsts-union | Reads a kaldi archive of FSTs. Performs the FST operation union on all fsts sharing the same key. Assumes the archive is sorted by key. Usage: fsts-union [options] <fsts-rspecifier> <fsts-wspecifier> e.g.: fsts-union ark:keywords_tmp.fsts ark,t:keywords.fsts see also: fstunion (from the OpenFst toolkit) |
fsts-concat | Reads kaldi archives with FSTs. Concatenates the fsts from all the rspecifiers. The fsts to concatenate must have same key. The sequencing is given by the position of arguments. Usage: fsts-concat [options] <fsts-rspecifier1> <fsts-rspecifier2> ... <fsts-wspecifier> e.g.: fsts-concat scp:fsts1.scp scp:fsts2.scp ... ark:fsts_out.ark see also: fstconcat (from the OpenFst toolkit) |
make-grammar-fst | Construct GrammarFst and write it to disk (or convert it to ConstFst and write that to disk instead). Mostly intended for demonstration and testing purposes (since it may be more convenient to construct GrammarFst from code). See kaldi-asr.org/doc/grammar.html Can also be used to prepares FSTs for this use, by calling PrepareForGrammarFst(), which does things like adding final-probs and making small structural tweaks to the FST Usage (1): make-grammar-fst [options] <top-level-fst> <symbol1> <fst1> \ [<symbol2> <fst2> ...]] <fst-out> <symbol1>, <symbol2> are the integer ids of the corresponding user-defined nonterminal symbols (e.g. #nonterm:contact_list) in the phones.txt file. e.g.: make-grammar-fst --nonterm-phones-offset=317 HCLG.fst \ 320 HCLG1.fst HCLG_grammar.fst Usage (2): make-grammar-fst <fst-in> <fst-out> Prepare individual FST for compilation into GrammarFst. E.g. make-grammar-fst HCLG.fst HCLGmod.fst. The outputs of this will then become the arguments <top-level-fst>, <fst1>, ... for usage pattern (1). The --nonterm-phones-offset option is required for both usage patterns. |
gmm-init-mono | Initialize monophone GMM. Usage: gmm-init-mono <topology-in> <dim> <model-out> <tree-out> e.g.: gmm-init-mono topo 39 mono.mdl mono.tree |
gmm-est | Do Maximum Likelihood re-estimation of GMM-based acoustic model Usage: gmm-est [options] <model-in> <stats-in> <model-out> e.g.: gmm-est 1.mdl 1.acc 2.mdl |
gmm-acc-stats-ali | Accumulate stats for GMM training. Usage: gmm-acc-stats-ali [options] <model-in> <feature-rspecifier> <alignments-rspecifier> <stats-out> e.g.: gmm-acc-stats-ali 1.mdl scp:train.scp ark:1.ali 1.acc |
gmm-align | Align features given [GMM-based] models. Usage: gmm-align [options] tree-in model-in lexicon-fst-in feature-rspecifier transcriptions-rspecifier alignments-wspecifier e.g.: gmm-align tree 1.mdl lex.fst scp:train.scp 'ark:sym2int.pl -f 2- words.txt text|' ark:1.ali |
gmm-decode-faster | Decode features using GMM-based model. Usage: gmm-decode-faster [options] model-in fst-in features-rspecifier words-wspecifier [alignments-wspecifier [lattice-wspecifier]] Note: lattices, if output, will just be linear sequences; use gmm-latgen-faster if you want "real" lattices. |
gmm-decode-simple | Decode features using GMM-based model. Viterbi decoding, Only produces linear sequence; any lattice produced is linear Usage: gmm-decode-simple [options] <model-in> <fst-in> <features-rspecifier> <words-wspecifier> [<alignments-wspecifier>] [<lattice-wspecifier>] |
gmm-align-compiled | Align features given [GMM-based] models. Usage: gmm-align-compiled [options] <model-in> <graphs-rspecifier> <feature-rspecifier> <alignments-wspecifier> [scores-wspecifier] e.g.: gmm-align-compiled 1.mdl ark:graphs.fsts scp:train.scp ark:1.ali or: compile-train-graphs tree 1.mdl lex.fst 'ark:sym2int.pl -f 2- words.txt text|' \ ark:- | gmm-align-compiled 1.mdl ark:- scp:train.scp t, ark:1.ali |
gmm-sum-accs | Sum multiple accumulated stats files for GMM training. Usage: gmm-sum-accs [options] <stats-out> <stats-in1> <stats-in2> ... E.g.: gmm-sum-accs 1.acc 1.1.acc 1.2.acc |
gmm-est-regtree-fmllr | Compute FMLLR transforms per-utterance (default) or per-speaker for the supplied set of speakers (spk2utt option). Note: writes RegtreeFmllrDiagGmm objects Usage: gmm-est-regtree-fmllr [options] <model-in> <feature-rspecifier> <posteriors-rspecifier> <regression-tree> <transforms-wspecifier> |
gmm-acc-stats-twofeats | Accumulate stats for GMM training, computing posteriors with one set of features but accumulating statistics with another. First features are used to get posteriors, second to accumulate stats Usage: gmm-acc-stats-twofeats [options] <model-in> <feature1-rspecifier> <feature2-rspecifier> <posteriors-rspecifier> <stats-out> e.g.: gmm-acc-stats-twofeats 1.mdl 1.ali scp:train.scp scp:train_new.scp ark:1.ali 1.acc |
gmm-acc-stats | Accumulate stats for GMM training (reading in posteriors). Usage: gmm-acc-stats [options] <model-in> <feature-rspecifier><posteriors-rspecifier> <stats-out> e.g.: gmm-acc-stats 1.mdl scp:train.scp ark:1.post 1.acc |
gmm-init-lvtln | Initialize lvtln transforms Usage: gmm-init-lvtln [options] <lvtln-out> e.g.: gmm-init-lvtln --dim=13 --num-classes=21 --default-class=10 1.lvtln |
gmm-est-lvtln-trans | Estimate linear-VTLN transforms, either per utterance or for the supplied set of speakers (spk2utt option). Reads posteriors. Usage: gmm-est-lvtln-trans [options] <model-in> <lvtln-in> <feature-rspecifier> <gpost-rspecifier> <lvtln-trans-wspecifier> [<warp-wspecifier>] |
gmm-train-lvtln-special | Set one of the transforms in lvtln to the minimum-squared-error solution to mapping feats-untransformed to feats-transformed; posteriors may optionally be used to downweight/remove silence. Usage: gmm-train-lvtln-special [options] class-index <lvtln-in> <lvtln-out> <feats-untransformed-rspecifier> <feats-transformed-rspecifier> [<posteriors-rspecifier>] e.g.: gmm-train-lvtln-special 5 5.lvtln 6.lvtln scp:train.scp scp:train_warp095.scp ark:nosil.post |
gmm-acc-mllt | Accumulate MLLT (global STC) statistics Usage: gmm-acc-mllt [options] <model-in> <feature-rspecifier> <posteriors-rspecifier> <stats-out> e.g.: gmm-acc-mllt 1.mdl scp:train.scp ark:1.post 1.macc |
gmm-mixup | Does GMM mixing up (and Gaussian merging) Usage: gmm-mixup [options] <model-in> <state-occs-in> <model-out> e.g. of mixing up: gmm-mixup --mix-up=4000 1.mdl 1.occs 2.mdl e.g. of merging: gmm-mixup --mix-down=2000 1.mdl 1.occs 2.mdl |
gmm-init-model | Initialize GMM from decision tree and tree stats Usage: gmm-init-model [options] <tree-in> <tree-stats-in> <topo-file> <model-out> [<old-tree> <old-model>] e.g.: gmm-init-model tree treeacc topo 1.mdl or (initializing GMMs with old model): gmm-init-model tree treeacc topo 1.mdl prev/tree prev/30.mdl |
gmm-transform-means | Transform GMM means with linear or affine transform Usage: gmm-transform-means <transform-matrix> <model-in> <model-out> e.g.: gmm-transform-means 2.mat 2.mdl 3.mdl |
gmm-make-regtree | Build regression class tree. Usage: gmm-make-regtree [options] <model-file> <regtree-out> E.g.: gmm-make-regtree --silphones=1:2:3 --state-occs=1.occs 1.mdl 1.regtree [Note: state-occs come from --write-occs option of gmm-est] |
gmm-decode-faster-regtree-fmllr | Decode features using GMM-based model. Usage: gmm-decode-faster-regtree-fmllr [options] model-in fst-in regtree-in features-rspecifier transforms-rspecifier words-wspecifier [alignments-wspecifier] |
gmm-post-to-gpost | Convert state-level posteriors to Gaussian-level posteriors Usage: gmm-post-to-gpost [options] <model-in> <feature-rspecifier> <posteriors-rspecifier> <gpost-wspecifier> e.g.: gmm-post-to-gpost 1.mdl scp:train.scp ark:1.post ark:1.gpost |
gmm-est-fmllr-gpost | Estimate global fMLLR transforms, either per utterance or for the supplied set of speakers (spk2utt option). Reads Gaussian-level posteriors. Writes to a table of matrices. Usage: gmm-est-fmllr-gpost [options] <model-in> <feature-rspecifier> <gpost-rspecifier> <transform-wspecifier> |
gmm-est-fmllr | Estimate global fMLLR transforms, either per utterance or for the supplied set of speakers (spk2utt option). Reads posteriors (on transition-ids). Writes to a table of matrices. Usage: gmm-est-fmllr [options] <model-in> <feature-rspecifier> <post-rspecifier> <transform-wspecifier> |
gmm-est-regtree-fmllr-ali | Compute FMLLR transforms per-utterance (default) or per-speaker for the supplied set of speakers (spk2utt option). Note: writes RegtreeFmllrDiagGmm objects Usage: gmm-est-regtree-fmllr-ali [options] <model-in> <feature-rspecifier> <alignments-rspecifier> <regression-tree> <transforms-wspecifier> |
gmm-est-regtree-mllr | Compute MLLR transforms per-utterance (default) or per-speaker for the supplied set of speakers (spk2utt option). Note: writes RegtreeMllrDiagGmm objects Usage: gmm-est-regtree-mllr [options] <model-in> <feature-rspecifier> <posteriors-rspecifier> <regression-tree> <transforms-wspecifier> |
gmm-compute-likes | Compute log-likelihoods from GMM-based model (outputs matrices of log-likelihoods indexed by (frame, pdf) Usage: gmm-compute-likes [options] model-in features-rspecifier likes-wspecifier |
gmm-decode-faster-regtree-mllr | Decode features using GMM-based model. Usage: gmm-decode-faster-regtree-mllr [options] model-in fst-in regtree-in features-rspecifier transforms-rspecifier words-wspecifier [alignments-wspecifier] |
gmm-latgen-simple | Generate lattices using GMM-based model. Usage: gmm-latgen-simple [options] model-in fst-in features-rspecifier lattice-wspecifier [ words-wspecifier [alignments-wspecifier] ] |
gmm-rescore-lattice | Replace the acoustic scores on a lattice using a new model. Usage: gmm-rescore-lattice [options] <model-in> <lattice-rspecifier> <feature-rspecifier> <lattice-wspecifier> e.g.: gmm-rescore-lattice 1.mdl ark:1.lats scp:trn.scp ark:2.lats |
gmm-decode-biglm-faster | Decode features using GMM-based model. User supplies LM used to generate decoding graph, and desired LM; this decoder applies the difference during decoding Usage: gmm-decode-biglm-faster [options] model-in fst-in oldlm-fst-in newlm-fst-in features-rspecifier words-wspecifier [alignments-wspecifier [lattice-wspecifier]] |
gmm-est-gaussians-ebw | Do EBW update for MMI, MPE or MCE discriminative training. Numerator stats should already be I-smoothed (e.g. use gmm-ismooth-stats) Usage: gmm-est-gaussians-ebw [options] <model-in> <stats-num-in> <stats-den-in> <model-out> e.g.: gmm-est-gaussians-ebw 1.mdl num.acc den.acc 2.mdl |
gmm-est-weights-ebw | Do EBW update on weights for MMI, MPE or MCE discriminative training. Numerator stats should not be I-smoothed Usage: gmm-est-weights-ebw [options] <model-in> <stats-num-in> <stats-den-in> <model-out> e.g.: gmm-est-weights-ebw 1.mdl num.acc den.acc 2.mdl |
gmm-latgen-faster | Generate lattices using GMM-based model. Usage: gmm-latgen-faster [options] model-in (fst-in|fsts-rspecifier) features-rspecifier lattice-wspecifier [ words-wspecifier [alignments-wspecifier] ] |
gmm-copy | Copy GMM based model (and possibly change binary/text format) Usage: gmm-copy [options] <model-in> <model-out> e.g.: gmm-copy --binary=false 1.mdl 1_txt.mdl |
gmm-global-acc-stats | Accumulate stats for training a diagonal-covariance GMM. Usage: gmm-global-acc-stats [options] <model-in> <feature-rspecifier> <stats-out> e.g.: gmm-global-acc-stats 1.mdl scp:train.scp 1.acc |
gmm-global-est | Estimate a diagonal-covariance GMM from the accumulated stats. Usage: gmm-global-est [options] <model-in> <stats-in> <model-out> |
gmm-global-sum-accs | Sum multiple accumulated stats files for diagonal-covariance GMM training. Usage: gmm-global-sum-accs [options] stats-out stats-in1 stats-in2 ... |
gmm-gselect | Precompute Gaussian indices for pruning (e.g. in training UBMs, SGMMs, tied-mixture systems) For each frame, gives a list of the n best Gaussian indices, sorted from best to worst. See also: gmm-global-get-post, fgmm-global-gselect-to-post, copy-gselect, fgmm-gselect Usage: gmm-gselect [options] <model-in> <feature-rspecifier> <gselect-wspecifier> The --gselect option (which takes an rspecifier) limits selection to a subset of indices: e.g.: gmm-gselect "--gselect=ark:gunzip -c bigger.gselect.gz|" --n=20 1.gmm "ark:feature-command |" "ark,t:|gzip -c >gselect.1.gz" |
gmm-latgen-biglm-faster | Generate lattices using GMM-based model. User supplies LM used to generate decoding graph, and desired LM; this decoder applies the difference during decoding Usage: gmm-latgen-biglm-faster [options] model-in (fst-in|fsts-rspecifier) oldlm-fst-in newlm-fst-in features-rspecifier lattice-wspecifier [ words-wspecifier [alignments-wspecifier] ] |
gmm-ismooth-stats | Apply I-smoothing to statistics, e.g. for discriminative training Usage: gmm-ismooth-stats [options] [--smooth-from-model] [<src-stats-in>|<src-model-in>] <dst-stats-in> <stats-out> e.g.: gmm-ismooth-stats --tau=100 ml.acc num.acc smoothed.acc or: gmm-ismooth-stats --tau=50 --smooth-from-model 1.mdl num.acc smoothed.acc or: gmm-ismooth-stats --tau=100 num.acc num.acc smoothed.acc |
gmm-global-get-frame-likes | Print out per-frame log-likelihoods for each utterance, as an archive of vectors of floats. If --average=true, prints out the average per-frame log-likelihood for each utterance, as a single float. Usage: gmm-global-get-frame-likes [options] <model-in> <feature-rspecifier> <likes-out-wspecifier> e.g.: gmm-global-get-frame-likes 1.mdl scp:train.scp ark:1.likes |
gmm-global-est-fmllr | Estimate global fMLLR transforms, either per utterance or for the supplied set of speakers (spk2utt option). Reads features, and (with --weights option) weights for each frame (also see --gselect option) Usage: gmm-global-est-fmllr [options] <gmm-in> <feature-rspecifier> <transform-wspecifier> |
gmm-global-to-fgmm | Convert single diagonal-covariance GMM to single full-covariance GMM. Usage: gmm-global-to-fgmm [options] 1.gmm 1.fgmm |
gmm-global-acc-stats-twofeats | Accumulate stats for training a diagonal-covariance GMM, two-feature version First features are used to get posteriors, second to accumulate stats Usage: gmm-global-acc-stats-twofeats [options] <model-in> <feature1-rspecifier> <feature2-rspecifier> <stats-out> e.g.: gmm-global-acc-stats-twofeats 1.mdl scp:train.scp scp:train2.scp 1.acc |
gmm-global-copy | Copy a diagonal-covariance GMM Usage: gmm-global-copy [options] <model-in> <model-out> e.g.: gmm-global-copy --binary=false 1.model - | less |
gmm-fmpe-acc-stats | Accumulate stats for fMPE training, using GMM model. Note: this could be done using gmm-get-feat-deriv and fmpe-acc-stats (but you'd be computing the features twice). Features input should be pre-fMPE features. Usage: gmm-fmpe-acc-stats [options] <model-in> <fmpe-in> <feature-rspecifier> <gselect-rspecifier> <posteriors-rspecifier> <fmpe-stats-out> e.g.: gmm-fmpe-acc-stats --model-derivative 1.accs 1.mdl 1.fmpe "$feats" ark:1.gselect ark:1.post 1.fmpe_stats |
gmm-acc-stats2 | Accumulate stats for GMM training (from posteriors) This version writes two accumulators (e.g. num and den), and puts the positive accumulators in num, negative in den Usage: gmm-acc-stats2 [options] <model> <feature-rspecifier><posteriors-rspecifier> <num-stats-out> <den-stats-out> e.g.: gmm-acc-stats 1.mdl "$feats" ark:1.post 1.num_acc 1.den_acc |
gmm-init-model-flat | Initialize GMM, with Gaussians initialized to mean and variance of some provided example data (or to 0,1 if not provided: in that case, provide --dim option) Usage: gmm-init-model-flat [options] <tree-in> <topo-file> <model-out> [<features-rspecifier>] e.g.: gmm-init-model-flat tree topo 1.mdl ark:feats.scp |
gmm-info | Write to standard output various properties of GMM-based model Usage: gmm-info [options] <model-in> e.g.: gmm-info 1.mdl See also: gmm-global-info, am-info |
gmm-get-stats-deriv | Get statistics derivative for GMM models (used in fMPE/fMMI feature-space discriminative training) Usage: gmm-get-stats-deriv [options] <model-in> <num-stats-in> <den-stats-in> <ml-stats-in> <deriv-out> e.g. (for fMMI/fBMMI): gmm-get-stats-deriv 1.mdl 1.acc 2.mdl |
gmm-est-rescale | Do "re-scaling" re-estimation of GMM-based model (this update changes the model as features change, but preserves the difference between the model and the features, to keep the effect of any prior discriminative training). Used in fMPE. Does not update the transitions or weights. Usage: gmm-est-rescale [options] <model-in> <old-stats-in> <new-stats-in> <model-out> e.g.: gmm-est-rescale 1.mdl old.acc new.acc 2.mdl |
gmm-boost-silence | Modify GMM-based model to boost (by a certain factor) all probabilities associated with the specified phones (could be all silence phones, or just the ones used for optional silence). Note: this is done by modifying the GMM weights. If the silence model shares a GMM with other models, then it will modify the GMM weights for all models that may correspond to silence. Usage: gmm-boost-silence [options] <silence-phones-list> <model-in> <model-out> e.g.: gmm-boost-silence --boost=1.5 1:2:3 1.mdl 1_boostsil.mdl |
gmm-basis-fmllr-accs | Accumulate gradient scatter from training set, either per utterance or for the supplied set of speakers (spk2utt option). Reads posterior to accumulate fMLLR stats for each speaker/utterance. Writes gradient scatter matrix. Usage: gmm-basis-fmllr-accs [options] <model-in> <feature-rspecifier><post-rspecifier> <accs-wspecifier> |
gmm-basis-fmllr-training | Estimate fMLLR basis representation. Reads a set of gradient scatter accumulations. Outputs basis matrices. Usage: gmm-basis-fmllr-training [options] <model-in> <basis-wspecifier> <accs-in1> <accs-in2> ... |
gmm-est-basis-fmllr | Perform basis fMLLR adaptation in testing stage, either per utterance or for the supplied set of speakers (spk2utt option). Reads posterior to accumulate fMLLR stats for each speaker/utterance. Writes to a table of matrices. Usage: gmm-est-basis-fmllr [options] <model-in> <basis-rspecifier> <feature-rspecifier> <post-rspecifier> <transform-wspecifier> |
gmm-est-map | Do Maximum A Posteriori re-estimation of GMM-based acoustic model Usage: gmm-est-map [options] <model-in> <stats-in> <model-out> e.g.: gmm-est-map 1.mdl 1.acc 2.mdl |
gmm-adapt-map | Compute MAP estimates per-utterance (default) or per-speaker for the supplied set of speakers (spk2utt option). This will typically be piped into gmm-latgen-map Usage: gmm-adapt-map [options] <model-in> <feature-rspecifier> <posteriors-rspecifier> <map-am-wspecifier> |
gmm-latgen-map | Decode features using GMM-based model. Note: the input <gmms-rspecifier> will typically be piped in from gmm-est-map. Note: <model-in> is only needed for the transition-model, which isn't included in <gmms-rspecifier>. Usage: gmm-latgen-map [options] <model-in> <gmms-rspecifier> <fsts-rxfilename|fsts-rspecifier> <features-rspecifier> <lattice-wspecifier> [ <words-wspecifier> [ <alignments-wspecifier> ] ] |
gmm-basis-fmllr-accs-gpost | Accumulate gradient scatter from training set, either per utterance or for the supplied set of speakers (spk2utt option). Reads Gaussian-level posterior to accumulate fMLLR stats for each speaker/utterance. Writes gradient scatter matrix. Usage: gmm-basis-fmllr-accs-gpost [options] <model-in> <feature-rspecifier><post-rspecifier> <accs-wspecifier> |
gmm-est-basis-fmllr-gpost | Perform basis fMLLR adaptation in testing stage, either per utterance or for the supplied set of speakers (spk2utt option). Reads Gaussian-level posterior to accumulate fMLLR stats for each speaker/utterance. Writes to a table of matrices. Usage: gmm-est-basis-fmllr-gpost [options] <model-in> <basis-rspecifier> <feature-rspecifier> <post-rspecifier> <transform-wspecifier> |
gmm-latgen-faster-parallel | Decode features using GMM-based model. Uses multiple decoding threads, but interface and behavior is otherwise the same as gmm-latgen-faster Usage: gmm-latgen-faster-parallel [options] model-in (fst-in|fsts-rspecifier) features-rspecifier lattice-wspecifier [ words-wspecifier [alignments-wspecifier] ] |
gmm-est-fmllr-raw | Estimate fMLLR transforms in the space before splicing and linear transforms such as LDA+MLLT, but using models in the space transformed by these transforms Requires the original spliced features, and the full LDA+MLLT (or similar) matrix including the 'rejected' rows (see the program get-full-lda-mat) Usage: gmm-est-fmllr-raw [options] <model-in> <full-lda-mat-in> <feature-rspecifier> <post-rspecifier> <transform-wspecifier> |
gmm-est-fmllr-raw-gpost | Estimate fMLLR transforms in the space before splicing and linear transforms such as LDA+MLLT, but using models in the space transformed by these transforms Requires the original spliced features, and the full LDA+MLLT (or similar) matrix including the 'rejected' rows (see the program get-full-lda-mat). Reads in Gaussian-level posteriors. Usage: gmm-est-fmllr-raw-gpost [options] <model-in> <full-lda-mat-in> <feature-rspecifier> <gpost-rspecifier> <transform-wspecifier> |
gmm-global-init-from-feats | This program initializes a single diagonal GMM and does multiple iterations of training from features stored in memory. Usage: gmm-global-init-from-feats [options] <feature-rspecifier> <model-out> e.g.: gmm-global-init-from-feats scp:train.scp 1.mdl |
gmm-global-info | Write to standard output various properties of GMM model This is for a single diagonal GMM, e.g. as used for a UBM. Usage: gmm-global-info [options] <gmm> e.g.: gmm-global-info 1.dubm See also: gmm-info, am-info |
gmm-latgen-faster-regtree-fmllr | Generate lattices using GMM-based model and RegTree-FMLLR adaptation. Usage: gmm-latgen-faster-regtree-fmllr [options] model-in regtree-in (fst-in|fsts-rspecifier) features-rspecifier transform-rspecifier lattice-wspecifier [ words-wspecifier [alignments-wspecifier] ] |
gmm-est-fmllr-global | Estimate global fMLLR transforms, either per utterance or for the supplied set of speakers (spk2utt option). This version is for when you have a single global GMM, e.g. a UBM. Writes to a table of matrices. Usage: gmm-est-fmllr-global [options] <gmm-in> <feature-rspecifier> <transform-wspecifier> e.g.: gmm-est-fmllr-global 1.ubm scp:feats.scp ark:trans.1 |
gmm-acc-mllt-global | Accumulate MLLT (global STC) statistics: this version is for where there is one global GMM (e.g. a UBM) Usage: gmm-acc-mllt-global [options] <gmm-in> <feature-rspecifier> <stats-out> e.g.: gmm-acc-mllt-global 1.dubm scp:feats.scp 1.macc |
gmm-transform-means-global | Transform GMM means with linear or affine transform This version for a single GMM, e.g. a UBM. Useful when estimating MLLT/STC Usage: gmm-transform-means-global <transform-matrix> <gmm-in> <gmm-out> e.g.: gmm-transform-means-global 2.mat 2.dubm 3.dubm |
gmm-global-get-post | Precompute Gaussian indices and convert immediately to top-n posteriors (useful in iVector extraction with diagonal UBMs) See also: gmm-gselect, fgmm-gselect, fgmm-global-gselect-to-post (e.g. in training UBMs, SGMMs, tied-mixture systems) For each frame, gives a list of the n best Gaussian indices, sorted from best to worst. Usage: gmm-global-get-post [options] <model-in> <feature-rspecifier> <post-wspecifier> e.g.: gmm-global-get-post --n=20 1.gmm "ark:feature-command |" "ark,t:|gzip -c >post.1.gz" |
gmm-global-gselect-to-post | Given features and Gaussian-selection (gselect) information for a diagonal-covariance GMM, output per-frame posteriors for the selected indices. Also supports pruning the posteriors if they are below a stated threshold, (and renormalizing the rest to sum to one) See also: gmm-gselect, fgmm-gselect, gmm-global-get-post, fgmm-global-gselect-to-post Usage: gmm-global-gselect-to-post [options] <model-in> <feature-rspecifier> <gselect-rspecifier> <post-wspecifier> e.g.: gmm-global-gselect-to-post 1.dubm ark:- 'ark:gunzip -c 1.gselect|' ark:- |
gmm-global-est-lvtln-trans | Estimate linear-VTLN transforms, either per utterance or for the supplied set of speakers (spk2utt option); this version is for a global diagonal GMM (also known as a UBM). Reads posteriors indicating Gaussian indexes in the UBM. Usage: gmm-global-est-lvtln-trans [options] <gmm-in> <lvtln-in> <feature-rspecifier> <gpost-rspecifier> <lvtln-trans-wspecifier> [<warp-wspecifier>] e.g.: gmm-global-est-lvtln-trans 0.ubm 0.lvtln '$feats' ark,s,cs:- ark:1.trans ark:1.warp (where the <gpost-rspecifier> will likely come from gmm-global-get-post or gmm-global-gselect-to-post |
gmm-init-biphone | Initialize a biphone context-dependency tree with all the leaves (i.e. a full tree). Intended for end-to-end tree-free models. Usage: gmm-init-biphone <topology-in> <dim> <model-out> <tree-out> e.g.: gmm-init-biphone topo 39 bi.mdl bi.tree |
ivector-extractor-init | Initialize ivector-extractor Usage: ivector-extractor-init [options] <fgmm-in> <ivector-extractor-out> e.g.: ivector-extractor-init 4.fgmm 0.ie |
ivector-extractor-copy | Copy the i-vector extractor to a text file Usage: ivector-extractor-copy [options] <ivector-extractor-in> <ivector-extractor-out> e.g.: ivector-extractor-copy --binary=false 0.ie 0_txt.ie |
ivector-extractor-acc-stats | Accumulate stats for iVector extractor training Reads in features and Gaussian-level posteriors (typically from a full GMM) Supports multiple threads, but won't be able to make use of too many at a time (e.g. more than about 4) Usage: ivector-extractor-acc-stats [options] <model-in> <feature-rspecifier><posteriors-rspecifier> <stats-out> e.g.: fgmm-global-gselect-to-post 1.fgmm '$feats' 'ark:gunzip -c gselect.1.gz|' ark:- | \ ivector-extractor-acc-stats 2.ie '$feats' ark,s,cs:- 2.1.acc |
ivector-extractor-sum-accs | Sum accumulators for training of iVector extractor Usage: ivector-extractor-sum-accs [options] <stats-in1> <stats-in2> ... <stats-inN> <stats-out> |
ivector-extractor-est | Do model re-estimation of iVector extractor (this is the update phase of a single pass of E-M) Usage: ivector-extractor-est [options] <model-in> <stats-in> <model-out> |
ivector-extract | Extract iVectors for utterances, using a trained iVector extractor, and features and Gaussian-level posteriors Usage: ivector-extract [options] <model-in> <feature-rspecifier> <posteriors-rspecifier> <ivector-wspecifier> e.g.: fgmm-global-gselect-to-post 1.ubm '$feats' 'ark:gunzip -c gselect.1.gz|' ark:- | \ ivector-extract final.ie '$feats' ark,s,cs:- ark,t:ivectors.1.ark |
compute-vad | This program reads input features and writes out, for each utterance, a vector of floats that are 1.0 if we judge the frame voiced and 0.0 otherwise. The algorithm is very simple and is based on thresholding the log mel energy (and taking the consensus of threshold decisions within a window centered on the current frame). See the options for more details, and egs/sid/s1/run.sh for examples; this program is intended for use in speaker-ID. Usage: compute-vad [options] <feats-rspecifier> <vad-wspecifier> e.g.: compute-vad scp:feats.scp ark:vad.ark |
select-voiced-frames | Select a subset of frames of the input files, based on the output of compute-vad or a similar program (a vector of length num-frames, containing 1.0 for voiced, 0.0 for unvoiced). Caution: this is mostly useful only in speaker identification applications. Usage: select-voiced-frames [options] <feats-rspecifier> <vad-rspecifier> <feats-wspecifier> E.g.: select-voiced-frames [options] scp:feats.scp scp:vad.scp ark:- |
compute-vad-from-frame-likes | This program computes frame-level voice activity decisions from a set of input frame-level log-likelihoods. Usually, these log-likelihoods are the output of fgmm-global-get-frame-likes. Frames are assigned labels according to the class for which the log-likelihood (optionally weighted by a prior) is maximal. The class labels are determined by the order of inputs on the command line. See options for more details. Usage: compute-vad-from-frame-likes [options] <likes-rspecifier-1> ... <likes-rspecifier-n> <vad-wspecifier> e.g.: compute-vad-from-frame-likes --map=label_map.txt scp:likes1.scp scp:likes2.scp ark:vad.ark See also: fgmm-global-get-frame-likes, compute-vad, merge-vads |
merge-vads | This program merges two archives of per-frame weights representing voice activity decisions. By default, the program assumes that the input vectors consist of floats that are 0.0 if a frame is judged as nonspeech and 1.0 if it is considered speech. The default behavior produces a frame-level decision of 1.0 if both input frames are 1.0, and 0.0 otherwise. Additional classes (e.g., 2.0 for music) can be handled using the "map" option. Usage: merge-vads [options] <vad-rspecifier-1> <vad-rspecifier-2> <vad-wspecifier> e.g.: merge-vads [options] scp:vad_energy.scp scp:vad_gmm.scp ark:vad.ark See also: compute-vad-from-frame-likes, compute-vad, ali-to-post, post-to-weights |
ivector-normalize-length | Normalize length of iVectors to equal sqrt(feature-dimension) Usage: ivector-normalize-length [options] <ivector-rspecifier> <ivector-wspecifier> e.g.: ivector-normalize-length ark:ivectors.ark ark:normalized_ivectors.ark |
ivector-transform | Multiplies iVectors (on the left) by a supplied transformation matrix Usage: ivector-transform [options] <matrix-in> <ivector-rspecifier><ivector-wspecifier> e.g.: ivector-transform transform.mat ark:ivectors.ark ark:transformed_ivectors.ark |
ivector-compute-dot-products | Computes dot-products between iVectors; useful in application of an iVector-based system. The 'trials-file' has lines of the form <key1> <key2> and the output will have the form <key1> <key2> [<dot-product>] (if either key could not be found, the dot-product field in the output will be absent, and this program will print a warning) Usage: ivector-compute-dot-products [options] <trials-in> <ivector1-rspecifier> <ivector2-rspecifier> <scores-out> e.g.: ivector-compute-dot-products trials ark:train_ivectors.scp ark:test_ivectors.scp trials.scored See also: ivector-plda-scoring |
ivector-mean | With 3 or 4 arguments, averages iVectors over all the utterances of each speaker using the spk2utt file. Input the spk2utt file and a set of iVectors indexed by utterance; output is iVectors indexed by speaker. If 4 arguments are given, extra argument is a table for the number of utterances per speaker (can be useful for PLDA). If 2 arguments are given, computes the mean of all input files and writes out the mean vector. Usage: ivector-mean <spk2utt-rspecifier> <ivector-rspecifier> <ivector-wspecifier> [<num-utt-wspecifier>] or: ivector-mean <ivector-rspecifier> <mean-wxfilename> e.g.: ivector-mean data/spk2utt exp/ivectors.ark exp/spk_ivectors.ark exp/spk_num_utts.ark or: ivector-mean exp/ivectors.ark exp/mean.vec See also: ivector-subtract-global-mean |
ivector-compute-lda | Compute an LDA matrix for iVector system. Reads in iVectors per utterance, and an utt2spk file which it uses to help work out the within-speaker and between-speaker covariance matrices. Outputs an LDA projection to a specified dimension. By default it will normalize so that the projected within-class covariance is unit, but if you set --normalize-total-covariance to true, it will normalize the total covariance. Note: the transform we produce is actually an affine transform which will also set the global mean to zero. Usage: ivector-compute-lda [options] <ivector-rspecifier> <utt2spk-rspecifier> <lda-matrix-out> e.g.: ivector-compute-lda ark:ivectors.ark ark:utt2spk lda.mat |
ivector-compute-plda | Computes a Plda object (for Probabilistic Linear Discriminant Analysis) from a set of iVectors. Uses speaker information from a spk2utt file to compute within and between class variances. Usage: ivector-compute-plda [options] <spk2utt-rspecifier> <ivector-rspecifier> <plda-out> e.g.: ivector-compute-plda ark:spk2utt ark,s,cs:ivectors.ark plda |
ivector-copy-plda | Copy a PLDA object, possibly applying smoothing to the within-class covariance Usage: ivector-copy-plda <plda-in> <plda-out> e.g.: ivector-copy-plda --smoothing=0.1 plda plda.smooth0.1 |
compute-eer | Computes Equal Error Rate Input is a series of lines, each with two fields. The first field must be a numeric score, and the second either the string 'target' or 'nontarget'. The EER will be printed to the standard output. Usage: compute-eer <scores-in> e.g.: compute-eer - |
ivector-subtract-global-mean | Copies a table of iVectors but subtracts the global mean as it does so. The mean may be specified as the first argument; if not, the sum of the input iVectors is used. Usage: ivector-subtract-global-mean <ivector-rspecifier> <ivector-wspecifier> or: ivector-subtract-global-mean <mean-rxfliename> <ivector-rspecifier> <ivector-wspecifier> e.g.: ivector-subtract-global-mean scp:ivectors.scp ark:- or: ivector-subtract-global-mean mean.vec scp:ivectors.scp ark:- See also: ivector-mean |
ivector-plda-scoring | Computes log-likelihood ratios for trials using PLDA model Note: the 'trials-file' has lines of the form <key1> <key2> and the output will have the form <key1> <key2> [<dot-product>] (if either key could not be found, the dot-product field in the output will be absent, and this program will print a warning) For training examples, the input is the iVectors averaged over speakers; a separate archive containing the number of utterances per speaker may be optionally supplied using the --num-utts option; this affects the PLDA scoring (if not supplied, it defaults to 1 per speaker). Usage: ivector-plda-scoring <plda> <train-ivector-rspecifier> <test-ivector-rspecifier> <trials-rxfilename> <scores-wxfilename> e.g.: ivector-plda-scoring --num-utts=ark:exp/train/num_utts.ark plda ark:exp/train/spk_ivectors.ark ark:exp/test/ivectors.ark trials scores See also: ivector-compute-dot-products, ivector-compute-plda |
logistic-regression-train | Trains a model using Logistic Regression with L-BFGS from a set of vectors. The class labels in <classes-rspecifier> must be a set of integers such that there are no gaps in its range and the smallest label must be 0. Usage: logistic-regression-train <vector-rspecifier> <classes-rspecifier> <model-out> |
logistic-regression-eval | Evaluates a model on input vectors and outputs either log posterior probabilities or scores. Usage1: logistic-regression-eval <model> <input-vectors-rspecifier> <output-log-posteriors-wspecifier> Usage2: logistic-regression-eval <model> <trials-file> <input-vectors-rspecifier> <output-scores-file> |
logistic-regression-copy | Copy a logistic-regression model, possibly changing the binary mode; also supports the --scale-priors option which can scale the prior probabilities the model assigns to different classes (e.g., you can remove the effect of unbalanced training data by scaling by the inverse of the class priors in the training data) Usage: logistic-regression-copy [options] <model-in> <model-out> e.g.: echo '[ 2.6 1.7 3.9 1.24 7.5 ]' | logistic-regression-copy --scale-priors=- \ 1.model scaled_priors.mdl |
ivector-extract-online | Extract iVectors for utterances, using a trained iVector extractor, and features and Gaussian-level posteriors. This version extracts an iVector every n frames (see the --ivector-period option), by including all frames up to that point in the utterance. This is designed to correspond with what will happen in a streaming decoding scenario; the iVectors would be used in neural net training. The iVectors are output as an archive of matrices, indexed by utterance-id; each row corresponds to an iVector. See also ivector-extract-online2 Usage: ivector-extract-online [options] <model-in> <feature-rspecifier><posteriors-rspecifier> <ivector-wspecifier> e.g.: gmm-global-get-post 1.dubm '$feats' ark:- | \ ivector-extract-online --ivector-period=10 final.ie '$feats' ark,s,cs:- ark,t:ivectors.1.ark |
ivector-adapt-plda | Adapt a PLDA object using unsupervised adaptation-data iVectors from a different domain to the training data. Usage: ivector-adapt-plda [options] <plda-in> <ivectors-rspecifier> <plda-out> e.g.: ivector-adapt-plda plda ark:ivectors.ark plda.adapted |
ivector-plda-scoring-dense | Perform PLDA scoring for speaker diarization. The input reco2utt should be of the form <recording-id> <seg1> <seg2> ... <segN> and there should be one iVector for each segment. PLDA scoring is performed between all pairs of iVectors in a recording and outputs an archive of score matrices, one for each recording-id. The rows and columns of the the matrix correspond the sorted order of the segments. Usage: ivector-plda-scoring-dense [options] <plda> <reco2utt> <ivectors-rspecifier> <scores-wspecifier> e.g.: ivector-plda-scoring-dense plda reco2utt scp:ivectors.scp ark:scores.ark ark,t:ivectors.1.ark |
agglomerative-cluster | Cluster utterances by similarity score, used in diarization. Takes a table of score matrices indexed by recording, with the rows/columns corresponding to the utterances of that recording in sorted order and a reco2utt file that contains the mapping from recordings to utterances, and outputs a list of labels in the form <utt> <label>. Clustering is done using agglomerative hierarchical clustering with a score threshold as stop criterion. By default, the program reads in similarity scores, but with --read-costs=true the scores are interpreted as costs (i.e. a smaller value indicates utterance similarity). Usage: agglomerative-cluster [options] <scores-rspecifier> <reco2utt-rspecifier> <labels-wspecifier> e.g.: agglomerative-cluster ark:scores.ark ark:reco2utt ark,t:labels.txt |
lattice-to-kws-index | Create an inverted index of the given lattices. The output index is in the T*T*T semiring. For details for the semiring, please refer to Dogan Can and Murat Saraclar's paper named "Lattice Indexing for Spoken Term Detection" Usage: lattice-to-kws-index [options] <utter-symtab-rspecifier> <lattice-rspecifier> <index-wspecifier> e.g.: lattice-to-kws-index ark:utter.symtab ark:1.lats ark:global.idx |
kws-index-union | Take a union of the indexed lattices. The input index is in the T*T*T semiring and the output index is also in the T*T*T semiring. At the end of this program, encoded epsilon removal, determinization and minimization will be applied. Usage: kws-index-union [options] index-rspecifier index-wspecifier e.g.: kws-index-union ark:input.idx ark:global.idx |
transcripts-to-fsts | Build a linear acceptor for each transcription in the archive. Read in the transcriptions in archive format and write out the linear acceptors in archive format with the same key. The costs of the arcs are set to be zero. The cost of the acceptor can be changed by supplying the costs archive. In that case, the first arc's cost will be set to the value obtained from the archive, i.e. the total cost will be equal to cost. The cost archive can be sparse, i.e. does not have to include zero-cost transcriptions. It is prefered for the archive to be sorted (for efficiency). Usage: transcripts-to-fsts [options] <transcriptions-rspecifier> [<costs-rspecifier>] <fsts-wspecifier> e.g.: transcripts-to-fsts ark:train.tra ark,s,cs,t:costs.txt ark:train.fsts |
kws-search | Search the keywords over the index. This program can be executed in parallel, either on the index side or the keywords side; we use a script to combine the final search results. Note that the index archive has a single key "global". Search has one or two outputs. The first one is mandatory and will contain the seach output, i.e. list of all found keyword instances The file is in the following format: kw_id utt_id beg_frame end_frame neg_logprob e.g.: KW105-0198 7 335 376 1.91254 The second parameter is optional and allows the user to gather more statistics about the individual instances from the posting list. Remember "keyword" is an FST and as such, there can be multiple paths matching in the keyword and in the lattice index in that given time period. The stats output will provide all matching paths each with the appropriate score. The format is as follows: kw_id utt_id beg_frame end_frame neg_logprob 0 w_id1 w_id2 ... 0 e.g.: KW105-0198 7 335 376 16.01254 0 5766 5659 0 Usage: kws-search [options] <index-rspecifier> <keywords-rspecifier> <results-wspecifier> [<stats_wspecifier>] e.g.: kws-search ark:index.idx ark:keywords.fsts ark:results ark:stats |
generate-proxy-keywords | Convert the keywords into in-vocabulary words using the given phone level edit distance fst (E.fst). The large lexicon (L2.fst) and inverted small lexicon (L1'.fst) are also expected to be present. We actually use the composed FST L2xE.fst to be more efficient. Ideally we should have used L2xExL1'.fst but this is quite computationally expensive at command level. Keywords.int is in the transcription format. If kwlist-wspecifier is given, the program also prints out the proxy fst in a format where each line is "kwid weight proxy". Usage: generate-proxy-keywords [options] <L2xE.fst> <L1'.fst> \ <keyword-rspecifier> <proxy-wspecifier> [kwlist-wspecifier] e.g.: generate-proxy-keywords L2xE.fst L1'.fst ark:keywords.int \ ark:proxy.fsts [ark,t:proxy.kwlist.txt] |
compute-atwv | Computes the Actual Term-Weighted Value and prints it. Usage: compute-atwv [options] <nof-trials> <ref-rspecifier> <hyp-rspecifier> [alignment-csv-filename] e.g.: compute-atwv 32485.4 ark:ref.1 ark:hyp.1 ali.csv or: compute-atwv 32485.4 ark:ref.1 ark:hyp.1 NOTES: a) the number of trials is usually equal to the size of the searched collection in seconds b the ref-rspecifier/hyp-rspecifier are the kaldi IO specifiers for both the reference and the hypotheses (found hits), respectively The format is the same for both of them. Each line is of the following format <KW-ID> <utterance-id> <start-frame> <end-frame> <score> e.g.: KW106-189 348 459 560 0.8 b) the alignment-csv-filename is an optional parameter. If present, the alignment i.e. detailed information about what hypotheses match up with which reference entries will be generated. The alignemnt file format is equivalent to the alignment file produced using the F4DE tool. However, we do not set some fields and the utterance identifiers are numeric. You can use the script utils/int2sym.pl and the utterance and keyword maps to convert the numerical ids into text form c) the scores are expected to be probabilities. Please note that the output from the kws-search is in -log(probability). d) compute-atwv does not perform any score normalization (it's just for scoring purposes). Without score normalization/calibration the performance of the search will be quite poor. |
print-proxy-keywords | Reads in the proxy keywords FSTs and print them to a file where each line is "kwid w1 w2 .. 2n" Usage: print-proxy-keywords [options] <proxy-rspecifier> <kwlist-wspecifier> [<cost-wspecifier>]] e.g.: print-proxy-keywords ark:proxy.fsts ark,t:kwlist.txt ark,t:costs.txt |
lattice-best-path | Generate 1-best path through lattices; output as transcriptions and alignments Note: if you want output as FSTs, use lattice-1best; if you want output with acoustic and LM scores, use lattice-1best | nbest-to-linear Usage: lattice-best-path [options] <lattice-rspecifier> [ <transcriptions-wspecifier> [ <alignments-wspecifier>] ] e.g.: lattice-best-path --acoustic-scale=0.1 ark:1.lats 'ark,t:|int2sym.pl -f 2- words.txt > text' ark:1.ali |
lattice-prune | Apply beam pruning to lattices Usage: lattice-prune [options] lattice-rspecifier lattice-wspecifier e.g.: lattice-prune --acoustic-scale=0.1 --beam=4.0 ark:1.lats ark:pruned.lats |
lattice-equivalent | Test whether sets of lattices are equivalent (return with status 0 if all were equivalent, 1 otherwise, -1 on error) Usage: lattice-equivalent [options] lattice-rspecifier1 lattice-rspecifier2 e.g.: lattice-equivalent ark:1.lats ark:2.lats |
lattice-to-nbest | Work out N-best paths in lattices and write out as FSTs Note: only guarantees distinct word sequences if distinct paths in input lattices had distinct word-sequences (this will not be true if you produced lattices with --determinize-lattice=false, i.e. state-level lattices). Usage: lattice-to-nbest [options] <lattice-rspecifier> <lattice-wspecifier> e.g.: lattice-to-nbest --acoustic-scale=0.1 --n=10 ark:1.lats ark:nbest.lats |
lattice-lmrescore | Add lm_scale * [cost of best path through LM FST] to graph-cost of paths through lattice. Does this by composing with LM FST, then lattice-determinizing (it has to negate weights first if lm_scale<0) Usage: lattice-lmrescore [options] <lattice-rspecifier> <lm-fst-in> <lattice-wspecifier> e.g.: lattice-lmrescore --lm-scale=-1.0 ark:in.lats 'fstproject --project_output=true data/lang/G.fst|' ark:out.lats |
lattice-scale | Apply scaling to lattice weights Usage: lattice-scale [options] lattice-rspecifier lattice-wspecifier e.g.: lattice-scale --lm-scale=0.0 ark:1.lats ark:scaled.lats |
lattice-union | Takes two archives of lattices (indexed by utterances) and computes the union of the individual lattice pairs (one from each archive). Usage: lattice-union [options] lattice-rspecifier1 lattice-rspecifier2 lattice-wspecifier e.g.: lattice-union ark:den.lats ark:num.lats ark:union.lats |
lattice-to-post | Do forward-backward and collect posteriors over lattices. Usage: lattice-to-post [options] lats-rspecifier posts-wspecifier [loglikes-wspecifier] e.g.: lattice-to-post --acoustic-scale=0.1 ark:1.lats ark:1.post See also: lattice-to-ctm-conf, post-to-pdf-post, lattice-arc-post |
lattice-determinize | This program is deprecated, please used lattice-determinize-pruned. lattice-determinize lattices (and apply a pruning beam) (see http://kaldi-asr.org/doc/lattices.html for more explanation) note: this program is tyically only useful if you generated state-level lattices, e.g. called gmm-latgen-simple with --determinize=false Usage: lattice-determinize [options] lattice-rspecifier lattice-wspecifier e.g.: lattice-determinize --acoustic-scale=0.1 --beam=15.0 ark:1.lats ark:det.lats |
lattice-oracle | Finds the path having the smallest edit-distance between a lattice and a reference string. Usage: lattice-oracle [options] <test-lattice-rspecifier> \ <reference-rspecifier> \ <transcriptions-wspecifier> \ [<edit-distance-wspecifier>] e.g.: lattice-oracle ark:lat.1 'ark:sym2int.pl -f 2- \ data/lang/words.txt <data/test/text|' ark,t:- Note the --write-lattices option by which you can write out the optimal path as a lattice. Note: you can use this program to compute the n-best oracle WER by first piping the input lattices through lattice-to-nbest and then nbest-to-lattice. |
lattice-rmali | Remove state-sequences from lattice weights Usage: lattice-rmali [options] lattice-rspecifier lattice-wspecifier e.g.: lattice-rmali ark:1.lats ark:proj.lats |
lattice-compose | Composes lattices (in transducer form, as type Lattice). Depending on the command-line arguments, either composes lattices with lattices, or lattices with FSTs (rspecifiers are assumed to be lattices, and rxfilenames are assumed to be FSTs, which have their weights interpreted as "graph weights" when converted into the Lattice format. Usage: lattice-compose [options] lattice-rspecifier1 (lattice-rspecifier2|fst-rxfilename2) lattice-wspecifier e.g.: lattice-compose ark:1.lats ark:2.lats ark:composed.lats or: lattice-compose ark:1.lats G.fst ark:composed.lats |
lattice-boost-ali | Boost graph likelihoods (decrease graph costs) by b * #frame-phone-errors on each arc in the lattice. Useful for discriminative training, e.g. boosted MMI. Modifies input lattices. This version takes the reference in the form of alignments. Needs the model (just the transitions) to transform pdf-ids to phones. Takes the --silence-phones option and these phones appearing in the lattice are always assigned zero error, or with the --max-silence-error option, at most this error-count per frame (--max-silence-error=1 is equivalent to not specifying --silence-phones). Usage: lattice-boost-ali [options] model lats-rspecifier ali-rspecifier lats-wspecifier e.g.: lattice-boost-ali --silence-phones=1:2:3 --b=0.05 1.mdl ark:1.lats ark:1.ali ark:boosted.lats |
lattice-copy | Copy lattices (e.g. useful for changing to text mode or changing format to standard from compact lattice.) The --include and --exclude options can be used to copy only a subset of lattices, where are the --include option specifies the whitelisted utterances that would be copied and --exclude option specifies the blacklisted utterances that would not be copied. Only one of --include and --exclude can be supplied. Usage: lattice-copy [options] lattice-rspecifier lattice-wspecifier e.g.: lattice-copy --write-compact=false ark:1.lats ark,t:text.lats See also: lattice-scale, lattice-to-fst, and the script egs/wsj/s5/utils/convert_slf.pl |
lattice-to-fst | Turn lattices into normal FSTs, retaining only the word labels By default, removes all weights and also epsilons (configure with with --acoustic-scale, --lm-scale and --rm-eps) Usage: lattice-to-fst [options] lattice-rspecifier fsts-wspecifier e.g.: lattice-to-fst ark:1.lats ark:1.fsts |
lattice-to-phone-lattice | Convert the words or transition-ids into phones, which are worked out from the transition-ids. If --replace-words=true (true by default), replaces the words with phones, otherwise replaces the transition-ids. If --replace-words=false, it will preserve the alignment of transition-ids/phones to words, so that if you do lattice-align-words | lattice-to-phone-lattice --replace-words=false, you can get the phones corresponding to each word in the lattice. Usage: lattice-to-phone-lattice [options] model lattice-rspecifier lattice-wspecifier e.g.: lattice-to-phone-lattice 1.mdl ark:1.lats ark:phones.lats See also: lattice-align-words, lattice-align-phones |
lattice-interp | Takes two archives of lattices (indexed by utterances) and combines the individual lattice pairs (one from each archive). Keeps the alignments from the first lattice. Equivalent to projecting the second archive on words (lattice-project), then composing the pairs of lattices (lattice-compose), then scaling graph and acoustic costs by 0.5 (lattice-scale). You can control the individual scales with --alpha, which is the scale of the first lattices (the second is 1-alpha). Usage: lattice-interp [options] lattice-rspecifier-a lattice-rspecifier-b lattice-wspecifier e.g.: lattice-compose ark:1.lats ark:2.lats ark:composed.lats |
lattice-project | Project lattices (in their transducer form); by default makes them word->word transducers (set --project-output=false for tid->tid). Usage: lattice-project [options] lattice-rspecifier lattice-wspecifier e.g.: lattice-project ark:1.lats ark:word2word.lats or: lattice-project --project-output=false ark:1.lats ark:tid2tid.lats |
lattice-add-trans-probs | Add transition probabilities into graph part of lattice scores, controlled by options --transition-scale and --self-loop-scale, which for compatibility with the original graph, would normally be set to the same values used in graph compilatoin Usage: lattice-add-trans-probs [options] model lattice-rspecifier lattice-wspecifier e.g.: lattice-add-trans-probs --transition-scale=1.0 --self-loop-scale=0.1 1.mdl ark:in.lats ark:out.lats |
lattice-difference | Compute FST difference on lattices (remove sequences in first lattice that appear in second lattice) Useful for the denominator lattice for MCE. Usage: lattice-difference [options] lattice1-rspecifier lattice2-rspecifier lattice-wspecifier e.g.: lattice-difference ark:den.lats ark:num.lats ark:den_mce.lats |
nbest-to-linear | Takes as input lattices/n-bests which must be linear (single path); convert from lattice to up to 4 archives containing transcriptions, alignments, and acoustic and LM costs (note: use ark:/dev/null for unwanted outputs) Usage: nbest-to-linear [options] <nbest-rspecifier> <alignments-wspecifier> [<transcriptions-wspecifier> [<lm-cost-wspecifier> [<ac-cost-wspecifier>]]] e.g.: lattice-to-nbest --n=10 ark:1.lats ark:- | \ nbest-to-linear ark:1.lats ark,t:1.ali 'ark,t:|int2sym.pl -f 2- words.txt > text' |
nbest-to-lattice | Read in a Table containing N-best entries from a lattices (i.e. individual lattices with a linear structure, one for each N-best entry, indexed by utt_id_a-1, utt_id_a-2, etc., and take the union of them for each utterance id (e.g. utt_id_a), outputting a lattice for each. Usage: nbest-to-lattice <nbest-rspecifier> <lattices-wspecifier> e.g.: nbest-to-lattice ark:1.nbest ark:1.lats |
latbin/lattice-1best.cc "lattice-1best" | Compute best path through lattices and write out as FSTs Note: differs from lattice-nbest with --n=1 because we won't append -1 to the utterance-ids. Differs from lattice-best-path because output is FST. Usage: lattice-1best [options] <lattice-rspecifier> <lattice-wspecifier> e.g.: lattice-1best --acoustic-scale=0.1 ark:1.lats ark:1best.lats |
linear-to-nbest | This does the opposite of nbest-to-linear. It takes 4 archives, containing alignments, word-sequences, and acoustic and LM costs, and turns it into a single archive containing FSTs with a linear structure. The program is called linear-to-nbest because very often the archives concerned will represent N-best lists Usage: linear-to-nbest [options] <alignments-rspecifier> <transcriptions-rspecifier> (<lm-cost-rspecifier>|'') (<ac-cost-rspecifier>|'') <nbest-wspecifier> Note: if the rspecifiers for lm-cost or ac-cost are the empty string, these value will default to zero. e.g.: linear-to-nbest ark:1.ali 'ark:sym2int.pl -f 2- words.txt text|' ark:1.lmscore ark:1.acscore ark:1.nbest |
lattice-mbr-decode | Do Minimum Bayes Risk decoding (decoding that aims to minimize the expected word error rate). Possible outputs include the 1-best path (i.e. the word-sequence, as a sequence of ints per utterance), the computed Bayes Risk for each utterance, and the sausage stats as (for each utterance) std::vector<std::vector<std::pair<int32, float> > > for which we use the same I/O routines as for posteriors (type Posterior). times-wspecifier writes pairs of (start-time, end-time) in frames, for each sausage position, or for each one-best entry if --one-best-times=true. Note: use ark:/dev/null or the empty string for unwanted outputs. Note: times will only be very meaningful if you first use lattice-word-align. If you need ctm-format output, don't use this program but use lattice-to-ctm-conf with --decode-mbr=true. Usage: lattice-mbr-decode [options] lattice-rspecifier transcriptions-wspecifier [ bayes-risk-wspecifier [ sausage-stats-wspecifier [ times-wspecifier] ] ] e.g.: lattice-mbr-decode --acoustic-scale=0.1 ark:1.lats 'ark,t:|int2sym.pl -f 2- words.txt > text' ark:/dev/null ark:1.sau |
lattice-align-words | Convert lattices so that the arcs in the CompactLattice format correspond with words (i.e. aligned with word boundaries). Note: it will generally be more efficient if you apply 'lattice-push' before this program. Usage: lattice-align-words [options] <word-boundary-file> <model> <lattice-rspecifier> <lattice-wspecifier> e.g.: lattice-align-words --silence-label=4320 --partial-word-label=4324 \ data/lang/phones/word_boundary.int final.mdl ark:1.lats ark:aligned.lats Note: word-boundary file has format (on each line): <integer-phone-id> [begin|end|singleton|internal|nonword] See also: lattice-align-words-lexicon, for use in cases where phones don't have word-position information. |
lattice-to-mpe-post | Do forward-backward and collect frame level MPE posteriors over lattices, which can be fed into gmm-acc-stats2 to do MPE traning. Caution: this is not really MPE, this is MPFE (minimum phone frame error). The posteriors may be positive or negative. Usage: lattice-to-mpe-post [options] <model> <num-posteriors-rspecifier> <lats-rspecifier> <posteriors-wspecifier> e.g.: lattice-to-mpe-post --acoustic-scale=0.1 1.mdl ark:num.post ark:1.lats ark:1.post |
lattice-copy-backoff | Copy a table of lattices (1st argument), but for any keys that appear in the table from the 2nd argument, use the one from the 2nd argument. If the sets of keys are identical, this is equivalent to copying the 2nd table. Note: the arguments are in this order due to the convention that sequential access is always over the 1st argument. Usage: lattice-copy-backoff [options] <lat-rspecifier1> <lat-rspecifier2> <lat-wspecifier> e.g.: lattice-copy-backoff ark:bad_but_complete.lat ark:good_but_incomplete.lat ark:out.lat |
nbest-to-ctm | Takes as input lattices which must be linear (single path), and must be in CompactLattice form where the transition-ids on the arcs have been aligned with the word boundaries... typically the input will be a lattice that has been piped through lattice-1best and then lattice-align-words. On the other hand, whenever we directly pipe the output of lattice-align-words-lexicon into nbest-to-ctm, we need to put the command `lattice-1best ark:- ark:-` between them, because even for linear lattices, lattice-align-words-lexicon can in certain cases produce non-linear outputs (due to disambiguity in the lexicon). It outputs ctm format (with integers in place of words), assuming the frame length is 0.01 seconds by default (change this with the --frame-length option). Note: the output is in the form <utterance-id> 1 <begin-time> <end-time> <word-id> and you can post-process this to account for segmentation issues and to convert ints to words; note, the times are relative to start of the utterance. Usage: nbest-to-ctm [options] <aligned-linear-lattice-rspecifier> <ctm-wxfilename> e.g.: lattice-1best --acoustic-weight=0.08333 ark:1.lats | \ lattice-align-words data/lang/phones/word_boundary.int exp/dir/final.mdl ark:- ark:- | \ nbest-to-ctm ark:- 1.ctm e.g.: lattice-align-words-lexicon data/lang/phones/align_lexicon.int exp/dir/final.mdl ark:1.lats ark:- | \ lattice-1best ark:- ark:- | \ nbest-to-ctm ark:- 1.ctm |
lattice-determinize-pruned | Determinize lattices, keeping only the best path (sequence of acoustic states) for each input-symbol sequence. This version does pruning as part of the determinization algorithm, which is more efficient and prevents blowup. See http://kaldi-asr.org/doc/lattices.html for more information on lattices. Usage: lattice-determinize-pruned [options] lattice-rspecifier lattice-wspecifier e.g.: lattice-determinize-pruned --acoustic-scale=0.1 --beam=6.0 ark:in.lats ark:det.lats |
lattice-to-ctm-conf | This tool turns a lattice into a ctm with confidences, based on the posterior probabilities in the lattice. The word sequence in the ctm is determined as follows. Firstly we determine the initial word sequence. In the 3-argument form, we read it from the <1best-rspecifier> input; otherwise it is the 1-best of the lattice. Then, if --decode-mbr=true, we iteratively refine the hypothesis using Minimum Bayes Risk decoding. (Note that the default value of decode_mbr is true. If you provide <1best-rspecifier> from MAP decoding, the output ctm from MBR decoding may be mismatched with the provided 1best hypothesis (the starting point of optimization). If you don't need confidences, you can do lattice-1best and pipe to nbest-to-ctm. The ctm this program produces will be relative to the utterance-id; a standard ctm relative to the filename can be obtained using utils/convert_ctm.pl. The times produced by this program will only be meaningful if you do lattice-align-words on the input. The <1-best-rspecifier> could be the output of utils/int2sym.pl or nbest-to-linear. Usage: lattice-to-ctm-conf [options] <lattice-rspecifier> \ <ctm-wxfilename> Usage: lattice-to-ctm-conf [options] <lattice-rspecifier> \ [<1best-rspecifier> [<times-rspecifier]] <ctm-wxfilename> e.g.: lattice-to-ctm-conf --acoustic-scale=0.1 ark:1.lats 1.ctm or: lattice-to-ctm-conf --acoustic-scale=0.1 --decode-mbr=false\ ark:1.lats ark:1.1best 1.ctm See also: lattice-mbr-decode, nbest-to-ctm, lattice-arc-post, steps/get_ctm.sh, steps/get_train_ctm.sh and utils/convert_ctm.pl. |
lattice-combine | Combine lattices generated by different systems by removing the total cost of all paths (backward cost) from individual lattices and doing a union of the reweighted lattices. Note: the acoustic and LM scales that this program applies are not removed before outputting the lattices. Intended for use in system combination prior to MBR decoding, see comments in code. Usage: lattice-combine [options] <lattice-rspecifier1> <lattice-rspecifier2> [<lattice-rspecifier3> ... ] <lattice-wspecifier> E.g.: lattice-combine 'ark:gunzip -c foo/lat.1.gz|' 'ark:gunzip -c bar/lat.1.gz|' ark:- | ... |
lattice-rescore-mapped | Replace the acoustic scores on a lattice using log-likelihoods read in as a matrix for each utterance, indexed (frame, pdf-id). This does the same as (e.g.) gmm-rescore-lattice, but from a matrix. The "mapped" means that the transition-model is used to map transition-ids to pdf-ids. (c.f. latgen-faster-mapped). Note: <transition-model-in> can be any type of model file, e.g. GMM-based or neural-net based; only the transition model is read. Usage: lattice-rescore-mapped [options] <transition-model-in> <lattice-rspecifier> <loglikes-rspecifier> <lattice-wspecifier> e.g.: nnet-logprob [args] .. | lattice-rescore-mapped final.mdl ark:1.lats ark:- ark:2.lats |
lattice-depth | Compute the lattice depths in terms of the average number of arcs that cross a frame. See also lattice-depth-per-frame Usage: lattice-depth <lattice-rspecifier> [<depth-wspecifier>] E.g.: lattice-depth ark:- ark,t:- |
lattice-align-phones | Convert lattices so that the arcs in the CompactLattice format correspond with phones. The output symbols are still words, unless you specify --replace-output-symbols=true Usage: lattice-align-phones [options] <model> <lattice-rspecifier> <lattice-wspecifier> e.g.: lattice-align-phones final.mdl ark:1.lats ark:phone_aligned.lats See also: lattice-to-phone-lattice, lattice-align-words, lattice-align-words-lexicon Note: if you just want the phone alignment from a lattice, the easiest path is lattice-1best | nbest-to-linear [keeping only alignment] | ali-to-phones If you want the words and phones jointly (i.e. pronunciations of words, with word alignment), try lattice-1best | nbest-to-prons |
lattice-to-smbr-post | Do forward-backward and collect frame level posteriors for the state-level minimum Bayes Risk criterion (SMBR), which is like MPE with the criterion at a context-dependent state level. The output may be fed into gmm-acc-stats2 or similar to train the models discriminatively. The posteriors may be positive or negative. Usage: lattice-to-smbr-post [options] <model> <num-posteriors-rspecifier> <lats-rspecifier> <posteriors-wspecifier> e.g.: lattice-to-smbr-post --acoustic-scale=0.1 1.mdl ark:num.post ark:1.lats ark:1.post |
lattice-determinize-pruned-parallel | Determinize lattices, keeping only the best path (sequence of acoustic states) for each input-symbol sequence. This is a version of lattice-determnize-pruned that accepts the --num-threads option. These programs do pruning as part of the determinization algorithm, which is more efficient and prevents blowup. See http://kaldi-asr.org/doc/lattices.html for more information on lattices. Usage: lattice-determinize-pruned-parallel [options] lattice-rspecifier lattice-wspecifier e.g.: lattice-determinize-pruned-parallel --acoustic-scale=0.1 --beam=6.0 ark:in.lats ark:det.lats |
lattice-add-penalty | Add word insertion penalty to the lattice. Note: penalties are negative log-probs, base e, and are added to the 'language model' part of the cost. Usage: lattice-add-penalty [options] <lattice-rspecifier> <lattice-wspecifier> e.g.: lattice-add-penalty --word-ins-penalty=1.0 ark:- ark:- |
lattice-align-words-lexicon | Convert lattices so that the arcs in the CompactLattice format correspond with words (i.e. aligned with word boundaries). This is the newest form, that reads in a lexicon in integer format, where each line is (integer id of) word-in word-out phone1 phone2 ... phoneN (note: word-in is word before alignment, word-out is after, e.g. for replacing <eps> with SIL or vice versa) This may be more efficient if you first apply 'lattice-push'. Usage: lattice-align-words-lexicon [options] <lexicon-file> <model> <lattice-rspecifier> <lattice-wspecifier> e.g.: lattice-align-words-lexicon --partial-word-label=4324 --max-expand 10.0 --test true \ data/lang/phones/align_lexicon.int final.mdl ark:1.lats ark:aligned.lats See also: lattice-align-words, which is only applicable if your phones have word-position markers, i.e. each phone comes in 5 versions like AA_B, AA_I, AA_W, AA_S, AA. |
lattice-push | Push lattices, in CompactLattice format, so that the strings are as close to the start as possible, and the lowest cost weight for each state except the start state is (0, 0). This can be helpful prior to word-alignment (in this case, only strings need to be pushed) Usage: lattice-push [options] lattice-rspecifier lattice-wspecifier e.g.: lattice-push ark:1.lats ark:2.lats |
lattice-minimize | Minimize lattices, in CompactLattice format. Should be applied to determinized lattices (e.g. produced with --determinize-lattice=true) Note: by default this program pushes the strings and weights prior to minimization.Usage: lattice-minimize [options] lattice-rspecifier lattice-wspecifier e.g.: lattice-minimize ark:1.lats ark:2.lats |
lattice-limit-depth | Limit the number of arcs crossing any frame, to a specified maximum. Requires an acoustic scale, because forward-backward Viterbi probs are needed, which will be affected by this. Usage: lattice-limit-depth [options] <lattice-rspecifier> <lattice-wspecifier> E.g.: lattice-limit-depth --max-arcs-per-frame=1000 --acoustic-scale=0.1 ark:- ark:- |
lattice-depth-per-frame | For each lattice, compute a vector of length (num-frames) saying how may arcs cross each frame. See also lattice-depth Usage: lattice-depth-per-frame <lattice-rspecifier> <depth-wspecifier> [<lattice-wspecifier>] The final <lattice-wspecifier> allows you to write the input lattices out in case you want to do something else with them as part of the same pipe. E.g.: lattice-depth-per-frame ark:- ark,t:- |
lattice-confidence | Compute sentence-level lattice confidence measures for each lattice. The output is simly the difference between the total costs of the best and second-best paths in the lattice (or a very large value if the lattice had only one path). Caution: this is not necessarily a very good confidence measure. You almost certainly want to specify the acoustic scale. If the input is a state-level lattice, you need to specify --read-compact-lattice=false, or the confidences will be very small (and wrong). You can get word-level confidence info from lattice-mbr-decode. Usage: lattice-confidence <lattice-rspecifier> <confidence-wspecifier> E.g.: lattice-confidence --acoustic-scale=0.08333 ark:- ark,t:- |
lattice-determinize-phone-pruned | Determinize lattices, keeping only the best path (sequence of acoustic states) for each input-symbol sequence. This version does phone inertion when doing a first pass determinization, it then removes the inserted symbols and does a second pass determinization. It also does pruning as part of the determinization algorithm, which is more efficient and prevents blowup. Usage: lattice-determinize-phone-pruned [options] <model> \ <lattice-rspecifier> <lattice-wspecifier> e.g.: lattice-determinize-phone-pruned --acoustic-scale=0.1 \ final.mdl ark:in.lats ark:det.lats |
lattice-determinize-phone-pruned-parallel | Determinize lattices, keeping only the best path (sequence of acoustic states) for each input-symbol sequence. This is a version of lattice-determinize-phone-pruned that accepts the --num-threads option. The program does phone insertion when doing a first pass determinization, it then removes the inserted symbols and does a second pass determinization. It also does pruning as part of the determinization algorithm, which is more efficient and prevents blowup. Usage: lattice-determinize-phone-pruned-parallel [options] \ <model> <lattice-rspecifier> <lattice-wspecifier> e.g.: lattice-determinize-phone-pruned-parallel \ --acoustic-scale=0.1 final.mdl ark:in.lats ark:det.lats |
lattice-expand-ngram | Expand lattices so that each arc has a unique n-label history, for a specified n (defaults to 3). Usage: lattice-expand-ngram [options] lattice-rspecifier lattice-wspecifier e.g.: lattice-expand-ngram --n=3 ark:lat ark:expanded_lat |
lattice-lmrescore-const-arpa | Rescores lattice with the ConstArpaLm format language model. The LM will be wrapped into the DeterministicOnDemandFst interface and the rescoring is done by composing with the wrapped LM using a special type of composition algorithm. Determinization will be applied on the composed lattice. Usage: lattice-lmrescore-const-arpa [options] lattice-rspecifier \ const-arpa-in lattice-wspecifier e.g.: lattice-lmrescore-const-arpa --lm-scale=-1.0 ark:in.lats \ const_arpa ark:out.lats |
lattice-lmrescore-rnnlm | Rescores lattice with rnnlm. The LM will be wrapped into the DeterministicOnDemandFst interface and the rescoring is done by composing with the wrapped LM using a special type of composition algorithm. Determinization will be applied on the composed lattice. Usage: lattice-lmrescore-rnnlm [options] [unk_prob_rspecifier] \ <word-symbol-table-rxfilename> <lattice-rspecifier> \ <rnnlm-rxfilename> <lattice-wspecifier> e.g.: lattice-lmrescore-rnnlm --lm-scale=-1.0 words.txt \ ark:in.lats rnnlm ark:out.lats |
nbest-to-prons | Reads lattices which must be linear (single path), and must be in CompactLattice form where the transition-ids on the arcs have been aligned with the word boundaries (see lattice-align-words*) and outputs a vaguely ctm-like format where each line is of the form: <utterance-id> <begin-frame> <num-frames> <word> <phone1> <phone2> ... <phoneN> where the words and phones will both be written as integers. For arcs in the input lattice that don't correspond to words, <word> may be zero; this will typically be the case for the optional silences. Usage: nbest-to-prons [options] <model> <aligned-linear-lattice-rspecifier> <output-wxfilename> e.g.: lattice-1best --acoustic-weight=0.08333 ark:1.lats | \ lattice-align-words data/lang/phones/word_boundary.int exp/dir/final.mdl ark:- ark:- | \ nbest-to-prons exp/dir/final.mdl ark:- 1.prons Note: the type of the model doesn't matter as only the transition-model is read. |
lattice-arc-post | Print out information regarding posteriors of lattice arcs This program computes posteriors from a lattice and prints out information for each arc (the format is reminiscent of ctm, but contains information from multiple paths). Each line is: <utterance-id> <start-frame> <num-frames> <posterior> <word> [<ali>] [<phone1> <phone2>...] for instance: 2013a04-bk42\t104\t26\t0.95,242,242,242,71,894,894,62,63,63,63,63 8 9 where the --print-alignment option determines whether the alignments (i.e. the sequences of transition-ids) are printed, and the phones are printed only if the <model> is supplied on the command line. Note, there are tabs between the major fields, but the phones are separated by spaces. Usage: lattice-arc-post [<model>] <lattices-rspecifier> <output-wxfilename> e.g.: lattice-arc-post --acoustic-scale=0.1 final.mdl 'ark:gunzip -c lat.1.gz|' post.txt You will probably want to word-align the lattices (e.g. lattice-align-words or lattice-align-words-lexicon) before this program, apply an acoustic scale either via the --acoustic-scale option or using lattice-scale. See also: lattice-post, lattice-to-ctm-conf, nbest-to-ctm |
lattice-determinize-non-compact | lattice-determinize lattices (and apply a pruning beam) (see http://kaldi-asr.org/doc/lattices.html for more explanation) This version of the program retains the original acoustic scores of arcs in the determinized lattice and writes it as a normal (non-compact) lattice. note: this program is tyically only useful if you generated state-level lattices, e.g. called gmm-latgen-simple with --determinize=false Usage: lattice-determinize-non-compact [options] lattice-rspecifier lattice-wspecifier e.g.: lattice-determinize-non-compact --acoustic-scale=0.1 --beam=15.0 ark:1.lats ark:det.lats |
lattice-lmrescore-kaldi-rnnlm | Rescores lattice with kaldi-rnnlm. This script is called from scripts/rnnlm/lmrescore.sh. An example for rescoring lattices is at egs/swbd/s5c/local/rnnlm/run_lstm.sh Usage: lattice-lmrescore-kaldi-rnnlm [options] \ <embedding-file> <raw-rnnlm-rxfilename> \ <lattice-rspecifier> <lattice-wspecifier> e.g.: lattice-lmrescore-kaldi-rnnlm --lm-scale=-1.0 \ word_embedding.mat \ --bos-symbol=1 --eos-symbol=2 \ final.raw ark:in.lats ark:out.lats |
lattice-lmrescore-pruned | This program can be used to subtract scores from one language model and add scores from another one. It uses an efficient rescoring algorithm that avoids exploring the entire composed lattice. The first (negative-weight) language model is expected to be an FST, e.g. G.fst; the second one can either be in FST or const-arpa format. Any FST-format language models will be projected on their output by this program, making it unnecessary for the caller to remove disambiguation symbols. Usage: lattice-lmrescore-pruned [options] <lm-to-subtract> <lm-to-add> <lattice-rspecifier> <lattice-wspecifier> e.g.: lattice-lmrescore-pruned --acoustic-scale=0.1 \ data/lang/G.fst data/lang_fg/G.fst ark:in.lats ark:out.lats or: lattice-lmrescore-pruned --acoustic-scale=0.1 --add-const-arpa=true\ data/lang/G.fst data/lang_fg/G.carpa ark:in.lats ark:out.lats |
lattice-lmrescore-kaldi-rnnlm-pruned | Rescores lattice with kaldi-rnnlm. This script is called from scripts/rnnlm/lmrescore_pruned.sh. An example for rescoring lattices is at egs/swbd/s5c/local/rnnlm/run_lstm.sh Usage: lattice-lmrescore-kaldi-rnnlm-pruned [options] \ <old-lm-rxfilename> <embedding-file> \ <raw-rnnlm-rxfilename> \ <lattice-rspecifier> <lattice-wspecifier> e.g.: lattice-lmrescore-kaldi-rnnlm-pruned --lm-scale=-1.0 fst_words.txt \ --bos-symbol=1 --eos-symbol=2 \ data/lang_test/G.fst word_embedding.mat \ final.raw ark:in.lats ark:out.lats lattice-lmrescore-kaldi-rnnlm-pruned --lm-scale=-1.0 fst_words.txt \ --bos-symbol=1 --eos-symbol=2 \ data/lang_test_fg/G.carpa word_embedding.mat \ final.raw ark:in.lats ark:out.lats |
lattice-reverse | Reverse a lattice in order to rescore the lattice with a RNNLM trained reversed text. An example for its application is at swbd/local/rnnlm/run_lstm_tdnn_back.sh Usage: lattice-reverse lattice-rspecifier lattice-wspecifier e.g.: lattice-reverse ark:forward.lats ark:backward.lats |
arpa2fst | Convert an ARPA format language model into an FST Usage: arpa2fst [opts] <input-arpa> <output-fst> e.g.: arpa2fst --disambig-symbol=#0 --read-symbol-table=data/lang/words.txt lm/input.arpa G.fst Note: When called without switches, the output G.fst will contain an embedded symbol table. This is compatible with the way a previous version of arpa2fst worked. |
arpa-to-const-arpa | Converts an Arpa format language model into ConstArpaLm format, which is an in-memory representation of the pre-built Arpa language model. The output language model can then be read in by a program that wants to rescore lattices. We assume that the words in the input arpa language model has been converted to integers. The program is used jointly with utils/map_arpa_lm.pl to build ConstArpaLm format language model. We first map the words in an Arpa format language model to integers using utils/map_arpa_m.pl, and then use this program to build a ConstArpaLm format language model. Usage: arpa-to-const-arpa [opts] <input-arpa> <const-arpa> e.g.: arpa-to-const-arpa --bos-symbol=1 --eos-symbol=2 \ arpa.txt const_arpa |
nnet-am-info | Print human-readable information about the neural network acoustic model to the standard output Usage: nnet-am-info [options] <nnet-in> e.g.: nnet-am-info 1.nnet |
nnet-init | Initialize the nnet2 neural network from a config file with a line for each component. Note, this only outputs the neural net itself, not the associated information such as the transition-model; you'll probably want to pipe the output into something like nnet-am-init. Usage: nnet-init [options] <config-in> <raw-nnet-out> e.g.: nnet-init nnet.config 1.raw |
nnet-train-simple | Train the neural network parameters with backprop and stochastic gradient descent using minibatches. Training examples would be produced by nnet-get-egs. Usage: nnet-train-simple [options] <model-in> <training-examples-in> <model-out> e.g.: nnet-train-simple 1.nnet ark:1.egs 2.nnet |
nnet-train-ensemble | Train an ensemble of neural networks with backprop and stochastic gradient descent using minibatches. Modified version of nnet-train-simple. Implements parallel gradient descent with a term that encourages the nnets to produce similar outputs. Usage: nnet-train-ensemble [options] <model-in-1> <model-in-2> ... <model-in-n> <training-examples-in> <model-out-1> <model-out-2> ... <model-out-n> e.g.: nnet-train-ensemble 1.1.nnet 2.1.nnet ark:egs.ark 2.1.nnet 2.2.nnet |
nnet-train-transitions | Train the transition probabilities of a neural network acoustic model Usage: nnet-train-transitions [options] <nnet-in> <alignments-rspecifier> <nnet-out> e.g.: nnet-train-transitions 1.nnet "ark:gunzip -c ali.*.gz|" 2.nnet |
nnet-latgen-faster | Generate lattices using neural net model. Usage: nnet-latgen-faster [options] <nnet-in> <fst-in|fsts-rspecifier> <features-rspecifier> <lattice-wspecifier> [ <words-wspecifier> [<alignments-wspecifier>] ] |
nnet-am-copy | Copy a (nnet2) neural net and its associated transition model, possibly changing the binary mode Also supports multiplying all the learning rates by a factor (the --learning-rate-factor option) and setting them all to a given value (the --learning-rate options) Usage: nnet-am-copy [options] <nnet-in> <nnet-out> e.g.: nnet-am-copy --binary=false 1.mdl text.mdl |
nnet-am-init | Initialize the neural network acoustic model and its associated transition-model, from a tree, a topology file, and a neural-net without an associated acoustic model. See example scripts to see how this works in practice. Usage: nnet-am-init [options] <tree-in> <topology-in> <raw-nnet-in> <nnet-am-out> or: nnet-am-init [options] <transition-model-in> <raw-nnet-in> <nnet-am-out> e.g.: nnet-am-init tree topo "nnet-init nnet.config - |" 1.mdl |
nnet-insert | Insert components into a neural network-based acoustic model. This is mostly intended for adding new hidden layers to neural networks. You can either specify the option --insert-at=n (specifying the index of the component after which you want your neural network inserted), or by default this program will insert it just before the component before the softmax component. CAUTION: It will also randomize the parameters of the component before the softmax (typically AffineComponent), with stddev equal to the --stddev-factor option (default 0.1), times the inverse square root of the number of inputs to that component. Set --randomize-next-component=false to turn this off. Usage: nnet-insert [options] <nnet-in> <raw-nnet-to-insert-in> <nnet-out> e.g.: nnet-insert 1.nnet "nnet-init hidden_layer.config -|" 2.nnet |
nnet-align-compiled | Align features given neural-net-based model Usage: nnet-align-compiled [options] <model-in> <graphs-rspecifier> <feature-rspecifier> <alignments-wspecifier> e.g.: nnet-align-compiled 1.mdl ark:graphs.fsts scp:train.scp ark:1.ali or: compile-train-graphs tree 1.mdl lex.fst 'ark:sym2int.pl -f 2- words.txt text|' \ ark:- | nnet-align-compiled 1.mdl ark:- scp:train.scp t, ark:1.ali |
nnet-compute-prob | Computes and prints the average log-prob per frame of the given data with a neural net. The input of this is the output of e.g. nnet-get-egs Aside from the logging output, which goes to the standard error, this program prints the average log-prob per frame to the standard output. Also see nnet-logprob, which produces a matrix of log-probs for each utterance. Usage: nnet-compute-prob [options] <model-in> <training-examples-in> e.g.: nnet-compute-prob 1.nnet ark:valid.egs |
nnet-copy-egs | Copy examples (typically single frames) for neural network training, possibly changing the binary mode. Supports multiple wspecifiers, in which case it will write the examples round-robin to the outputs. Usage: nnet-copy-egs [options] <egs-rspecifier> <egs-wspecifier1> [<egs-wspecifier2> ...] e.g. nnet-copy-egs ark:train.egs ark,t:text.egs or: nnet-copy-egs ark:train.egs ark:1.egs ark:2.egs |
nnet-combine | Using a validation set, compute an optimal combination of a number of neural nets (the combination weights are separate for each layer and do not have to sum to one). The optimization is BFGS, which is initialized from the best of the individual input neural nets (or as specified by --initial-model) Usage: nnet-combine [options] <model-in1> <model-in2> ... <model-inN> <valid-examples-in> <model-out> e.g.: nnet-combine 1.1.nnet 1.2.nnet 1.3.nnet ark:valid.egs 2.nnet Caution: the first input neural net must not be a gradient. |
nnet-am-average | This program averages (or sums, if --sum=true) the parameters over a number of neural nets. If you supply the option --skip-last-layer=true, the parameters of the last updatable layer are copied from <model1> instead of being averaged (useful in multi-language scenarios). The --weights option can be used to weight each model differently. Usage: nnet-am-average [options] <model1> <model2> ... <modelN> <model-out> e.g.: nnet-am-average 1.1.nnet 1.2.nnet 1.3.nnet 2.nnet |
nnet-am-compute | Does the neural net computation for each file of input features, and outputs as a matrix the result. Used mostly for debugging. Note: if you want it to apply a log (e.g. for log-likelihoods), use --apply-log=true Usage: nnet-am-compute [options] <model-in> <feature-rspecifier> <feature-or-loglikes-wspecifier> See also: nnet-compute, nnet-logprob |
nnet-am-mixup | Add mixture-components to a neural net (comparable to mixtures in a Gaussian mixture model). Number of mixture components must be greater than the number of pdfs Usage: nnet-am-mixup [options] <nnet-in> <nnet-out> e.g.: nnet-am-mixup --power=0.3 --num-mixtures=5000 1.mdl 2.mdl |
nnet-get-egs | Get frame-by-frame examples of data for neural network training. Essentially this is a format change from features and posteriors into a special frame-by-frame format. To split randomly into different subsets, do nnet-copy-egs with --random=true, but note that this does not randomize the order of frames. Usage: nnet-get-egs [options] <features-rspecifier> <pdf-post-rspecifier> <training-examples-out> An example [where $feats expands to the actual features]: nnet-get-egs --left-context=8 --right-context=8 "$feats" \ "ark:gunzip -c exp/nnet/ali.1.gz | ali-to-pdf exp/nnet/1.nnet ark:- ark:- | ali-to-post ark:- ark:- |" \ ark:- Note: the --left-context and --right-context would be derived from the output of nnet-info. |
nnet-train-parallel | Train the neural network parameters with backprop and stochastic gradient descent using minibatches. As nnet-train-simple, but uses multiple threads in a Hogwild type of update (for CPU, not GPU). Usage: nnet-train-parallel [options] <model-in> <training-examples-in> <model-out> e.g.: nnet-train-parallel --num-threads=8 1.nnet ark:1.1.egs 2.nnet |
nnet-combine-fast | Using a validation set, compute an optimal combination of a number of neural nets (the combination weights are separate for each layer and do not have to sum to one). The optimization is BFGS, which is initialized from the best of the individual input neural nets (or as specified by --initial-model) Usage: nnet-combine-fast [options] <model-in1> <model-in2> ... <model-inN> <valid-examples-in> <model-out> e.g.: nnet-combine-fast 1.1.nnet 1.2.nnet 1.3.nnet ark:valid.egs 2.nnet Caution: the first input neural net must not be a gradient. |
nnet-subset-egs | Creates a random subset of the input examples, of a specified size. Uses no more memory than the size of the subset. Usage: nnet-subset-egs [options] <egs-rspecifier> [<egs-wspecifier2> ...] e.g. nnet-subset-egs [args] ark:- | nnet-subset-egs --n=1000 ark:- ark:subset.egs |
nnet-shuffle-egs | Copy examples (typically single frames) for neural network training, from the input to output, but randomly shuffle the order. This program will keep all of the examples in memory at once, unless you use the --buffer-size option Usage: nnet-shuffle-egs [options] <egs-rspecifier> <egs-wspecifier> nnet-shuffle-egs --srand=1 ark:train.egs ark:shuffled.egs |
nnet-am-fix | Copy a (cpu-based) neural net and its associated transition model, but modify it to remove certain pathologies. We use the average derivative statistics stored with the layers derived from NonlinearComponent. Note: some processes, such as nnet-combine-fast, may not process these statistics correctly, and you may have to recover them using the --stats-from option of nnet-am-copy before you use. this program. Usage: nnet-am-fix [options] <nnet-in> <nnet-out> e.g.: nnet-am-fix 1.mdl 1_fixed.mdl or: nnet-am-fix --get-counts-from=1.gradient 1.mdl 1_shrunk.mdl |
nnet-latgen-faster-parallel | Generate lattices using neural net model. Usage: nnet-latgen-faster-parallel [options] <nnet-in> <fst-in|fsts-rspecifier> <features-rspecifier> <lattice-wspecifier> [ <words-wspecifier> [<alignments-wspecifier>] ] |
nnet-to-raw-nnet | Copy a (cpu-based) neural net: reads the AmNnet with its transition model, but writes just the Nnet with no transition model (i.e. the raw neural net.) Usage: nnet-to-raw-nnet [options] <nnet-in> <raw-nnet-out> e.g.: nnet-to-raw-nnet --binary=false 1.mdl 1.raw |
nnet-compute | Does the neural net computation for each file of input features, and outputs as a matrix the result. Used mostly for debugging. Note: if you want it to apply a log (e.g. for log-likelihoods), use --apply-log=true. Unlike nnet-am-compute, this version reads a 'raw' neural net Usage: nnet-compute [options] <raw-nnet-in> <feature-rspecifier> <feature-or-loglikes-wspecifier> |
raw-nnet-concat | Concatenate two 'raw' neural nets, e.g. as output by nnet-init or nnet-to-raw-nnet Usage: raw-nnet-concat [options] <raw-nnet-in1> <raw-nnet-in2> <raw-nnet-out> e.g.: raw-nnet-concat nnet1 nnet2 nnet_concat |
raw-nnet-info | Print human-readable information about the raw neural network to the standard output Usage: raw-nnet-info [options] <nnet-in> e.g.: raw-nnet-info 1.nnet |
nnet-get-feature-transform | Get feature-projection transform using stats obtained with acc-lda. See comments in the code of nnet2/get-feature-transform.h for more information. Usage: nnet-get-feature-transform [options] <matrix-out> <lda-acc-1> <lda-acc-2> ... |
nnet-compute-from-egs | Does the neural net computation, taking as input the nnet-training examples (typically an archive with the extension .egs), ignoring the labels; it outputs as a matrix the result. Used mostly for debugging. Usage: nnet-compute-from-egs [options] <raw-nnet-in> <egs-rspecifier> <feature-wspecifier> e.g.: nnet-compute-from-egs 'nnet-to-raw-nnet final.mdl -|' egs.10.1.ark ark:- |
nnet-am-widen | Copy a (cpu-based) neural net and its associated transition model, possibly changing the binary mode Also supports multiplying all the learning rates by a factor (the --learning-rate-factor option) and setting them all to a given value (the --learning-rate options) Usage: nnet-am-widen [options] <nnet-in> <nnet-out> e.g.: nnet-am-widen --hidden-layer-dim=1024 1.mdl 2.mdl |
nnet-show-progress | Given an old and a new model and some training examples (possibly held-out), show the average objective function given the mean of the two models, and the breakdown by component of why this happened (computed from derivative information). Also shows parameter differences per layer. If training examples not provided, only shows parameter differences per layer. Usage: nnet-show-progress [options] <old-model-in> <new-model-in> [<training-examples-in>] e.g.: nnet-show-progress 1.nnet 2.nnet ark:valid.egs |
nnet-get-feature-transform-multi | Get feature-projection transform using stats obtained with acc-lda. The file <index-list> contains a series of line, each containing a list of integer indexes. For each line we create a transform of the same type as nnet-get-feature-transform would produce, taking as input just the listed feature dimensions. The output transform will be the concatenation of all these transforms. The output-dim will be the number of integers in the file <index-list> (the individual transforms are not dimension-reducing). Do not set the --dim option.Usage: nnet-get-feature-transform-multi [options] <index-list> <lda-acc-1> <lda-acc-2> ... <lda-acc-n> <matrix-out> |
nnet-copy-egs-discriminative | Copy examples for discriminative neural network training. Supports multiple wspecifiers, in which case it will write the examples round-robin to the outputs. Usage: nnet-copy-egs-discriminative [options] <egs-rspecifier> <egs-wspecifier1> [<egs-wspecifier2> ...] e.g. nnet-copy-egs-discriminative ark:train.degs ark,t:text.degs or: nnet-copy-egs-discriminative ark:train.degs ark:1.degs ark:2.degs |
nnet-get-egs-discriminative | Get examples of data for discriminative neural network training; each one corresponds to part of a file, of variable (and configurable) length. Usage: nnet-get-egs-discriminative [options] <model> <features-rspecifier> <ali-rspecifier> <den-lat-rspecifier> <training-examples-out> An example [where $feats expands to the actual features]: nnet-get-egs-discriminative --acoustic-scale=0.1 \ 1.mdl '$feats' 'ark,s,cs:gunzip -c ali.1.gz|' 'ark,s,cs:gunzip -c lat.1.gz|' ark:1.degs |
nnet-shuffle-egs-discriminative | Copy examples (typically single frames) for neural network training, from the input to output, but randomly shuffle the order. This program will keep all of the examples in memory at once, so don't give it too many. Usage: nnet-shuffle-egs-discriminative [options] <egs-rspecifier> <egs-wspecifier> nnet-shuffle-egs-discriminative --srand=1 ark:train.degs ark:shuffled.degs |
nnet-compare-hash-discriminative | Compares two archives of discriminative training examples and checks that they behave the same way for purposes of discriminative training. This program was created as a way of testing nnet-get-egs-discriminative The model is only needed for its transition-model. Usage: nnet-compare-hash-discriminative [options] <model-rxfilename> <egs-rspecifier1> <egs-rspecifier2> Note: options --drop-frames and --criterion should be matched with the command line of nnet-get-egs-discriminative used to get the examples nnet-compare-hash-discriminative --drop-frames=true --criterion=mmi ark:1.degs ark:2.degs |
nnet-combine-egs-discriminative | Copy examples for discriminative neural network training, and combine successive examples if their combined length will be less than --max-length. This can help to improve efficiency (--max-length corresponds to minibatch size) Usage: nnet-combine-egs-discriminative [options] <egs-rspecifier> <egs-wspecifier> e.g. nnet-combine-egs-discriminative --max-length=512 ark:temp.1.degs ark:1.degs |
nnet-train-discriminative-simple | Train the neural network parameters with a discriminative objective function (MMI, SMBR or MPFE). This uses training examples prepared with nnet-get-egs-discriminative Usage: nnet-train-discriminative-simple [options] <model-in> <training-examples-in> <model-out> e.g.: nnet-train-discriminative-simple 1.nnet ark:1.degs 2.nnet |
nnet-train-discriminative-parallel | Train the neural network parameters with a discriminative objective function (MMI, SMBR or MPFE). This uses training examples prepared with nnet-get-egs-discriminative This version uses multiple threads (but no GPU) Usage: nnet-train-discriminative-parallel [options] <model-in> <training-examples-in> <model-out> e.g.: nnet-train-discriminative-parallel --num-threads=8 1.nnet ark:1.degs 2.nnet |
nnet-modify-learning-rates | This program modifies the learning rates so as to equalize the relative changes in parameters for each layer, while keeping their geometric mean the same (or changing it to a value specified using the --average-learning-rate option). Usage: nnet-modify-learning-rates [options] <prev-model> \ <cur-model> <modified-cur-model> e.g.: nnet-modify-learning-rates --average-learning-rate=0.0002 \ 5.mdl 6.mdl 6.mdl |
nnet-normalize-stddev | This program first identifies any affine or block affine layers that are followed by pnorm and then renormalize layers. Then it rescales those layers such that the parameter stddev is 1.0 after scaling (the target stddev is configurable by the --stddev option). If you supply the option --stddev-from=<model-filename>, it rescales those layers to match the standard deviations of corresponding layers in the specified model. Usage: nnet-normalize-stddev [options] <model-in> <model-out> e.g.: nnet-normalize-stddev final.mdl final.mdl |
nnet-get-weighted-egs | Get frame-by-frame examples of data for neural network training. Essentially this is a format change from features and posteriors into a special frame-by-frame format. To split randomly into different subsets, do nnet-copy-egs with --random=true, but note that this does not randomize the order of frames. Usage: nnet-get-weighted-egs [options] <features-rspecifier> <pdf-post-rspecifier> <weights-rspecifier> <training-examples-out> An example [where $feats expands to the actual features]: nnet-get-weighted-egs --left-context=8 --right-context=8 "$feats" \ "ark:gunzip -c exp/nnet/ali.1.gz | ali-to-pdf exp/nnet/1.nnet ark:- ark:- | ali-to-post ark:- ark:- |" \ ark:- Note: the --left-context and --right-context would be derived from the output of nnet-info. |
nnet-adjust-priors | Set the priors of the neural net to the computed posterios from the net, on typical data (e.g. training data). This is correct under more general circumstances than using the priors of the class labels in the training data Typical usage of this program will involve computation of an average pdf-level posterior with nnet-compute or nnet-compute-from-egs, piped into matrix-sum-rows and then vector-sum, to compute the average posterior Usage: nnet-adjust-priors [options] <nnet-in> <summed-posterior-vector-in> <nnet-out> e.g.: nnet-adjust-priors final.mdl prior.vec final.mdl |
nnet-replace-last-layers | This program is for adding new layers to a neural-network acoustic model. It removes the last --remove-layers layers, and adds the layers from the supplied raw-nnet. The typical use is to remove the last two layers (the softmax, and the affine component before it), and add in replacements for them newly initialized by nnet-init. This program is a more flexible way of adding layers than nnet-insert, but the inserted network needs to contain replacements for the removed layers. Usage: nnet-replace-last-layers [options] <nnet-in> <raw-nnet-to-insert-in> <nnet-out> e.g.: nnet-replace-last-layers 1.nnet "nnet-init hidden_layer.config -|" 2.nnet |
nnet-am-switch-preconditioning | Copy a (cpu-based) neural net and its associated transition model, and switch it to online preconditioning, i.e. change any components derived from AffineComponent to components of type AffineComponentPreconditionedOnline. Usage: nnet-am-switch-preconditioning [options] <nnet-in> <nnet-out> e.g.: nnet-am-switch-preconditioning --binary=false 1.mdl text.mdl |
nnet1-to-raw-nnet | Convert nnet1 neural net to nnet2 'raw' neural net Usage: nnet1-to-raw-nnet [options] <nnet1-in> <nnet2-out> e.g.: nnet1-to-raw-nnet srcdir/final.nnet - | nnet-am-init dest/tree dest/topo - dest/0.mdl |
raw-nnet-copy | Copy a raw neural net (this version works on raw nnet2 neural nets, without the transition model. Supports the 'truncate' option. Usage: raw-nnet-copy [options] <raw-nnet-in> <raw-nnet-out> e.g.: raw-nnet-copy --binary=false 1.mdl text.mdl See also: nnet-to-raw-nnet, nnet-am-copy |
nnet-relabel-egs | Relabel neural network egs with the read pdf-id alignments, zero-based.. Usage: nnet-relabel-egs [options] <pdf-aligment-rspecifier> <egs_rspecifier1> ... <egs_rspecifierN> <egs_wspecifier1> ... <egs_wspecifierN> e.g.: nnet-relabel-egs ark:1.ali egs_in/egs.1.ark egs_in/egs.2.ark egs_out/egs.1.ark egs_out/egs.2.ark See also: nnet-get-egs, nnet-copy-egs, steps/nnet2/relabel_egs.sh |
nnet-am-reinitialize | This program can used when transferring a neural net from one language to another (or one tree to another). It takes a neural net and a transition model from a different neural net, resizes the last layer to match the new transition model, zeroes it, and writes out the new, resized .mdl file. If the original model had been 'mixed-up', the associated SumGroupComponent will be removed. Usage: nnet-am-reinitialize [options] <nnet-in> <new-transition-model> <nnet-out> e.g.: nnet-am-reinitialize 1.mdl exp/tri6/final.mdl 2.mdl |
nnet3-init | Initialize nnet3 neural network from a config file; outputs 'raw' nnet without associated information such as transition model and priors. Search for examples in scripts in /egs/wsj/s5/steps/nnet3/ Can also be used to add layers to existing model (provide existing model as 1st arg) Usage: nnet3-init [options] [<existing-model-in>] <config-in> <raw-nnet-out> e.g.: nnet3-init nnet.config 0.raw or: nnet3-init 1.raw nnet.config 2.raw See also: nnet3-copy, nnet3-info |
nnet3-info | Print some text information about 'raw' nnet3 neural network, to standard output Usage: nnet3-info [options] <raw-nnet> e.g.: nnet3-info 0.raw See also: nnet3-am-info |
nnet3-get-egs | Get frame-by-frame examples of data for nnet3 neural network training. Essentially this is a format change from features and posteriors into a special frame-by-frame format. This program handles the common case where you have some input features, possibly some iVectors, and one set of labels. If people in future want to do different things they may have to extend this program or create different versions of it for different tasks (the egs format is quite general) Usage: nnet3-get-egs [options] <features-rspecifier> <pdf-post-rspecifier> <egs-out> An example [where $feats expands to the actual features]: nnet3-get-egs --num-pdfs=2658 --left-context=12 --right-context=9 --num-frames=8 "$feats"\ "ark:gunzip -c exp/nnet/ali.1.gz | ali-to-pdf exp/nnet/1.nnet ark:- ark:- | ali-to-post ark:- ark:- |" \ ark:- See also: nnet3-chain-get-egs, nnet3-get-egs-simple |
nnet3-copy-egs | Copy examples (single frames or fixed-size groups of frames) for neural network training, possibly changing the binary mode. Supports multiple wspecifiers, in which case it will write the examples round-robin to the outputs. Usage: nnet3-copy-egs [options] <egs-rspecifier> <egs-wspecifier1> [<egs-wspecifier2> ...] e.g. nnet3-copy-egs ark:train.egs ark,t:text.egs or: nnet3-copy-egs ark:train.egs ark:1.egs ark:2.egs See also: nnet3-subset-egs, nnet3-get-egs, nnet3-merge-egs, nnet3-shuffle-egs |
nnet3-subset-egs | Creates a random subset of the input examples, of a specified size. Uses no more memory than the size of the subset. Usage: nnet3-subset-egs [options] <egs-rspecifier> [<egs-wspecifier2> ...] e.g. nnet3-copy-egs [args] ark:egs.1.ark ark:- | nnet-subset-egs --n=1000 ark:- ark:subset.egs |
nnet3-shuffle-egs | Copy examples (typically single frames or small groups of frames) for neural network training, from the input to output, but randomly shuffle the order. This program will keep all of the examples in memory at once, unless you use the --buffer-size option Usage: nnet3-shuffle-egs [options] <egs-rspecifier> <egs-wspecifier> nnet3-shuffle-egs --srand=1 ark:train.egs ark:shuffled.egs |
nnet3-acc-lda-stats | Accumulate statistics in the same format as acc-lda (i.e. stats for estimation of LDA and similar types of transform), starting from nnet training examples. This program puts the features through the network, and the network output will be the features; the supervision in the training examples is used for the class labels. Used in obtaining feature transforms that help nnet training work better. Usage: nnet3-acc-lda-stats [options] <raw-nnet-in> <training-examples-in> <lda-stats-out> e.g.: nnet3-acc-lda-stats 0.raw ark:1.egs 1.acc See also: nnet-get-feature-transform |
nnet3-merge-egs | This copies nnet training examples from input to output, but while doing so it merges many NnetExample objects into one, forming a minibatch consisting of a single NnetExample. Usage: nnet3-merge-egs [options] <egs-rspecifier> <egs-wspecifier> e.g. nnet3-merge-egs --minibatch-size=512 ark:1.egs ark:- | nnet3-train-simple ... See also nnet3-copy-egs |
nnet3-compute-from-egs | Read input nnet training examples, and compute the output for each one. If --apply-exp=true, apply the Exp() function to the output before writing it out. Usage: nnet3-compute-from-egs [options] <raw-nnet-in> <training-examples-in> <matrices-out> e.g.: nnet3-compute-from-egs --apply-exp=true 0.raw ark:1.egs ark:- | matrix-sum-rows ark:- ... See also: nnet3-compute |
nnet3-train | Train nnet3 neural network parameters with backprop and stochastic gradient descent. Minibatches are to be created by nnet3-merge-egs in the input pipeline. This training program is single-threaded (best to use it with a GPU); see nnet3-train-parallel for multi-threaded training that is better suited to CPUs. Usage: nnet3-train [options] <raw-model-in> <training-examples-in> <raw-model-out> e.g.: nnet3-train 1.raw 'ark:nnet3-merge-egs 1.egs ark:-|' 2.raw |
nnet3-am-init | Initialize nnet3 am-nnet (i.e. neural network-based acoustic model, with associated transition model) from an existing transition model and nnet.. Search for examples in scripts in /egs/wsj/s5/steps/nnet3/ Set priors using nnet3-am-train-transitions or nnet3-am-adjust-priors Usage: nnet3-am-init [options] <tree-in> <topology-in> <input-raw-nnet> <output-am-nnet> or: nnet3-am-init [options] <trans-model-in> <input-raw-nnet> <output-am-nnet> e.g.: nnet3-am-init tree topo 0.raw 0.mdl See also: nnet3-init, nnet3-am-copy, nnet3-am-info, nnet3-am-train-transitions, nnet3-am-adjust-priors |
nnet3-am-train-transitions | Train the transition probabilities of an nnet3 neural network acoustic model Usage: nnet3-am-train-transitions [options] <nnet-in> <alignments-rspecifier> <nnet-out> e.g.: nnet3-am-train-transitions 1.nnet "ark:gunzip -c ali.*.gz|" 2.nnet |
nnet3-am-adjust-priors | Set the priors of the nnet3 neural net to the computed posterios from the net, on typical data (e.g. training data). This is correct under more general circumstances than using the priors of the class labels in the training data Typical usage of this program will involve computation of an average pdf-level posterior with nnet3-compute or nnet3-compute-from-egs, piped into matrix-sum-rows and then vector-sum, to compute the average posterior Usage: nnet3-am-adjust-priors [options] <nnet-in> <summed-posterior-vector-in> <nnet-out> e.g.: nnet3-am-adjust-priors final.mdl counts.vec final.mdl |
nnet3-am-copy | Copy nnet3 neural-net acoustic model file; supports conversion to raw model (--raw=true). Also supports setting all learning rates to a supplied value (the --learning-rate option), and supports replacing the raw nnet in the model (the Nnet) with a provided raw nnet (the --set-raw-nnet option) Usage: nnet3-am-copy [options] <nnet-in> <nnet-out> e.g.: nnet3-am-copy --binary=false 1.mdl text.mdl nnet3-am-copy --raw=true 1.mdl 1.raw |
nnet3-compute-prob | Computes and prints in logging messages the average log-prob per frame of the given data with an nnet3 neural net. The input of this is the output of e.g. nnet3-get-egs | nnet3-merge-egs. Usage: nnet3-compute-prob [options] <raw-model-in> <training-examples-in> e.g.: nnet3-compute-prob 0.raw ark:valid.egs |
nnet3-average | This program averages the parameters over a number of 'raw' nnet3 neural nets. Usage: nnet3-average [options] <model1> <model2> ... <modelN> <model-out> e.g.: nnet3-average 1.1.nnet 1.2.nnet 1.3.nnet 2.nnet |
nnet3-am-info | Print some text information about an nnet3 neural network, to standard output Usage: nnet3-am-info [options] <nnet> e.g.: nnet3-am-info 0.mdl See also: nnet3-am-info |
nnet3-combine | Using a subset of training or held-out examples, compute the average over the first n nnet3 models where we maxize the objective function for n. Note that the order of models has been reversed before being fed into this binary. So we are actually combining last n models. Inputs and outputs are 'raw' nnets. Usage: nnet3-combine [options] <nnet-in1> <nnet-in2> ... <nnet-inN> <valid-examples-in> <nnet-out> e.g.: nnet3-combine 1.1.raw 1.2.raw 1.3.raw ark:valid.egs 2.raw |
nnet3-latgen-faster | Generate lattices using nnet3 neural net model. Usage: nnet3-latgen-faster [options] <nnet-in> <fst-in|fsts-rspecifier> <features-rspecifier> <lattice-wspecifier> [ <words-wspecifier> [<alignments-wspecifier>] ] See also: nnet3-latgen-faster-parallel, nnet3-latgen-faster-batch |
nnet3-latgen-faster-parallel | Generate lattices using nnet3 neural net model. This version supports multiple decoding threads (using a shared decoding graph.) Usage: nnet3-latgen-faster-parallel [options] <nnet-in> <fst-in|fsts-rspecifier> <features-rspecifier> <lattice-wspecifier> [ <words-wspecifier> [<alignments-wspecifier>] ] See also: nnet3-latgen-faster-batch (which supports GPUs) |
nnet3-show-progress | Given an old and a new 'raw' nnet3 network and some training examples (possibly held-out), show the average objective function given the mean of the two networks, and the breakdown by component of why this happened (computed from derivative information). Also shows parameter differences per layer. If training examples not provided, only shows parameter differences per layer. Usage: nnet3-show-progress [options] <old-net-in> <new-net-in> [<training-examples-in>] e.g.: nnet3-show-progress 1.nnet 2.nnet ark:valid.egs |
nnet3-align-compiled | Align features given nnet3 neural net model Usage: nnet3-align-compiled [options] <nnet-in> <graphs-rspecifier> <features-rspecifier> <alignments-wspecifier> e.g.: nnet3-align-compiled 1.mdl ark:graphs.fsts scp:train.scp ark:1.ali or: compile-train-graphs tree 1.mdl lex.fst 'ark:sym2int.pl -f 2- words.txt text|' \ ark:- | nnet3-align-compiled 1.mdl ark:- scp:train.scp t, ark:1.ali |
nnet3-copy | Copy 'raw' nnet3 neural network to standard output Also supports setting all the learning rates to a value (the --learning-rate option) Usage: nnet3-copy [options] <nnet-in> <nnet-out> e.g.: nnet3-copy --binary=false 0.raw text.raw |
nnet3-get-egs-dense-targets | Get frame-by-frame examples of data for nnet3 neural network training. This program is similar to nnet3-get-egs, but the targets here are dense matrices instead of posteriors (sparse matrices). This is useful when you want the targets to be continuous real-valued with the neural network possibly trained with a quadratic objective Usage: nnet3-get-egs-dense-targets --num-targets=<n> [options] <features-rspecifier> <targets-rspecifier> <egs-out> An example [where $feats expands to the actual features]: nnet-get-egs-dense-targets --num-targets=26 --left-context=12 \ --right-context=9 --num-frames=8 "$feats" \ "ark:copy-matrix ark:exp/snrs/snr.1.ark ark:- |" ark:- |
nnet3-compute | Propagate the features through raw neural network model and write the output. If --apply-exp=true, apply the Exp() function to the output before writing it out. Usage: nnet3-compute [options] <nnet-in> <features-rspecifier> <matrix-wspecifier> e.g.: nnet3-compute final.raw scp:feats.scp ark:nnet_prediction.ark See also: nnet3-compute-from-egs, nnet3-chain-compute-post Note: this program does not currently make very efficient use of the GPU. |
nnet3-discriminative-get-egs | Get frame-by-frame examples of data for nnet3+sequence neural network training. This involves breaking up utterances into pieces of sizes determined by the --num-frames option. Usage: nnet3-discriminative-get-egs [options] <model> <features-rspecifier> <denominator-lattice-rspecifier> <numerator-alignment-rspecifier> <egs-wspecifier> An example [where $feats expands to the actual features]: nnet3-discriminative-get-egs --left-context=25 --right-context=9 --num-frames=150,100,90 \ "$feats" "ark,s,cs:gunzip -c lat.1.gz" scp:ali.scp ark:degs.1.ark |
nnet3-discriminative-copy-egs | Copy examples for nnet3 discriminative training, possibly changing the binary mode. Supports multiple wspecifiers, in which case it will write the examples round-robin to the outputs. Usage: nnet3-discriminative-copy-egs [options] <egs-rspecifier> <egs-wspecifier1> [<egs-wspecifier2> ...] e.g. nnet3-discriminative-copy-egs ark:train.degs ark,t:text.degs or: nnet3-discriminative-copy-egs ark:train.degs ark:1.degs ark:2.degs |
nnet3-discriminative-merge-egs | This copies nnet3 discriminative training examples from input to output, merging them into composite examples. The --minibatch-size option controls how many egs are merged into a single output eg. Usage: nnet3-discriminative-egs [options] <egs-rspecifier> <egs-wspecifier> e.g. nnet3-discriminative-merge-egs --minibatch-size=128 ark:1.degs ark:- | nnet3-discriminative-train ... See also nnet3-discriminative-copy-egs |
nnet3-discriminative-shuffle-egs | Copy nnet3 discriminative training examples from the input to output, while randomly shuffling the order. This program will keep all of the examples in memory at once, unless you use the --buffer-size option Usage: nnet3-discriminative-shuffle-egs [options] <egs-rspecifier> <egs-wspecifier> nnet3-discriminative-shuffle-egs --srand=1 ark:train.egs ark:shuffled.egs |
nnet3-discriminative-compute-objf | Computes and prints to in logging messages the objective function per frame of the given data with an nnet3 neural net. The input of this is the output of e.g. nnet3-discriminative-get-egs | nnet3-discriminative-merge-egs. Usage: nnet3-discrminative-compute-objf [options] <nnet3-model-in> <training-examples-in> e.g.: nnet3-discriminative-compute-objf 0.mdl ark:valid.degs |
nnet3-discriminative-train | Train nnet3 neural network parameters with discriminative sequence objective gradient descent. Minibatches are to be created by nnet3-discriminative-merge-egs in the input pipeline. This training program is single-threaded (best to use it with a GPU). Usage: nnet3-discriminative-train [options] <nnet-in> <discriminative-training-examples-in> <raw-nnet-out> nnet3-discriminative-train 1.mdl 'ark:nnet3-merge-egs 1.degs ark:-|' 2.raw |
nnet3-discriminative-subset-egs | Creates a random subset of the input examples, of a specified size. Uses no more memory than the size of the subset. Usage: nnet3-discriminative-subset-egs [options] <degs-rspecifier> [<degs-wspecifier2> ...] e.g. nnet3-discriminative-copy-egs [args] ark:degs.1.ark ark:- | nnet-discriminative-subset-egs --n=1000 ark:- ark:subset.egs |
nnet3-get-egs-simple | Get frame-by-frame examples of data for nnet3 neural network training. This is like nnet3-get-egs, but does not split up its inputs into pieces and allows more general generation of egs. E.g. this is usable for image recognition tasks. Usage: nnet3-get-egs-simple [options] <name1>=<rspecifier1> <name2>=<rspecifier2> ... e.g.: nnet3-get-egs-simple input=scp:images.scp \ output='ark,o:ali-to-post ark:labels.txt ark:- | post-to-smat --dim=10 ark:- ark:-' ark:egs.ark See also: nnet3-get-egs |
nnet3-discriminative-compute-from-egs | Read input nnet discriminative training examples, and compute the output for each one. This program is similar to nnet3-compute-from-egs, but works with discriminative egs. If --apply-exp=true, apply the Exp() function to the output before writing it out. Note: This program uses only the input; it does not do forward-backward over the lattice. See nnet3-discriminative-compute-objf for that. Usage: nnet3-discriminative-compute-from-egs [options] <raw-nnet-in> <training-examples-in> <matrices-out> e.g.: nnet3-discriminative-compute-from-egs --apply-exp=true 0.raw ark:1.degs ark:- | matrix-sum-rows ark:- ... See also: nnet3-compute nnet3-compute-from-egs |
nnet3-latgen-faster-looped | Generate lattices using nnet3 neural net model. [this version uses the 'looped' computation, which may be slightly faster for many architectures, but should not be used for backwards-recurrent architectures such as BLSTMs. Usage: nnet3-latgen-faster-looped [options] <nnet-in> <fst-in|fsts-rspecifier> <features-rspecifier> <lattice-wspecifier> [ <words-wspecifier> [<alignments-wspecifier>] ] |
nnet3-egs-augment-image | Copy examples (single frames or fixed-size groups of frames) for neural network training, doing image augmentation inline (copies after possibly modifying of each image, randomly chosen according to configuration parameters). E.g.: nnet3-egs-augment-image --horizontal-flip-prob=0.5 --horizontal-shift=0.1\ --vertical-shift=0.1 --srand=103 --num-channels=3 --fill-mode=nearest ark:- ark:- Requires that each eg contain a NnetIo object 'input', with successive 't' values representing different x offsets , and the feature dimension representing the y offset and the channel (color), with the channel varying the fastest. See also: nnet3-copy-egs |
nnet3-xvector-get-egs | Get examples for training an nnet3 neural network for the xvector system. Each output example contains a chunk of features from some utterance along with a speaker label. The location and length of the feature chunks are specified in the 'ranges' file. Each line is interpreted as follows: <source-utterance> <relative-output-archive-index> <absolute-archive-index> <start-frame-index> <num-frames> <speaker-label> where <relative-output-archive-index> is interpreted as a zero-based index into the wspecifiers provided on the command line (<egs-0-out> and so on), and <absolute-archive-index> is ignored by this program. For example: utt1 3 13 65 300 3 utt1 0 10 50 400 3 utt2 ... Usage: nnet3-xvector-get-egs [options] <ranges-filename> <features-rspecifier> <egs-0-out> <egs-1-out> ... <egs-N-1-out> For example: nnet3-xvector-get-egs ranges.1 "$feats" ark:egs_temp.1.ark ark:egs_temp.2.ark ark:egs_temp.3.ark |
nnet3-xvector-compute | Propagate features through an xvector neural network model and write the output vectors. "Xvector" is our term for a vector or embedding which is the output of a particular type of neural network architecture found in speaker recognition. This architecture consists of several layers that operate on frames, a statistics pooling layer that aggregates over the frame-level representations and possibly additional layers that operate on segment-level representations. The xvectors are generally extracted from an output layer after the statistics pooling layer. By default, one xvector is extracted directly from the set of features for each utterance. Optionally, xvectors are extracted from chunks of input features and averaged, to produce a single vector. Usage: nnet3-xvector-compute [options] <raw-nnet-in> <features-rspecifier> <vector-wspecifier> e.g.: nnet3-xvector-compute final.raw scp:feats.scp ark:nnet_prediction.ark See also: nnet3-compute |
nnet3-xvector-compute-batched | Propagate features through an xvector neural network model and write the output vectors. "Xvector" is our term for a vector or embedding which is the output of a particular type of neural network architecture found in speaker recognition. This architecture consists of several layers that operate on frames, a statistics pooling layer that aggregates over the frame-level representations and possibly additional layers that operate on segment-level representations. The xvectors are generally extracted from an output layer after the statistics pooling layer. By default, one xvector is extracted directly from the set of features for each utterance. Optionally, xvectors are extracted from chunks of input features and averaged, to produce a single vector. Usage: nnet3-xvector-compute [options] <raw-nnet-in> <features-rspecifier> <vector-wspecifier> e.g.: nnet3-xvector-compute final.raw scp:feats.scp ark:nnet_prediction.ark See also: nnet3-compute |
nnet3-latgen-grammar | Generate lattices using nnet3 neural net model, and GrammarFst-based graph see kaldi-asr.org/doc/grammar.html for more context. Usage: nnet3-latgen-grammar [options] <nnet-in> <grammar-fst-in> <features-rspecifier> <lattice-wspecifier> [ <words-wspecifier> [<alignments-wspecifier>] ] |
nnet3-compute-batch | Propagate the features through raw neural network model and write the output. This version is optimized for GPU use. If --apply-exp=true, apply the Exp() function to the output before writing it out. Usage: nnet3-compute-batch [options] <nnet-in> <features-rspecifier> <matrix-wspecifier> e.g.: nnet3-compute-batch final.raw scp:feats.scp ark:nnet_prediction.ark |
nnet3-latgen-faster-batch | Generate lattices using nnet3 neural net model. This version is optimized for GPU-based inference. Usage: nnet3-latgen-faster-batch [options] <nnet-in> <fst-in> <features-rspecifier> <lattice-wspecifier> |
cuda-gpu-available | Test if there is a GPU available, and if the GPU setup is correct. A GPU is acquired and a small computation is done (generating a random matrix and computing softmax for its rows). exit-code: 0 = success, 1 = compiled without GPU support, -1 = error Usage: cuda-gpu-available |
cuda-compiled | This program returns exit status 0 (success) if the code was compiled with CUDA support, and 1 otherwise. To support CUDA, you must run 'configure' on a machine that has the CUDA compiler 'nvcc' available. |
nnet-train-frmshuff | Perform one iteration (epoch) of Neural Network training with mini-batch Stochastic Gradient Descent. The training targets are usually pdf-posteriors, prepared by ali-to-post. Usage: nnet-train-frmshuff [options] <feature-rspecifier> <targets-rspecifier> <model-in> [<model-out>] e.g.: nnet-train-frmshuff scp:feats.scp ark:posterior.ark nnet.init nnet.iter1 |
nnet-train-perutt | Perform one iteration of NN training by SGD with per-utterance updates. The training targets are represented as pdf-posteriors, usually prepared by ali-to-post. Usage: nnet-train-perutt [options] <feature-rspecifier> <targets-rspecifier> <model-in> [<model-out>] e.g.: nnet-train-perutt scp:feature.scp ark:posterior.ark nnet.init nnet.iter1 |
nnet-train-mmi-sequential | Perform one iteration of MMI training using SGD with per-utteranceupdates Usage: nnet-train-mmi-sequential [options] <model-in> <transition-model-in> <feature-rspecifier> <den-lat-rspecifier> <ali-rspecifier> [<model-out>] e.g.: nnet-train-mmi-sequential nnet.init trans.mdl scp:feats.scp scp:denlats.scp ark:ali.ark nnet.iter1 |
nnet-train-mpe-sequential | Perform one iteration of MPE/sMBR training using SGD with per-utteranceupdates. Usage: nnet-train-mpe-sequential [options] <model-in> <transition-model-in> <feature-rspecifier> <den-lat-rspecifier> <ali-rspecifier> [<model-out>] e.g.: nnet-train-mpe-sequential nnet.init trans.mdl scp:feats.scp scp:denlats.scp ark:ali.ark nnet.iter1 |
nnet-train-multistream | Perform one iteration of Multi-stream training, truncated BPTT for LSTMs. The training targets are pdf-posteriors, usually prepared by ali-to-post. The updates are per-utterance. Usage: nnet-train-multistream [options] <feature-rspecifier> <targets-rspecifier> <model-in> [<model-out>] e.g.: nnet-train-lstm-streams scp:feature.scp ark:posterior.ark nnet.init nnet.iter1 |
nnet-train-multistream-perutt | Perform one iteration of Multi-stream training, per-utterance BPTT for (B)LSTMs. The updates are done per-utterance, while several utterances are processed at the same time. Usage: nnet-train-multistream-perutt [options] <feature-rspecifier> <labels-rspecifier> <model-in> [<model-out>] e.g.: nnet-train-blstm-streams scp:feats.scp ark:targets.ark nnet.init nnet.iter1 |
rbm-train-cd1-frmshuff | Train RBM by Contrastive Divergence alg. with 1 step of Markov Chain Monte-Carlo. The tool can perform several iterations (--num-iters) or it can subsample the training dataset (--drop-data) Usage: rbm-train-cd1-frmshuff [options] <model-in> <feature-rspecifier> <model-out> e.g.: rbm-train-cd1-frmshuff 1.rbm.init scp:train.scp 1.rbm |
rbm-convert-to-nnet | Convert RBM to <affinetransform> and <sigmoid> Usage: rbm-convert-to-nnet [options] <rbm-in> <nnet-out> e.g.: rbm-convert-to-nnet --binary=false rbm.mdl nnet.mdl |
nnet-forward | Perform forward pass through Neural Network. Usage: nnet-forward [options] <nnet1-in> <feature-rspecifier> <feature-wspecifier> e.g.: nnet-forward final.nnet ark:input.ark ark:output.ark |
nnet-copy | Copy Neural Network model (and possibly change binary/text format) Usage: nnet-copy [options] <model-in> <model-out> e.g.: nnet-copy --binary=false nnet.mdl nnet_txt.mdl |
nnet-info | Print human-readable information about the neural network. (topology, various weight statistics, etc.) It prints to stdout. Usage: nnet-info [options] <nnet-in> e.g.: nnet-info 1.nnet |
nnet-concat | Concatenate Neural Networks (and possibly change binary/text format) Usage: nnet-concat [options] <nnet-in1> <...> <nnet-inN> <nnet-out> e.g.: nnet-concat --binary=false nnet.1 nnet.2 nnet.1.2 |
transf-to-nnet | Convert transformation matrix to <affine-transform> Usage: transf-to-nnet [options] <transf-in> <nnet-out> e.g.: transf-to-nnet --binary=false transf.mat nnet.mdl |
cmvn-to-nnet | Convert cmvn-stats into <AddShift> and <Rescale> components. Usage: cmvn-to-nnet [options] <transf-in> <nnet-out> e.g.: cmvn-to-nnet --binary=false transf.mat nnet.mdl |
nnet-initialize | Initialize Neural Network parameters according to a prototype (nnet1). Usage: nnet-initialize [options] <nnet-prototype-in> <nnet-out> e.g.: nnet-initialize --binary=false nnet.proto nnet.init |
feat-to-post | Convert features into posterior format, which is the generic format of NN training targets in Karel's nnet1 tools. (speed is not an issue for reasonably low NN-output dimensions) Usage: feat-to-post [options] feat-rspecifier posteriors-wspecifier e.g.: feat-to-post scp:feats.scp ark:feats.post |
paste-post | Combine 2 or more streams with NN-training targets into single stream. As the posterior streams are pasted, the output dimension is the sum of the input dimensions. This is used when training NN with multiple softmaxes on its output. This is used in multi-task, multi-lingual or multi-database training. Depending on the context, an utterance is not required to be in all the input streams. For a multi-database training only 1 output layer will be active. The lengths of utterances are provided as 1st argument. The dimensions of input stream are set as 2nd in argument. Follow the input and output streams which are in 'posterior' format. Usage: paste-post <featlen-rspecifier> <dims-csl> <post1-rspecifier> ... <postN-rspecifier> <post-wspecifier> e.g.: paste-post 'ark:feat-to-len $feats ark,t:-|' 1029:1124 ark:post1.ark ark:post2.ark ark:pasted.ark |
train-transitions | Train the transition probabilities in transition-model (used in nnet1 recipe). Usage: train-transitions [options] <trans-model-in> <alignments-rspecifier> <trans-model-out> e.g.: train-transitions 1.mdl "ark:gunzip -c ali.*.gz|" 2.mdl |
nnet-set-learnrate | Sets learning rate coefficient inside of 'nnet1' model Usage: nnet-set-learnrate --components=<csl> --coef=<float> <nnet-in> <nnet-out> e.g.: nnet-set-learnrate --components=1:3:5 --coef=0.5 --bias-coef=0.1 nnet-in nnet-out |
online2-wav-gmm-latgen-faster | Reads in wav file(s) and simulates online decoding, including basis-fMLLR adaptation and endpointing. Writes lattices. Models are specified via options. Usage: online2-wav-gmm-latgen-faster [options] <fst-in> <spk2utt-rspecifier> <wav-rspecifier> <lattice-wspecifier> Run egs/rm/s5/local/run_online_decoding.sh for example |
apply-cmvn-online | Apply online cepstral mean (and possibly variance) computation online, using the same code as used for online decoding in the 'new' setup in online2/ and online2bin/. If the --spk2utt option is used, it uses prior utterances from the same speaker to back off to at the utterance beginning. See also apply-cmvn-sliding. Usage: apply-cmvn-online [options] <global-cmvn-stats> <feature-rspecifier> <feature-wspecifier> e.g. apply-cmvn-online 'matrix-sum scp:data/train/cmvn.scp -|' data/train/split8/1/feats.scp ark:- or: apply-cmvn-online --spk2utt=ark:data/train/split8/1/spk2utt 'matrix-sum scp:data/train/cmvn.scp -|' data/train/split8/1/feats.scp ark:- |
extend-wav-with-silence | Extend wave data with a fairly long silence at the end (e.g. 5 seconds). The input waveforms are assumed having silences at the begin/end and those segments are extracted and appended to the end of the utterance. Note this is for use in testing endpointing in decoding. Usage: extend-wav-with-silence [options] <wav-rspecifier> <wav-wspecifier> extend-wav-with-silence [options] <wav-rxfilename> <wav-wxfilename> |
compress-uncompress-speex | Demonstrating how to use the Speex wrapper in Kaldi by compressing input waveforms chunk by chunk and then decompressing them. Usage: compress-uncompress-speex [options] <wav-rspecifier> <wav-wspecifier> |
online2-wav-nnet2-latgen-faster | Reads in wav file(s) and simulates online decoding with neural nets (nnet2 setup), with optional iVector-based speaker adaptation and optional endpointing. Note: some configuration values and inputs are set via config files whose filenames are passed as options Usage: online2-wav-nnet2-latgen-faster [options] <nnet2-in> <fst-in> <spk2utt-rspecifier> <wav-rspecifier> <lattice-wspecifier> The spk2utt-rspecifier can just be <utterance-id> <utterance-id> if you want to decode utterance by utterance. See egs/rm/s5/local/run_online_decoding_nnet2.sh for example See also online2-wav-nnet2-latgen-threaded |
ivector-extract-online2 | Extract iVectors for utterances every --ivector-period frames, using a trained iVector extractor and features and Gaussian-level posteriors. Similar to ivector-extract-online but uses the actual online decoder code to do it, and does everything in-memory instead of using multiple processes. Note: the value of the --use-most-recent-ivector config variable is ignored it's set to false. The <spk2utt-rspecifier> is mandatory, to simplify the code; if you want to do it separately per utterance, just make it of the form <utterance-id> <utterance-id>. The iVectors are output as an archive of matrices, indexed by utterance-id; each row corresponds to an iVector. If --repeat=true, outputs the whole matrix of iVectors, not just every (ivector-period)'th frame The input features are the raw, non-cepstral-mean-normalized features, e.g. MFCC. Usage: ivector-extract-online2 [options] <spk2utt-rspecifier> <feature-rspecifier> <ivector-wspecifier> e.g.: ivector-extract-online2 --config=exp/nnet2_online/nnet_online/conf/ivector_extractor.conf \ ark:data/train/spk2utt scp:data/train/feats.scp ark,t:ivectors.1.ark |
online2-wav-dump-features | Reads in wav file(s) and processes them as in online2-wav-nnet2-latgen-faster, but instead of decoding, dumps the features. Most of the parameters are set via configuration variables. Usage: online2-wav-dump-features [options] <spk2utt-rspecifier> <wav-rspecifier> <feature-wspecifier> The spk2utt-rspecifier can just be <utterance-id> <utterance-id> if you want to generate features utterance by utterance. Alternate usage: online2-wav-dump-features [options] --print-ivector-dim=true See steps/online/nnet2/{dump_nnet_activations,get_egs.sh} for examples. |
ivector-randomize | Copy matrices of online-estimated iVectors, but randomize them; this is intended primarily for training the online nnet2 setup with iVectors. For each input matrix, each row with index t is, with probability given by the option --randomize-prob, replaced with the contents an input row chosen randomly from the interval [t, T] where T is the index of the last row of the matrix. Usage: ivector-randomize [options] <ivector-rspecifier> <ivector-wspecifier> e.g.: ivector-randomize ark:- ark:- See also: ivector-extract-online, ivector-extract-online2, subsample-feats |
online2-wav-nnet2-am-compute | Simulates the online neural net computation for each file of input features, and outputs as a matrix the result, with optional iVector-based speaker adaptation. Note: some configuration values and inputs are set via config files whose filenames are passed as options. Used mostly for debugging. Note: if you want it to apply a log (e.g. for log-likelihoods), use --apply-log=true. Usage: online2-wav-nnet2-am-compute [options] <nnet-in> <spk2utt-rspecifier> <wav-rspecifier> <feature-or-loglikes-wspecifier> The spk2utt-rspecifier can just be <utterance-id> <utterance-id> if you want to compute utterance by utterance. |
online2-wav-nnet2-latgen-threaded | Reads in wav file(s) and simulates online decoding with neural nets (nnet2 setup), with optional iVector-based speaker adaptation and optional endpointing. This version uses multiple threads for decoding. Note: some configuration values and inputs are set via config files whose filenames are passed as options Usage: online2-wav-nnet2-latgen-threaded [options] <nnet2-in> <fst-in> <spk2utt-rspecifier> <wav-rspecifier> <lattice-wspecifier> The spk2utt-rspecifier can just be <utterance-id> <utterance-id> if you want to decode utterance by utterance. See egs/rm/s5/local/run_online_decoding_nnet2.sh for example See also online2-wav-nnet2-latgen-faster |
online2-wav-nnet3-latgen-faster | Reads in wav file(s) and simulates online decoding with neural nets (nnet3 setup), with optional iVector-based speaker adaptation and optional endpointing. Note: some configuration values and inputs are set via config files whose filenames are passed as options Usage: online2-wav-nnet3-latgen-faster [options] <nnet3-in> <fst-in> <spk2utt-rspecifier> <wav-rspecifier> <lattice-wspecifier> The spk2utt-rspecifier can just be <utterance-id> <utterance-id> if you want to decode utterance by utterance. |
online2-wav-nnet3-latgen-grammar | Reads in wav file(s) and simulates online decoding with neural nets (nnet3 setup), with optional iVector-based speaker adaptation and optional endpointing. Note: some configuration values and inputs are set via config files whose filenames are passed as options. This program like online2-wav-nnet3-latgen-faster but when the FST to be decoded is of type GrammarFst. Usage: online2-wav-nnet3-latgen-grammar [options] <nnet3-in> <fst-in> <spk2utt-rspecifier> <wav-rspecifier> <lattice-wspecifier> The spk2utt-rspecifier can just be <utterance-id> <utterance-id> if you want to decode utterance by utterance. |
online2-tcp-nnet3-decode-faster | Reads in audio from a network socket and performs online decoding with neural nets (nnet3 setup), with iVector-based speaker adaptation and endpointing. Note: some configuration values and inputs are set via config files whose filenames are passed as options Usage: online2-tcp-nnet3-decode-faster [options] <nnet3-in> <fst-in> <word-symbol-table> |
online2-wav-nnet3-latgen-incremental | Reads in wav file(s) and simulates online decoding with neural nets (nnet3 setup), with optional iVector-based speaker adaptation and optional endpointing. Note: some configuration values and inputs are set via config files whose filenames are passed as options The lattice determinization algorithm here can operate incrementally. Usage: online2-wav-nnet3-latgen-incremental [options] <nnet3-in> <fst-in> <spk2utt-rspecifier> <wav-rspecifier> <lattice-wspecifier> The spk2utt-rspecifier can just be <utterance-id> <utterance-id> if you want to decode utterance by utterance. |
online-net-client | Takes input using a microphone(PortAudio), extracts features and sends them to a speech recognition server over a network connection Usage: online-net-client server-address server-port |
online-server-gmm-decode-faster | Decode speech, using feature batches received over a network connection Utterance segmentation is done on-the-fly. Feature splicing/LDA transform is used, if the optional(last) argument is given. Otherwise delta/delta-delta(2-nd order) features are produced. Usage: online-server-gmm-decode-faster [options] model-infst-in word-symbol-table silence-phones udp-port [lda-matrix-in] Example: online-server-gmm-decode-faster --rt-min=0.3 --rt-max=0.5 --max-active=4000 --beam=12.0 --acoustic-scale=0.0769 model HCLG.fst words.txt '1:2:3:4:5' 1234 lda-matrix |
online-gmm-decode-faster | Decode speech, using microphone input(PortAudio) Utterance segmentation is done on-the-fly. Feature splicing/LDA transform is used, if the optional(last) argument is given. Otherwise delta/delta-delta(2-nd order) features are produced. Usage: online-gmm-decode-faster [options] <model-in><fst-in> <word-symbol-table> <silence-phones> [<lda-matrix-in>] Example: online-gmm-decode-faster --rt-min=0.3 --rt-max=0.5 --max-active=4000 --beam=12.0 --acoustic-scale=0.0769 model HCLG.fst words.txt '1:2:3:4:5' lda-matrix |
online-wav-gmm-decode-faster | Reads in wav file(s) and simulates online decoding. Writes integerized-text and .ali files for WER computation. Utterance segmentation is done on-the-fly. Feature splicing/LDA transform is used, if the optional(last) argument is given. Otherwise delta/delta-delta(i.e. 2-nd order) features are produced. Caution: the last few frames of the wav file may not be decoded properly. Hence, don't use one wav file per utterance, but rather use one wav file per show. Usage: online-wav-gmm-decode-faster [options] wav-rspecifier model-infst-in word-symbol-table silence-phones transcript-wspecifier alignments-wspecifier [lda-matrix-in] Example: ./online-wav-gmm-decode-faster --rt-min=0.3 --rt-max=0.5 --max-active=4000 --beam=12.0 --acoustic-scale=0.0769 scp:wav.scp model HCLG.fst words.txt '1:2:3:4:5' ark,t:trans.txt ark,t:ali.txt |
online-audio-server-decode-faster | Starts a TCP server that receives RAW audio and outputs aligned words. A sample client can be found in: onlinebin/online-audio-client Usage: online-audio-server-decode-faster [options] model-in fst-in word-symbol-table silence-phones word_boundary_file tcp-port [lda-matrix-in] example: online-audio-server-decode-faster --verbose=1 --rt-min=0.5 --rt-max=3.0 --max-active=6000 --beam=72.0 --acoustic-scale=0.0769 final.mdl graph/HCLG.fst graph/words.txt '1:2:3:4:5' graph/word_boundary.int 5000 final.mat |
online-audio-client | Sends an audio file to the KALDI audio server (onlinebin/online-audio-server-decode-faster) and prints the result optionally saving it to an HTK label file or WebVTT subtitles file e.g.: ./online-audio-client 192.168.50.12 9012 'scp:wav_files.scp' |
rnnlm-get-egs | This program processes lines of text (typically sentences) with weights, in a format like: 1.0 67 5689 21 8940 6723 and turns them into examples (class RnnlmExample) for RNNLM training. This involves splitting up the sentences to a maximum length, importance sampling and other procedures. Usage: (1) no sampling: rnnlm-get-egs [options] <sentences-rxfilename> <rnnlm-egs-wspecifier> (2) sampling, ARPA LM read: rnnlm-get-egs [options] <symbol-table> <ARPA-rxfilename> \ <sentences-rxfilename> <rnnlm-egs-wspecifier> (3) sampling, non-ARPA LM read: rnnlm-get-egs [options] <LM-rxfilename> <sentences-rxfilename>\ <rnnlm-egs-wspecifier> E.g.: ... | rnnlm-get-egs --vocab-size=20002 - ark:- | rnnlm-train ... or (with sampling, reading LM as ARPA): ... | rnnlm-get-egs words.txt foo.arpa - ark:- | rnnlm-train ... or (with sampling, reading LM natively): ... | rnnlm-get-egs sampling.lm - ark:- | rnnlm-train ... See also: rnnlm-train |
rnnlm-train | Train nnet3-based RNNLM language model (reads minibatches prepared by rnnlm-get-egs). Supports various modes depending which parameters we are training. Usage: rnnlm-train [options] <egs-rspecifier> e.g.: rnnlm-get-egs ... ark:- | \ rnnlm-train --read-rnnlm=foo/0.raw --write-rnnlm=foo/1.raw --read-embedding=foo/0.embedding \ --write-embedding=foo/1.embedding --read-sparse-word-features=foo/word_feats.txt ark:- See also: rnnlm-get-egs |
rnnlm-get-sampling-lm | Estimate highly-pruned backoff LM for use in importance sampling for RNNLM training. Reads integerized text. Usage: rnnlm-get-sampling-lm [options] <input-integerized-weighted-text> \ <sampling-lm-out> (this form writes a non-human-readable format that can be read by rnnlm-get-egs). e.g.: ... | rnnlm-get-sampling-lm --vocab-size=10002 - sampling.lm The word symbol table is used to write the ARPA file, but is expected to already have been used to convert the words into integer form. Each line of integerized input text should have a corpus weight as the first field, e.g.: 1.0 782 1271 3841 82 and lines of input text should not be repeated (just increase the weight). See also: rnnlm-get-egs |
rnnlm-get-word-embedding | This very simple program multiplies a sparse matrix by a dense matrix compute the word embedding (which is also a dense matrix). The sparse matrix is in a text format specific to the RNNLM tools. Usage: rnnlm-get-word-embedding [options] <sparse-word-features-rxfilename> \ <feature-embedding-rxfilename> <word-embedding-wxfilename> e.g.: rnnlm-get-word-embedding word_features.txt feat_embedding.mat word_embedding.mat See also: rnnlm-get-egs, rnnlm-train |
rnnlm-compute-prob | This program computes the probability per word of the provided training data in 'egs' format as prepared by rnnlm-get-egs. The interface is similar to rnnlm-train, except that it doesn't train, and doesn't write the model; it just prints the average probability to the standard output (in addition to printing various diagnostics to the standard error). Usage: rnnlm-compute-prob [options] <rnnlm> <word-embedding-matrix> <egs-rspecifier> e.g.: rnnlm-get-egs ... ark:- | \ rnnlm-compute-prob 0.raw 0.word_embedding ark:- (note: use rnnlm-get-word-embedding to get the word embedding matrix if you are using sparse word features.) |
rnnlm-sentence-probs | This program takes input of a text corpus (with words represented by symbol-id's), and an already trained RNNLM model, and prints the log -probabilities of each word in the corpus. The RNNLM resets its hidden state for each new line. This is used in n-best rescoring with RNNLMs An example the n-best rescoring usage is at egs/swbd/s5c$ vi local/rnnlm/run_tdnn_lstm.sh Usage: rnnlm-sentence-probs [options] <rnnlm> <word-embedding-matrix> <input-text-file> e.g.: rnnlm-sentence-probs rnnlm/final.raw rnnlm/final.word_embedding dev_corpus.txt > output_logprobs.txt |
sgmm2-init | Initialize an SGMM from a trained full-covariance UBM and a specified model topology. Usage: sgmm2-init [options] <topology> <tree> <init-model> <sgmm-out> The <init-model> argument can be a UBM (the default case) or another SGMM (if the --init-from-sgmm flag is used). For systems with two-level tree, use --pdf-map argument. |
sgmm2-gselect | Precompute Gaussian indices for SGMM training Usage: sgmm2-gselect [options] <model-in> <feature-rspecifier> <gselect-wspecifier> e.g.: sgmm2-gselect 1.sgmm "ark:feature-command |" ark:1.gs Note: you can do the same thing by combining the programs sgmm2-write-ubm, fgmm-global-to-gmm, gmm-gselect and fgmm-gselect |
sgmm2-acc-stats | Accumulate stats for SGMM training. Usage: sgmm2-acc-stats [options] <model-in> <feature-rspecifier> <posteriors-rspecifier> <stats-out> e.g.: sgmm2-acc-stats --gselect=ark:gselect.ark 1.mdl 1.ali scp:train.scp 'ark:ali-to-post 1.ali ark:-|' 1.acc (note: gselect option is mandatory) |
sgmm2-est | Estimate SGMM model parameters from accumulated stats. Usage: sgmm2-est [options] <model-in> <stats-in> <model-out> |
sgmm2-sum-accs | Sum multiple accumulated stats files for SGMM training. Usage: sgmm2-sum-accs [options] stats-out stats-in1 stats-in2 ... |
sgmm2-align-compiled | Align features given [SGMM-based] models. Usage: sgmm2-align-compiled [options] <model-in> <graphs-rspecifier> <feature-rspecifier> <alignments-wspecifier> e.g.: sgmm2-align-compiled 1.mdl ark:graphs.fsts scp:train.scp ark:1.ali |
sgmm2-est-spkvecs | Estimate SGMM speaker vectors, either per utterance or for the supplied set of speakers (with spk2utt option). Reads Gaussian-level posteriors. Writes to a table of vectors. Usage: sgmm2-est-spkvecs [options] <model-in> <feature-rspecifier> <post-rspecifier> <vecs-wspecifier> note: --gselect option is required. |
sgmm2-post-to-gpost | Convert posteriors to Gaussian-level posteriors for SGMM training. Usage: sgmm2-post-to-gpost [options] <model-in> <feature-rspecifier> <posteriors-rspecifier> <gpost-wspecifier> e.g.: sgmm2-post-to-gpost 1.mdl 1.ali scp:train.scp 'ark:ali-to-post ark:1.ali ark:-|' ark:- |
sgmm2-acc-stats-gpost | Accumulate stats for SGMM training, given Gaussian-level posteriors Usage: sgmm2-acc-stats-gpost [options] <model-in> <feature-rspecifier> <gpost-rspecifier> <stats-out> e.g.: sgmm2-acc-stats-gpost 1.mdl 1.ali scp:train.scp ark, s, cs:- 1.acc |
sgmm2-latgen-faster | Decode features using SGMM-based model. Usage: sgmm2-latgen-faster [options] <model-in> (<fst-in>|<fsts-rspecifier>) <features-rspecifier> <lattices-wspecifier> [<words-wspecifier> [<alignments-wspecifier>] ] |
sgmm2-est-spkvecs-gpost | Estimate SGMM speaker vectors, either per utterance or for the supplied set of speakers (with spk2utt option). Reads Gaussian-level posteriors. Writes to a table of vectors. Usage: sgmm2-est-spkvecs-gpost [options] <model-in> <feature-rspecifier> <gpost-rspecifier> <vecs-wspecifier> |
sgmm2-rescore-lattice | Replace the acoustic scores on a lattice using a new model. Usage: sgmm2-rescore-lattice [options] <model-in> <lattice-rspecifier> <feature-rspecifier> <lattice-wspecifier> e.g.: sgmm2-rescore-lattice 1.mdl ark:1.lats scp:trn.scp ark:2.lats |
sgmm2-copy | Copy SGMM (possibly changing binary/text format) Usage: sgmm2-copy [options] <model-in> <model-out> e.g.: sgmm2-copy --binary=false 1.mdl 1_text.mdl |
sgmm2-info | Print various information about an SGMM. Usage: sgmm2-info [options] <model-in> [model-in2 ... ] |
sgmm2-est-ebw | Estimate SGMM model parameters discriminatively using Extended Baum-Welch style of update Usage: sgmm2-est-ebw [options] <model-in> <num-stats-in> <den-stats-in> <model-out> |
sgmm2-acc-stats2 | Accumulate numerator and denominator stats for discriminative training of SGMMs (input is posteriors of mixed sign) Usage: sgmm2-acc-stats2 [options] <model-in> <feature-rspecifier> <posteriors-rspecifier> <num-stats-out> <den-stats-out> e.g.: sgmm2-acc-stats2 1.mdl 1.ali scp:train.scp ark:1.posts num.acc den.acc |
sgmm2-comp-prexform | Compute "pre-transform" parameters required for estimating fMLLR with SGMMs, and write to a model file, after the SGMM. Usage: sgmm2-comp-prexform [options] <sgmm2-in> <occs-in> <sgmm-out> |
sgmm2-est-fmllr | Estimate FMLLR transform for SGMMs, either per utterance or for the supplied set of speakers (with spk2utt option). Reads state-level posteriors. Writes to a table of matrices. --gselect option is mandatory. Usage: sgmm2-est-fmllr [options] <model-in> <feature-rspecifier> <post-rspecifier> <mats-wspecifier> |
sgmm2-project | Compute SGMM model projection that only models a part of a pre-LDA space. Used in predictive SGMMs. Takes as input an LDA+MLLT transform, and outputs a transform from the pre-LDA+MLLT space to the space that we want to model Usage: sgmm2-project [options] <model-in> <lda-mllt-mat-in> <model-out> <new-projection-out> e.g.: sgmm2-project --start-dim=0 --end-dim=52 final.mdl final.inv_full_mat final_proj1.mdl proj1.mat |
sgmm2-latgen-faster-parallel | Decode features using SGMM-based model. This version accepts the --num-threads option but otherwise behaves identically to sgmm2-latgen-faster Usage: sgmm2-latgen-faster-parallel [options] <model-in> (<fst-in>|<fsts-rspecifier>) <features-rspecifier> <lattices-wspecifier> [<words-wspecifier> [<alignments-wspecifier>] ] |
init-ubm | Cluster the Gaussians in a diagonal-GMM acoustic model to a single full-covariance or diagonal-covariance GMM. Usage: init-ubm [options] <model-file> <state-occs> <gmm-out> |
lattice-lmrescore-tf-rnnlm | Rescores lattice with rnnlm that is trained with TensorFlow. An example script for training and rescoring with the TensorFlow RNNLM is at egs/ami/s5/local/tfrnnlm/run_lstm_fast.sh Usage: lattice-lmrescore-tf-rnnlm [options] [unk-file] <rnnlm-wordlist> \ <word-symbol-table-rxfilename> <lattice-rspecifier> \ <rnnlm-rxfilename> <lattice-wspecifier> e.g.: lattice-lmrescore-tf-rnnlm --lm-scale=0.5 data/tensorflow_lstm/unkcounts.txt data/tensorflow_lstm/rnnwords.txt \ data/lang/words.txt ark:in.lats data/tensorflow_lstm/rnnlm ark:out.lats |
lattice-lmrescore-tf-rnnlm-pruned | Rescores lattice with rnnlm that is trained with TensorFlow. An example script for training and rescoring with the TensorFlow RNNLM is at egs/ami/s5/local/tfrnnlm/run_lstm_fast.sh Usage: lattice-lmrescore-tf-rnnlm-pruned [options] [unk-file] \ <old-lm> <fst-wordlist> <rnnlm-wordlist> \ <rnnlm-rxfilename> <lattice-rspecifier> <lattice-wspecifier> e.g.: lattice-lmrescore-tf-rnnlm-pruned --lm-scale=0.5 data/tensorflow_lstm/unkcounts.txt \ data/test/G.fst data/lang/words.txt data/tensorflow_lstm/rnnwords.txt \ data/tensorflow_lstm/rnnlm ark:in.lats ark:out.lats e.g.: lattice-lmrescore-tf-rnnlm-pruned --lm-scale=0.5 data/tensorflow_lstm/unkcounts.txt \ data/test_fg/G.carpa data/lang/words.txt data/tensorflow_lstm/rnnwords.txt \ data/tensorflow_lstm/rnnlm ark:in.lats ark:out.lats |