This page describes the I/O mechanisms in Kaldi from the perspective of a user of the command line tools.
See Kaldi I/O mechanisms for a more code-level overview.
We first describe "non-table" I/O. This refers to files or streams containing just one or two objects (e.g. acoustic model files; transformation matrices), rather than a collection of objects indexed by strings.
To illustrate the concepts above, make sure $KALDI_ROOT/src/bin is on your path, where $KALDI_ROOT is the top of the repository, and type the following:
echo '[ 0 1 ]' | copy-matrix - -
It will print out a log message and some binary data corresponding to that matrix. Now try:
echo '[ 0 1 ]' | copy-matrix --binary=false - -
The output will look like this:
# copy-matrix --binary=false - - copy-matrix --binary=false - - [ 0 1 ] LOG (copy-matrix:main():copy-matrix.cc:68) Copied matrix to -
Although it looks like the matrix and log messages are mixed up, the log messages are on the standard error and would not be passed into a pipe; to avoid seeing the log messages you could redirect stderr to /dev/null by adding 2>/dev/null to the command line.
Kaldi programs may be connected using pipes or by using the stream-as-a-file mechanism of Kaldi I/O. Here is a pipe example:
echo '[ 0 1 ]' | copy-matrix - - | copy-matrix --binary=false - -
This outputs the matrix in text form (the first copy-matrix command converts to binary form and the second to text form, which is of course pointless). You could accomplish the same thing in a more convoluted way by doing this:
copy-matrix 'echo [ 0 1 ]|' '|copy-matrix --binary=false - -'
There is no reason to do this here, but it can sometimes be useful when programs have multiple inputs or outputs so the stdin or stdout is already being used. It is particularly useful with tables (see next section).
Kaldi has special I/O mechanisms for dealing with collections of objects indexed by strings. Examples of this are feature matrices indexed by utterance-ids, or speaker-adaptation transformation matrices indexed by speaker-ids. The strings that index the collection must be nonempty and whitespace free. See The Table concept for a more in-depth discussion.
A Table may exist in two forms: an "archive" or a "script file". The difference is that the archive actually contains the data, while a script file points to the location of the data.
Programs that read from Tables expect a string we call an "rspecifier" that says how to read the indexed data, and programs that write to Tables expect a string we call a "wspecifier" to write it. These are strings that specify whether to expect script file or an archive, and the file location, along with various options. Common types of rspecifiers include "ark:-", meaning read the data as an archive from the standard input, or "scp:foo.scp", meaning the script file foo.scp says where to read the data from. Points to bear in mind are:
"cat a/b/*.ark"if you need the sorted order.
echo '[ 0 1 ]' | copy-matrix 'scp:echo foo -|' 'scp,t:echo foo -|'This deserves a little explanation. Firstly, the rspecifier "scp:echo foo -|" is equivalent to scp:bar.scp if the file bar.scp contained just the line "foo -". This tells it to read the object indexed by "foo" from the standard input. Similarly, for the wspecifier "scp,t:echo foo -|", it writes the data for "foo" to the standard output. This trick should not be overused. In this particular case, it is unnecessary because we have made the copy-matrix program support non-table I/O directly, so you could have written just "copy-matrix - -". If you have to use this trick too much, it's better to modify the program concerned.
copy-matrix 'ark:some_archive.ark' 'scp,t,p:echo foo_bar -|'
Many Kaldi programs take utterance-to-speaker and speaker-to-utterances maps– files called "utt2spk" or "spk2utt". These are generally specified by command-line options –utt2spk and –spk2utt. The utt2spk map has the format
utt1 spk_of_utt1 utt2 spk_of_utt2 ...
and the spk2utt map has the format
spk1 utt1_of_spk1 utt2_of_spk1 utt3_of_spk1 spk2 utt1_of_spk2 utt2_of_spk2 ...
These files are used for speaker adaptation, e.g. for finding which speaker corresponds to an utterance, or to iterate over speakers. For reasons that relate mostly to the way the Kaldi example scripts are set up and the way we split data up into multiple pieces, it's important to ensure that the speakers in the utterance-to-speaker map are in sorted order (see Data preparation). Anyway, these files are actually treated as archives, and for this reason you will see command-line options like –utt2spk=ark:data/train/utt2spk. You will see that these files fit the generic archive format of: "<key1> <data> <newline> <key2> <data> <newline>", where in this case the data is in text form. At the code level, the utt2spk file is treated as a table containing a string, and the spk2utt file is treated as a table containing a list of strings.