Contact
dpovey@gmail.com
Phone: 425 247 4129
(Daniel Povey)

How to upload your data

Before uploading your builds, please talk with Dan Povey: dpovey@gmail.com. When you are ready to submit, the usage is as follows (and note that ssh will prompt for your password, which Dan will give you).
ssh uploads@kaldi-asr.org accept_data.pl --revision <kaldi-svn-revision> --branch <branch-name> --name '"<your name>"' \
    --note '"<your note>"' --root <archive-root> < (your data)
The double quoting is necessary-- the outer quotes are interpreted by your shell, and the inner ones by the shell on the remote machine, kaldi-asr.org. A concrete example is as follows:
  cd ~/kaldi-trunk/egs/wsj/s5
  tar cz data exp | ssh uploads@kaldi-asr.org accept_data.pl --revision 4131 --branch trunk \
     --name '"Daniel Povey"' --root egs/wsj/s5  --note '"Building the standard parts of WSJ script"'
For larger builds it will make sense to clean up your output before submitting, e.g. remove intermediate neural net model builds and egs/ directories, lattices (if they are large) and compiled graphs (fsts.*.gz); otherwise it will incur more server fees for storing the data. The command "du -k " will be useful in figuring out where most of the space is taken up, and you should tell Dan the total size of data that you intend to upload.

For non-free data you may also have to do some cleanup as required by the copyright. If your dataset comes from the LDC or a similar non-free provider of data, most likely the transcripts should not be released, so before uploading your build you should probably do something like

 for x in data/*/text; do
   echo "This file cannot be provided at kaldi-asr.org, for copyright reasons" > $x
 done
If you want to avoid ruining your existing build directory by doing this type of thing, it probably makes sense to copy it to a different location. For example:
 cd ~/kaldi-trunk/egs/wsj
 cp -r s5 s5.upload
 cd s5.upload
 # <do any cleanup you need to do>
 tar cz data exp | ssh uploads@kaldi-asr.org accept_data.pl --revision 4131 --branch trunk \
      --name '"Daniel Povey"' --root egs/wsj/s5  --note '"Building the standard parts of WSJ script"'

After you upload the data, it will not show up on the website automatically; you need to ask Dan to rebuild the site.