Kaldi is a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. Kaldi is intended for use by speech recognition researchers. For more detailed history and list of contributors see History of the Kaldi project.
According to legend, Kaldi was the Ethiopian goatherder who discovered the coffee plant.
Kaldi is similar in aims and scope to HTK. The goal is to have modern and flexible code, written in C++, that is easy to modify and extend. Important features include:
The goal of releasing complete recipes is an important aspect of Kaldi. Since the code is publicly available under a license that permits modifications and re-release, we would like to encourage people to release their code, along with their script directories, in a similar format to Kaldi's own example script.
We have tried to make Kaldi's documentation as complete as possible given time constraints, but in the short term we cannot hope to generate documentation that is as thorough as HTK's. In particular there is a lot of introductory material in the HTKBook, explaining statistical speech recognition for the uninitiated, that will probably never appear in Kaldi's documentation. Much of Kaldi's documentation is written in such a way that it will only be accessible to an expert. In the future we hope to make it somewhat more accessible, bearing in mind that our intended audience is speech recognition researchers or researchers-in-training. In general, Kaldi is not a speech recognition toolkit "for dummies." It will allow you to do many kinds of operations that don't make sense.
In this section we attempt to summarize some of the more generic qualities of the Kaldi toolkit. To some extent this describes the goals of the current developers, as much as it descibes the current status of the project. It is not meant to exclude contributions from researchers whose work has a different flavor.
Currently, we have code and scripts for most standard techniques, including all standard linear transforms, MMI, boosted MMI and MCE discriminative training, and also feature-space discriminative training (like fMPE, but based on boosted MMI). We have working recipes for Wall Street Journal and Resource Management, and also for Switchboard. The Switchboard recipe is not yet giving state-of-the-art results, due to vocabulary and language model issues– we don't use any external data sources for this.
Note: after an early phase in which we intended to use version numbers for major releases of Kaldi ("v1" and so on), we realized that these type of releases do not mesh well with the natural style of development, which is very continuous. Currently we maintain only the "master" development branch, and this is the version you should use. Also, frequently do "git pull" to keep it up to date; see Downloading and installing Kaldi for more details.
You can use the following reference if you want to cite Kaldi in papers.
@INPROCEEDINGS{ Povey_ASRU2011, author = {Povey, Daniel and Ghoshal, Arnab and Boulianne, Gilles and Burget, Lukas and Glembek, Ondrej and Goel, Nagendra and Hannemann, Mirko and Motlicek, Petr and Qian, Yanmin and Schwarz, Petr and Silovsky, Jan and Stemmer, Georg and Vesely, Karel}, keywords = {ASR, Automatic Speech Recognition, GMM, HTK, SGMM}, month = dec, title = {The Kaldi Speech Recognition Toolkit}, booktitle = {IEEE 2011 Workshop on Automatic Speech Recognition and Understanding}, year = {2011}, publisher = {IEEE Signal Processing Society}, location = {Hilton Waikoloa Village, Big Island, Hawaii, US}, note = {IEEE Catalog No.: CFP11SRW-USB}, }
The paper can be found here .