Librispeech ASR model
The following models are provided: (i) TDNN-F based chain model based on the tdnn_1d_sp recipe, trained on 960h Librispeech data with 3x speed perturbation; (ii) Language models RNNLM trained on Librispeech trainiing transcriptions; and (iii) an i-vector extractor trained on a 200h subset of the data. We don't additionally include an LM since it can be prepared easily using the package available on OpenSLR. A tutorial on using the pre-trained model on your own data (with WSJ as an example) can be found here.
Librispeech ASR Chain 1d
- Date
- 2020-02-03
- Uploader
- Desh Raj
- Recipe
- egs/librispeech/s5
- Kaldi Version
- ea6e1b7
- Model Type
- Speech Recognition, Factored TDNN, Chain
- Error Rate
- WER 3.76% on test-clean, 8.92% on test-other
- Notes
- Reported WER is after rescoring with large 4-gram LM (fglarge).
Librispeech language models
- Date
- 2020-12-27
- Uploader
- Ke Li
- Recipe
- egs/librispeech/s5
- Model Type
- Pruned 3-gram, RNNLM
- Error Rate
- Perplexity 110.7 on dev (for RNNLM)
Librispeech i-vector extractor
- Date
- 2020-02-03
- Uploader
- Desh Raj
- Recipe
- egs/librispeech/s5
- Kaldi Version
- ea6e1b7
- Model Type
- Speaker ID, i-vector