Librispeech ASR model

The following models are provided: (i) TDNN-F based chain model based on the tdnn_1d_sp recipe, trained on 960h Librispeech data with 3x speed perturbation; (ii) Language models RNNLM trained on Librispeech trainiing transcriptions; and (iii) an i-vector extractor trained on a 200h subset of the data. We don't additionally include an LM since it can be prepared easily using the package available on OpenSLR. A tutorial on using the pre-trained model on your own data (with WSJ as an example) can be found here.

Librispeech ASR Chain 1d

Download

Date: 2020-02-03
Uploader: Desh Raj
Recipe: egs/librispeech/s5
Kaldi Version: ea6e1b7
Model Type: Speech Recognition, Factored TDNN, Chain
Error Rate: WER 3.76% on test-clean, 8.92% on test-other
Notes: Reported WER is after rescoring with large 4-gram LM (fglarge).

Librispeech language models

Download

Date: 2020-12-27
Uploader: Ke Li
Recipe: egs/librispeech/s5
Model Type: Pruned 3-gram, RNNLM
Error Rate: Perplexity 110.7 on dev (for RNNLM)

Librispeech i-vector extractor

Download

Date: 2020-02-03
Uploader: Desh Raj
Recipe: egs/librispeech/s5
Kaldi Version: ea6e1b7
Model Type: Speaker ID, i-vector