Frequently Asked Questions

Introduction

This page contains the answers to some miscellaneous frequently asked questions from the mailing lists. This should not be your primary way of finding such answers: the mailing lists and github contain many more discussions, and a web search may be the easiest way to find answers.

Below are FAQ candidates (with some TODOs) from the mailing lists, we will update these candidates to make them more readable.

1. How to interpret the final.mdl

2. WFST

3. Feat related tools

4. Lattice

5. CTM

6. Random in Feature Extraction

7. Copy related tools (copy-feats, copy-matrix, etc )

8. Usage of Common tools (in data preparation?)

9. Resume training

10. GOP (goodness of pronunciation) and confidence score in Kaldi

11. Decision tree

12. –Transition-scale, –self-loop-scale, –acoustic-scale, lm-weight

13. Interaction between Kaldi and HTK

  • Feature level (copy-feats-to-htk, etc)
  • Model Level (they are primarily different, with further explanation)
  • releated questions:

14. The effect of Beam

15. Nnet-align-compiled used too much memory?

16. How do I check the kaldi version?

17. Getting acoustic scores on state level in decoding

18. Mandarin: Pitch vs No Pitch

19. Is it possible to run kaldi on AMD gpu? Is a opencl port available?

20. Rescore

21. Free training data

22. Thread safe in Kaldi

23. Kaldi logo

24. Lexicon free Text recognition

25. Decoding .wav files

26. How to remove the silence modeling during training and testing

27. Python wrapper for Kaldi

28. Examples for different task

29. Model (update) in Kaldi

30. The use of !SIL word in the lexicon

31. Problem when do alignment

32. Why is there a disambiguation symbol in L_disambig.fst after optional silence?

33. Kaldi Book for beginners

34. Docker for kaldi

35. RNNLM

36. Run nnet3 without ivectors

37. Which is a best starting point to learn online decoding

38. How to print partial result in online2-wav-nnet3-latgen-faster

39. Data preprocessing and augmentation

40. Speaker diarization

41. How to specify GPU for chain model training

42. How to do the Latency control training in kaldi ?

43. What's the meaning of content of nnet3's config?

44. Keyword spotting

45. Kaldi supported gpus

46. Windows ASR toolkit based on Kaldi

47. Optimizing model load time?

48. DNN input feature

49. Is there any trick to accelerate the nnet-compute?

50. Reading *.ark files from bash or python

51. What is meant by WER and SER?

52. Training DNN over LDA+MLLT system

53. End-to-End SR

54. Kaldi already supports SVD. Can you give me an example of how to use SVD in LSTMP network?

55. Decoding a built graph without grammar

56. Why is mfcc used in tdnn,but not fbank?

57. What's the maximum amount of data used with kaldi for training acoustic models

58. Ivector

59. CMVN, VTLN, FMLLR adaptation

60. What causes too many words delete?

61. Kaldi linear Model Combination or Model Merging

62. OCR

63. Robustness of ASR

64. Python3 vs. Python2.7 in Kaldi scripts

65. Kaldi for Android

66. Why the name is kaldi?

67. Adapt speaker recognition model

68. Teacher-student model in Kaldi

69. Language model

70. Real time time decoding force last audio data decoding

71. QR Decomposition within Kaldi

72. Is the word_boundary.int necessary for online-audio-server-decode-faster

73. Is WER a lexcion error or a character error when training kaldi Mandarin speech recognition model?

74. What is word_boundary file and how can I create this?

75. Different results from lattice-align-words and lattice-mbr-decode