am-sgmm2.h
Go to the documentation of this file.
1 // sgmm2/am-sgmm2.h
2 
3 // Copyright 2009-2011 Microsoft Corporation; Lukas Burget;
4 // Saarland University (Author: Arnab Ghoshal);
5 // Ondrej Glembek; Yanmin Qian;
6 // Copyright 2012-2013 Johns Hopkins University (author: Daniel Povey)
7 // Liang Lu; Arnab Ghoshal
8 
9 // See ../../COPYING for clarification regarding multiple authors
10 //
11 // Licensed under the Apache License, Version 2.0 (the "License");
12 // you may not use this file except in compliance with the License.
13 // You may obtain a copy of the License at
14 //
15 // http://www.apache.org/licenses/LICENSE-2.0
16 //
17 // THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
18 // KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
19 // WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
20 // MERCHANTABLITY OR NON-INFRINGEMENT.
21 // See the Apache 2 License for the specific language governing permissions and
22 // limitations under the License.
23 
24 #ifndef KALDI_SGMM2_AM_SGMM2_H_
25 #define KALDI_SGMM2_AM_SGMM2_H_
26 
27 #include <vector>
28 
29 #include "base/kaldi-common.h"
30 #include "matrix/matrix-lib.h"
31 #include "gmm/model-common.h"
32 #include "gmm/diag-gmm.h"
33 #include "gmm/full-gmm.h"
34 #include "itf/options-itf.h"
35 #include "util/table-types.h"
36 #include "util/kaldi-thread.h"
37 
38 namespace kaldi {
39 /*
40  When reading this file, keep in mind two references: the paper
41  "The Subspace Gaussian Mixture Model-- a Structured Model for Speech Recognition", by D. Povey,
42  L. Burget et. al (Computer Speech and Language, 2011), and
43  "The Symmetric Subspace Gaussian Mixture Model": Microsoft Research technical report MSR-TR-2010-138.
44  We will refer to these as "the paper" [or "the CSL paper"] and "the techreport".
45 
46  (1) SSGMM
47 
48  We'll use the acronym SSGMM to refer to the Symmetric SGMM, and we'll mark in
49  the code with "[SSGMM]" things that relate to it. The technical report
50  describes an extention to the originally described model where we have
51  speaker-dependent mixture weights. These are implemented here. Note: we only
52  implement the "more efficient" version of the update for the speaker
53  projection vectors \u_i. There is also an ICASSP paper that describes the
54  stuff in the techreport (more briefly), with results, but we don't refer to
55  any equation numbers in that.
56 
57  (2) SCTM
58 
59  What we implement here has another extension that was not in the CSL paper: an
60  extension to the "state-clustered tied mixture" [SCTM] system-- a bit like BBN's
61  style of system, except for SGMMs not Gaussians, at the sub-state not Gaussian level.
62  We build a first
63  tree, at which level the phonetic sub-state vectors are defined, and then a
64  "more detailed" tree, at which level we share the sub-state mixture weights.
65  In this class, NumPdfs() returns the real number of pdf's (i.e. the #leaves
66  of the more detailed tree), and NumPdfGroups() returns the number of groups of
67  pdf's that share the sub-state vectors.
68  We use the index j2 for indexing 0...NumPdfs()-1 [as it's the "2nd level" of the tree],
69  and j1 for indexing 0...NumPdfGroups()-1 [as it's the "1st level" of the tree].
70  The weights are stored as c[j2][m]. There is a mapping Pdf2Group(j2) which returns
71  the corresponding j1 for a given j2, and Group2PdfList(j1) which returns a vector<int32>
72  consisting of the list of j2 indices for that j1.
73 
74  The count quantities we store during the accumulation phase could most simply
75  be stored as gamma[j2][m][i] (where m is the sub-state index), but this is
76  inefficient. Instead we store them separately as gamma1[j1][m][i] and gamma2[j2][m],
77  so each count gets stored in two separate places; this makes the stats more compact.
78 
79  In this implementation, the normalizers n_{jmi} are now stored as n[j1][m][i],
80  without including the log-weight term log c[j2][m]. In the computation of
81  state likelihoods, we first compute the log-prob of the data given each of the
82  sub-state vectors; and we compute the log-sum of this and the posteriors over
83  each of the vectors [treating the weights as 1.0]. Call these
84  "pseudo-posteriors". Then to take into account the contribution of the
85  weights in a state j2, we take the dot product of the weight-vector c[j2][...]
86  with this vector of pseudo-posteriors. The log of this dot-product gets added to the
87  original log-sum.
88 */
89 
90 
97  Sgmm2SplitSubstatesConfig(): split_substates(0),
98  perturb_factor(0.01),
99  power(0.2),
100  max_cond(100.0),
101  min_count(40.0) { }
102  void Register(OptionsItf *opts) {
103  opts->Register("split-substates", &split_substates, "Increase number of "
104  "substates to this overall target.");
105  opts->Register("max-cond-split", &max_cond, "Max condition number of smoothing "
106  "matrix used in substate splitting.");
107  opts->Register("perturb-factor", &perturb_factor, "Perturbation factor for "
108  "state vectors while splitting substates.");
109  opts->Register("power", &power, "Exponent for substate occupancies used while "
110  "splitting substates.");
111  opts->Register("min-count", &min_count, "Minimum allowed count, used in allocating "
112  "sub-states to state in mixture splitting.");
113  }
114 };
115 
116 // Caution: this config is probably not used in most of the setups, we generally do the Gaussian
117 // selection using separate programs
123 
125  full_gmm_nbest = 15;
126  diag_gmm_nbest = 50;
127  }
128 
129  void Register(OptionsItf *opts) {
130  opts->Register("full-gmm-nbest", &full_gmm_nbest, "Number of highest-scoring"
131  " full-covariance Gaussians selected per frame.");
132  opts->Register("diag-gmm-nbest", &diag_gmm_nbest, "Number of highest-scoring"
133  " diagonal-covariance Gaussians selected per frame.");
134  }
135 };
136 
143  std::vector<int32> gselect;
148 
151  void Resize(int32 ngauss, int32 feat_dim, int32 phn_dim) { // resizes but does
152  // not necessarily zero things.
153  if (xt.Dim() != feat_dim) xt.Resize(feat_dim);
154  if (xti.NumRows() != ngauss || xti.NumCols() != feat_dim)
155  xti.Resize(ngauss, feat_dim);
156  if (zti.NumRows() != ngauss || zti.NumCols() != phn_dim)
157  zti.Resize(ngauss, phn_dim);
158  if (nti.Dim() != ngauss)
159  nti.Resize(ngauss);
160  }
161 };
162 
163 class AmSgmm2;
164 
166  // To set this up, call ComputePerSpkDerivedVars from the sgmm object.
167  public:
168  void Clear() {
169  v_s.Resize(0);
170  o_s.Resize(0, 0);
171  b_is.Resize(0);
172  log_b_is.Resize(0);
173  log_d_jms.resize(0);
174  }
175  bool Empty() { return v_s.Dim() == 0; }
176  // caution: after SetSpeakerVector you typically want to
177  // use the function AmSgmm::ComputePerSpkDerivedVars
178  const Vector<BaseFloat> &GetSpeakerVector() { return v_s; }
179 
180  void SetSpeakerVector(const Vector<BaseFloat> &v_s_in) {
181  v_s.Resize(v_s_in.Dim());
182  v_s.CopyFromVec(v_s_in);
183  }
184  protected:
185  friend class AmSgmm2;
186  friend class MleAmSgmm2Accs;
191  std::vector<Vector<BaseFloat> > log_d_jms;
192 };
194 
200  public:
201  // you'll typically initialize with (sgmm.NumGroups(), sgmm.NumPdfs()).
202  Sgmm2LikelihoodCache(int32 num_groups, int32 num_pdfs):
203  substate_cache(num_groups), pdf_cache(num_pdfs), t(1) { }
204 
205  struct SubstateCacheElement { // indexed by j1.
207  // The "likes" and "remaining_log_like" quantities store the
208  // log-like of the data given each substate vector, in a redundant
209  // way, so the likelihood is likes(i) * exp(remaining_log_like).
210  // This is to get around problems with numerical range.
213  int32 t; // used in detecting "freshness."
214  };
215  struct PdfCacheElement { // indexed by j2.
216  PdfCacheElement(): t(0) { }
218  int32 t; // used in detecting "freshness."
219  };
220 
221  void NextFrame(); // increments t.
222  std::vector<SubstateCacheElement> substate_cache; // indexed by j1.
223  std::vector<PdfCacheElement> pdf_cache; // indexed by j2.
225 };
226 
227 
231 class AmSgmm2 {
232  public:
233  AmSgmm2() {}
234  void Read(std::istream &is, bool binary);
235  void Write(std::ostream &os, bool binary,
236  SgmmWriteFlagsType write_params) const;
237 
241  void Check(bool show_properties = true);
242 
247  void InitializeFromFullGmm(const FullGmm &gmm,
248  const std::vector<int32> &pdf2group,
249  int32 phn_subspace_dim,
250  int32 spk_subspace_dim,
251  bool speaker_dependent_weights,
252  BaseFloat self_weight); // self_weight relates to
253  // initialization of the weights. if self_weight == 1.0 it means we
254  // just have 1 sub-state per group, otherwise we have one per pdf,
255  // and each pdf has "self_weight" as its "own" weight.
256 
259  void CopyGlobalsInitVecs(const AmSgmm2 &other,
260  const std::vector<int32> &pdf2group,
261  BaseFloat self_weight);
262 
264  void CopyFromSgmm2(const AmSgmm2 &other,
265  bool copy_normalizers,
266  bool copy_weights); // copy_weights is to copy w_{jmi} [which are
267  // stored, in the symmetric SSGMM.]
268 
272  BaseFloat GaussianSelection(const Sgmm2GselectConfig &config,
273  const VectorBase<BaseFloat> &data,
274  std::vector<int32> *gselect) const;
275 
278  void ComputePerFrameVars(const VectorBase<BaseFloat> &data,
279  const std::vector<int32> &gselect,
280  const Sgmm2PerSpkDerivedVars &spk_vars,
281  Sgmm2PerFrameDerivedVars *per_frame_vars) const;
282 
283 
286  void ComputePerSpkDerivedVars(Sgmm2PerSpkDerivedVars *vars) const;
287 
294  BaseFloat LogLikelihood(const Sgmm2PerFrameDerivedVars &per_frame_vars,
295  int32 j2, // pdf_id
296  Sgmm2LikelihoodCache *cache, // be careful to call NextFrame() when needed!
297  Sgmm2PerSpkDerivedVars *spk_vars,
298  BaseFloat log_prune = 0.0) const;
299 
305  BaseFloat ComponentPosteriors(const Sgmm2PerFrameDerivedVars &per_frame_vars,
306  int32 j2,
307  Sgmm2PerSpkDerivedVars *spk_vars,
308  Matrix<BaseFloat> *post) const;
309 
311  void SplitSubstates(const Vector<BaseFloat> &state_occupancies, // [indexed by pdf-id j2]
312  const Sgmm2SplitSubstatesConfig &config);
313 
317  void IncreasePhoneSpaceDim(int32 target_dim,
318  const Matrix<BaseFloat> &norm_xform);
319 
324  void IncreaseSpkSpaceDim(int32 target_dim,
325  const Matrix<BaseFloat> &norm_xform,
326  bool speaker_dependent_weights);
327 
332  void ComputeDerivedVars();
333 
336  void ComputeNormalizers();
337 
340  void ComputeWeights();
341 
344  void ComputeFmllrPreXform(const Vector<BaseFloat> &pdf_occs,
345  Matrix<BaseFloat> *xform,
346  Matrix<BaseFloat> *inv_xform,
347  Vector<BaseFloat> *diag_mean_scatter) const;
348 
350  int32 NumPdfs() const { return pdf2group_.size(); }
351  int32 NumGroups() const { return group2pdf_.size(); } // relates to SCTM. # pdf groups,
352  // <= NumPdfs().
353  int32 Pdf2Group(int32 j2) const; // relates to SCTM.
355  KALDI_ASSERT(j2 < NumPdfs()); return c_[j2].Dim();
356  }
358  KALDI_ASSERT(j1 < NumGroups()); return v_[j1].NumRows();
359  }
360  int32 NumGauss() const { return M_.size(); }
361  int32 PhoneSpaceDim() const { return w_.NumCols(); }
362  int32 SpkSpaceDim() const { return (N_.size() > 0) ? N_[0].NumCols() : 0; }
363  int32 FeatureDim() const { return M_[0].NumRows(); }
364 
366  bool HasSpeakerDependentWeights() const { return (u_.NumRows() != 0); }
367 
368  bool HasSpeakerSpace() const { return (!N_.empty()); }
369 
370  void RemoveSpeakerSpace() { N_.clear(); u_.Resize(0, 0); w_jmi_.clear(); }
371 
372  // [SSGMM] get the quantity d_{jm}^{(s)} and cache it with
373  // spk vars if necessary. Called in accumulation code.
374  BaseFloat GetDjms(int32 j1, int32 m,
375  Sgmm2PerSpkDerivedVars *spk_vars) const;
376 
378  const FullGmm & full_ubm() const { return full_ubm_; }
379  const DiagGmm & diag_ubm() const { return diag_ubm_; }
380 
381 
383  template<typename Real>
384  void GetInvCovars(int32 gauss_index, SpMatrix<Real> *out) const;
385 
386  template<typename Real>
387  void GetSubstateMean(int32 j1, int32 m, int32 i,
388  VectorBase<Real> *mean_out) const;
389 
390  template<typename Real>
391  void GetNtransSigmaInv(std::vector< Matrix<Real> > *out) const;
392 
393  template<typename Real>
394  void GetSubstateSpeakerMean(int32 j1, int32 substate, int32 gauss,
395  const Sgmm2PerSpkDerivedVars &spk,
396  VectorBase<Real> *mean_out) const;
397 
398  template<typename Real>
399  void GetVarScaledSubstateSpeakerMean(int32 j1, int32 substate,
400  int32 gauss,
401  const Sgmm2PerSpkDerivedVars &spk,
402  VectorBase<Real> *mean_out) const;
403 
405  template<class Real>
406  void ComputeH(std::vector< SpMatrix<Real> > *H_i) const;
407 
408  protected:
409  std::vector<int32> pdf2group_;
410  std::vector<std::vector<int32> > group2pdf_; // the reverse map.
411 
415 
421 
423  std::vector< SpMatrix<BaseFloat> > SigmaInv_;
425  std::vector< Matrix<BaseFloat> > M_;
427  std::vector< Matrix<BaseFloat> > N_;
432 
434 
436  std::vector< Matrix<BaseFloat> > v_;
438  std::vector< Vector<BaseFloat> > c_;
440  std::vector< Matrix<BaseFloat> > n_;
442  std::vector< Matrix<BaseFloat> > w_jmi_;
443 
444  // Priors for MAP adaptation of M -- keeping them here for now but they may
445  // be moved somewhere else eventually
446  // These are parameters of a matrix-variate normal distribution. The means are
447  // the unadapted M_i, and we have 2 separate covaraince matrices for the rows
448  // and columns of M.
449  std::vector< Matrix<BaseFloat> > M_prior_; // Matrix-variate Gaussian mean
452 
453  private:
456  void ComputeGammaI(const Vector<BaseFloat> &state_occupancies,
457  Vector<BaseFloat> *gamma_i) const;
458 
460  void SplitSubstatesInGroup(const Vector<BaseFloat> &pdf_occupancies,
461  const Sgmm2SplitSubstatesConfig &opts,
462  const SpMatrix<BaseFloat> &sqrt_H_sm,
463  int32 j1, int32 M);
464 
466  void ComputeNormalizersInternal(int32 num_threads, int32 thread,
467  int32 *entropy_count, double *entropy_sum);
468 
473  inline void ComponentLogLikes(const Sgmm2PerFrameDerivedVars &per_frame_vars,
474  int32 j1,
475  Sgmm2PerSpkDerivedVars *spk_vars,
476  Matrix<BaseFloat> *loglikes) const;
477 
478 
480  void InitializeMw(int32 phn_subspace_dim,
481  const Matrix<BaseFloat> &norm_xform);
483  void InitializeNu(int32 spk_subspace_dim,
484  const Matrix<BaseFloat> &norm_xform,
485  bool speaker_dependent_weights);
486  void InitializeVecsAndSubstateWeights(BaseFloat self_weight);
487  void InitializeCovars();
488 
489  void ComputeHsmFromModel(
490  const std::vector< SpMatrix<BaseFloat> > &H,
491  const Vector<BaseFloat> &state_occupancies,
492  SpMatrix<BaseFloat> *H_sm,
493  BaseFloat max_cond) const;
494 
495  void ComputePdfMappings(); // sets up group2pdf_ from pdf2group_.
498 
501  friend class Sgmm2Project;
502  friend class EbwAmSgmm2Updater;
503  friend class MleAmSgmm2Accs;
504  friend class MleAmSgmm2Updater;
505  friend class MleSgmm2SpeakerAccs;
506  friend class AmSgmm2Functions; // misc functions that need access.
507  friend class Sgmm2Feature;
508 };
509 
510 template<typename Real>
511 inline void AmSgmm2::GetInvCovars(int32 gauss_index,
512  SpMatrix<Real> *out) const {
513  out->Resize(SigmaInv_[gauss_index].NumRows(), kUndefined);
514  out->CopyFromSp(SigmaInv_[gauss_index]);
515 }
516 
517 
518 template<typename Real>
520  VectorBase<Real> *mean_out) const {
521  KALDI_ASSERT(mean_out != NULL);
522  KALDI_ASSERT(j1 < NumGroups() && m < NumSubstatesForGroup(j1)
523  && i < NumGauss());
524  KALDI_ASSERT(mean_out->Dim() == FeatureDim());
525  Vector<BaseFloat> mean_tmp(FeatureDim());
526  mean_tmp.AddMatVec(1.0, M_[i], kNoTrans, v_[j1].Row(m), 0.0);
527  mean_out->CopyFromVec(mean_tmp);
528 }
529 
530 
531 template<typename Real>
533  const Sgmm2PerSpkDerivedVars &spk,
534  VectorBase<Real> *mean_out) const {
535  GetSubstateMean(j1, m, i, mean_out);
536  if (spk.v_s.Dim() != 0) // have speaker adaptation...
537  mean_out->AddVec(1.0, spk.o_s.Row(i));
538 }
539 
540 template<typename Real>
542  const Sgmm2PerSpkDerivedVars &spk,
543  VectorBase<Real> *mean_out) const {
544  Vector<BaseFloat> tmp_mean(mean_out->Dim()), tmp_mean2(mean_out->Dim());
545  GetSubstateSpeakerMean(j1, m, i, spk, &tmp_mean);
546  tmp_mean2.AddSpVec(1.0, SigmaInv_[i], tmp_mean, 0.0);
547  mean_out->CopyFromVec(tmp_mean2);
548 }
549 
550 
555 
556 
559  // Need gselect info here, since "posteriors" is relative to this set of
560  // selected Gaussians.
561  std::vector<int32> gselect;
562  std::vector<int32> tids; // transition-ids for each entry in "posteriors"
563  std::vector<Matrix<BaseFloat> > posteriors;
564 };
565 
566 
568 class Sgmm2GauPost: public std::vector<Sgmm2GauPostElement> {
569  public:
570  // Add the standard Kaldi Read and Write routines so
571  // we can use KaldiObjectHolder with this type.
572  explicit Sgmm2GauPost(size_t i) : std::vector<Sgmm2GauPostElement>(i) {}
574  void Write(std::ostream &os, bool binary) const;
575  void Read(std::istream &is, bool binary);
576 };
577 
582 
583 } // namespace kaldi
584 
585 
586 #endif // KALDI_SGMM2_AM_SGMM2_H_
This code computes Goodness of Pronunciation (GOP) and extracts phone-level pronunciation feature for...
Definition: chain.dox:20
uint16 SgmmWriteFlagsType
Bitwise OR of the above flags.
Definition: model-common.h:70
Matrix< BaseFloat > u_
[SSGMM] Speaker-subspace weight projection vectors. Dimension is [I][T]
Definition: am-sgmm2.h:431
Class for definition of the subspace Gmm acoustic model.
Definition: am-sgmm2.h:231
Packed symetric matrix class.
Definition: matrix-common.h:62
std::vector< int32 > pdf2group_
Definition: am-sgmm2.h:409
KaldiObjectHolder works for Kaldi objects that have the "standard" Read and Write functions...
Vector< BaseFloat > xt
x&#39;(t), FMLLR-adapted, dim = [D], eq.(33)
Definition: am-sgmm2.h:144
void GetSubstateMean(int32 j1, int32 m, int32 i, VectorBase< Real > *mean_out) const
Definition: am-sgmm2.h:519
bool HasSpeakerSpace() const
Definition: am-sgmm2.h:368
std::vector< Vector< BaseFloat > > c_
c_{jm}, mixture weights. Dimension is [J2][#mix]
Definition: am-sgmm2.h:438
MatrixIndexT NumCols() const
Returns number of columns (or zero for empty matrix).
Definition: kaldi-matrix.h:67
Matrix< BaseFloat > w_
Phonetic-subspace weight projection vectors. Dimension is [I][S].
Definition: am-sgmm2.h:429
Vector< BaseFloat > v_s
Speaker adaptation vector v_^{(s)}. Dim is [T].
Definition: am-sgmm2.h:187
Definition for Gaussian Mixture Model with full covariances.
Definition: full-gmm.h:40
Class for the accumulators required to update the speaker vectors v_s.
const Vector< BaseFloat > & GetSpeakerVector()
Definition: am-sgmm2.h:178
void GetVarScaledSubstateSpeakerMean(int32 j1, int32 substate, int32 gauss, const Sgmm2PerSpkDerivedVars &spk, VectorBase< Real > *mean_out) const
Definition: am-sgmm2.h:541
KaldiObjectHolder< Sgmm2GauPost > Sgmm2GauPostHolder
Definition: am-sgmm2.h:578
void GetInvCovars(int32 gauss_index, SpMatrix< Real > *out) const
Templated accessors (used to accumulate in different precision)
Definition: am-sgmm2.h:511
A templated class for writing objects to an archive or script file; see The Table concept...
Definition: kaldi-table.h:368
kaldi::int32 int32
DiagGmm diag_ubm_
These contain the "background" model associated with the subspace GMM.
Definition: am-sgmm2.h:413
std::vector< Matrix< BaseFloat > > posteriors
Definition: am-sgmm2.h:563
std::vector< Matrix< BaseFloat > > n_
n_{jim}, per-Gaussian normalizer. Dimension is [J1][I][#mix]
Definition: am-sgmm2.h:440
std::vector< Matrix< BaseFloat > > N_
Speaker-subspace projections. Dimension is [I][D][T].
Definition: am-sgmm2.h:427
SpMatrix< BaseFloat > col_cov_inv_
Definition: am-sgmm2.h:451
void Resize(MatrixIndexT length, MatrixResizeType resize_type=kSetZero)
Set vector to a specified size (can be zero).
#define KALDI_DISALLOW_COPY_AND_ASSIGN(type)
Definition: kaldi-utils.h:121
const FullGmm & full_ubm() const
Accessors.
Definition: am-sgmm2.h:378
int32 PhoneSpaceDim() const
Definition: am-sgmm2.h:361
std::vector< Matrix< BaseFloat > > v_
The parameters in a particular SGMM state.
Definition: am-sgmm2.h:436
void CopyFromSp(const SpMatrix< Real > &other)
Definition: sp-matrix.h:85
Sgmm2LikelihoodCache(int32 num_groups, int32 num_pdfs)
Definition: am-sgmm2.h:202
virtual void Register(const std::string &name, bool *ptr, const std::string &doc)=0
Allows random access to a collection of objects in an archive or script file; see The Table concept...
Definition: kaldi-table.h:233
Matrix< BaseFloat > zti
z_{i}(t), dim = [I][S], eq.(35)
Definition: am-sgmm2.h:146
Vector< BaseFloat > b_is
Definition: am-sgmm2.h:189
std::vector< Matrix< BaseFloat > > M_
Phonetic-subspace projections. Dimension is [I][D][S].
Definition: am-sgmm2.h:425
int32 FeatureDim() const
Definition: am-sgmm2.h:363
std::vector< int32 > tids
Definition: am-sgmm2.h:562
void CopyFromVec(const VectorBase< Real > &v)
Copy data from another vector (must match own size).
std::vector< Vector< BaseFloat > > log_d_jms
< [SSGMM] log of the above (more efficient to store both).
Definition: am-sgmm2.h:191
SequentialTableReader< Sgmm2GauPostHolder > SequentialSgmm2GauPostReader
Definition: am-sgmm2.h:580
int32 NumSubstatesForPdf(int32 j2) const
Definition: am-sgmm2.h:354
int32 NumGroups() const
Definition: am-sgmm2.h:351
void GetSubstateSpeakerMean(int32 j1, int32 substate, int32 gauss, const Sgmm2PerSpkDerivedVars &spk, VectorBase< Real > *mean_out) const
Definition: am-sgmm2.h:532
Matrix< BaseFloat > xti
x_{i}(t) = x&#39;(t) - o_i(s): dim = [I][D], eq.(34)
Definition: am-sgmm2.h:145
const SubVector< Real > Row(MatrixIndexT i) const
Return specific row of matrix [const].
Definition: kaldi-matrix.h:188
std::vector< SpMatrix< BaseFloat > > SigmaInv_
Globally shared parameters of the subspace GMM.
Definition: am-sgmm2.h:423
std::vector< Matrix< BaseFloat > > M_prior_
Definition: am-sgmm2.h:449
FullGmm full_ubm_
Definition: am-sgmm2.h:414
Vector< BaseFloat > nti
n_{i}(t), dim = [I], eq.
Definition: am-sgmm2.h:147
A templated class for reading objects sequentially from an archive or script file; see The Table conc...
Definition: kaldi-table.h:287
std::vector< PdfCacheElement > pdf_cache
Definition: am-sgmm2.h:223
This is the entry for a single time.
Definition: am-sgmm2.h:558
int32 NumPdfs() const
Various model dimensions.
Definition: am-sgmm2.h:350
int32 full_gmm_nbest
Number of highest-scoring full-covariance Gaussians per frame.
Definition: am-sgmm2.h:120
void Resize(int32 ngauss, int32 feat_dim, int32 phn_dim)
Definition: am-sgmm2.h:151
indexed by time.
Definition: am-sgmm2.h:568
MatrixIndexT Dim() const
Returns the dimension of the vector.
Definition: kaldi-vector.h:64
int32 NumGauss() const
Definition: am-sgmm2.h:360
std::vector< int32 > gselect
Definition: am-sgmm2.h:143
const DiagGmm & diag_ubm() const
Definition: am-sgmm2.h:379
SpMatrix< BaseFloat > row_cov_inv_
Definition: am-sgmm2.h:450
Sgmm2GauPost(size_t i)
Definition: am-sgmm2.h:572
TableWriter< Sgmm2GauPostHolder > Sgmm2GauPostWriter
Definition: am-sgmm2.h:581
std::vector< int32 > gselect
Definition: am-sgmm2.h:561
int32 diag_gmm_nbest
Number of highest-scoring diagonal-covariance Gaussians per frame.
Definition: am-sgmm2.h:122
std::vector< std::vector< int32 > > group2pdf_
Definition: am-sgmm2.h:410
A class representing a vector.
Definition: kaldi-vector.h:406
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumRows() const
Returns number of rows (or zero for empty matrix).
Definition: kaldi-matrix.h:64
void RemoveSpeakerSpace()
Definition: am-sgmm2.h:370
Definition for Gaussian Mixture Model with diagonal covariances.
Definition: diag-gmm.h:42
Sgmm2LikelihoodCache caches SGMM likelihoods at two levels: the final pdf likelihoods, and the sub-state level likelihoods, which means that with the SCTM system we can avoid redundant computation.
Definition: am-sgmm2.h:199
std::vector< SubstateCacheElement > substate_cache
Definition: am-sgmm2.h:222
void Resize(const MatrixIndexT r, const MatrixIndexT c, MatrixResizeType resize_type=kSetZero, MatrixStrideType stride_type=kDefaultStride)
Sets matrix to a specified size (zero is OK as long as both r and c are zero).
void SetSpeakerVector(const Vector< BaseFloat > &v_s_in)
Definition: am-sgmm2.h:180
void Resize(MatrixIndexT nRows, MatrixResizeType resize_type=kSetZero)
Definition: sp-matrix.h:81
bool HasSpeakerDependentWeights() const
True if doing SSGMM.
Definition: am-sgmm2.h:366
Provides a vector abstraction class.
Definition: kaldi-vector.h:41
void Register(OptionsItf *opts)
Definition: am-sgmm2.h:102
Class for the accumulators associated with the phonetic-subspace model parameters.
int32 SpkSpaceDim() const
Definition: am-sgmm2.h:362
void ComputeFeatureNormalizingTransform(const FullGmm &gmm, Matrix< BaseFloat > *xform)
Computes the inverse of an LDA transform (without dimensionality reduction) The computed transform is...
Definition: am-sgmm2.cc:1297
void AddVec(const Real alpha, const VectorBase< OtherReal > &v)
Add vector : *this = *this + alpha * rv (with casting between floats and doubles) ...
Vector< BaseFloat > log_b_is
< [SSGMM]: Eq. (22) in techreport, b_i^{(s)} = (^T ^{(s)})
Definition: am-sgmm2.h:190
Holds the per-frame precomputed quantities x(t), x_{i}(t), z_{i}(t), and n_{i}(t) (cf...
Definition: am-sgmm2.h:142
int32 NumSubstatesForGroup(int32 j1) const
Definition: am-sgmm2.h:357
void Register(OptionsItf *opts)
Definition: am-sgmm2.h:129
RandomAccessTableReader< Sgmm2GauPostHolder > RandomAccessSgmm2GauPostReader
Definition: am-sgmm2.h:579
std::vector< Matrix< BaseFloat > > w_jmi_
[SSGMM] w_{jmi}, dimension is [J1][#mix][I]. Computed from w_ and v_.
Definition: am-sgmm2.h:442
Matrix< BaseFloat > o_s
Per-speaker offsets o_{i}. Dimension is [I][D].
Definition: am-sgmm2.h:188