Class for the accumulators required to update the speaker vectors v_s. More...

#include <estimate-am-sgmm2.h>

Collaboration diagram for MleSgmm2SpeakerAccs:

Public Member Functions
	MleSgmm2SpeakerAccs (const AmSgmm2 &model, BaseFloat rand_prune_=1.0e-05)
	Initialize the object. Error if speaker subspace not set up. More...

void	Clear ()
	Clear the statistics. More...

BaseFloat	Accumulate (const AmSgmm2 &model, const Sgmm2PerFrameDerivedVars &frame_vars, int32 pdf_index, BaseFloat weight, Sgmm2PerSpkDerivedVars *spk_vars)
	Accumulate statistics. Returns per-frame log-likelihood. More...

BaseFloat	AccumulateFromPosteriors (const AmSgmm2 &model, const Sgmm2PerFrameDerivedVars &frame_vars, const Matrix< BaseFloat > &posteriors, int32 pdf_index, Sgmm2PerSpkDerivedVars *spk_vars)
	Accumulate statistics, given posteriors. More...

void	Update (const AmSgmm2 &model, BaseFloat min_count, Vector< BaseFloat > v_s, BaseFloat objf_impr_out, BaseFloat *count_out)
	Update speaker vector. More...

Private Member Functions
void	UpdateNoU (Vector< BaseFloat > v_s, BaseFloat objf_impr_out, BaseFloat *count_out)

void	UpdateWithU (const AmSgmm2 &model, Vector< BaseFloat > v_s, BaseFloat objf_impr_out, BaseFloat *count_out)

Private Attributes
Vector< double >	y_s_
	Statistics for speaker adaptation (vectors), stored per-speaker. More...

Vector< double >	gamma_s_
	gamma_{i}^{(s)}. Per-speaker counts for each Gaussian. Dimension is [I] More...

Vector< double >	a_s_
	a_i^{(s)}. For SSGMM. More...

std::vector< SpMatrix< double > >	H_spk_
	The following variable does not change per speaker, it just relates to the speaker subspace. More...

std::vector< Matrix< double > >	NtransSigmaInv_
	N_i^T {i}^{-1}. Needed for y^{(s)}. More...

BaseFloat	rand_prune_
	small constant to randomly prune tiny posteriors More...

Detailed Description

Class for the accumulators required to update the speaker vectors v_s.

Note: if you have multiple speakers you will want to initialize this just once and call Clear() after you're done with each speaker, rather than creating a new object for each speaker, since the initialization function does nontrivial work.

Definition at line 354 of file estimate-am-sgmm2.h.

Constructor & Destructor Documentation

◆ MleSgmm2SpeakerAccs()

MleSgmm2SpeakerAccs	(	const AmSgmm2 &	model,
		BaseFloat	rand_prune_ = `1.0e-05`
	)

Initialize the object. Error if speaker subspace not set up.

Definition at line 1713 of file estimate-am-sgmm2.cc.

References MleSgmm2SpeakerAccs::a_s_, MleSgmm2SpeakerAccs::gamma_s_, AmSgmm2::GetNtransSigmaInv(), MleSgmm2SpeakerAccs::H_spk_, AmSgmm2::HasSpeakerDependentWeights(), rnnlm::i, KALDI_ASSERT, kaldi::kTrans, AmSgmm2::N_, MleSgmm2SpeakerAccs::NtransSigmaInv_, AmSgmm2::NumGauss(), Vector< Real >::Resize(), AmSgmm2::SigmaInv_, AmSgmm2::SpkSpaceDim(), and MleSgmm2SpeakerAccs::y_s_.

     : rand_prune_(prune) {
   KALDI_ASSERT(model.SpkSpaceDim() != 0);
   H_spk_.resize(model.NumGauss());
   for (int32 i = 0; i < model.NumGauss(); i++) {
     // Eq. (82): H_{i}^{spk} = N_{i}^T \Sigma_{i}^{-1} N_{i}
     H_spk_[i].Resize(model.SpkSpaceDim());
     H_spk_[i].AddMat2Sp(1.0, Matrix<double>(model.N_[i]),
                         kTrans, SpMatrix<double>(model.SigmaInv_[i]), 0.0);
   }
 
   model.GetNtransSigmaInv(&NtransSigmaInv_);
 
   gamma_s_.Resize(model.NumGauss());
   y_s_.Resize(model.SpkSpaceDim());
   if (model.HasSpeakerDependentWeights())
     a_s_.Resize(model.NumGauss());
 }

Member Function Documentation

◆ Accumulate()

BaseFloat Accumulate	(	const AmSgmm2 &	model,
		const Sgmm2PerFrameDerivedVars &	frame_vars,
		int32	pdf_index,
		BaseFloat	weight,
		Sgmm2PerSpkDerivedVars *	spk_vars
	)

Accumulate statistics. Returns per-frame log-likelihood.

Definition at line 1740 of file estimate-am-sgmm2.cc.

References MleSgmm2SpeakerAccs::AccumulateFromPosteriors(), AmSgmm2::ComponentPosteriors(), and MatrixBase< Real >::Scale().

Referenced by kaldi::AccumulateForUtterance().

                                                                  {
   // Calculate Gaussian posteriors and collect statistics
   Matrix<BaseFloat> posteriors;
   BaseFloat log_like = model.ComponentPosteriors(frame_vars, j2, spk_vars,
                                                  &posteriors);
   posteriors.Scale(weight);
   AccumulateFromPosteriors(model, frame_vars, posteriors, j2, spk_vars);
   return log_like;
 }

◆ AccumulateFromPosteriors()

BaseFloat AccumulateFromPosteriors	(	const AmSgmm2 &	model,
		const Sgmm2PerFrameDerivedVars &	frame_vars,
		const Matrix< BaseFloat > &	posteriors,
		int32	pdf_index,
		Sgmm2PerSpkDerivedVars *	spk_vars
	)

Accumulate statistics, given posteriors.

Returns total count accumulated, which may differ from posteriors.Sum() due to randomized pruning.

Definition at line 1755 of file estimate-am-sgmm2.cc.

References MleSgmm2SpeakerAccs::a_s_, VectorBase< Real >::AddMatVec(), VectorBase< Real >::AddVec(), VectorBase< Real >::Dim(), AmSgmm2::FeatureDim(), MleSgmm2SpeakerAccs::gamma_s_, AmSgmm2::GetDjms(), AmSgmm2::GetSubstateMean(), Sgmm2PerFrameDerivedVars::gselect, rnnlm::i, KALDI_ASSERT, kaldi::kNoTrans, MleSgmm2SpeakerAccs::NtransSigmaInv_, AmSgmm2::NumSubstatesForPdf(), AmSgmm2::Pdf2Group(), MleSgmm2SpeakerAccs::rand_prune_, kaldi::RandPrune(), AmSgmm2::SpkSpaceDim(), AmSgmm2::w_jmi_, Sgmm2PerFrameDerivedVars::xt, and MleSgmm2SpeakerAccs::y_s_.

Referenced by MleSgmm2SpeakerAccs::Accumulate(), and kaldi::AccumulateForUtterance().

                                                                                {
   double tot_count = 0.0;
   int32 feature_dim = model.FeatureDim(),
       spk_space_dim = model.SpkSpaceDim();
   KALDI_ASSERT(spk_space_dim != 0);
   const vector<int32> &gselect = frame_vars.gselect;
 
   // Intermediate variables
   Vector<double> xt_jmi(feature_dim), mu_jmi(feature_dim),
       zt_jmi(spk_space_dim);
   int32 num_substates = model.NumSubstatesForPdf(j2),
       j1 = model.Pdf2Group(j2);
   bool have_spk_dep_weights = (a_s_.Dim() != 0);
 
   for (int32 m = 0; m < num_substates; m++) {
     BaseFloat gammat_jm = 0.0;
     for (int32 ki = 0; ki < static_cast<int32>(gselect.size()); ki++) {
       int32 i = gselect[ki];
       // Eq. (39): gamma_{jmi}(t) = p (j, m, i|t)
       BaseFloat gammat_jmi = RandPrune(posteriors(ki, m), rand_prune_);
       if (gammat_jmi != 0.0) {
         gammat_jm += gammat_jmi;
         tot_count += gammat_jmi;
         model.GetSubstateMean(j1, m, i, &mu_jmi);
         xt_jmi.CopyFromVec(frame_vars.xt);
         xt_jmi.AddVec(-1.0, mu_jmi);
         // Eq. (48): z{jmi}(t) = N_{i}^{T} \Sigma_{i}^{-1} x_{jmi}(t)
         zt_jmi.AddMatVec(1.0, NtransSigmaInv_[i], kNoTrans, xt_jmi, 0.0);
         // Eq. (49): \gamma_{i}^{(s)} = \sum_{t\in\Tau(s), j, m} gamma_{jmi}
         gamma_s_(i) += gammat_jmi;
         // Eq. (50): y^{(s)} = \sum_{t, j, m, i} gamma_{jmi}(t) z_{jmi}(t)
         y_s_.AddVec(gammat_jmi, zt_jmi);
       }
     }
     if (have_spk_dep_weights) {
       KALDI_ASSERT(!model.w_jmi_.empty());
       BaseFloat d_jms = model.GetDjms(j1, m, spk_vars);
       if (d_jms == -1.0) d_jms = 1.0; // Explanation: d_jms is set to -1 when we didn't have
       // speaker vectors in training.  We treat this the same as the speaker vector being
       // zero, and d_jms becomes 1 in this case.
       a_s_.AddVec(gammat_jm/d_jms, model.w_jmi_[j1].Row(m));
     }
   }
   return tot_count;
 }

◆ Clear()

void Clear ( )

Clear the statistics.

Definition at line 1733 of file estimate-am-sgmm2.cc.

References MleSgmm2SpeakerAccs::a_s_, VectorBase< Real >::Dim(), MleSgmm2SpeakerAccs::gamma_s_, VectorBase< Real >::SetZero(), and MleSgmm2SpeakerAccs::y_s_.

Referenced by main().

                                 {
   y_s_.SetZero();
   gamma_s_.SetZero();
   if (a_s_.Dim() != 0) a_s_.SetZero();
 }

◆ Update()

void Update	(	const AmSgmm2 &	model,
		BaseFloat	min_count,
		Vector< BaseFloat > *	v_s,
		BaseFloat *	objf_impr_out,
		BaseFloat *	count_out
	)

Update speaker vector.

If v_s was empty, will assume it started as zero and will resize it to the speaker-subspace size.

Definition at line 1805 of file estimate-am-sgmm2.cc.

References MleSgmm2SpeakerAccs::a_s_, VectorBase< Real >::Dim(), MleSgmm2SpeakerAccs::gamma_s_, KALDI_WARN, VectorBase< Real >::Sum(), MleSgmm2SpeakerAccs::UpdateNoU(), and MleSgmm2SpeakerAccs::UpdateWithU().

Referenced by main().

                                                       {
   double tot_gamma = gamma_s_.Sum();
   if (tot_gamma < min_count) {
     KALDI_WARN << "Updating speaker vectors, count is " << tot_gamma
                << " < " << min_count << "not updating.";
     if (objf_impr_out) *objf_impr_out = 0.0;
     if (count_out) *count_out = 0.0;
     return;
   }
   if (a_s_.Dim() == 0) // No speaker-dependent weights...
     UpdateNoU(v_s, objf_impr_out, count_out);
   else
     UpdateWithU(model, v_s, objf_impr_out, count_out);
 }

◆ UpdateNoU()

void UpdateNoU	(	Vector< BaseFloat > *	v_s,
		BaseFloat *	objf_impr_out,
		BaseFloat *	count_out
	)

private

Definition at line 1826 of file estimate-am-sgmm2.cc.

References SpMatrix< Real >::AddSp(), VectorBase< Real >::CopyFromVec(), VectorBase< Real >::Dim(), MleSgmm2SpeakerAccs::gamma_s_, MleSgmm2SpeakerAccs::H_spk_, rnnlm::i, KALDI_ASSERT, KALDI_LOG, Vector< Real >::Resize(), kaldi::SolveQuadraticProblem(), VectorBase< Real >::Sum(), and MleSgmm2SpeakerAccs::y_s_.

Referenced by MleSgmm2SpeakerAccs::Update().

                                                       {
   double tot_gamma = gamma_s_.Sum();
   KALDI_ASSERT(y_s_.Dim() != 0);
   int32 T = y_s_.Dim();  // speaker-subspace dim.
   int32 num_gauss = gamma_s_.Dim();
   if (v_s->Dim() != T) v_s->Resize(T);  // will set it to zero.
 
   // Eq. (84): H^{(s)} = \sum_{i} \gamma_{i}(s) H_{i}^{spk}
   SpMatrix<double> H_s(T);
 
   for (int32 i = 0; i < num_gauss; i++)
     H_s.AddSp(gamma_s_(i), H_spk_[i]);
 
   // Don't make these options to SolveQuadraticProblem configurable...
   // they really don't make a difference at all unless the matrix in
   // question is singular, which wouldn't happen in this case.
   Vector<double> v_s_dbl(*v_s);
   double tot_objf_impr =
       SolveQuadraticProblem(H_s, y_s_, SolverOptions("v_s"), &v_s_dbl);
 
   v_s->CopyFromVec(v_s_dbl);
 
   KALDI_LOG << "*Objf impr for speaker vector is " << (tot_objf_impr / tot_gamma)
             << " over " << tot_gamma << " frames.";
 
   if (objf_impr_out) *objf_impr_out = tot_objf_impr;
   if (count_out) *count_out = tot_gamma;
 }

◆ UpdateWithU()

void UpdateWithU	(	const AmSgmm2 &	model,
		Vector< BaseFloat > *	v_s,
		BaseFloat *	objf_impr_out,
		BaseFloat *	count_out
	)

private

Definition at line 1858 of file estimate-am-sgmm2.cc.

Referenced by MleSgmm2SpeakerAccs::Update().

                                                            {
   double tot_gamma = gamma_s_.Sum();
   KALDI_ASSERT(y_s_.Dim() != 0);
   int32 T = y_s_.Dim();  // speaker-subspace dim.
   int32 num_gauss = gamma_s_.Dim();
   if (v_s_ptr->Dim() != T) v_s_ptr->Resize(T);  // will set it to zero.
 
   // Eq. (84): H^{(s)} = \sum_{i} \gamma_{i}(s) H_{i}^{spk}
   SpMatrix<double> H_s(T);
 
   for (int32 i = 0; i < num_gauss; i++)
     H_s.AddSp(gamma_s_(i), H_spk_[i]);
 
   Vector<double> v_s(*v_s_ptr);
   int32 num_iters = 5, // don't set this to 1, as we discard last iter.
       num_backtracks = 0,
       max_backtracks = 10;
   Vector<double> auxf(num_iters);
   Matrix<double> v_s_per_iter(num_iters, T);
   // The update for v^{(s)} is the one described in the technical report
   // section 5.1 (eq. 33 and below).
 
   for (int32 iter = 0; iter < num_iters; iter++) { // converges very fast,
     // and each iteration is fast, so don't need to make this configurable.
     v_s_per_iter.Row(iter).CopyFromVec(v_s);
 
     SpMatrix<double> F(H_s); // the 2nd-order quadratic term on this iteration...
     // F^{(p)} in the techerport.
     Vector<double> g(y_s_); // g^{(p)} in the techreport.
     g.AddSpVec(-1.0, H_s, v_s, 1.0);
     Vector<double> log_b_is(num_gauss); // b_i^{(s)}, indexed by i.
     log_b_is.AddMatVec(1.0, Matrix<double>(model.u_), kNoTrans, v_s, 0.0);
     Vector<double> tilde_w_is(log_b_is);
     Vector<double> log_a_s_(a_s_);
     log_a_s_.ApplyLog();
     tilde_w_is.AddVec(1.0, log_a_s_);
     tilde_w_is.Add(-1.0 * tilde_w_is.LogSumExp()); // normalize.
     // currently tilde_w_is is in log form.
     auxf(iter) = VecVec(v_s, y_s_) - 0.5 * VecSpVec(v_s, H_s, v_s)
         + VecVec(gamma_s_, tilde_w_is); // "new" term (weights)
 
     if (iter > 0 && auxf(iter) < auxf(iter-1) &&
         !ApproxEqual(auxf(iter), auxf(iter-1))) { // auxf did not improve.
       // backtrack halfway, and do this iteration again.
       KALDI_WARN << "Backtracking in speaker vector update, on iter "
                  << iter << ", auxfs are " << auxf(iter-1) << " -> "
                  << auxf(iter);
       v_s.Scale(0.5);
       v_s.AddVec(0.5, v_s_per_iter.Row(iter-1));
       if (++num_backtracks >= max_backtracks) {
         KALDI_WARN << "Backtracked " << max_backtracks
                    << " times in speaker-vector update.";
         // backtrack all the way, and terminate:
         v_s_per_iter.Row(num_iters-1).CopyFromVec(v_s_per_iter.Row(iter-1));
         // the following statement ensures we will get
         // the appropriate auxiliary-function.
         auxf(num_iters-1) = auxf(iter-1);
         break;
       }
       iter--;
     }
     tilde_w_is.ApplyExp();
     for (int32 i = 0; i < num_gauss; i++) {
       g.AddVec(gamma_s_(i) - tot_gamma * tilde_w_is(i), model.u_.Row(i));
       F.AddVec2(tot_gamma * tilde_w_is(i), model.u_.Row(i));
     }
     Vector<double> delta(v_s.Dim());
     SolveQuadraticProblem(F, g, SolverOptions("v_s"), &delta);
     v_s.AddVec(1.0, delta);
   }
   // so that we only accept things where the auxf has been checked, we
   // actually take the penultimate speaker-vector. --> don't set
   // num-iters = 1.
   v_s_ptr->CopyFromVec(v_s_per_iter.Row(num_iters-1));
 
   double auxf_change = auxf(num_iters-1) - auxf(0);
   KALDI_LOG << "*Objf impr for speaker vector is " << (auxf_change / tot_gamma)
             << " per frame, over " << tot_gamma << " frames.";
 
   if (objf_impr_out) *objf_impr_out = auxf_change;
   if (count_out) *count_out = tot_gamma;
 }

Member Data Documentation

◆ a_s_

Vector<double> a_s_

private

a_i^{(s)}. For SSGMM.

Definition at line 406 of file estimate-am-sgmm2.h.

Referenced by MleSgmm2SpeakerAccs::AccumulateFromPosteriors(), MleSgmm2SpeakerAccs::Clear(), MleSgmm2SpeakerAccs::MleSgmm2SpeakerAccs(), MleSgmm2SpeakerAccs::Update(), and MleSgmm2SpeakerAccs::UpdateWithU().

◆ gamma_s_

Vector<double> gamma_s_

private

gamma_{i}^{(s)}. Per-speaker counts for each Gaussian. Dimension is [I]

Definition at line 404 of file estimate-am-sgmm2.h.

Referenced by MleSgmm2SpeakerAccs::AccumulateFromPosteriors(), MleSgmm2SpeakerAccs::Clear(), MleSgmm2SpeakerAccs::MleSgmm2SpeakerAccs(), MleSgmm2SpeakerAccs::Update(), MleSgmm2SpeakerAccs::UpdateNoU(), MleSgmm2SpeakerAccs::UpdateWithU(), and MleAmSgmm2Accs::~MleAmSgmm2Accs().

◆ H_spk_

std::vector< SpMatrix<double> > H_spk_

private

The following variable does not change per speaker, it just relates to the speaker subspace.

Eq. (82): H_{i}^{spk} = N_{i}^T {i}^{-1} N_{i}

Definition at line 411 of file estimate-am-sgmm2.h.

Referenced by MleSgmm2SpeakerAccs::MleSgmm2SpeakerAccs(), MleSgmm2SpeakerAccs::UpdateNoU(), and MleSgmm2SpeakerAccs::UpdateWithU().

◆ NtransSigmaInv_

std::vector< Matrix<double> > NtransSigmaInv_

private

N_i^T {i}^{-1}. Needed for y^{(s)}.

Definition at line 414 of file estimate-am-sgmm2.h.

Referenced by MleSgmm2SpeakerAccs::AccumulateFromPosteriors(), and MleSgmm2SpeakerAccs::MleSgmm2SpeakerAccs().

◆ rand_prune_

BaseFloat rand_prune_

private

small constant to randomly prune tiny posteriors

Definition at line 417 of file estimate-am-sgmm2.h.

Referenced by MleSgmm2SpeakerAccs::AccumulateFromPosteriors().

◆ y_s_

Vector<double> y_s_

private

Statistics for speaker adaptation (vectors), stored per-speaker.

Per-speaker stats for vectors, y^{(s)}. Dimension [T].

Definition at line 402 of file estimate-am-sgmm2.h.

Referenced by MleSgmm2SpeakerAccs::AccumulateFromPosteriors(), MleSgmm2SpeakerAccs::Clear(), MleSgmm2SpeakerAccs::MleSgmm2SpeakerAccs(), MleSgmm2SpeakerAccs::UpdateNoU(), and MleSgmm2SpeakerAccs::UpdateWithU().

The documentation for this class was generated from the following files:

sgmm2/estimate-am-sgmm2.h
sgmm2/estimate-am-sgmm2.cc

Public Member Functions

Private Member Functions

Private Attributes

Detailed Description

Constructor & Destructor Documentation

◆ MleSgmm2SpeakerAccs()

Member Function Documentation

◆ Accumulate()

◆ AccumulateFromPosteriors()

◆ Clear()

◆ Update()

◆ UpdateNoU()

◆ UpdateWithU()

Member Data Documentation

◆ a_s_

◆ gamma_s_

◆ H_spk_

◆ NtransSigmaInv_

◆ rand_prune_

◆ y_s_