Definition for Gaussian Mixture Model with diagonal covariances. More...

#include <diag-gmm.h>

Collaboration diagram for DiagGmm:

[legend]

Public Member Functions
	DiagGmm ()
	Empty constructor. More...

	DiagGmm (const DiagGmm &gmm)

	DiagGmm (const GaussClusterable &gc, BaseFloat var_floor)
	Initializer from GaussClusterable initializes the DiagGmm as a single Gaussian from tree stats. More...

void	CopyFromNormal (const DiagGmmNormal &diag_gmm_normal)
	Copies from DiagGmmNormal; does not resize. More...

	DiagGmm (int32 nMix, int32 dim)

	DiagGmm (const std::vector< std::pair< BaseFloat, const DiagGmm *> > &gmms)
	Constructor that allows us to merge GMMs with weights. More...

void	Resize (int32 nMix, int32 dim)
	Resizes arrays to this dim. Does not initialize data. More...

int32	NumGauss () const
	Returns the number of mixture components in the GMM. More...

int32	Dim () const
	Returns the dimensionality of the Gaussian mean vectors. More...

void	CopyFromDiagGmm (const DiagGmm &diaggmm)
	Copies from given DiagGmm. More...

void	CopyFromFullGmm (const FullGmm &fullgmm)
	Copies from given FullGmm. More...

BaseFloat	LogLikelihood (const VectorBase< BaseFloat > &data) const
	Returns the log-likelihood of a data point (vector) given the GMM. More...

void	LogLikelihoods (const VectorBase< BaseFloat > &data, Vector< BaseFloat > *loglikes) const
	Outputs the per-component log-likelihoods. More...

void	LogLikelihoods (const MatrixBase< BaseFloat > &data, Matrix< BaseFloat > *loglikes) const
	This version of the LogLikelihoods function operates on a sequence of frames simultaneously; the row index of both "data" and "loglikes" is the frame index. More...

void	LogLikelihoodsPreselect (const VectorBase< BaseFloat > &data, const std::vector< int32 > &indices, Vector< BaseFloat > *loglikes) const
	Outputs the per-component log-likelihoods of a subset of mixture components. More...

BaseFloat	GaussianSelection (const VectorBase< BaseFloat > &data, int32 num_gselect, std::vector< int32 > *output) const
	Get gaussian selection information for one frame. More...

BaseFloat	GaussianSelection (const MatrixBase< BaseFloat > &data, int32 num_gselect, std::vector< std::vector< int32 > > *output) const
	This version of the Gaussian selection function works for a sequence of frames rather than just a single frame. More...

BaseFloat	GaussianSelectionPreselect (const VectorBase< BaseFloat > &data, const std::vector< int32 > &preselect, int32 num_gselect, std::vector< int32 > *output) const
	Get gaussian selection information for one frame. More...

BaseFloat	ComponentPosteriors (const VectorBase< BaseFloat > &data, Vector< BaseFloat > *posteriors) const
	Computes the posterior probabilities of all Gaussian components given a data point. More...

BaseFloat	ComponentLogLikelihood (const VectorBase< BaseFloat > &data, int32 comp_id) const
	Computes the log-likelihood of a data point given a single Gaussian component. More...

int32	ComputeGconsts ()
	Sets the gconsts. More...

void	Generate (VectorBase< BaseFloat > *output)
	Generates a random data-point from this distribution. More...

void	Split (int32 target_components, float perturb_factor, std::vector< int32 > *history=NULL)
	Split the components and remember the order in which the components were split. More...

void	Perturb (float perturb_factor)
	Perturbs the component means with a random vector multiplied by the pertrub factor. More...

void	Merge (int32 target_components, std::vector< int32 > *history=NULL)
	Merge the components and remember the order in which the components were merged (flat list of pairs) More...

void	MergeKmeans (int32 target_components, ClusterKMeansOptions cfg=ClusterKMeansOptions())

void	Write (std::ostream &os, bool binary) const

void	Read (std::istream &in, bool binary)

void	Interpolate (BaseFloat rho, const DiagGmm &source, GmmFlagsType flags=kGmmAll)
	this = rho x source + (1-rho) x this More...

void	Interpolate (BaseFloat rho, const FullGmm &source, GmmFlagsType flags=kGmmAll)
	this = rho x source + (1-rho) x this More...

const Vector< BaseFloat > &	gconsts () const
	Const accessors. More...

const Vector< BaseFloat > &	weights () const

const Matrix< BaseFloat > &	means_invvars () const

const Matrix< BaseFloat > &	inv_vars () const

bool	valid_gconsts () const

void	RemoveComponent (int32 gauss, bool renorm_weights)
	Removes single component from model. More...

void	RemoveComponents (const std::vector< int32 > &gauss, bool renorm_weights)
	Removes multiple components from model; "gauss" must not have dups. More...

template<class Real >
void	SetWeights (const VectorBase< Real > &w)
	Mutators for both float or double. More...

template<class Real >
void	SetMeans (const MatrixBase< Real > &m)
	Use SetMeans to update only the Gaussian means (and not variances) More...

template<class Real >
void	SetInvVarsAndMeans (const MatrixBase< Real > &invvars, const MatrixBase< Real > &means)
	Use SetInvVarsAndMeans if updating both means and (inverse) variances. More...

template<class Real >
void	SetInvVars (const MatrixBase< Real > &v)
	Set the (inverse) variances and recompute means_invvars_. More...

template<class Real >
void	GetVars (Matrix< Real > *v) const
	Accessor for covariances. More...

template<class Real >
void	GetMeans (Matrix< Real > *m) const
	Accessor for means. More...

template<class Real >
void	SetComponentMean (int32 gauss, const VectorBase< Real > &in)
	Mutators for single component, supports float or double Set mean for a single component - internally multiplies with inv(var) More...

template<class Real >
void	SetComponentInvVar (int32 gauss, const VectorBase< Real > &in)
	Set inv-var for single component (recommend to do this before setting the mean, if doing both, for numerical reasons). More...

void	SetComponentWeight (int32 gauss, BaseFloat weight)
	Set weight for single component. More...

template<class Real >
void	GetComponentMean (int32 gauss, VectorBase< Real > *out) const
	Accessor for single component mean. More...

template<class Real >
void	GetComponentVariance (int32 gauss, VectorBase< Real > *out) const
	Accessor for single component variance. More...

Private Member Functions
BaseFloat	merged_components_logdet (BaseFloat w1, BaseFloat w2, const VectorBase< BaseFloat > &f1, const VectorBase< BaseFloat > &f2, const VectorBase< BaseFloat > &s1, const VectorBase< BaseFloat > &s2) const

const DiagGmm &	operator= (const DiagGmm &other)

Private Attributes
Vector< BaseFloat >	gconsts_
	Equals log(weight) - 0.5 * (log det(var) + meanmeaninv(var)) More...

bool	valid_gconsts_
	Recompute gconsts_ if false. More...

Vector< BaseFloat >	weights_
	weights (not log). More...

Matrix< BaseFloat >	inv_vars_
	Inverted (diagonal) variances. More...

Matrix< BaseFloat >	means_invvars_
	Means times inverted variance. More...

Friends
class	DiagGmmNormal
	this makes it a little easier to modify the internals More...

Detailed Description

Definition for Gaussian Mixture Model with diagonal covariances.

Definition at line 42 of file diag-gmm.h.

Constructor & Destructor Documentation

◆ DiagGmm() [1/5]

DiagGmm ( )

inline

Empty constructor.

Definition at line 48 of file diag-gmm.h.

Referenced by DiagGmm::DiagGmm(), and DiagGmm::Split().

48 : valid_gconsts_(false) { }

kaldi::DiagGmm::valid_gconsts_

bool valid_gconsts_

Recompute gconsts_ if false.

Definition: diag-gmm.h:233

◆ DiagGmm() [2/5]

DiagGmm ( const DiagGmm & gmm )

inlineexplicit

Definition at line 50 of file diag-gmm.h.

References DiagGmm::CopyFromDiagGmm(), DiagGmm::CopyFromNormal(), and DiagGmm::DiagGmm().

                                       : valid_gconsts_(false) {
     CopyFromDiagGmm(gmm);
   }

◆ DiagGmm() [3/5]

DiagGmm	(	const GaussClusterable &	gc,
		BaseFloat	var_floor
	)

Initializer from GaussClusterable initializes the DiagGmm as a single Gaussian from tree stats.

Definition at line 944 of file diag-gmm.cc.

References DiagGmm::ComputeGconsts(), count, GaussClusterable::count(), KALDI_ASSERT, DiagGmm::Resize(), MatrixBase< Real >::Row(), DiagGmm::SetInvVarsAndMeans(), DiagGmm::SetWeights(), DiagGmm::weights(), GaussClusterable::x2_stats(), and GaussClusterable::x_stats().

                                      : valid_gconsts_(false) {
   Vector<BaseFloat> x (gc.x_stats());
   Vector<BaseFloat> x2 (gc.x2_stats());
   BaseFloat count =  gc.count();
   KALDI_ASSERT(count > 0.0);
   this->Resize(1, x.Dim());
   x.Scale(1.0/count);
   x2.Scale(1.0/count);
   x2.AddVec2(-1.0, x);  // subtract mean^2.
   x2.ApplyFloor(var_floor);
   x2.InvertElements();  // get inv-var.
   KALDI_ASSERT(x2.Min() > 0);
   Matrix<BaseFloat> mean(1, x.Dim());
   mean.Row(0).CopyFromVec(x);
   Matrix<BaseFloat> inv_var(1, x.Dim());
   inv_var.Row(0).CopyFromVec(x2);
   this->SetInvVarsAndMeans(inv_var, mean);
   Vector<BaseFloat> weights(1);
   weights(0) = 1.0;
   this->SetWeights(weights);
   this->ComputeGconsts();
 }

◆ DiagGmm() [4/5]

DiagGmm	(	int32	nMix,
		int32	dim
	)

inline

Definition at line 61 of file diag-gmm.h.

References DiagGmm::DiagGmm(), and DiagGmm::Resize().

61 : valid_gconsts_(false) { Resize(nMix, dim); }

kaldi::DiagGmm::Resize

void Resize(int32 nMix, int32 dim)

Resizes arrays to this dim. Does not initialize data.

Definition: diag-gmm.cc:66

kaldi::DiagGmm::valid_gconsts_

bool valid_gconsts_

Recompute gconsts_ if false.

Definition: diag-gmm.h:233

◆ DiagGmm() [5/5]

DiagGmm ( const std::vector< std::pair< BaseFloat, const DiagGmm *> > & gmms )

explicit

Constructor that allows us to merge GMMs with weights.

Weights must sum to one, or this GMM will not be properly normalized (we don't check this). Weights must be positive (we check this).

Definition at line 39 of file diag-gmm.cc.

References DiagGmm::ComputeGconsts(), rnnlm::i, DiagGmm::inv_vars(), DiagGmm::inv_vars_, KALDI_ASSERT, DiagGmm::means_invvars(), DiagGmm::means_invvars_, DiagGmm::NumGauss(), DiagGmm::Resize(), MatrixBase< Real >::Row(), DiagGmm::weights(), and DiagGmm::weights_.

   : valid_gconsts_(false) {
   if (gmms.empty()) {
     return;  // GMM will be empty.
   } else {
     int32 num_gauss = 0, dim = gmms[0].second->Dim();
     for (size_t i = 0; i < gmms.size(); i++)
       num_gauss += gmms[i].second->NumGauss();
     Resize(num_gauss, dim);
     int32 cur_gauss = 0;
     for (size_t i = 0; i < gmms.size(); i++) {
       BaseFloat weight = gmms[i].first;
       KALDI_ASSERT(weight > 0.0);
       const DiagGmm &gmm = *(gmms[i].second);
       for (int32 g = 0; g < gmm.NumGauss(); g++, cur_gauss++) {
         means_invvars_.Row(cur_gauss).CopyFromVec(gmm.means_invvars().Row(g));
         inv_vars_.Row(cur_gauss).CopyFromVec(gmm.inv_vars().Row(g));
         weights_(cur_gauss) = weight * gmm.weights()(g);
       }
     }
     KALDI_ASSERT(cur_gauss == NumGauss());
     ComputeGconsts();
   }
 }

Member Function Documentation

◆ ComponentLogLikelihood()

BaseFloat ComponentLogLikelihood	(	const VectorBase< BaseFloat > &	data,
		int32	comp_id
	)		const

Computes the log-likelihood of a data point given a single Gaussian component.

NOTE: Currently we make no guarantees about what happens if one of the variances is zero.

Definition at line 497 of file diag-gmm.cc.

References VectorBase< Real >::ApplyPow(), VectorBase< Real >::Dim(), DiagGmm::Dim(), DiagGmm::gconsts_, DiagGmm::inv_vars_, KALDI_ERR, DiagGmm::means_invvars_, MatrixBase< Real >::Row(), DiagGmm::valid_gconsts_, and kaldi::VecVec().

Referenced by DiagGmm::Dim().

                                                                {
   if (!valid_gconsts_)
     KALDI_ERR << "Must call ComputeGconsts() before computing likelihood";
   if (static_cast<int32>(data.Dim()) != Dim()) {
     KALDI_ERR << "DiagGmm::ComponentLogLikelihood, dimension "
         << "mismatch " << (data.Dim()) << " vs. "<< (Dim());
   }
   BaseFloat loglike;
   Vector<BaseFloat> data_sq(data);
   data_sq.ApplyPow(2.0);
 
   // loglike =  means * inv(vars) * data.
   loglike = VecVec(means_invvars_.Row(comp_id), data);
   // loglike += -0.5 * inv(vars) * data_sq.
   loglike -= 0.5 * VecVec(inv_vars_.Row(comp_id), data_sq);
   return loglike + gconsts_(comp_id);
 }

◆ ComponentPosteriors()

BaseFloat ComponentPosteriors	(	const VectorBase< BaseFloat > &	data,
		Vector< BaseFloat > *	posteriors
	)		const

Computes the posterior probabilities of all Gaussian components given a data point.

Returns the log-likehood of the data given the GMM.

Definition at line 601 of file diag-gmm.cc.

References VectorBase< Real >::ApplySoftMax(), VectorBase< Real >::CopyFromVec(), VectorBase< Real >::Dim(), KALDI_ERR, KALDI_ISINF, KALDI_ISNAN, DiagGmm::LogLikelihoods(), Vector< Real >::Resize(), and DiagGmm::valid_gconsts_.

Referenced by FmllrDiagGmmAccs::AccumulateForGmm(), FmllrRawAccs::AccumulateForGmm(), RegtreeMllrDiagGmmAccs::AccumulateForGmm(), RegtreeFmllrDiagGmmAccs::AccumulateForGmm(), AccumFullGmm::AccumulateFromDiag(), AccumDiagGmm::AccumulateFromDiag(), MlltAccs::AccumulateFromGmm(), kaldi::ComputeAmGmmFeatureDeriv(), DiagGmm::Dim(), kaldi::GetFeatDeriv(), SingleUtteranceGmmDecoder::GetGaussianPosteriors(), main(), TestComponentAcc(), kaldi::UnitTestDiagGmm(), kaldi::UnitTestFmllrDiagGmm(), kaldi::UnitTestFmllrDiagGmmDiagonal(), and kaldi::UnitTestFmllrDiagGmmOffset().

                                                                            {
   if (!valid_gconsts_)
     KALDI_ERR << "Must call ComputeGconsts() before computing likelihood";
   if (posterior == NULL) KALDI_ERR << "NULL pointer passed as return argument.";
   Vector<BaseFloat> loglikes;
   LogLikelihoods(data, &loglikes);
   BaseFloat log_sum = loglikes.ApplySoftMax();
   if (KALDI_ISNAN(log_sum) || KALDI_ISINF(log_sum))
     KALDI_ERR << "Invalid answer (overflow or invalid variances/features?)";
   if (posterior->Dim() != loglikes.Dim())
     posterior->Resize(loglikes.Dim());
   posterior->CopyFromVec(loglikes);
   return log_sum;
 }

◆ ComputeGconsts()

int32 ComputeGconsts ( )

Sets the gconsts.

Returns the number that are "invalid" e.g. because of zero weights or variances.

Definition at line 114 of file diag-gmm.cc.

References rnnlm::d, DiagGmm::Dim(), DiagGmm::gconsts_, DiagGmm::inv_vars_, KALDI_ASSERT, KALDI_ERR, KALDI_ISINF, KALDI_ISNAN, kaldi::Log(), M_LOG_2PI, DiagGmm::means_invvars_, DiagGmm::NumGauss(), DiagGmm::valid_gconsts_, and DiagGmm::weights_.

Referenced by Sgmm2Project::ApplyProjection(), DiagGmm::CopyFromFullGmm(), DiagGmm::DiagGmm(), DiagGmm::Dim(), init_rand_diag_gmm(), kaldi::InitGmmFromRandomFrames(), kaldi::unittest::InitRandDiagGmm(), kaldi::InitRandomGmm(), DiagGmm::Interpolate(), main(), kaldi::MapDiagGmmUpdate(), DiagGmm::Merge(), DiagGmm::MergeKmeans(), kaldi::MleDiagGmmUpdate(), DiagGmm::Perturb(), rand_diag_gmm(), DiagGmm::Read(), kaldi::ResizeModel(), DiagGmm::Split(), test_flags_driven_update(), TestXformMean(), kaldi::UnitTestDiagGmm(), UnitTestEstimateDiagGmm(), kaldi::UnitTestEstimateMmieDiagGmm(), kaldi::UnitTestRegtreeFmllrDiagGmm(), kaldi::UpdateEbwDiagGmm(), and kaldi::UpdateEbwWeightsDiagGmm().

                               {
   int32 num_mix = NumGauss();
   int32 dim = Dim();
   BaseFloat offset = -0.5 * M_LOG_2PI * dim;  // constant term in gconst.
   int32 num_bad = 0;
 
   // Resize if Gaussians have been removed during Update()
   if (num_mix != static_cast<int32>(gconsts_.Dim()))
     gconsts_.Resize(num_mix);
 
   for (int32 mix = 0; mix < num_mix; mix++) {
     KALDI_ASSERT(weights_(mix) >= 0);  // Cannot have negative weights.
     BaseFloat gc = Log(weights_(mix)) + offset;  // May be -inf if weights == 0
     for (int32 d = 0; d < dim; d++) {
       gc += 0.5 * Log(inv_vars_(mix, d)) - 0.5 * means_invvars_(mix, d)
         * means_invvars_(mix, d) / inv_vars_(mix, d);
     }
     // Change sign for logdet because var is inverted. Also, note that
     // mean_invvars(mix, d)*mean_invvars(mix, d)/inv_vars(mix, d) is the
     // mean-squared times inverse variance, since mean_invvars(mix, d) contains
     // the mean times inverse variance.
     // So gc is the likelihood at zero feature value.
 
     if (KALDI_ISNAN(gc)) {  // negative infinity is OK but NaN is not acceptable
       KALDI_ERR << "At component "  << mix
                 << ", not a number in gconst computation";
     }
     if (KALDI_ISINF(gc)) {
       num_bad++;
       // If positive infinity, make it negative infinity.
       // Want to make sure the answer becomes -inf in the end, not NaN.
       if (gc > 0) gc = -gc;
     }
     gconsts_(mix) = gc;
   }
 
   valid_gconsts_ = true;
   return num_bad;
 }

◆ CopyFromDiagGmm()

void CopyFromDiagGmm ( const DiagGmm & diaggmm )

Copies from given DiagGmm.

Definition at line 83 of file diag-gmm.cc.

References MatrixBase< Real >::CopyFromMat(), DiagGmm::gconsts_, DiagGmm::inv_vars_, DiagGmm::means_invvars_, MatrixBase< Real >::NumCols(), DiagGmm::Resize(), DiagGmm::valid_gconsts_, and DiagGmm::weights_.

Referenced by AmDiagGmm::AddPdf(), kaldi::ClusterGaussiansToUbm(), DiagGmm::DiagGmm(), DiagGmm::Dim(), DiagGmm::Split(), test_flags_driven_update(), test_io(), TestXformMean(), kaldi::UnitTestDiagGmm(), and UnitTestRegressionTree().

                                                     {
   Resize(diaggmm.weights_.Dim(), diaggmm.means_invvars_.NumCols());
   gconsts_.CopyFromVec(diaggmm.gconsts_);
   weights_.CopyFromVec(diaggmm.weights_);
   inv_vars_.CopyFromMat(diaggmm.inv_vars_);
   means_invvars_.CopyFromMat(diaggmm.means_invvars_);
   valid_gconsts_ = diaggmm.valid_gconsts_;
 }

◆ CopyFromFullGmm()

void CopyFromFullGmm ( const FullGmm & fullgmm )

Copies from given FullGmm.

Definition at line 92 of file diag-gmm.cc.

References DiagGmm::ComputeGconsts(), VectorBase< Real >::CopyDiagFromPacked(), MatrixBase< Real >::CopyFromMat(), SpMatrix< Real >::CopyFromSp(), kaldi::diag, FullGmm::Dim(), FullGmm::gconsts(), DiagGmm::gconsts_, FullGmm::GetMeans(), FullGmm::inv_covars(), DiagGmm::inv_vars_, SpMatrix< Real >::Invert(), VectorBase< Real >::InvertElements(), DiagGmm::means_invvars_, MatrixBase< Real >::MulElements(), FullGmm::NumGauss(), DiagGmm::NumGauss(), DiagGmm::Resize(), MatrixBase< Real >::Row(), FullGmm::weights(), and DiagGmm::weights_.

Referenced by Sgmm2Project::ApplyProjection(), DiagGmm::Dim(), main(), and UnitTestFullGmm().

                                                     {
   int32 num_comp = fullgmm.NumGauss(), dim = fullgmm.Dim();
   Resize(num_comp, dim);
   gconsts_.CopyFromVec(fullgmm.gconsts());
   weights_.CopyFromVec(fullgmm.weights());
   Matrix<BaseFloat> means(num_comp, dim);
   fullgmm.GetMeans(&means);
   int32 ncomp = NumGauss();
   for (int32 mix = 0; mix < ncomp; mix++) {
     SpMatrix<double> covar(dim);
     covar.CopyFromSp(fullgmm.inv_covars()[mix]);
     covar.Invert();
     Vector<double> diag(dim);
     diag.CopyDiagFromPacked(covar);
     diag.InvertElements();
     inv_vars_.Row(mix).CopyFromVec(diag);
   }
   means_invvars_.CopyFromMat(means);
   means_invvars_.MulElements(inv_vars_);
   ComputeGconsts();
 }

◆ CopyFromNormal()

void CopyFromNormal ( const DiagGmmNormal & diag_gmm_normal )

Copies from DiagGmmNormal; does not resize.

Definition at line 918 of file diag-gmm.cc.

References DiagGmmNormal::CopyToDiagGmm().

Referenced by DiagGmm::DiagGmm(), and kaldi::InitGmmFromRandomFrames().

                                                                  {
   diag_gmm_normal.CopyToDiagGmm(this);
 }

◆ Dim()

int32 Dim ( ) const

inline

Returns the dimensionality of the Gaussian mean vectors.

Definition at line 74 of file diag-gmm.h.

References DiagGmm::ComponentLogLikelihood(), DiagGmm::ComponentPosteriors(), DiagGmm::ComputeGconsts(), DiagGmm::CopyFromDiagGmm(), DiagGmm::CopyFromFullGmm(), DiagGmm::GaussianSelection(), DiagGmm::GaussianSelectionPreselect(), DiagGmm::Generate(), DiagGmm::Interpolate(), kaldi::kGmmAll, DiagGmm::LogLikelihood(), DiagGmm::LogLikelihoods(), DiagGmm::LogLikelihoodsPreselect(), DiagGmm::means_invvars_, DiagGmm::Merge(), DiagGmm::MergeKmeans(), MatrixBase< Real >::NumCols(), DiagGmm::Perturb(), DiagGmm::Read(), DiagGmm::Split(), and DiagGmm::Write().

Referenced by AccumFullGmm::AccumulateFromDiag(), AccumDiagGmm::AccumulateFromDiag(), MlltAccs::AccumulateFromPosteriors(), AmDiagGmm::AddPdf(), OnlineIvectorExtractionInfo::Check(), DiagGmm::ComponentLogLikelihood(), Fmpe::ComputeC(), DiagGmm::ComputeGconsts(), DiagGmmNormal::CopyFromDiagGmm(), FullGmm::CopyFromDiagGmm(), DiagGmmNormal::CopyToDiagGmm(), kaldi::DiagGmmToStats(), kaldi::DoRescalingUpdate(), FmllrDiagGmmAccs::FmllrDiagGmmAccs(), DiagGmm::Generate(), DiagGmm::GetComponentMean(), DiagGmm::GetComponentVariance(), DiagGmm::GetMeans(), kaldi::GetStatsDerivative(), DiagGmm::GetVars(), init_rand_diag_gmm(), DiagGmm::Interpolate(), DiagGmm::LogLikelihoods(), DiagGmm::LogLikelihoodsPreselect(), DecodableAmDiagGmmRegtreeFmllr::LogLikelihoodZeroBased(), DecodableAmDiagGmmUnmapped::LogLikelihoodZeroBased(), DecodableAmDiagGmmRegtreeMllr::LogLikelihoodZeroBased(), main(), kaldi::MapDiagGmmUpdate(), DiagGmm::Merge(), DiagGmm::MergeKmeans(), kaldi::MleDiagGmmUpdate(), DiagGmm::Perturb(), AccumDiagGmm::Resize(), DiagGmm::SetComponentInvVar(), DiagGmm::SetComponentMean(), DiagGmm::SetInvVars(), AccumDiagGmm::SmoothWithModel(), DiagGmm::Split(), test_flags_driven_update(), TestComponentAcc(), kaldi::UnitTestDiagGmm(), kaldi::UnitTestDiagGmmGenerate(), UnitTestEstimateDiagGmm(), kaldi::UnitTestEstimateMmieDiagGmm(), kaldi::UnitTestFmllrDiagGmm(), kaldi::UnitTestFmllrDiagGmmDiagonal(), kaldi::UnitTestFmllrDiagGmmOffset(), kaldi::UnitTestFmllrRaw(), and kaldi::UpdateEbwDiagGmm().

74 { return means_invvars_.NumCols(); }

kaldi::MatrixBase::NumCols

MatrixIndexT NumCols() const

Returns number of columns (or zero for empty matrix).

Definition: kaldi-matrix.h:67

kaldi::DiagGmm::means_invvars_

Matrix< BaseFloat > means_invvars_

Means times inverted variance.

Definition: diag-gmm.h:236

◆ GaussianSelection() [1/2]

BaseFloat GaussianSelection	(	const VectorBase< BaseFloat > &	data,
		int32	num_gselect,
		std::vector< int32 > *	output
	)		const

Get gaussian selection information for one frame.

Returns log-like this frame. Output is the best "num_gselect" indices, sorted from best to worst likelihood. If "num_gselect" > NumGauss(), sets it to NumGauss().

Definition at line 765 of file diag-gmm.cc.

References VectorBase< Real >::Data(), rnnlm::j, KALDI_ASSERT, kaldi::kUndefined, kaldi::LogAdd(), DiagGmm::LogLikelihoods(), and DiagGmm::NumGauss().

Referenced by DiagGmm::Dim(), DiagGmm::GaussianSelection(), and main().

                                                                      {
   int32 num_gauss = NumGauss();
   Vector<BaseFloat> loglikes(num_gauss, kUndefined);
   output->clear();
   this->LogLikelihoods(data, &loglikes);
 
   BaseFloat thresh;
   if (num_gselect < num_gauss) {
     Vector<BaseFloat> loglikes_copy(loglikes);
     BaseFloat *ptr = loglikes_copy.Data();
     std::nth_element(ptr, ptr+num_gauss-num_gselect, ptr+num_gauss);
     thresh = ptr[num_gauss-num_gselect];
   } else {
     thresh = -std::numeric_limits<BaseFloat>::infinity();
   }
   BaseFloat tot_loglike = -std::numeric_limits<BaseFloat>::infinity();
   std::vector<std::pair<BaseFloat, int32> > pairs;
   for (int32 p = 0; p < num_gauss; p++) {
     if (loglikes(p) >= thresh) {
       pairs.push_back(std::make_pair(loglikes(p), p));
     }
   }
   std::sort(pairs.begin(), pairs.end(),
             std::greater<std::pair<BaseFloat, int32> >());
   for (int32 j = 0;
        j < num_gselect && j < static_cast<int32>(pairs.size());
        j++) {
     output->push_back(pairs[j].second);
     tot_loglike = LogAdd(tot_loglike, pairs[j].first);
   }
   KALDI_ASSERT(!output->empty());
   return tot_loglike;
 }

◆ GaussianSelection() [2/2]

BaseFloat GaussianSelection	(	const MatrixBase< BaseFloat > &	data,
		int32	num_gselect,
		std::vector< std::vector< int32 > > *	output
	)		const

This version of the Gaussian selection function works for a sequence of frames rather than just a single frame.

Returns sum of the log-likes over all frames.

Definition at line 801 of file diag-gmm.cc.

References VectorBase< Real >::Data(), DiagGmm::GaussianSelection(), rnnlm::i, rnnlm::j, KALDI_ASSERT, kaldi::kUndefined, kaldi::LogAdd(), DiagGmm::LogLikelihoods(), MatrixBase< Real >::NumCols(), DiagGmm::NumGauss(), and MatrixBase< Real >::NumRows().

                                                                                  {
   double ans = 0.0;
   int32 num_frames = data.NumRows(), num_gauss = NumGauss();
 
   int32 max_mem = 10000000; // Don't devote more than 10Mb to loglikes_mat;
                             // break up the utterance if needed.
   int32 mem_needed = num_frames * num_gauss * sizeof(BaseFloat);
   if (mem_needed > max_mem) {
     // Break into parts and recurse, we don't want to consume too
     // much memory.
     int32 num_parts = (mem_needed + max_mem - 1) / max_mem;
     int32 part_frames = (data.NumRows() + num_parts - 1) / num_parts;
     double tot_ans = 0.0;
     std::vector<std::vector<int32> > part_output;
     output->clear();
     output->resize(num_frames);
     for (int32 p = 0; p < num_parts; p++) {
       int32 start_frame = p * part_frames,
           this_num_frames = std::min(num_frames - start_frame, part_frames);
       SubMatrix<BaseFloat> data_part(data, start_frame, this_num_frames,
                                      0, data.NumCols());
       tot_ans += GaussianSelection(data_part, num_gselect, &part_output);
       for (int32 t = 0; t < this_num_frames; t++)
         (*output)[start_frame + t].swap(part_output[t]);
     }
     KALDI_ASSERT(!output->back().empty());
     return tot_ans;
   }
   
   KALDI_ASSERT(num_frames != 0);
   Matrix<BaseFloat> loglikes_mat(num_frames, num_gauss, kUndefined);
   this->LogLikelihoods(data, &loglikes_mat);
   
   output->clear();
   output->resize(num_frames);
 
   for (int32 i = 0; i < num_frames; i++) {
     SubVector<BaseFloat> loglikes(loglikes_mat, i);
 
     BaseFloat thresh;
     if (num_gselect < num_gauss) {
       Vector<BaseFloat> loglikes_copy(loglikes);
       BaseFloat *ptr = loglikes_copy.Data();
       std::nth_element(ptr, ptr+num_gauss-num_gselect, ptr+num_gauss);
       thresh = ptr[num_gauss-num_gselect];
     } else {
       thresh = -std::numeric_limits<BaseFloat>::infinity();
     }
     BaseFloat tot_loglike = -std::numeric_limits<BaseFloat>::infinity();
     std::vector<std::pair<BaseFloat, int32> > pairs;
     for (int32 p = 0; p < num_gauss; p++) {
       if (loglikes(p) >= thresh) {
         pairs.push_back(std::make_pair(loglikes(p), p));
       }
     }
     std::sort(pairs.begin(), pairs.end(),
               std::greater<std::pair<BaseFloat, int32> >());
     std::vector<int32> &this_output = (*output)[i];
     for (int32 j = 0;
          j < num_gselect && j < static_cast<int32>(pairs.size());
          j++) {
       this_output.push_back(pairs[j].second);
       tot_loglike = LogAdd(tot_loglike, pairs[j].first);
     }
     KALDI_ASSERT(!this_output.empty());
     ans += tot_loglike;
   }
   return ans;
 }

◆ GaussianSelectionPreselect()

BaseFloat GaussianSelectionPreselect	(	const VectorBase< BaseFloat > &	data,
		const std::vector< int32 > &	preselect,
		int32	num_gselect,
		std::vector< int32 > *	output
	)		const

Get gaussian selection information for one frame.

Returns log-like for this frame. Output is the best "num_gselect" indices that were preselected, sorted from best to worst likelihood. If "num_gselect" > NumGauss(), sets it to NumGauss().

Definition at line 875 of file diag-gmm.cc.

References VectorBase< Real >::Data(), rnnlm::j, KALDI_ASSERT, KALDI_WARN, kaldi::LogAdd(), and DiagGmm::LogLikelihoodsPreselect().

Referenced by DiagGmm::Dim(), and main().

                                     {
   static bool warned_size = false;
   int32 preselect_sz = preselect.size();
   int32 this_num_gselect = std::min(num_gselect, preselect_sz);
   if (preselect_sz <= num_gselect && !warned_size) {
     warned_size = true;
     KALDI_WARN << "Preselect size is less or equal to than final size, "
                << "doing nothing: " << preselect_sz << " < " <<  num_gselect
                << " [won't warn again]";
   }
   Vector<BaseFloat> loglikes(preselect_sz);
   LogLikelihoodsPreselect(data, preselect, &loglikes);
   
   Vector<BaseFloat> loglikes_copy(loglikes);
   BaseFloat *ptr = loglikes_copy.Data();
   std::nth_element(ptr, ptr+preselect_sz-this_num_gselect,
                    ptr+preselect_sz);
   BaseFloat thresh = ptr[preselect_sz-this_num_gselect];
 
   BaseFloat tot_loglike = -std::numeric_limits<BaseFloat>::infinity();
   // we want the output sorted from best likelihood to worse
   // (so we can prune further without the model)...
   std::vector<std::pair<BaseFloat, int32> > pairs;
   for (int32 p = 0; p < preselect_sz; p++)
     if (loglikes(p) >= thresh)
       pairs.push_back(std::make_pair(loglikes(p), preselect[p]));
   std::sort(pairs.begin(), pairs.end(),
             std::greater<std::pair<BaseFloat, int32> >());
   output->clear();
   for (int32 j = 0;
        j < this_num_gselect && j < static_cast<int32>(pairs.size());
        j++) {
     output->push_back(pairs[j].second);
     tot_loglike = LogAdd(tot_loglike, pairs[j].first);
   }
   KALDI_ASSERT(!output->empty());
   return tot_loglike;
 }

◆ gconsts()

const Vector<BaseFloat>& gconsts ( ) const

inline

Const accessors.

Definition at line 174 of file diag-gmm.h.

References DiagGmm::gconsts_, KALDI_ASSERT, and DiagGmm::valid_gconsts_.

Referenced by FullGmm::CopyFromDiagGmm(), DecodableAmDiagGmmRegtreeFmllr::LogLikelihoodZeroBased(), DecodableAmDiagGmmUnmapped::LogLikelihoodZeroBased(), and kaldi::MlObjective().

                                            {
     KALDI_ASSERT(valid_gconsts_);
     return gconsts_;
   }

◆ Generate()

void Generate ( VectorBase< BaseFloat > * output )

Generates a random data-point from this distribution.

Definition at line 922 of file diag-gmm.cc.

References rnnlm::d, VectorBase< Real >::Dim(), DiagGmm::Dim(), rnnlm::i, DiagGmm::inv_vars_, KALDI_ASSERT, DiagGmm::means_invvars_, kaldi::RandGauss(), kaldi::RandUniform(), and DiagGmm::weights_.

Referenced by DiagGmm::Dim(), TestMllrAccsIO(), kaldi::UnitTestDiagGmmGenerate(), kaldi::UnitTestFmllrDiagGmm(), kaldi::UnitTestFmllrDiagGmmDiagonal(), kaldi::UnitTestFmllrDiagGmmOffset(), and UnitTestRegtreeMllrDiagGmm().

                                                     {
   KALDI_ASSERT(static_cast<int32>(output->Dim()) == Dim());
   BaseFloat tot = weights_.Sum();
   KALDI_ASSERT(tot > 0.0);
   double r = tot * RandUniform() * 0.99999;
   int32 i = 0;
   double sum = 0.0;
   while (sum + weights_(i) < r) {
     sum += weights_(i);
     i++;
     KALDI_ASSERT(i < static_cast<int32>(weights_.Dim()));
   }
   // now i is the index of the Gaussian we chose.
   SubVector<BaseFloat> inv_var(inv_vars_, i),
       mean_invvar(means_invvars_, i);
   for (int32 d = 0; d < inv_var.Dim(); d++) {
     BaseFloat stddev = 1.0 / sqrt(inv_var(d)),
         mean = mean_invvar(d) / inv_var(d);
     (*output)(d) = mean + RandGauss() * stddev;
   }
 }

◆ GetComponentMean()

void GetComponentMean	(	int32	gauss,
		VectorBase< Real > *	out
	)		const

Accessor for single component mean.

Definition at line 135 of file diag-gmm-inl.h.

References VectorBase< Real >::CopyRowFromMat(), VectorBase< Real >::Dim(), DiagGmm::Dim(), VectorBase< Real >::DivElements(), DiagGmm::inv_vars_, KALDI_ASSERT, DiagGmm::means_invvars_, and DiagGmm::NumGauss().

Referenced by RegtreeMllrDiagGmmAccs::AccumulateForGaussian(), RegtreeMllrDiagGmmAccs::AccumulateForGmm(), kaldi::ClusterGaussiansToUbm(), kaldi::UnitTestDiagGmm(), UnitTestRegressionTree(), and DiagGmm::valid_gconsts().

                                                                        {
   KALDI_ASSERT(gauss < NumGauss());
   KALDI_ASSERT(static_cast<int32>(out->Dim()) == Dim());
   Vector<Real> tmp(Dim());
   tmp.CopyRowFromMat(inv_vars_, gauss);
   out->CopyRowFromMat(means_invvars_, gauss);
   out->DivElements(tmp);
 }

◆ GetComponentVariance()

void GetComponentVariance	(	int32	gauss,
		VectorBase< Real > *	out
	)		const

Accessor for single component variance.

Definition at line 145 of file diag-gmm-inl.h.

References VectorBase< Real >::CopyRowFromMat(), VectorBase< Real >::Dim(), DiagGmm::Dim(), DiagGmm::inv_vars_, VectorBase< Real >::InvertElements(), KALDI_ASSERT, and DiagGmm::NumGauss().

Referenced by kaldi::ClusterGaussiansToUbm(), and DiagGmm::valid_gconsts().

                                                                            {
   KALDI_ASSERT(gauss < NumGauss());
   KALDI_ASSERT(static_cast<int32>(out->Dim()) == Dim());
   out->CopyRowFromMat(inv_vars_, gauss);
   out->InvertElements();
 }

◆ GetMeans()

void GetMeans ( Matrix< Real > * m ) const

Accessor for means.

Definition at line 123 of file diag-gmm-inl.h.

References MatrixBase< Real >::CopyFromMat(), DiagGmm::Dim(), DiagGmm::inv_vars_, MatrixBase< Real >::InvertElements(), KALDI_ASSERT, DiagGmm::means_invvars_, MatrixBase< Real >::MulElements(), DiagGmm::NumGauss(), and Matrix< Real >::Resize().

Referenced by BasisFmllrEstimate::ComputeAmDiagPrecond(), main(), AccumDiagGmm::SmoothWithModel(), test_flags_driven_update(), kaldi::UnitTestDiagGmm(), UnitTestRegressionTree(), and DiagGmm::valid_gconsts().

                                             {
   KALDI_ASSERT(m != NULL);
   m->Resize(NumGauss(), Dim());
   Matrix<Real> vars(NumGauss(), Dim());
   vars.CopyFromMat(inv_vars_);
   vars.InvertElements();
   m->CopyFromMat(means_invvars_);
   m->MulElements(vars);
 }

◆ GetVars()

void GetVars ( Matrix< Real > * v ) const

Accessor for covariances.

Definition at line 115 of file diag-gmm-inl.h.

References MatrixBase< Real >::CopyFromMat(), DiagGmm::Dim(), DiagGmm::inv_vars_, MatrixBase< Real >::InvertElements(), KALDI_ASSERT, DiagGmm::NumGauss(), and Matrix< Real >::Resize().

Referenced by BasisFmllrEstimate::ComputeAmDiagPrecond(), AccumDiagGmm::SmoothWithModel(), test_flags_driven_update(), kaldi::UnitTestDiagGmm(), and DiagGmm::valid_gconsts().

                                            {
   KALDI_ASSERT(v != NULL);
   v->Resize(NumGauss(), Dim());
   v->CopyFromMat(inv_vars_);
   v->InvertElements();
 }

◆ Interpolate() [1/2]

void Interpolate	(	BaseFloat	rho,
		const DiagGmm &	source,
		GmmFlagsType	flags = `kGmmAll`
	)

this = rho x source + (1-rho) x this

Definition at line 645 of file diag-gmm.cc.

References DiagGmm::ComputeGconsts(), DiagGmm::Dim(), KALDI_ASSERT, kaldi::kGmmMeans, kaldi::kGmmVariances, kaldi::kGmmWeights, DiagGmmNormal::means_, DiagGmm::NumGauss(), DiagGmmNormal::vars_, and DiagGmmNormal::weights_.

Referenced by DiagGmm::Dim().

                                               {
   KALDI_ASSERT(NumGauss() == source.NumGauss());
   KALDI_ASSERT(Dim() == source.Dim());
 
   DiagGmmNormal us(*this);
   DiagGmmNormal them(source);
 
   if (flags & kGmmWeights) {
     us.weights_.Scale(1.0 - rho);
     us.weights_.AddVec(rho, them.weights_);
     us.weights_.Scale(1.0 / us.weights_.Sum());
   }
 
   if (flags & kGmmMeans) {
     us.means_.Scale(1.0 - rho);
     us.means_.AddMat(rho, them.means_);
   }
 
   if (flags & kGmmVariances) {
     us.vars_.Scale(1.0 - rho);
     us.vars_.AddMat(rho, them.vars_);
   }
 
   us.CopyToDiagGmm(this);
   ComputeGconsts();
 }

◆ Interpolate() [2/2]

void Interpolate	(	BaseFloat	rho,
		const FullGmm &	source,
		GmmFlagsType	flags = `kGmmAll`
	)

this = rho x source + (1-rho) x this

Definition at line 673 of file diag-gmm.cc.

References DiagGmm::ComputeGconsts(), kaldi::diag, FullGmm::Dim(), DiagGmm::Dim(), rnnlm::i, rnnlm::j, KALDI_ASSERT, kaldi::kGmmMeans, kaldi::kGmmVariances, kaldi::kGmmWeights, FullGmmNormal::means_, FullGmm::NumGauss(), DiagGmm::NumGauss(), FullGmmNormal::vars_, and FullGmmNormal::weights_.

                                               {
   KALDI_ASSERT(NumGauss() == source.NumGauss());
   KALDI_ASSERT(Dim() == source.Dim());
   DiagGmmNormal us(*this);
   FullGmmNormal them(source);
 
   if (flags & kGmmWeights) {
     us.weights_.Scale(1.0 - rho);
     us.weights_.AddVec(rho, them.weights_);
     us.weights_.Scale(1.0 / us.weights_.Sum());
   }
 
   if (flags & kGmmMeans) {
     us.means_.Scale(1.0 - rho);
     us.means_.AddMat(rho, them.means_);
   }
 
   if (flags & kGmmVariances) {
     for (int32 i = 0; i < NumGauss(); i++) {
       us.vars_.Scale(1. - rho);
       Vector<double> diag(Dim());
       for (int32 j = 0; j < Dim(); j++)
         diag(j) = them.vars_[i](j, j);
       us.vars_.Row(i).AddVec(rho, diag);
     }
   }
 
   us.CopyToDiagGmm(this);
   ComputeGconsts();
 }

◆ inv_vars()

const Matrix<BaseFloat>& inv_vars ( ) const

inline

Definition at line 180 of file diag-gmm.h.

References DiagGmm::inv_vars_.

Referenced by RegtreeMllrDiagGmmAccs::AccumulateForGaussian(), RegtreeFmllrDiagGmmAccs::AccumulateForGaussian(), RegtreeMllrDiagGmmAccs::AccumulateForGmm(), RegtreeFmllrDiagGmmAccs::AccumulateForGmm(), MlltAccs::AccumulateFromPosteriors(), FmllrDiagGmmAccs::AccumulateFromPosteriors(), FmllrRawAccs::AccumulateFromPosteriors(), FmllrDiagGmmAccs::AccumulateFromPosteriorsPreselect(), kaldi::ComputeAmGmmFeatureDeriv(), Fmpe::ComputeStddevs(), FullGmm::CopyFromDiagGmm(), DiagGmm::DiagGmm(), FmllrDiagGmmAccs::FmllrDiagGmmAccs(), kaldi::GetFeatDeriv(), DecodableAmDiagGmmRegtreeMllr::GetXformedMeanInvVars(), DecodableAmDiagGmmRegtreeFmllr::LogLikelihoodZeroBased(), DecodableAmDiagGmmUnmapped::LogLikelihoodZeroBased(), DecodableAmDiagGmmRegtreeMllr::LogLikelihoodZeroBased(), kaldi::MlObjective(), TestXformMean(), and UnitTestEstimateDiagGmm().

180 { return inv_vars_; }

kaldi::DiagGmm::inv_vars_

Matrix< BaseFloat > inv_vars_

Inverted (diagonal) variances.

Definition: diag-gmm.h:235

◆ LogLikelihood()

BaseFloat LogLikelihood ( const VectorBase< BaseFloat > & data ) const

Returns the log-likelihood of a data point (vector) given the GMM.

Definition at line 517 of file diag-gmm.cc.

References KALDI_ERR, KALDI_ISINF, KALDI_ISNAN, DiagGmm::LogLikelihoods(), VectorBase< Real >::LogSumExp(), and DiagGmm::valid_gconsts_.

Referenced by DiagGmm::Dim(), kaldi::GetGmmLike(), main(), test_flags_driven_update(), test_io(), TestComponentAcc(), TestXformMean(), kaldi::UnitTestDiagGmm(), and UnitTestFullGmm().

                                                                         {
   if (!valid_gconsts_)
     KALDI_ERR << "Must call ComputeGconsts() before computing likelihood";
   Vector<BaseFloat> loglikes;
   LogLikelihoods(data, &loglikes);
   BaseFloat log_sum = loglikes.LogSumExp();
   if (KALDI_ISNAN(log_sum) || KALDI_ISINF(log_sum))
     KALDI_ERR << "Invalid answer (overflow or invalid variances/features?)";
   return log_sum;
 }

◆ LogLikelihoods() [1/2]

void LogLikelihoods	(	const VectorBase< BaseFloat > &	data,
		Vector< BaseFloat > *	loglikes
	)		const

Outputs the per-component log-likelihoods.

Definition at line 528 of file diag-gmm.cc.

References VectorBase< Real >::AddMatVec(), VectorBase< Real >::ApplyPow(), VectorBase< Real >::CopyFromVec(), VectorBase< Real >::Dim(), DiagGmm::Dim(), DiagGmm::gconsts_, DiagGmm::inv_vars_, KALDI_ERR, kaldi::kNoTrans, kaldi::kUndefined, DiagGmm::means_invvars_, and Vector< Real >::Resize().

Referenced by DiagGmm::ComponentPosteriors(), DiagGmm::Dim(), DiagGmm::GaussianSelection(), DiagGmm::LogLikelihood(), main(), and kaldi::UnitTestDiagGmm().

                                                                 {
   loglikes->Resize(gconsts_.Dim(), kUndefined);
   loglikes->CopyFromVec(gconsts_);
   if (data.Dim() != Dim()) {
     KALDI_ERR << "DiagGmm::LogLikelihoods, dimension "
               << "mismatch " << data.Dim() << " vs. "<< Dim();
   }
   Vector<BaseFloat> data_sq(data);
   data_sq.ApplyPow(2.0);
 
   // loglikes +=  means * inv(vars) * data.
   loglikes->AddMatVec(1.0, means_invvars_, kNoTrans, data, 1.0);
   // loglikes += -0.5 * inv(vars) * data_sq.
   loglikes->AddMatVec(-0.5, inv_vars_, kNoTrans, data_sq, 1.0);
 }

◆ LogLikelihoods() [2/2]

void LogLikelihoods	(	const MatrixBase< BaseFloat > &	data,
		Matrix< BaseFloat > *	loglikes
	)		const

This version of the LogLikelihoods function operates on a sequence of frames simultaneously; the row index of both "data" and "loglikes" is the frame index.

Definition at line 546 of file diag-gmm.cc.

References MatrixBase< Real >::AddMatMat(), MatrixBase< Real >::ApplyPow(), MatrixBase< Real >::CopyRowsFromVec(), DiagGmm::Dim(), DiagGmm::gconsts_, DiagGmm::inv_vars_, KALDI_ASSERT, KALDI_ERR, kaldi::kNoTrans, kaldi::kTrans, kaldi::kUndefined, DiagGmm::means_invvars_, MatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), and Matrix< Real >::Resize().

                                                                 {
   KALDI_ASSERT(data.NumRows() != 0);
   loglikes->Resize(data.NumRows(), gconsts_.Dim(), kUndefined);
   loglikes->CopyRowsFromVec(gconsts_);
   if (data.NumCols() != Dim()) {
     KALDI_ERR << "DiagGmm::LogLikelihoods, dimension "
               << "mismatch " << data.NumCols() << " vs. "<< Dim();
   }
   Matrix<BaseFloat> data_sq(data);
   data_sq.ApplyPow(2.0);
 
   // loglikes +=  means * inv(vars) * data.
   loglikes->AddMatMat(1.0, data, kNoTrans, means_invvars_, kTrans, 1.0);
   // loglikes += -0.5 * inv(vars) * data_sq.
   loglikes->AddMatMat(-0.5, data_sq, kNoTrans, inv_vars_, kTrans, 1.0);
 }

◆ LogLikelihoodsPreselect()

void LogLikelihoodsPreselect	(	const VectorBase< BaseFloat > &	data,
		const std::vector< int32 > &	indices,
		Vector< BaseFloat > *	loglikes
	)		const

Outputs the per-component log-likelihoods of a subset of mixture components.

Note: at output, loglikes->Dim() will equal indices.size(). loglikes[i] will correspond to the log-likelihood of the Gaussian indexed indices[i], including the mixture weight.

Definition at line 566 of file diag-gmm.cc.

References VectorBase< Real >::AddMatVec(), VectorBase< Real >::CopyFromVec(), VectorBase< Real >::Dim(), DiagGmm::Dim(), DiagGmm::gconsts_, rnnlm::i, DiagGmm::inv_vars_, KALDI_ASSERT, kaldi::kNoTrans, kaldi::kUndefined, DiagGmm::means_invvars_, Vector< Real >::Resize(), MatrixBase< Real >::Row(), and kaldi::VecVec().

Referenced by FmllrDiagGmmAccs::AccumulateForGmmPreselect(), kaldi::AccumulateForUtterance(), MlltAccs::AccumulateFromGmmPreselect(), Fmpe::ApplyProjection(), Fmpe::ApplyProjectionReverse(), DiagGmm::Dim(), DiagGmm::GaussianSelectionPreselect(), main(), and kaldi::UnitTestDiagGmm().

                                                                          {
   KALDI_ASSERT(data.Dim() == Dim());
   Vector<BaseFloat> data_sq(data);
   data_sq.ApplyPow(2.0);
 
   int32 num_indices = static_cast<int32>(indices.size());
   loglikes->Resize(num_indices, kUndefined);
   if (indices.back() + 1 - indices.front() == num_indices) {
     // A special (but common) case when the indices form a contiguous range.
     int32 start_idx = indices.front();
     loglikes->CopyFromVec(SubVector<BaseFloat>(gconsts_, start_idx, num_indices));
     // loglikes +=  means * inv(vars) * data.
     SubMatrix<BaseFloat> means_invvars_sub(means_invvars_, start_idx, num_indices,
                                            0, Dim());
     loglikes->AddMatVec(1.0, means_invvars_sub, kNoTrans, data, 1.0);
     SubMatrix<BaseFloat> inv_vars_sub(inv_vars_, start_idx, num_indices,
                                       0, Dim());
     // loglikes += -0.5 * inv(vars) * data_sq.
     loglikes->AddMatVec(-0.5, inv_vars_sub, kNoTrans, data_sq, 1.0);
   } else {
     for (int32 i = 0; i < num_indices; i++) {
       int32 idx = indices[i];  // The Gaussian index.
       BaseFloat this_loglike =
           gconsts_(idx) + VecVec(means_invvars_.Row(idx), data)
           - 0.5*VecVec(inv_vars_.Row(idx), data_sq);
       (*loglikes)(i) = this_loglike;
     }
   }
 }

◆ means_invvars()

const Matrix<BaseFloat>& means_invvars ( ) const

inline

Definition at line 179 of file diag-gmm.h.

References DiagGmm::means_invvars_.

Referenced by RegtreeFmllrDiagGmmAccs::AccumulateForGaussian(), RegtreeFmllrDiagGmmAccs::AccumulateForGmm(), MlltAccs::AccumulateFromPosteriors(), FmllrDiagGmmAccs::AccumulateFromPosteriors(), FmllrRawAccs::AccumulateFromPosteriors(), FmllrDiagGmmAccs::AccumulateFromPosteriorsPreselect(), Fmpe::ApplyProjection(), Fmpe::ApplyProjectionReverse(), kaldi::ComputeAmGmmFeatureDeriv(), FullGmm::CopyFromDiagGmm(), DiagGmm::DiagGmm(), FmllrDiagGmmAccs::FmllrDiagGmmAccs(), kaldi::GetFeatDeriv(), DecodableAmDiagGmmRegtreeFmllr::LogLikelihoodZeroBased(), DecodableAmDiagGmmUnmapped::LogLikelihoodZeroBased(), kaldi::MlObjective(), and UnitTestEstimateDiagGmm().

179 { return means_invvars_; }

kaldi::DiagGmm::means_invvars_

Matrix< BaseFloat > means_invvars_

Means times inverted variance.

Definition: diag-gmm.h:236

◆ Merge()

void Merge	(	int32	target_components,
		std::vector< int32 > *	history = `NULL`
	)

Merge the components and remember the order in which the components were merged (flat list of pairs)

Definition at line 295 of file diag-gmm.cc.

References kaldi::ApproxEqual(), DiagGmm::ComputeGconsts(), rnnlm::d, DiagGmm::Dim(), DiagGmm::gconsts_, rnnlm::i, DiagGmm::inv_vars_, MatrixBase< Real >::InvertElements(), rnnlm::j, KALDI_ASSERT, KALDI_ERR, KALDI_VLOG, KALDI_WARN, kaldi::Log(), DiagGmm::means_invvars_, DiagGmm::merged_components_logdet(), MatrixBase< Real >::MulElements(), DiagGmm::NumGauss(), Matrix< Real >::RemoveRow(), Matrix< Real >::Resize(), MatrixBase< Real >::Row(), MatrixBase< Real >::Scale(), DiagGmm::weights(), and DiagGmm::weights_.

Referenced by kaldi::ClusterGaussiansToUbm(), DiagGmm::Dim(), and kaldi::UnitTestDiagGmm().

                                                                       {
   if (target_components <= 0 || NumGauss() < target_components) {
     KALDI_ERR << "Invalid argument for target number of Gaussians (="
               << target_components << "), #Gauss = " << NumGauss();
   }
   if (NumGauss() == target_components) {
     KALDI_VLOG(2) << "No components merged, as target (" << target_components
                   << ") = total.";
     return;  // Nothing to do.
   }
 
   int32 num_comp = NumGauss(), dim = Dim();
 
   if (target_components == 1) {  // global mean and variance
     Vector<BaseFloat> weights(weights_);
     // Undo variance inversion and multiplication of mean by inv var.
     Matrix<BaseFloat> vars(inv_vars_);
     Matrix<BaseFloat> means(means_invvars_);
     vars.InvertElements();
     means.MulElements(vars);
     // add means square to variances; get second-order stats
     for (int32 i = 0; i < num_comp; i++) {
       vars.Row(i).AddVec2(1.0, means.Row(i));
     }
 
     // Slightly more efficient than calling this->Resize(1, dim)
     gconsts_.Resize(1);
     weights_.Resize(1);
     means_invvars_.Resize(1, dim);
     inv_vars_.Resize(1, dim);
 
     for (int32 i = 0; i < num_comp; i++) {
       weights_(0) += weights(i);
       means_invvars_.Row(0).AddVec(weights(i), means.Row(i));
       inv_vars_.Row(0).AddVec(weights(i), vars.Row(i));
     }
     if (!ApproxEqual(weights_(0), 1.0, 1e-6)) {
       KALDI_WARN << "Weights sum to " << weights_(0) << ": rescaling.";
       means_invvars_.Scale(weights_(0));
       inv_vars_.Scale(weights_(0));
       weights_(0) = 1.0;
     }
     inv_vars_.Row(0).AddVec2(-1.0, means_invvars_.Row(0));
     inv_vars_.InvertElements();
     means_invvars_.MulElements(inv_vars_);
     ComputeGconsts();
     return;
   }
 
   // If more than 1 merged component is required, use the hierarchical
   // clustering of components that lead to the smallest decrease in likelihood.
   std::vector<bool> discarded_component(num_comp);
   Vector<BaseFloat> logdet(num_comp);   // logdet for each component
   for (int32 i = 0; i < num_comp; i++) {
     discarded_component[i] = false;
     for (int32 d = 0; d < dim; d++) {
       logdet(i) += 0.5 * Log(inv_vars_(i, d));  // +0.5 because var is inverted
     }
   }
 
   // Undo variance inversion and multiplication of mean by this
   // Makes copy of means and vars for all components - memory inefficient?
   Matrix<BaseFloat> vars(inv_vars_);
   Matrix<BaseFloat> means(means_invvars_);
   vars.InvertElements();
   means.MulElements(vars);
 
   // add means square to variances; get second-order stats
   // (normalized by zero-order stats)
   for (int32 i = 0; i < num_comp; i++) {
     vars.Row(i).AddVec2(1.0, means.Row(i));
   }
 
   // compute change of likelihood for all combinations of components
   SpMatrix<BaseFloat> delta_like(num_comp);
   for (int32 i = 0; i < num_comp; i++) {
     for (int32 j = 0; j < i; j++) {
       BaseFloat w1 = weights_(i), w2 = weights_(j), w_sum = w1 + w2;
       BaseFloat merged_logdet = merged_components_logdet(w1, w2,
         means.Row(i), means.Row(j), vars.Row(i), vars.Row(j));
       delta_like(i, j) = w_sum * merged_logdet
         - w1 * logdet(i) - w2 * logdet(j);
     }
   }
 
   // Merge components with smallest impact on the loglike
   for (int32 removed = 0; removed < num_comp - target_components; removed++) {
     // Search for the least significant change in likelihood
     // (maximum of negative delta_likes)
     BaseFloat max_delta_like = -std::numeric_limits<BaseFloat>::max();
     int32 max_i = -1, max_j = -1;
     for (int32 i = 0; i < NumGauss(); i++) {
       if (discarded_component[i]) continue;
       for (int32 j = 0; j < i; j++) {
         if (discarded_component[j]) continue;
         if (delta_like(i, j) > max_delta_like) {
           max_delta_like = delta_like(i, j);
           max_i = i;
           max_j = j;
         }
       }
     }
 
     // make sure that different components will be merged
     KALDI_ASSERT(max_i != max_j && max_i != -1 && max_j != -1);
 
     // remember the merge candidates
     if (history != NULL) {
       history->push_back(max_i);
       history->push_back(max_j);
     }
 
     // Merge components
     BaseFloat w1 = weights_(max_i), w2 = weights_(max_j);
     BaseFloat w_sum = w1 + w2;
     // merge means
     means.Row(max_i).AddVec(w2/w1, means.Row(max_j));
     means.Row(max_i).Scale(w1/w_sum);
     // merge vars
     vars.Row(max_i).AddVec(w2/w1, vars.Row(max_j));
     vars.Row(max_i).Scale(w1/w_sum);
     // merge weights
     weights_(max_i) = w_sum;
 
     // Update gmm for merged component
     // copy second-order stats (normalized by zero-order stats)
     inv_vars_.Row(max_i).CopyFromVec(vars.Row(max_i));
     // centralize
     inv_vars_.Row(max_i).AddVec2(-1.0, means.Row(max_i));
     // invert
     inv_vars_.Row(max_i).InvertElements();
     // copy first-order stats (normalized by zero-order stats)
     means_invvars_.Row(max_i).CopyFromVec(means.Row(max_i));
     // multiply by inv_vars
     means_invvars_.Row(max_i).MulElements(inv_vars_.Row(max_i));
 
     // Update logdet for merged component
     logdet(max_i) = 0.0;
     for (int32 d = 0; d < dim; d++) {
       logdet(max_i) += 0.5 * Log(inv_vars_(max_i, d));
       // +0.5 because var is inverted
     }
 
     // Label the removed component as discarded
     discarded_component[max_j] = true;
 
     // Update delta_like for merged component
     for (int32 j = 0; j < num_comp; j++) {
       if ((j == max_i) || (discarded_component[j])) continue;
       BaseFloat w1 = weights_(max_i),
                 w2 = weights_(j),
                 w_sum = w1 + w2;
       BaseFloat merged_logdet = merged_components_logdet(w1, w2,
           means.Row(max_i), means.Row(j), vars.Row(max_i), vars.Row(j));
       delta_like(max_i, j) = w_sum * merged_logdet - w1 * logdet(max_i)
           - w2 * logdet(j);
       // doesn't respect lower triangular indeces,
       // relies on implicitly performed swap of coordinates if necessary
     }
   }
 
   // Remove the consumed components
   int32 m = 0;
   for (int32 i = 0; i < num_comp; i++) {
     if (discarded_component[i]) {
       weights_.RemoveElement(m);
       means_invvars_.RemoveRow(m);
       inv_vars_.RemoveRow(m);
     } else {
       ++m;
     }
   }
 
   ComputeGconsts();
 }

◆ merged_components_logdet()

BaseFloat merged_components_logdet	(	BaseFloat	w1,
		BaseFloat	w2,
		const VectorBase< BaseFloat > &	f1,
		const VectorBase< BaseFloat > &	f2,
		const VectorBase< BaseFloat > &	s1,
		const VectorBase< BaseFloat > &	s2
	)		const

private

Definition at line 471 of file diag-gmm.cc.

References VectorBase< Real >::AddVec(), VectorBase< Real >::AddVec2(), VectorBase< Real >::CopyFromVec(), rnnlm::d, VectorBase< Real >::Dim(), kaldi::Log(), and VectorBase< Real >::Scale().

Referenced by DiagGmm::Merge().

                                                   {
   int32 dim = f1.Dim();
   Vector<BaseFloat> tmp_mean(dim);
   Vector<BaseFloat> tmp_var(dim);
 
   BaseFloat w_sum = w1 + w2;
   tmp_mean.CopyFromVec(f1);
   tmp_mean.AddVec(w2/w1, f2);
   tmp_mean.Scale(w1/w_sum);
   tmp_var.CopyFromVec(s1);
   tmp_var.AddVec(w2/w1, s2);
   tmp_var.Scale(w1/w_sum);
   tmp_var.AddVec2(-1.0, tmp_mean);
   BaseFloat merged_logdet = 0.0;
   for (int32 d = 0; d < dim; d++) {
     merged_logdet -= 0.5 * Log(tmp_var(d));
     // -0.5 because var is not inverted
   }
   return merged_logdet;
 }

◆ MergeKmeans()

void MergeKmeans	(	int32	target_components,
		ClusterKMeansOptions	cfg = `ClusterKMeansOptions()`
	)

Definition at line 231 of file diag-gmm.cc.

References VectorBase< Real >::AddVec2(), kaldi::ClusterKMeans(), DiagGmm::ComputeGconsts(), VectorBase< Real >::CopyFromVec(), count, GaussClusterable::count(), kaldi::DeletePointers(), DiagGmm::Dim(), DiagGmm::inv_vars_, VectorBase< Real >::InvertElements(), KALDI_ERR, KALDI_VLOG, KALDI_WARN, DiagGmm::means_invvars_, VectorBase< Real >::MulElements(), DiagGmm::NumGauss(), DiagGmm::Resize(), VectorBase< Real >::Scale(), DiagGmm::weights_, GaussClusterable::x2_stats(), and GaussClusterable::x_stats().

Referenced by DiagGmm::Dim(), and kaldi::UnitTestDiagGmm().

                                                     {
   if (target_components <= 0 || NumGauss() < target_components) {
     KALDI_ERR << "Invalid argument for target number of Gaussians (="
               << target_components << "), #Gauss = " << NumGauss();
   }
   if (NumGauss() == target_components) {
     KALDI_VLOG(2) << "No components merged, as target (" << target_components
                   << ") = total.";
     return;  // Nothing to do.
   }
   double min_var = 1.0e-10;
   std::vector<Clusterable*> clusterable_vec;
   for (int32 g = 0; g < NumGauss(); g++) {
     if (weights_(g) == 0) {
       KALDI_WARN << "Not using zero-weight Gaussians in clustering.";
       continue;
     }
     Vector<BaseFloat> x_stats(Dim()),
         x2_stats(Dim());
     BaseFloat count = weights_(g);
 
     SubVector<BaseFloat> inv_var(inv_vars_, g),
         mean_invvar(means_invvars_, g);
     x_stats.AddVecDivVec(1.0, mean_invvar, inv_var, count);  // x_stats is now mean.
     x2_stats.CopyFromVec(inv_var);
     x2_stats.InvertElements();  // x2_stats is now var.
     x2_stats.AddVec2(1.0, x_stats);  // x2_stats is now var + mean^2
     x_stats.Scale(count);  // x_stats is now scaled by count.
     x2_stats.Scale(count);  // x2_stats is now scaled by count.
     clusterable_vec.push_back(new GaussClusterable(x_stats, x2_stats, min_var,
                                                    count));
   }
   if (clusterable_vec.size() <= target_components) {
     KALDI_WARN << "Not doing clustering phase since lost too many Gaussians "
                << "due to zero weight. Warning: zero-weight Gaussians are "
                << "still there.";
     DeletePointers(&clusterable_vec);
     return;
   } else {
     std::vector<Clusterable*> clusters;
     ClusterKMeans(clusterable_vec,
                   target_components,
                   &clusters, NULL, cfg);
     Resize(clusters.size(), Dim());
     for (int32 g = 0; g < static_cast<int32>(clusters.size()); g++) {
       GaussClusterable *gc = static_cast<GaussClusterable*>(clusters[g]);
       weights_(g) = gc->count();
       SubVector<BaseFloat> inv_var(inv_vars_, g),
           mean_invvar(means_invvars_, g);
       inv_var.CopyFromVec(gc->x2_stats());
       inv_var.Scale(1.0 / gc->count());  // inv_var is now the var + mean^2
       mean_invvar.CopyFromVec(gc->x_stats());
       mean_invvar.Scale(1.0 / gc->count());  // mean_invvar is now the mean.
       inv_var.AddVec2(-1.0, mean_invvar);  // subtract mean^2; inv_var is now the var
       inv_var.InvertElements();  // inv_var is now the inverse var.
       mean_invvar.MulElements(inv_var);  // mean_invvar is now mean * inverse var.
     }
     ComputeGconsts();
     DeletePointers(&clusterable_vec);
     DeletePointers(&clusters);
   }
 }

◆ NumGauss()

int32 NumGauss ( ) const

inline

Returns the number of mixture components in the GMM.

Definition at line 72 of file diag-gmm.h.

References DiagGmm::weights_.

Referenced by AccumAmDiagGmm::AccumulateForGaussian(), FmllrDiagGmmAccs::AccumulateForGmm(), FmllrRawAccs::AccumulateForGmm(), RegtreeMllrDiagGmmAccs::AccumulateForGmm(), RegtreeFmllrDiagGmmAccs::AccumulateForGmm(), AccumFullGmm::AccumulateFromDiag(), AccumDiagGmm::AccumulateFromDiag(), MlltAccs::AccumulateFromGmm(), MlltAccs::AccumulateFromGmmPreselect(), MlltAccs::AccumulateFromPosteriors(), FmllrRawAccs::AccumulateFromPosteriors(), RegressionTree::BuildTree(), kaldi::ClusterGaussiansToUbm(), BasisFmllrEstimate::ComputeAmDiagPrecond(), Fmpe::ComputeC(), DiagGmm::ComputeGconsts(), FullGmm::CopyFromDiagGmm(), DiagGmm::CopyFromFullGmm(), DiagGmm::DiagGmm(), kaldi::DiagGmmToStats(), kaldi::DoRescalingUpdate(), FmllrDiagGmmAccs::FmllrDiagGmmAccs(), DiagGmm::GaussianSelection(), DiagGmm::GetComponentMean(), DiagGmm::GetComponentVariance(), DiagGmm::GetMeans(), kaldi::GetStatsDerivative(), RegtreeMllrDiagGmm::GetTransformedMeans(), DiagGmm::GetVars(), DecodableAmDiagGmmRegtreeMllr::GetXformedMeanInvVars(), AccumAmDiagGmm::Init(), init_rand_diag_gmm(), kaldi::InitGmmFromRandomFrames(), DiagGmm::Interpolate(), DecodableAmDiagGmmRegtreeFmllr::LogLikelihoodZeroBased(), main(), kaldi::MapDiagGmmUpdate(), DiagGmm::Merge(), DiagGmm::MergeKmeans(), kaldi::MleDiagGmmUpdate(), DiagGmm::Perturb(), DiagGmm::RemoveComponent(), AccumDiagGmm::Resize(), kaldi::ResizeModel(), DiagGmm::SetComponentInvVar(), DiagGmm::SetComponentMean(), DiagGmm::SetComponentWeight(), DiagGmm::SetInvVars(), AccumDiagGmm::SmoothWithModel(), DiagGmm::Split(), test_flags_driven_update(), TestComponentAcc(), kaldi::TestFmpe(), TestXformMean(), kaldi::UnitTestDiagGmm(), kaldi::UnitTestDiagGmmGenerate(), UnitTestEstimateDiagGmm(), kaldi::UnitTestEstimateMmieDiagGmm(), kaldi::UnitTestFmllrDiagGmm(), kaldi::UnitTestFmllrDiagGmmDiagonal(), kaldi::UnitTestFmllrDiagGmmOffset(), UnitTestRegressionTree(), kaldi::UpdateEbwDiagGmm(), and Fmpe::Write().

72 { return weights_.Dim(); }

kaldi::DiagGmm::weights_

Vector< BaseFloat > weights_

weights (not log).

Definition: diag-gmm.h:234

◆ operator=()

const DiagGmm& operator= ( const DiagGmm & other )

private

◆ Perturb()

void Perturb ( float perturb_factor )

Perturbs the component means with a random vector multiplied by the pertrub factor.

Definition at line 215 of file diag-gmm.cc.

References MatrixBase< Real >::AddMat(), DiagGmm::ComputeGconsts(), rnnlm::d, DiagGmm::Dim(), rnnlm::i, DiagGmm::inv_vars_, kaldi::kNoTrans, DiagGmm::means_invvars_, DiagGmm::NumGauss(), and kaldi::RandGauss().

Referenced by DiagGmm::Dim(), init_rand_diag_gmm(), kaldi::InitRandomGmm(), and main().

                                           {
   int32 num_comps = NumGauss(),
       dim = Dim();
   Matrix<BaseFloat> rand_mat(num_comps, dim);
   for (int32 i = 0; i < num_comps; i++) {
     for (int32 d = 0; d < dim; d++) {
       rand_mat(i, d) = RandGauss() * std::sqrt(inv_vars_(i, d));
       // as in DiagGmm::Split, we perturb the means_invvars using a random
       // fraction of inv_vars_
     }
   }
   means_invvars_.AddMat(perturb_factor, rand_mat, kNoTrans);
   ComputeGconsts();
 }

◆ Read()

void Read	(	std::istream &	in,
		bool	binary
	)

Definition at line 728 of file diag-gmm.cc.

References DiagGmm::ComputeGconsts(), kaldi::ExpectToken(), DiagGmm::gconsts_, DiagGmm::inv_vars_, KALDI_ERR, DiagGmm::means_invvars_, Matrix< Real >::Read(), kaldi::ReadToken(), and DiagGmm::weights_.

Referenced by DiagGmm::Dim(), main(), kaldi::operator>>(), Fmpe::Read(), and kaldi::UnitTestDiagGmm().

                                               {
 //  ExpectToken(is, binary, "<DiagGMMBegin>");
   std::string token;
   ReadToken(is, binary, &token);
   // <DiagGMMBegin> is for compatibility. Will be deleted later
   if (token != "<DiagGMMBegin>" && token != "<DiagGMM>")
     KALDI_ERR << "Expected <DiagGMM>, got " << token;
   ReadToken(is, binary, &token);
   if (token == "<GCONSTS>") {  // The gconsts are optional.
     gconsts_.Read(is, binary);
     ExpectToken(is, binary, "<WEIGHTS>");
   } else {
     if (token != "<WEIGHTS>")
       KALDI_ERR << "DiagGmm::Read, expected <WEIGHTS> or <GCONSTS>, got "
                 << token;
   }
   weights_.Read(is, binary);
   ExpectToken(is, binary, "<MEANS_INVVARS>");
   means_invvars_.Read(is, binary);
   ExpectToken(is, binary, "<INV_VARS>");
   inv_vars_.Read(is, binary);
 //  ExpectToken(is, binary, "<DiagGMMEnd>");
   ReadToken(is, binary, &token);
   // <DiagGMMEnd> is for compatibility. Will be deleted later
   if (token != "<DiagGMMEnd>" && token != "</DiagGMM>")
     KALDI_ERR << "Expected </DiagGMM>, got " << token;
 
   ComputeGconsts();  // safer option than trusting the read gconsts
 }

◆ RemoveComponent()

void RemoveComponent	(	int32	gauss,
		bool	renorm_weights
	)

Removes single component from model.

Definition at line 617 of file diag-gmm.cc.

References DiagGmm::gconsts_, DiagGmm::inv_vars_, KALDI_ASSERT, KALDI_ERR, DiagGmm::means_invvars_, DiagGmm::NumGauss(), Matrix< Real >::RemoveRow(), DiagGmm::valid_gconsts_, and DiagGmm::weights_.

Referenced by DiagGmm::RemoveComponents(), and DiagGmm::valid_gconsts().

                                                               {
   KALDI_ASSERT(gauss < NumGauss());
   if (NumGauss() == 1)
     KALDI_ERR << "Attempting to remove the only remaining component.";
   weights_.RemoveElement(gauss);
   gconsts_.RemoveElement(gauss);
   means_invvars_.RemoveRow(gauss);
   inv_vars_.RemoveRow(gauss);
   BaseFloat sum_weights = weights_.Sum();
   if (renorm_weights) {
     weights_.Scale(1.0/sum_weights);
     valid_gconsts_ = false;
   }
 }

◆ RemoveComponents()

void RemoveComponents	(	const std::vector< int32 > &	gauss,
		bool	renorm_weights
	)

Removes multiple components from model; "gauss" must not have dups.

Definition at line 632 of file diag-gmm.cc.

References rnnlm::i, kaldi::IsSortedAndUniq(), rnnlm::j, KALDI_ASSERT, and DiagGmm::RemoveComponent().

Referenced by kaldi::MleDiagGmmUpdate(), and DiagGmm::valid_gconsts().

                                                     {
   std::vector<int32> gauss(gauss_in);
   std::sort(gauss.begin(), gauss.end());
   KALDI_ASSERT(IsSortedAndUniq(gauss));
   // If efficiency is later an issue, will code this specially (unlikely).
   for (size_t i = 0; i < gauss.size(); i++) {
     RemoveComponent(gauss[i], renorm_weights);
     for (size_t j = i + 1; j < gauss.size(); j++)
       gauss[j]--;
   }
 }

◆ Resize()

void Resize	(	int32	nMix,
		int32	dim
	)

Resizes arrays to this dim. Does not initialize data.

Definition at line 66 of file diag-gmm.cc.

References DiagGmm::gconsts_, DiagGmm::inv_vars_, KALDI_ASSERT, DiagGmm::means_invvars_, MatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), Matrix< Real >::Resize(), MatrixBase< Real >::Set(), DiagGmm::valid_gconsts_, and DiagGmm::weights_.

Referenced by Sgmm2Project::ApplyProjection(), kaldi::ClusterGaussiansToUbm(), DiagGmm::CopyFromDiagGmm(), DiagGmm::CopyFromFullGmm(), DiagGmm::DiagGmm(), kaldi::unittest::InitRandDiagGmm(), kaldi::InitRandomGmm(), main(), DiagGmm::MergeKmeans(), kaldi::ResizeModel(), TestComponentAcc(), kaldi::UnitTestDiagGmm(), UnitTestEstimateDiagGmm(), kaldi::UnitTestEstimateMmieDiagGmm(), UnitTestFullGmm(), UnitTestRegressionTree(), and kaldi::UnitTestRegtreeFmllrDiagGmm().

                                           {
   KALDI_ASSERT(nmix > 0 && dim > 0);
   if (gconsts_.Dim() != nmix) gconsts_.Resize(nmix);
   if (weights_.Dim() != nmix) weights_.Resize(nmix);
   if (inv_vars_.NumRows() != nmix ||
       inv_vars_.NumCols() != dim) {
     inv_vars_.Resize(nmix, dim);
     inv_vars_.Set(1.0);
     // must be initialized to unit for case of calling SetMeans while having
     // covars/invcovars that are not set yet (i.e. zero)
   }
   if (means_invvars_.NumRows() != nmix ||
       means_invvars_.NumCols() != dim)
     means_invvars_.Resize(nmix, dim);
   valid_gconsts_ = false;
 }

◆ SetComponentInvVar()

void SetComponentInvVar	(	int32	gauss,
		const VectorBase< Real > &	in
	)

Set inv-var for single component (recommend to do this before setting the mean, if doing both, for numerical reasons).

Definition at line 97 of file diag-gmm-inl.h.

References VectorBase< Real >::CopyFromVec(), VectorBase< Real >::Dim(), DiagGmm::Dim(), DiagGmm::inv_vars_, VectorBase< Real >::InvertElements(), KALDI_ASSERT, DiagGmm::means_invvars_, DiagGmm::NumGauss(), MatrixBase< Real >::Row(), and DiagGmm::valid_gconsts_.

Referenced by DiagGmm::valid_gconsts().

                                                                    {
   KALDI_ASSERT(g < NumGauss() && v.Dim() == Dim());
 
   int32 dim = Dim();
   Vector<Real> mean(dim), var(dim);
 
   var.CopyFromVec(inv_vars_.Row(g));
   var.InvertElements();  // This inversion happens in double if Real == double
   mean.CopyFromVec(means_invvars_.Row(g));
   mean.MulElements(var);  // This is a real mean now.
   mean.MulElements(v);  // currently, v is inverted (in double if Real == double)
   means_invvars_.Row(g).CopyFromVec(mean);  // Mean times new inverse variance
   inv_vars_.Row(g).CopyFromVec(v);
   valid_gconsts_ = false;
 }

◆ SetComponentMean()

void SetComponentMean	(	int32	gauss,
		const VectorBase< Real > &	in
	)

Mutators for single component, supports float or double Set mean for a single component - internally multiplies with inv(var)

Definition at line 52 of file diag-gmm-inl.h.

References MatrixBase< Real >::CopyRowFromVec(), VectorBase< Real >::Dim(), DiagGmm::Dim(), DiagGmm::inv_vars_, KALDI_ASSERT, DiagGmm::means_invvars_, DiagGmm::NumGauss(), and DiagGmm::valid_gconsts_.

Referenced by DiagGmm::valid_gconsts().

                                                                   {
   KALDI_ASSERT(g < NumGauss() && Dim() == in.Dim());
   Vector<Real> tmp(Dim());
   tmp.CopyRowFromMat(inv_vars_, g);
   tmp.MulElements(in);
   means_invvars_.CopyRowFromVec(tmp, g);
   valid_gconsts_ = false;
 }

◆ SetComponentWeight()

void SetComponentWeight	(	int32	gauss,
		BaseFloat	weight
	)

inline

Set weight for single component.

Definition at line 34 of file diag-gmm-inl.h.

References KALDI_ASSERT, DiagGmm::NumGauss(), DiagGmm::valid_gconsts_, and DiagGmm::weights_.

Referenced by DiagGmm::valid_gconsts().

                                                             {
   KALDI_ASSERT(w > 0.0);
   KALDI_ASSERT(g < NumGauss());
   weights_(g) = w;
   valid_gconsts_ = false;
 }

◆ SetInvVars()

void SetInvVars ( const MatrixBase< Real > & v )

Set the (inverse) variances and recompute means_invvars_.

Definition at line 78 of file diag-gmm-inl.h.

References MatrixBase< Real >::CopyFromMat(), DiagGmm::Dim(), DiagGmm::inv_vars_, MatrixBase< Real >::InvertElements(), KALDI_ASSERT, DiagGmm::means_invvars_, MatrixBase< Real >::MulElements(), MatrixBase< Real >::NumCols(), DiagGmm::NumGauss(), MatrixBase< Real >::NumRows(), and DiagGmm::valid_gconsts_.

Referenced by kaldi::ResizeModel(), test_flags_driven_update(), kaldi::UnitTestDiagGmm(), and DiagGmm::valid_gconsts().

                                                   {
   KALDI_ASSERT(inv_vars_.NumRows() == v.NumRows()
     && inv_vars_.NumCols() == v.NumCols());
 
   int32 num_comp = NumGauss(), dim = Dim();
   Matrix<Real> means(num_comp, dim);
   Matrix<Real> vars(num_comp, dim);
 
   vars.CopyFromMat(inv_vars_);
   vars.InvertElements();  // This inversion happens in double if Real == double
   means.CopyFromMat(means_invvars_);
   means.MulElements(vars);  // These are real means now
   means.MulElements(v);  // v is inverted (in double if Real == double)
   means_invvars_.CopyFromMat(means);  // Means times new inverse variance
   inv_vars_.CopyFromMat(v);
   valid_gconsts_ = false;
 }

◆ SetInvVarsAndMeans()

void SetInvVarsAndMeans	(	const MatrixBase< Real > &	invvars,
		const MatrixBase< Real > &	means
	)

Use SetInvVarsAndMeans if updating both means and (inverse) variances.

Definition at line 63 of file diag-gmm-inl.h.

References MatrixBase< Real >::CopyFromMat(), DiagGmm::inv_vars_, KALDI_ASSERT, DiagGmm::means_invvars_, MatrixBase< Real >::MulElements(), MatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), and DiagGmm::valid_gconsts_.

Referenced by kaldi::ClusterGaussiansToUbm(), DiagGmm::DiagGmm(), init_rand_diag_gmm(), kaldi::unittest::InitRandDiagGmm(), kaldi::InitRandomGmm(), main(), rand_diag_gmm(), TestXformMean(), kaldi::UnitTestDiagGmm(), UnitTestEstimateDiagGmm(), kaldi::UnitTestEstimateMmieDiagGmm(), kaldi::UnitTestRegtreeFmllrDiagGmm(), and DiagGmm::valid_gconsts().

                                                                 {
   KALDI_ASSERT(means_invvars_.NumRows() == means.NumRows()
     && means_invvars_.NumCols() == means.NumCols()
     && inv_vars_.NumRows() == invvars.NumRows()
     && inv_vars_.NumCols() == invvars.NumCols());
 
   inv_vars_.CopyFromMat(invvars);
   Matrix<Real> new_means_invvars(means);
   new_means_invvars.MulElements(invvars);
   means_invvars_.CopyFromMat(new_means_invvars);
   valid_gconsts_ = false;
 }

◆ SetMeans()

void SetMeans ( const MatrixBase< Real > & m )

Use SetMeans to update only the Gaussian means (and not variances)

Definition at line 43 of file diag-gmm-inl.h.

References MatrixBase< Real >::CopyFromMat(), DiagGmm::inv_vars_, KALDI_ASSERT, DiagGmm::means_invvars_, MatrixBase< Real >::MulElements(), MatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), and DiagGmm::valid_gconsts_.

Referenced by main(), test_flags_driven_update(), kaldi::UnitTestDiagGmm(), UnitTestRegressionTree(), and DiagGmm::valid_gconsts().

                                                 {
   KALDI_ASSERT(means_invvars_.NumRows() == m.NumRows()
     && means_invvars_.NumCols() == m.NumCols());
   means_invvars_.CopyFromMat(m);
   means_invvars_.MulElements(inv_vars_);
   valid_gconsts_ = false;
 }

◆ SetWeights()

void SetWeights ( const VectorBase< Real > & w )

Mutators for both float or double.

Set mixure weights

Definition at line 28 of file diag-gmm-inl.h.

References VectorBase< Real >::Dim(), KALDI_ASSERT, DiagGmm::valid_gconsts_, and DiagGmm::weights_.

Referenced by kaldi::ClusterGaussiansToUbm(), DiagGmm::DiagGmm(), init_rand_diag_gmm(), kaldi::unittest::InitRandDiagGmm(), kaldi::InitRandomGmm(), main(), rand_diag_gmm(), test_flags_driven_update(), kaldi::UnitTestDiagGmm(), UnitTestEstimateDiagGmm(), kaldi::UnitTestEstimateMmieDiagGmm(), kaldi::UnitTestRegtreeFmllrDiagGmm(), and DiagGmm::valid_gconsts().

                                                   {
   KALDI_ASSERT(weights_.Dim() == w.Dim());
   weights_.CopyFromVec(w);
   valid_gconsts_ = false;
 }

◆ Split()

void Split	(	int32	target_components,
		float	perturb_factor,
		std::vector< int32 > *	history = `NULL`
	)

Split the components and remember the order in which the components were split.

Definition at line 154 of file diag-gmm.cc.

References DiagGmm::ComputeGconsts(), DiagGmm::CopyFromDiagGmm(), DiagGmm::DiagGmm(), DiagGmm::Dim(), DiagGmm::gconsts_, rnnlm::i, DiagGmm::inv_vars_, KALDI_ERR, KALDI_WARN, DiagGmm::means_invvars_, DiagGmm::NumGauss(), kaldi::RandGauss(), MatrixBase< Real >::Range(), Matrix< Real >::Resize(), MatrixBase< Real >::Row(), and DiagGmm::weights_.

Referenced by DiagGmm::Dim(), main(), kaldi::UnitTestDiagGmm(), UnitTestEstimateDiagGmm(), and kaldi::UnitTestEstimateMmieDiagGmm().

                                                {
   if (target_components < NumGauss() || NumGauss() == 0) {
     KALDI_ERR << "Cannot split from "  << NumGauss() << " to "
               << target_components  << " components";
   }
   if (target_components == NumGauss()) {
     KALDI_WARN << "Already have the target # of Gaussians. Doing nothing.";
     return;
   }
 
   int32 current_components = NumGauss(), dim = Dim();
   DiagGmm *tmp = new DiagGmm;
   tmp->CopyFromDiagGmm(*this);  // so we have copies of matrices
   // First do the resize:
   weights_.Resize(target_components);
   weights_.Range(0, current_components).CopyFromVec(tmp->weights_);
   means_invvars_.Resize(target_components, dim);
   means_invvars_.Range(0, current_components, 0, dim).CopyFromMat(
       tmp->means_invvars_);
   inv_vars_.Resize(target_components, dim);
   inv_vars_.Range(0, current_components, 0, dim).CopyFromMat(tmp->inv_vars_);
   gconsts_.Resize(target_components);
 
   delete tmp;
 
   // future work(arnab): Use a priority queue instead?
   while (current_components < target_components) {
     BaseFloat max_weight = weights_(0);
     int32 max_idx = 0;
     for (int32 i = 1; i < current_components; i++) {
       if (weights_(i) > max_weight) {
         max_weight = weights_(i);
         max_idx = i;
       }
     }
 
     // remember what component was split
     if (history != NULL)
       history->push_back(max_idx);
 
     weights_(max_idx) /= 2;
     weights_(current_components) = weights_(max_idx);
     Vector<BaseFloat> rand_vec(dim);
     for (int32 i = 0; i < dim; i++) {
       rand_vec(i) = RandGauss() * std::sqrt(inv_vars_(max_idx, i));
       // note, this looks wrong but is really right because it's the
       // means_invvars we're multiplying and they have the dimension
       // of an inverse standard variance. [dan]
     }
     inv_vars_.Row(current_components).CopyFromVec(inv_vars_.Row(max_idx));
     means_invvars_.Row(current_components).CopyFromVec(means_invvars_.Row(
         max_idx));
     means_invvars_.Row(current_components).AddVec(perturb_factor, rand_vec);
     means_invvars_.Row(max_idx).AddVec(-perturb_factor, rand_vec);
     current_components++;
   }
   ComputeGconsts();
 }

◆ valid_gconsts()

bool valid_gconsts ( ) const

inline

Definition at line 181 of file diag-gmm.h.

References DiagGmm::GetComponentMean(), DiagGmm::GetComponentVariance(), DiagGmm::GetMeans(), DiagGmm::GetVars(), DiagGmm::RemoveComponent(), DiagGmm::RemoveComponents(), DiagGmm::SetComponentInvVar(), DiagGmm::SetComponentMean(), DiagGmm::SetComponentWeight(), DiagGmm::SetInvVars(), DiagGmm::SetInvVarsAndMeans(), DiagGmm::SetMeans(), DiagGmm::SetWeights(), and DiagGmm::valid_gconsts_.

Referenced by DecodableAmDiagGmmRegtreeFmllr::LogLikelihoodZeroBased(), and DecodableAmDiagGmmUnmapped::LogLikelihoodZeroBased().

181 { return valid_gconsts_; }

kaldi::DiagGmm::valid_gconsts_

bool valid_gconsts_

Recompute gconsts_ if false.

Definition: diag-gmm.h:233

◆ weights()

const Vector<BaseFloat>& weights ( ) const

inline

Definition at line 178 of file diag-gmm.h.

References DiagGmm::weights_.

Referenced by RegressionTree::BuildTree(), kaldi::ClusterGaussiansToUbm(), BasisFmllrEstimate::ComputeAmDiagPrecond(), FullGmm::CopyFromDiagGmm(), DiagGmm::DiagGmm(), DecodableAmDiagGmmRegtreeMllr::GetXformedMeanInvVars(), main(), DiagGmm::Merge(), test_flags_driven_update(), kaldi::UnitTestDiagGmm(), UnitTestEstimateDiagGmm(), and kaldi::UnitTestEstimateMmieDiagGmm().

178 { return weights_; }

kaldi::DiagGmm::weights_

Vector< BaseFloat > weights_

weights (not log).

Definition: diag-gmm.h:234

◆ Write()

void Write	(	std::ostream &	os,
		bool	binary
	)		const

Definition at line 705 of file diag-gmm.cc.

References DiagGmm::gconsts_, DiagGmm::inv_vars_, KALDI_ERR, DiagGmm::means_invvars_, DiagGmm::valid_gconsts_, DiagGmm::weights_, MatrixBase< Real >::Write(), and kaldi::WriteToken().

Referenced by DiagGmm::Dim(), main(), kaldi::operator<<(), kaldi::UnitTestDiagGmm(), and Fmpe::Write().

                                                              {
   if (!valid_gconsts_)
     KALDI_ERR << "Must call ComputeGconsts() before writing the model.";
   WriteToken(out_stream, binary, "<DiagGMM>");
   if (!binary) out_stream << "\n";
   WriteToken(out_stream, binary, "<GCONSTS>");
   gconsts_.Write(out_stream, binary);
   WriteToken(out_stream, binary, "<WEIGHTS>");
   weights_.Write(out_stream, binary);
   WriteToken(out_stream, binary, "<MEANS_INVVARS>");
   means_invvars_.Write(out_stream, binary);
   WriteToken(out_stream, binary, "<INV_VARS>");
   inv_vars_.Write(out_stream, binary);
   WriteToken(out_stream, binary, "</DiagGMM>");
   if (!binary) out_stream << "\n";
 }

Friends And Related Function Documentation

◆ DiagGmmNormal

friend class DiagGmmNormal

friend

this makes it a little easier to modify the internals

Definition at line 44 of file diag-gmm.h.

Member Data Documentation

◆ gconsts_

Vector<BaseFloat> gconsts_

private

Equals log(weight) - 0.5 * (log det(var) + mean*mean*inv(var))

Definition at line 232 of file diag-gmm.h.

Referenced by DiagGmm::ComponentLogLikelihood(), DiagGmm::ComputeGconsts(), DiagGmm::CopyFromDiagGmm(), DiagGmm::CopyFromFullGmm(), DiagGmm::gconsts(), DiagGmm::LogLikelihoods(), DiagGmm::LogLikelihoodsPreselect(), DiagGmm::Merge(), DiagGmm::Read(), DiagGmm::RemoveComponent(), DiagGmm::Resize(), DiagGmm::Split(), and DiagGmm::Write().

◆ inv_vars_

Matrix<BaseFloat> inv_vars_

private

◆ means_invvars_

Matrix<BaseFloat> means_invvars_

private

◆ valid_gconsts_

bool valid_gconsts_

private

◆ weights_

Vector<BaseFloat> weights_

private

weights (not log).

Definition at line 234 of file diag-gmm.h.

Referenced by DiagGmm::ComputeGconsts(), DiagGmmNormal::CopyFromDiagGmm(), DiagGmm::CopyFromDiagGmm(), DiagGmm::CopyFromFullGmm(), DiagGmmNormal::CopyToDiagGmm(), DiagGmm::DiagGmm(), DiagGmm::Generate(), DiagGmm::Merge(), DiagGmm::MergeKmeans(), DiagGmm::NumGauss(), DiagGmm::Read(), DiagGmm::RemoveComponent(), DiagGmm::Resize(), DiagGmm::SetComponentWeight(), DiagGmm::SetWeights(), DiagGmm::Split(), DiagGmm::weights(), and DiagGmm::Write().

The documentation for this class was generated from the following files:

Public Member Functions

Private Member Functions

Private Attributes

Friends

Detailed Description

Constructor & Destructor Documentation

◆ DiagGmm() [1/5]

◆ DiagGmm() [2/5]

◆ DiagGmm() [3/5]

◆ DiagGmm() [4/5]

◆ DiagGmm() [5/5]

Member Function Documentation

◆ ComponentLogLikelihood()

◆ ComponentPosteriors()

◆ ComputeGconsts()

◆ CopyFromDiagGmm()

◆ CopyFromFullGmm()

◆ CopyFromNormal()

◆ Dim()

◆ GaussianSelection() [1/2]

◆ GaussianSelection() [2/2]

◆ GaussianSelectionPreselect()

◆ gconsts()

◆ Generate()

◆ GetComponentMean()

◆ GetComponentVariance()

◆ GetMeans()

◆ GetVars()

◆ Interpolate() [1/2]

◆ Interpolate() [2/2]

◆ inv_vars()

◆ LogLikelihood()

◆ LogLikelihoods() [1/2]

◆ LogLikelihoods() [2/2]

◆ LogLikelihoodsPreselect()

◆ means_invvars()

◆ Merge()

◆ merged_components_logdet()

◆ MergeKmeans()

◆ NumGauss()

◆ operator=()

◆ Perturb()

◆ Read()

◆ RemoveComponent()

◆ RemoveComponents()

◆ Resize()

◆ SetComponentInvVar()

◆ SetComponentMean()

◆ SetComponentWeight()

◆ SetInvVars()

◆ SetInvVarsAndMeans()

◆ SetMeans()

◆ SetWeights()

◆ Split()

◆ valid_gconsts()

◆ weights()

◆ Write()

Friends And Related Function Documentation

◆ DiagGmmNormal

Member Data Documentation

◆ gconsts_

◆ inv_vars_

◆ means_invvars_

◆ valid_gconsts_

◆ weights_