See Clustering mechanisms in Kaldi for context. More...
Functions | |
BaseFloat | SumClusterableObjf (const std::vector< Clusterable * > &vec) |
Returns the total objective function after adding up all the statistics in the vector (pointers may be NULL). More... | |
BaseFloat | SumClusterableNormalizer (const std::vector< Clusterable * > &vec) |
Returns the total normalizer (usually count) of the cluster (pointers may be NULL). More... | |
Clusterable * | SumClusterable (const std::vector< Clusterable * > &vec) |
Sums stats (ptrs may be NULL). Returns NULL if no non-NULL stats present. More... | |
void | EnsureClusterableVectorNotNull (std::vector< Clusterable * > *stats) |
Fills in any (NULL) holes in "stats" vector, with empty stats, because certain algorithms require non-NULL stats. More... | |
void | AddToClusters (const std::vector< Clusterable * > &stats, const std::vector< int32 > &assignments, std::vector< Clusterable * > *clusters) |
Given stats and a vector "assignments" of the same size (that maps to cluster indices), sums the stats up into "clusters." It will add to any stats already present in "clusters" (although typically "clusters" will be empty when called), and it will extend with NULL pointers for any unseen indices. More... | |
void | AddToClustersOptimized (const std::vector< Clusterable * > &stats, const std::vector< int32 > &assignments, const Clusterable &total, std::vector< Clusterable * > *clusters) |
AddToClustersOptimized does the same as AddToClusters (it sums up the stats within each cluster, except it uses the sum of all the stats ("total") to optimize the computation for speed, if possible. More... | |
See Clustering mechanisms in Kaldi for context.
void AddToClusters | ( | const std::vector< Clusterable * > & | stats, |
const std::vector< int32 > & | assignments, | ||
std::vector< Clusterable * > * | clusters | ||
) |
Given stats and a vector "assignments" of the same size (that maps to cluster indices), sums the stats up into "clusters." It will add to any stats already present in "clusters" (although typically "clusters" will be empty when called), and it will extend with NULL pointers for any unseen indices.
Call EnsureClusterableStatsNotNull afterwards if you want to ensure all non-NULL clusters. Pointer in "clusters" are owned by caller. Pointers in "stats" do not have to be non-NULL.
Definition at line 108 of file cluster-utils.cc.
References rnnlm::i, and KALDI_ASSERT.
Referenced by kaldi::FindBestSplitForKey(), kaldi::TestAddToClusters(), and kaldi::TestAddToClustersOptimized().
void AddToClustersOptimized | ( | const std::vector< Clusterable * > & | stats, |
const std::vector< int32 > & | assignments, | ||
const Clusterable & | total, | ||
std::vector< Clusterable * > * | clusters | ||
) |
AddToClustersOptimized does the same as AddToClusters (it sums up the stats within each cluster, except it uses the sum of all the stats ("total") to optimize the computation for speed, if possible.
This will generally only be a significant speedup in the case where there are just two clusters, which can happen in algorithms that are doing binary splits; the idea is that we sum up all the stats in one cluster (the one with the fewest points in it), and then subtract from the total.
Definition at line 130 of file cluster-utils.cc.
References Clusterable::Add(), kaldi::AssertEqual(), Clusterable::Copy(), rnnlm::i, KALDI_ASSERT, Clusterable::Normalizer(), and kaldi::SumClusterableNormalizer().
Referenced by kaldi::ComputeInitialSplit(), and kaldi::TestAddToClustersOptimized().
void EnsureClusterableVectorNotNull | ( | std::vector< Clusterable * > * | stats | ) |
Fills in any (NULL) holes in "stats" vector, with empty stats, because certain algorithms require non-NULL stats.
If "stats" nonempty, requires it to contain at least one non-NULL pointer that we can call Copy() on.
Definition at line 82 of file cluster-utils.cc.
References Clusterable::Copy(), KALDI_ASSERT, KALDI_ERR, and Clusterable::SetZero().
Referenced by kaldi::AutomaticallyObtainQuestions(), kaldi::FindBestSplitForKey(), kaldi::KMeansClusterPhones(), and kaldi::TestEnsureClusterableVectorNotNull().
Clusterable * SumClusterable | ( | const std::vector< Clusterable *> & | vec | ) |
Sums stats (ptrs may be NULL). Returns NULL if no non-NULL stats present.
Definition at line 69 of file cluster-utils.cc.
References Clusterable::Add(), Clusterable::Copy(), and rnnlm::i.
Referenced by kaldi::ClusterKMeansOnce(), kaldi::ComputeInitialSplit(), TreeClusterer::Init(), kaldi::InitAmGmm(), kaldi::TestAddToClustersOptimized(), and kaldi::TestSum().
BaseFloat SumClusterableNormalizer | ( | const std::vector< Clusterable *> & | vec | ) |
Returns the total normalizer (usually count) of the cluster (pointers may be NULL).
Definition at line 54 of file cluster-utils.cc.
References rnnlm::i, KALDI_ISNAN, and KALDI_WARN.
Referenced by kaldi::AddToClustersOptimized(), kaldi::ClusterEventMapGetMapping(), kaldi::ClusterEventMapToNClustersRestrictedByMap(), kaldi::ClusterKMeansOnce(), kaldi::KMeansClusterPhones(), kaldi::TestAddToClustersOptimized(), and kaldi::TestSumObjfAndSumNormalizer().
BaseFloat SumClusterableObjf | ( | const std::vector< Clusterable * > & | vec | ) |
Returns the total objective function after adding up all the statistics in the vector (pointers may be NULL).
Definition at line 39 of file cluster-utils.cc.
References rnnlm::i, KALDI_ISNAN, and KALDI_WARN.
Referenced by kaldi::ClusterKMeansOnce(), kaldi::ComputeInitialSplit(), kaldi::ObjfGivenMap(), kaldi::TestClusterKMeans(), kaldi::TestClusterKMeansVector(), kaldi::TestClusterTopDown(), kaldi::TestRefineClusters(), kaldi::TestSumObjfAndSumNormalizer(), and kaldi::TestTreeCluster().