See Decision tree internals and specifically Classes and functions involved in tree-building for context. More...
Functions | |
void | DeleteBuildTreeStats (BuildTreeStatsType *stats) |
This frees the Clusterable* pointers in "stats", where non-NULL, and sets them to NULL. More... | |
void | WriteBuildTreeStats (std::ostream &os, bool binary, const BuildTreeStatsType &stats) |
Writes BuildTreeStats object. This works even if pointers are NULL. More... | |
void | ReadBuildTreeStats (std::istream &is, bool binary, const Clusterable &example, BuildTreeStatsType *stats) |
Reads BuildTreeStats object. More... | |
bool | PossibleValues (EventKeyType key, const BuildTreeStatsType &stats, std::vector< EventValueType > *ans) |
Convenience function e.g. More... | |
void | SplitStatsByMap (const BuildTreeStatsType &stats_in, const EventMap &e, std::vector< BuildTreeStatsType > *stats_out) |
Splits stats according to the EventMap, indexing them at output by the leaf type. More... | |
void | SplitStatsByKey (const BuildTreeStatsType &stats_in, EventKeyType key, std::vector< BuildTreeStatsType > *stats_out) |
SplitStatsByKey splits stats up according to the value of a particular key, which must be always defined and nonnegative. More... | |
bool | ConvertStats (int32 oldN, int32 oldP, int32 newN, int32 newP, BuildTreeStatsType *stats) |
Converts stats from a given context-window (N) and central-position (P) to a different N and P, by possibly reducing context. More... | |
void | FilterStatsByKey (const BuildTreeStatsType &stats_in, EventKeyType key, std::vector< EventValueType > &values, bool include_if_present, BuildTreeStatsType *stats_out) |
FilterStatsByKey filters the stats according the value of a specified key. More... | |
Clusterable * | SumStats (const BuildTreeStatsType &stats_in) |
Sums stats, or returns NULL stats_in has no non-NULL stats. More... | |
BaseFloat | SumNormalizer (const BuildTreeStatsType &stats_in) |
Sums the normalizer [typically, data-count] over the stats. More... | |
BaseFloat | SumObjf (const BuildTreeStatsType &stats_in) |
Sums the objective function over the stats. More... | |
void | SumStatsVec (const std::vector< BuildTreeStatsType > &stats_in, std::vector< Clusterable * > *stats_out) |
Sum a vector of stats. More... | |
BaseFloat | ObjfGivenMap (const BuildTreeStatsType &stats_in, const EventMap &e) |
Cluster the stats given the event map return the total objf given those clusters. More... | |
void | FindAllKeys (const BuildTreeStatsType &stats, AllKeysType keys_type, std::vector< EventKeyType > *keys) |
FindAllKeys puts in *keys the (sorted, unique) list of all key identities in the stats. More... | |
See Decision tree internals and specifically Classes and functions involved in tree-building for context.
Converts stats from a given context-window (N) and central-position (P) to a different N and P, by possibly reducing context.
This function does a job that's quite specific to the "normal" stats format we use. See Phonetic context windows for background. This function may delete some keys and change others, depending on the N and P values. It expects that at input, all keys will either be -1 or lie between 0 and oldN-1. At output, keys will be either -1 or between 0 and newN-1. Returns false if we could not convert the stats (e.g. because newN is larger than oldN).
Definition at line 1077 of file build-tree-utils.cc.
References rnnlm::i, rnnlm::j, KALDI_ASSERT, and KALDI_WARN.
Referenced by kaldi::InitAmGmmFromOld(), and kaldi::TestConvertStats().
void DeleteBuildTreeStats | ( | BuildTreeStatsType * | stats | ) |
This frees the Clusterable* pointers in "stats", where non-NULL, and sets them to NULL.
Does not delete the pointer "stats" itself.
Definition at line 754 of file build-tree-utils.cc.
References KALDI_ASSERT.
Referenced by kaldi::GenRandContextDependency(), kaldi::GenRandContextDependencyLarge(), main(), kaldi::TestBuildTree(), kaldi::TestClusterEventMap(), kaldi::TestClusterEventMapGetMappingAndRenumberEventMap(), kaldi::TestClusterEventMapGetMappingAndRenumberEventMap2(), kaldi::TestClusterEventMapRestricted(), kaldi::TestGenRandStats(), kaldi::TestShareEventMapLeaves(), and kaldi::TestSplitDecisionTree().
void FilterStatsByKey | ( | const BuildTreeStatsType & | stats_in, |
EventKeyType | key, | ||
std::vector< EventValueType > & | values, | ||
bool | include_if_present, | ||
BuildTreeStatsType * | stats_out | ||
) |
FilterStatsByKey filters the stats according the value of a specified key.
If include_if_present == true, it only outputs the stats whose key is in "values"; otherwise it only outputs the stats whose key is not in "values". At input, "values" must be sorted and unique, and all stats in "stats_in" must have "key" defined. At output, pointers to Clusterable* in stats_out are not newly allocated– they are the same as the ones in stats_in.
Definition at line 222 of file build-tree-utils.cc.
References kaldi::EventTypeToString(), kaldi::IsSortedAndUniq(), KALDI_ASSERT, KALDI_ERR, and EventMap::Lookup().
Referenced by kaldi::AutomaticallyObtainQuestions(), kaldi::BuildTree(), kaldi::BuildTreeTwoLevel(), and kaldi::KMeansClusterPhones().
void FindAllKeys | ( | const BuildTreeStatsType & | stats, |
AllKeysType | keys_type, | ||
std::vector< EventKeyType > * | keys | ||
) |
FindAllKeys puts in *keys the (sorted, unique) list of all key identities in the stats.
If type == kAllKeysInsistIdentical, it will insist that this set of keys is the same for all the stats (else exception is thrown). if type == kAllKeysIntersection, it will return the smallest common set of keys present in the set of stats if type== kAllKeysUnion (currently probably not so useful since maps will return "undefined" if key is not present), it will return the union of all the keys present in the stats.
Definition at line 92 of file build-tree-utils.cc.
References kaldi::GetEventKeys(), KALDI_ASSERT, KALDI_ERR, kaldi::kAllKeysInsistIdentical, kaldi::kAllKeysIntersection, and kaldi::kAllKeysUnion.
Referenced by Questions::InitRand(), and kaldi::TestFindAllKeys().
BaseFloat ObjfGivenMap | ( | const BuildTreeStatsType & | stats_in, |
const EventMap & | e | ||
) |
Cluster the stats given the event map return the total objf given those clusters.
Definition at line 285 of file build-tree-utils.cc.
References kaldi::DeletePointers(), kaldi::SplitStatsByMap(), kaldi::SumClusterableObjf(), and kaldi::SumStatsVec().
Referenced by kaldi::BuildTree(), kaldi::BuildTreeTwoLevel(), and kaldi::TestSplitDecisionTree().
bool PossibleValues | ( | EventKeyType | key, |
const BuildTreeStatsType & | stats, | ||
std::vector< EventValueType > * | ans | ||
) |
Convenience function e.g.
to work out possible values of the phones from just the stats. Returns true if key was always defined inside the stats. May be used with and == NULL to find out of key was always defined.
Definition at line 63 of file build-tree-utils.cc.
References kaldi::CopySetToVector(), and EventMap::Lookup().
Referenced by kaldi::DoTableSplit(), kaldi::FindBestSplitForKey(), Questions::InitRand(), main(), kaldi::TestPossibleValues(), and kaldi::TestShareEventMapLeaves().
void ReadBuildTreeStats | ( | std::istream & | is, |
bool | binary, | ||
const Clusterable & | example, | ||
BuildTreeStatsType * | stats | ||
) |
Reads BuildTreeStats object.
The "example" argument must be of the same type as the stats on disk, and is needed for access to the correct "Read" function. It was organized this way for easier extensibility (so adding new Clusterable derived classes isn't painful)
Definition at line 46 of file build-tree-utils.cc.
References kaldi::ExpectToken(), rnnlm::i, KALDI_ASSERT, kaldi::ReadBasicType(), kaldi::ReadEventType(), and Clusterable::ReadNew().
Referenced by main(), and kaldi::TestBuildTreeStatsIo().
void SplitStatsByKey | ( | const BuildTreeStatsType & | stats_in, |
EventKeyType | key, | ||
std::vector< BuildTreeStatsType > * | stats_out | ||
) |
SplitStatsByKey splits stats up according to the value of a particular key, which must be always defined and nonnegative.
Like MapStats. Pointers to Clusterable* in stats_out are not newly allocated– they are the same as the ones in stats_in. Generally they will still be owned at stats_in (user can decide where to allocate ownership).
Definition at line 198 of file build-tree-utils.cc.
References kaldi::EventTypeToString(), KALDI_ASSERT, KALDI_ERR, and EventMap::Lookup().
Referenced by kaldi::AutomaticallyObtainQuestions(), kaldi::ClusterEventMapRestrictedHelper(), kaldi::FindBestSplitForKey(), kaldi::GetToLengthMap(), kaldi::KMeansClusterPhones(), and kaldi::TestSplitStatsByKey().
void SplitStatsByMap | ( | const BuildTreeStatsType & | stats_in, |
const EventMap & | e, | ||
std::vector< BuildTreeStatsType > * | stats_out | ||
) |
Splits stats according to the EventMap, indexing them at output by the leaf type.
A utility function. NOTE– pointers in stats_out point to the same memory location as those in stats. No copying of Clusterable* objects happens. Will add to stats in stats_out if non-empty at input. This function may increase the size of vector stats_out as necessary to accommodate stats, but will never decrease the size.
Definition at line 172 of file build-tree-utils.cc.
References kaldi::EventTypeToString(), KALDI_ASSERT, KALDI_ERR, and EventMap::Map().
Referenced by kaldi::ClusterEventMapGetMapping(), kaldi::ClusterEventMapRestrictedByMap(), kaldi::ClusterEventMapToNClustersRestrictedByMap(), kaldi::ComputeTreeMapping(), kaldi::DoTableSplit(), kaldi::GetOccs(), kaldi::InitAmGmm(), kaldi::InitAmGmmFromOld(), main(), kaldi::ObjfGivenMap(), kaldi::SplitDecisionTree(), and kaldi::TestSplitDecisionTree().
BaseFloat SumNormalizer | ( | const BuildTreeStatsType & | stats_in | ) |
Sums the normalizer [typically, data-count] over the stats.
Definition at line 258 of file build-tree-utils.cc.
References Clusterable::Normalizer().
Referenced by kaldi::BuildTree(), kaldi::BuildTreeTwoLevel(), kaldi::GetOccs(), and main().
BaseFloat SumObjf | ( | const BuildTreeStatsType & | stats_in | ) |
Sums the objective function over the stats.
Definition at line 268 of file build-tree-utils.cc.
References Clusterable::Objf().
Clusterable * SumStats | ( | const BuildTreeStatsType & | stats_in | ) |
Sums stats, or returns NULL stats_in has no non-NULL stats.
Stats are newly allocated, owned by caller.
Definition at line 245 of file build-tree-utils.cc.
References Clusterable::Add(), and Clusterable::Copy().
Referenced by DecisionTreeSplitter::DoSplitInternal(), kaldi::InitAmGmmFromOld(), and kaldi::SumStatsVec().
void SumStatsVec | ( | const std::vector< BuildTreeStatsType > & | stats_in, |
std::vector< Clusterable * > * | stats_out | ||
) |
Sum a vector of stats.
Leaves NULL as pointer if no stats available. The pointers in stats_out are owned by caller. At output, there may be NULLs in the vector stats_out.
Definition at line 279 of file build-tree-utils.cc.
References rnnlm::i, KALDI_ASSERT, and kaldi::SumStats().
Referenced by kaldi::AutomaticallyObtainQuestions(), kaldi::ClusterEventMapGetMapping(), kaldi::ClusterEventMapToNClustersRestrictedByMap(), kaldi::FindBestSplitForKey(), kaldi::InitAmGmm(), kaldi::KMeansClusterPhones(), and kaldi::ObjfGivenMap().
void WriteBuildTreeStats | ( | std::ostream & | os, |
bool | binary, | ||
const BuildTreeStatsType & | stats | ||
) |
Writes BuildTreeStats object. This works even if pointers are NULL.
Definition at line 29 of file build-tree-utils.cc.
References rnnlm::i, KALDI_ERR, kaldi::WriteBasicType(), kaldi::WriteEventType(), and kaldi::WriteToken().
Referenced by main(), kaldi::TestBuildTree(), kaldi::TestBuildTreeStatsIo(), and kaldi::TestGenRandStats().