These functions are are used in top-level tree-building code (Top-level tree-building functions); see Decision tree internals for documentation. More...
Functions | |
EventMap * | TrivialTree (int32 *num_leaves) |
Returns a tree with just one node. More... | |
EventMap * | DoTableSplit (const EventMap &orig, EventKeyType key, const BuildTreeStatsType &stats, int32 *num_leaves) |
DoTableSplit does a complete split on this key (e.g. More... | |
EventMap * | DoTableSplitMultiple (const EventMap &orig, const std::vector< EventKeyType > &keys, const BuildTreeStatsType &stats, int32 *num_leaves) |
DoTableSplitMultiple does a complete split on all the keys, in order from keys[0], keys[1] and so on. More... | |
int | ClusterEventMapGetMapping (const EventMap &e_in, const BuildTreeStatsType &stats, BaseFloat thresh, std::vector< EventMap * > *mapping) |
"ClusterEventMapGetMapping" clusters the leaves of the EventMap, with "thresh" a delta-likelihood threshold to control how many leaves we combine (might be the same as the delta-like threshold used in splitting. More... | |
EventMap * | ClusterEventMap (const EventMap &e_in, const BuildTreeStatsType &stats, BaseFloat thresh, int32 *num_removed) |
This is as ClusterEventMapGetMapping but a more convenient interface that exposes less of the internals. More... | |
EventMap * | ClusterEventMapRestrictedByKeys (const EventMap &e_in, const BuildTreeStatsType &stats, BaseFloat thresh, const std::vector< EventKeyType > &keys, int32 *num_removed) |
This is as ClusterEventMap, but first splits the stats on the keys specified in "keys" (e.g. More... | |
EventMap * | ClusterEventMapRestrictedByMap (const EventMap &e_in, const BuildTreeStatsType &stats, BaseFloat thresh, const EventMap &e_restrict, int32 *num_removed) |
This version of ClusterEventMapRestricted restricts the clustering to only allow things that "e_restrict" maps to the same value to be clustered together. More... | |
EventMap * | ClusterEventMapToNClustersRestrictedByMap (const EventMap &e_in, const BuildTreeStatsType &stats, int32 num_clusters, const EventMap &e_restrict, int32 *num_removed) |
This version of ClusterEventMapRestrictedByMap clusters to get a specific number of clusters as specified by 'num_clusters'. More... | |
EventMap * | RenumberEventMap (const EventMap &e_in, int32 *num_leaves) |
RenumberEventMap [intended to be used after calling ClusterEventMap] renumbers an EventMap so its leaves are consecutive. More... | |
EventMap * | MapEventMapLeaves (const EventMap &e_in, const std::vector< int32 > &mapping) |
This function remaps the event-map leaves using this mapping, indexed by the number at leaf. More... | |
EventMap * | ShareEventMapLeaves (const EventMap &e_in, EventKeyType key, std::vector< std::vector< EventValueType > > &values, int32 *num_leaves) |
ShareEventMapLeaves performs a quite specific function that allows us to generate trees where, for a certain list of phones, and for all states in the phone, all the pdf's are shared. More... | |
EventMap * | SplitDecisionTree (const EventMap &orig, const BuildTreeStatsType &stats, Questions &qcfg, BaseFloat thresh, int32 max_leaves, int32 *num_leaves, BaseFloat *objf_impr_out, BaseFloat *smallest_split_change_out) |
Does a decision-tree split at the leaves of an EventMap. More... | |
void | CreateRandomQuestions (const BuildTreeStatsType &stats, int32 num_quest, Questions *cfg_out) |
CreateRandomQuestions will initialize a Questions randomly, in a reasonable way [for testing purposes, or when hand-designed questions are not available]. More... | |
BaseFloat | FindBestSplitForKey (const BuildTreeStatsType &stats, const Questions &qcfg, EventKeyType key, std::vector< EventValueType > *yes_set) |
FindBestSplitForKey is a function used in DoDecisionTreeSplit. More... | |
EventMap * | GetStubMap (int32 P, const std::vector< std::vector< int32 > > &phone_sets, const std::vector< int32 > &phone2num_pdf_classes, const std::vector< bool > &share_roots, int32 *num_leaves) |
GetStubMap is used in tree-building functions to get the initial to-states map, before the decision-tree-building process. More... | |
These functions are are used in top-level tree-building code (Top-level tree-building functions); see Decision tree internals for documentation.
EventMap * ClusterEventMap | ( | const EventMap & | e_in, |
const BuildTreeStatsType & | stats, | ||
BaseFloat | thresh, | ||
int32 * | num_removed | ||
) |
This is as ClusterEventMapGetMapping but a more convenient interface that exposes less of the internals.
It uses a bottom-up clustering to combine the leaves, until the log-likelihood decrease from combinging two leaves exceeds the threshold.
Definition at line 699 of file build-tree-utils.cc.
References kaldi::ClusterEventMapGetMapping(), EventMap::Copy(), and kaldi::DeletePointers().
Referenced by kaldi::TestClusterEventMap(), and kaldi::TestClusterEventMapRestricted().
int ClusterEventMapGetMapping | ( | const EventMap & | e_in, |
const BuildTreeStatsType & | stats, | ||
BaseFloat | thresh, | ||
std::vector< EventMap * > * | mapping | ||
) |
"ClusterEventMapGetMapping" clusters the leaves of the EventMap, with "thresh" a delta-likelihood threshold to control how many leaves we combine (might be the same as the delta-like threshold used in splitting.
Definition at line 599 of file build-tree-utils.cc.
References kaldi::ClusterBottomUp(), kaldi::DeletePointers(), rnnlm::i, KALDI_ASSERT, KALDI_VLOG, KALDI_WARN, kaldi::SplitStatsByMap(), kaldi::SumClusterableNormalizer(), and kaldi::SumStatsVec().
Referenced by kaldi::ClusterEventMap(), kaldi::ClusterEventMapRestrictedByMap(), kaldi::ClusterEventMapRestrictedHelper(), kaldi::TestClusterEventMapGetMappingAndRenumberEventMap(), and kaldi::TestClusterEventMapGetMappingAndRenumberEventMap2().
EventMap * ClusterEventMapRestrictedByKeys | ( | const EventMap & | e_in, |
const BuildTreeStatsType & | stats, | ||
BaseFloat | thresh, | ||
const std::vector< EventKeyType > & | keys, | ||
int32 * | num_removed | ||
) |
This is as ClusterEventMap, but first splits the stats on the keys specified in "keys" (e.g.
typically keys = [ -1, P ]), and only clusters within the classes defined by that splitting. Note– leaves will be non-consecutive at output, use RenumberEventMap.
Definition at line 822 of file build-tree-utils.cc.
References kaldi::ClusterEventMapRestrictedHelper(), EventMap::Copy(), and kaldi::DeletePointers().
Referenced by kaldi::TestClusterEventMapRestricted().
EventMap * ClusterEventMapRestrictedByMap | ( | const EventMap & | e_in, |
const BuildTreeStatsType & | stats, | ||
BaseFloat | thresh, | ||
const EventMap & | e_restrict, | ||
int32 * | num_removed | ||
) |
This version of ClusterEventMapRestricted restricts the clustering to only allow things that "e_restrict" maps to the same value to be clustered together.
Definition at line 838 of file build-tree-utils.cc.
References kaldi::ClusterEventMapGetMapping(), EventMap::Copy(), kaldi::DeletePointers(), rnnlm::i, and kaldi::SplitStatsByMap().
Referenced by kaldi::BuildTree(), kaldi::BuildTreeTwoLevel(), and kaldi::TestClusterEventMapRestricted().
EventMap * ClusterEventMapToNClustersRestrictedByMap | ( | const EventMap & | e_in, |
const BuildTreeStatsType & | stats, | ||
int32 | num_clusters_required, | ||
const EventMap & | e_restrict, | ||
int32 * | num_removed_ptr | ||
) |
This version of ClusterEventMapRestrictedByMap clusters to get a specific number of clusters as specified by 'num_clusters'.
Definition at line 861 of file build-tree-utils.cc.
References kaldi::ClusterBottomUpCompartmentalized(), EventMap::Copy(), kaldi::DeletePointers(), rnnlm::i, rnnlm::j, KALDI_ASSERT, KALDI_VLOG, KALDI_WARN, kaldi::SplitStatsByMap(), kaldi::SumClusterableNormalizer(), and kaldi::SumStatsVec().
Referenced by kaldi::BuildTree().
void kaldi::CreateRandomQuestions | ( | const BuildTreeStatsType & | stats, |
int32 | num_quest, | ||
Questions * | cfg_out | ||
) |
CreateRandomQuestions will initialize a Questions randomly, in a reasonable way [for testing purposes, or when hand-designed questions are not available].
e.g. num_quest = 5 might be a reasonable value if num_iters > 0, or num_quest = 20 otherwise.
EventMap * DoTableSplit | ( | const EventMap & | orig, |
EventKeyType | key, | ||
const BuildTreeStatsType & | stats, | ||
int32 * | num_leaves | ||
) |
DoTableSplit does a complete split on this key (e.g.
might correspond to central phone (key = P-1), or HMM-state position (key == kPdfClass == -1). Stats used to work out possible values of the event. "num_leaves" is used to allocate new leaves. All stats must have this key defined, or this function will crash.
Definition at line 126 of file build-tree-utils.cc.
References EventMap::Copy(), kaldi::DeletePointers(), KALDI_ASSERT, kaldi::PossibleValues(), and kaldi::SplitStatsByMap().
Referenced by kaldi::DoTableSplitMultiple(), kaldi::TestClusterEventMap(), kaldi::TestClusterEventMapGetMappingAndRenumberEventMap(), kaldi::TestClusterEventMapGetMappingAndRenumberEventMap2(), and kaldi::TestDoTableSplit().
EventMap * DoTableSplitMultiple | ( | const EventMap & | orig, |
const std::vector< EventKeyType > & | keys, | ||
const BuildTreeStatsType & | stats, | ||
int32 * | num_leaves | ||
) |
DoTableSplitMultiple does a complete split on all the keys, in order from keys[0], keys[1] and so on.
The stats are used to work out possible values corresponding to the key. "num_leaves" is used to allocate new leaves. All stats must have the keys defined, or this function will crash. Returns a newly allocated event map.
Definition at line 156 of file build-tree-utils.cc.
References EventMap::Copy(), kaldi::DoTableSplit(), and rnnlm::i.
Referenced by kaldi::TestClusterEventMapRestricted(), and kaldi::TestShareEventMapLeaves().
BaseFloat FindBestSplitForKey | ( | const BuildTreeStatsType & | stats, |
const Questions & | qcfg, | ||
EventKeyType | key, | ||
std::vector< EventValueType > * | yes_set | ||
) |
FindBestSplitForKey is a function used in DoDecisionTreeSplit.
It finds the best split for this key, given these stats. It will return 0 if the key was not always defined for the stats.
Definition at line 348 of file build-tree-utils.cc.
References kaldi::AddToClusters(), kaldi::ApproxEqual(), kaldi::ComputeInitialSplit(), kaldi::DeletePointers(), kaldi::EnsureClusterableVectorNotNull(), Questions::GetQuestionsOf(), rnnlm::i, KALDI_ASSERT, KALDI_WARN, RefineClustersOptions::num_iters, kaldi::PossibleValues(), QuestionsForKey::refine_opts, kaldi::RefineClusters(), kaldi::SplitStatsByKey(), and kaldi::SumStatsVec().
Referenced by DecisionTreeSplitter::FindBestSplit().
EventMap * GetStubMap | ( | int32 | P, |
const std::vector< std::vector< int32 > > & | phone_sets, | ||
const std::vector< int32 > & | phone2num_pdf_classes, | ||
const std::vector< bool > & | share_roots, | ||
int32 * | num_leaves | ||
) |
GetStubMap is used in tree-building functions to get the initial to-states map, before the decision-tree-building process.
It creates a simple map that splits on groups of phones. For the set of phones in phone_sets[i] it creates either: if share_roots[i] == true, a single leaf node, or if share_roots[i] == false, separate root nodes for each HMM-position (it goes up to the highest position for any phone in the set, although it will warn if you share roots between phones with different numbers of states, which is a weird thing to do but should still work. If any phone is present in "phone_sets" but "phone2num_pdf_classes" does not map it to a length, it is an error. Note that the behaviour of the resulting map is undefined for phones not present in "phone_sets". At entry, this function should be called with (*num_leaves == 0). It will number the leaves starting from (*num_leaves).
Definition at line 975 of file build-tree-utils.cc.
References rnnlm::i, kaldi::IsSortedAndUniq(), rnnlm::j, KALDI_ASSERT, KALDI_WARN, and kaldi::kPdfClass.
Referenced by kaldi::BuildTree(), kaldi::MonophoneContextDependency(), and kaldi::MonophoneContextDependencyShared().
This function remaps the event-map leaves using this mapping, indexed by the number at leaf.
Definition at line 689 of file build-tree-utils.cc.
References EventMap::Copy(), kaldi::DeletePointers(), and rnnlm::i.
Referenced by kaldi::BuildTreeTwoLevel().
RenumberEventMap [intended to be used after calling ClusterEventMap] renumbers an EventMap so its leaves are consecutive.
It puts the number of leaves in *num_leaves. If later you need the mapping of the leaves, modify the function and add a new argument.
Definition at line 664 of file build-tree-utils.cc.
References EventMap::Copy(), kaldi::DeletePointers(), KALDI_ASSERT, EventMap::MultiMap(), and kaldi::SortAndUniq().
Referenced by kaldi::BuildTree(), kaldi::BuildTreeTwoLevel(), kaldi::ShareEventMapLeaves(), kaldi::TestClusterEventMapGetMappingAndRenumberEventMap(), and kaldi::TestClusterEventMapGetMappingAndRenumberEventMap2().
EventMap * ShareEventMapLeaves | ( | const EventMap & | e_in, |
EventKeyType | key, | ||
std::vector< std::vector< EventValueType > > & | values, | ||
int32 * | num_leaves | ||
) |
ShareEventMapLeaves performs a quite specific function that allows us to generate trees where, for a certain list of phones, and for all states in the phone, all the pdf's are shared.
Each element of "values" contains a list of phones (may be just one phone), all states of which we want shared together). Typically at input, "key" will equal P, the central-phone position, and "values" will contain just one list containing the silence phone. This function renumbers the event map leaves after doing the sharing, to make the event-map leaves contiguous.
Definition at line 710 of file build-tree-utils.cc.
References EventMap::Copy(), kaldi::DeletePointers(), rnnlm::i, rnnlm::j, KALDI_ASSERT, KALDI_WARN, kaldi::MakeEventPair(), EventMap::MultiMap(), kaldi::RenumberEventMap(), and kaldi::SortAndUniq().
Referenced by kaldi::TestShareEventMapLeaves().
EventMap * SplitDecisionTree | ( | const EventMap & | orig, |
const BuildTreeStatsType & | stats, | ||
Questions & | qcfg, | ||
BaseFloat | thresh, | ||
int32 | max_leaves, | ||
int32 * | num_leaves, | ||
BaseFloat * | objf_impr_out, | ||
BaseFloat * | smallest_split_change_out | ||
) |
Does a decision-tree split at the leaves of an EventMap.
orig | [in] The EventMap whose leaves we want to split. [may be either a trivial or a non-trivial one]. |
stats | [in] The statistics for splitting the tree; if you do not want a particular subset of leaves to be split, make sure the stats corresponding to those leaves are not present in "stats". |
qcfg | [in] Configuration class that contains initial questions (e.g. sets of phones) for each key and says whether to refine these questions during tree building. |
thresh | [in] A log-likelihood threshold (e.g. 300) that can be used to limit the number of leaves; you can use zero and set max_leaves instead. |
max_leaves | [in] Will stop leaves being split after they reach this number. |
num_leaves | [in,out] A pointer used to allocate leaves; always corresponds to the current number of leaves (is incremented when this is increased). |
objf_impr_out | [out] If non-NULL, will be set to the objective improvement due to splitting (not normalized by the number of frames). |
smallest_split_change_out | If non-NULL, will be set to the smallest objective-function improvement that we got from splitting any leaf; useful to provide a threshold for ClusterEventMap. |
Definition at line 532 of file build-tree-utils.cc.
References DecisionTreeSplitter::BestSplit(), EventMap::Copy(), count, DecisionTreeSplitter::DecisionTreeSplitter(), DecisionTreeSplitter::GetMap(), rnnlm::i, KALDI_ASSERT, KALDI_LOG, and kaldi::SplitStatsByMap().
Referenced by kaldi::BuildTree(), kaldi::BuildTreeTwoLevel(), kaldi::TestClusterEventMapRestricted(), kaldi::TestShareEventMapLeaves(), and kaldi::TestSplitDecisionTree().
Returns a tree with just one node.
Used @ start of tree-building process. Not really used in current recipes.
Definition at line 152 of file build-tree-utils.h.
Referenced by kaldi::TestClusterEventMap(), kaldi::TestClusterEventMapGetMappingAndRenumberEventMap(), kaldi::TestClusterEventMapGetMappingAndRenumberEventMap2(), kaldi::TestClusterEventMapRestricted(), kaldi::TestDoTableSplit(), kaldi::TestShareEventMapLeaves(), kaldi::TestSplitDecisionTree(), and kaldi::TestTrivialTree().