build-tree-utils.cc File Reference
#include <set>
#include <queue>
#include "util/stl-utils.h"
#include "tree/build-tree-utils.h"
Include dependency graph for build-tree-utils.cc:

Go to the source code of this file.

Classes

class  DecisionTreeSplitter
 

Namespaces

 kaldi
 This code computes Goodness of Pronunciation (GOP) and extracts phone-level pronunciation feature for mispronunciations detection tasks, the reference:
 

Functions

void WriteBuildTreeStats (std::ostream &os, bool binary, const BuildTreeStatsType &stats)
 Writes BuildTreeStats object. This works even if pointers are NULL. More...
 
void ReadBuildTreeStats (std::istream &is, bool binary, const Clusterable &example, BuildTreeStatsType *stats)
 Reads BuildTreeStats object. More...
 
bool PossibleValues (EventKeyType key, const BuildTreeStatsType &stats, std::vector< EventValueType > *ans)
 Convenience function e.g. More...
 
static void GetEventKeys (const EventType &vec, std::vector< EventKeyType > *keys)
 
void FindAllKeys (const BuildTreeStatsType &stats, AllKeysType keys_type, std::vector< EventKeyType > *keys)
 FindAllKeys puts in *keys the (sorted, unique) list of all key identities in the stats. More...
 
EventMap * DoTableSplit (const EventMap &orig, EventKeyType key, const BuildTreeStatsType &stats, int32 *num_leaves)
 DoTableSplit does a complete split on this key (e.g. More...
 
EventMap * DoTableSplitMultiple (const EventMap &orig, const std::vector< EventKeyType > &keys, const BuildTreeStatsType &stats, int32 *num_leaves)
 DoTableSplitMultiple does a complete split on all the keys, in order from keys[0], keys[1] and so on. More...
 
void SplitStatsByMap (const BuildTreeStatsType &stats_in, const EventMap &e, std::vector< BuildTreeStatsType > *stats_out)
 Splits stats according to the EventMap, indexing them at output by the leaf type. More...
 
void SplitStatsByKey (const BuildTreeStatsType &stats_in, EventKeyType key, std::vector< BuildTreeStatsType > *stats_out)
 SplitStatsByKey splits stats up according to the value of a particular key, which must be always defined and nonnegative. More...
 
void FilterStatsByKey (const BuildTreeStatsType &stats_in, EventKeyType key, std::vector< EventValueType > &values, bool include_if_present, BuildTreeStatsType *stats_out)
 FilterStatsByKey filters the stats according the value of a specified key. More...
 
Clusterable * SumStats (const BuildTreeStatsType &stats_in)
 Sums stats, or returns NULL stats_in has no non-NULL stats. More...
 
BaseFloat SumNormalizer (const BuildTreeStatsType &stats_in)
 Sums the normalizer [typically, data-count] over the stats. More...
 
BaseFloat SumObjf (const BuildTreeStatsType &stats_in)
 Sums the objective function over the stats. More...
 
void SumStatsVec (const std::vector< BuildTreeStatsType > &stats_in, std::vector< Clusterable * > *stats_out)
 Sum a vector of stats. More...
 
BaseFloat ObjfGivenMap (const BuildTreeStatsType &stats_in, const EventMap &e)
 Cluster the stats given the event map return the total objf given those clusters. More...
 
BaseFloat ComputeInitialSplit (const std::vector< Clusterable *> &summed_stats, const Questions &q_opts, EventKeyType key, std::vector< EventValueType > *yes_set)
 
BaseFloat FindBestSplitForKey (const BuildTreeStatsType &stats, const Questions &qcfg, EventKeyType key, std::vector< EventValueType > *yes_set)
 FindBestSplitForKey is a function used in DoDecisionTreeSplit. More...
 
EventMap * SplitDecisionTree (const EventMap &orig, const BuildTreeStatsType &stats, Questions &qcfg, BaseFloat thresh, int32 max_leaves, int32 *num_leaves, BaseFloat *objf_impr_out, BaseFloat *smallest_split_change_out)
 Does a decision-tree split at the leaves of an EventMap. More...
 
int ClusterEventMapGetMapping (const EventMap &e_in, const BuildTreeStatsType &stats, BaseFloat thresh, std::vector< EventMap * > *mapping)
 "ClusterEventMapGetMapping" clusters the leaves of the EventMap, with "thresh" a delta-likelihood threshold to control how many leaves we combine (might be the same as the delta-like threshold used in splitting. More...
 
EventMap * RenumberEventMap (const EventMap &e_in, int32 *num_leaves)
 RenumberEventMap [intended to be used after calling ClusterEventMap] renumbers an EventMap so its leaves are consecutive. More...
 
EventMap * MapEventMapLeaves (const EventMap &e_in, const std::vector< int32 > &mapping)
 This function remaps the event-map leaves using this mapping, indexed by the number at leaf. More...
 
EventMap * ClusterEventMap (const EventMap &e_in, const BuildTreeStatsType &stats, BaseFloat thresh, int32 *num_removed)
 This is as ClusterEventMapGetMapping but a more convenient interface that exposes less of the internals. More...
 
EventMap * ShareEventMapLeaves (const EventMap &e_in, EventKeyType key, std::vector< std::vector< EventValueType > > &values, int32 *num_leaves)
 ShareEventMapLeaves performs a quite specific function that allows us to generate trees where, for a certain list of phones, and for all states in the phone, all the pdf's are shared. More...
 
void DeleteBuildTreeStats (BuildTreeStatsType *stats)
 This frees the Clusterable* pointers in "stats", where non-NULL, and sets them to NULL. More...
 
EventMap * GetToLengthMap (const BuildTreeStatsType &stats, int32 P, const std::vector< EventValueType > *phones, int32 default_length)
 
static int32 ClusterEventMapRestrictedHelper (const EventMap &e_in, const BuildTreeStatsType &stats, BaseFloat thresh, std::vector< EventKeyType > keys, std::vector< EventMap *> *leaf_mapping)
 
EventMap * ClusterEventMapRestrictedByKeys (const EventMap &e_in, const BuildTreeStatsType &stats, BaseFloat thresh, const std::vector< EventKeyType > &keys, int32 *num_removed)
 This is as ClusterEventMap, but first splits the stats on the keys specified in "keys" (e.g. More...
 
EventMap * ClusterEventMapRestrictedByMap (const EventMap &e_in, const BuildTreeStatsType &stats, BaseFloat thresh, const EventMap &e_restrict, int32 *num_removed)
 This version of ClusterEventMapRestricted restricts the clustering to only allow things that "e_restrict" maps to the same value to be clustered together. More...
 
EventMap * ClusterEventMapToNClustersRestrictedByMap (const EventMap &e_in, const BuildTreeStatsType &stats, int32 num_clusters, const EventMap &e_restrict, int32 *num_removed)
 This version of ClusterEventMapRestrictedByMap clusters to get a specific number of clusters as specified by 'num_clusters'. More...
 
EventMap * GetStubMap (int32 P, const std::vector< std::vector< int32 > > &phone_sets, const std::vector< int32 > &phone2num_pdf_classes, const std::vector< bool > &share_roots, int32 *num_leaves)
 GetStubMap is used in tree-building functions to get the initial to-states map, before the decision-tree-building process. More...
 
bool ConvertStats (int32 oldN, int32 oldP, int32 newN, int32 newP, BuildTreeStatsType *stats)
 Converts stats from a given context-window (N) and central-position (P) to a different N and P, by possibly reducing context. More...