#include <nnet-component.h>

Inheritance diagram for SoftmaxComponent:

Collaboration diagram for SoftmaxComponent:

[legend]

Public Member Functions
	SoftmaxComponent (int32 dim)

	SoftmaxComponent (const SoftmaxComponent &other)

	SoftmaxComponent ()

virtual std::string	Type () const

virtual bool	BackpropNeedsInput () const

virtual bool	BackpropNeedsOutput () const

virtual void	Propagate (const ChunkInfo &in_info, const ChunkInfo &out_info, const CuMatrixBase< BaseFloat > &in, CuMatrixBase< BaseFloat > *out) const
	Perform forward pass propagation Input->Output. More...

virtual void	Backprop (const ChunkInfo &in_info, const ChunkInfo &out_info, const CuMatrixBase< BaseFloat > &in_value, const CuMatrixBase< BaseFloat > &out_value, const CuMatrixBase< BaseFloat > &out_deriv, Component to_update, CuMatrix< BaseFloat > in_deriv) const
	Perform backward pass propagation of the derivative, and also either update the model (if to_update == this) or update another model or compute the model derivative (otherwise). More...

void	MixUp (int32 num_mixtures, BaseFloat power, BaseFloat min_count, BaseFloat perturb_stddev, AffineComponent ac, SumGroupComponent sc)
	Allocate mixtures to states via a power rule, and add any new mixtures. More...

virtual Component *	Copy () const
	Copy component (deep copy). More...

Public Member Functions inherited from NonlinearComponent
void	Init (int32 dim)

	NonlinearComponent (int32 dim)

	NonlinearComponent ()

	NonlinearComponent (const NonlinearComponent &other)

virtual int32	InputDim () const
	Get size of input vectors. More...

virtual int32	OutputDim () const
	Get size of output vectors. More...

virtual void	InitFromString (std::string args)
	We implement InitFromString at this level. More...

virtual void	Read (std::istream &is, bool binary)
	We implement Read at this level as it just needs the Type(). More...

virtual void	Write (std::ostream &os, bool binary) const
	Write component to stream. More...

void	Scale (BaseFloat scale)

void	Add (BaseFloat alpha, const NonlinearComponent &other)

const CuVector< double > &	ValueSum () const

const CuVector< double > &	DerivSum () const

double	Count () const

void	SetDim (int32 dim)

Public Member Functions inherited from Component
	Component ()

virtual int32	Index () const
	Returns the index in the sequence of layers in the neural net; intended only to be used in debugging information. More...

virtual void	SetIndex (int32 index)

virtual std::vector< int32 >	Context () const
	Return a vector describing the temporal context this component requires for each frame of output, as a sorted list. More...

void	Propagate (const ChunkInfo &in_info, const ChunkInfo &out_info, const CuMatrixBase< BaseFloat > &in, CuMatrix< BaseFloat > *out) const
	A non-virtual propagate function that first resizes output if necessary. More...

virtual std::string	Info () const

virtual	~Component ()

Private Member Functions
SoftmaxComponent &	operator= (const SoftmaxComponent &other)

Additional Inherited Members
Static Public Member Functions inherited from Component
static Component *	ReadNew (std::istream &is, bool binary)
	Read component from stream. More...

static Component *	NewFromString (const std::string &initializer_line)
	Initialize the Component from one line that will contain first the type, e.g. More...

static Component *	NewComponentOfType (const std::string &type)
	Return a new Component of the given type e.g. More...

Protected Member Functions inherited from NonlinearComponent
void	UpdateStats (const CuMatrixBase< BaseFloat > &out_value, const CuMatrixBase< BaseFloat > *deriv=NULL)

const NonlinearComponent &	operator= (const NonlinearComponent &other)

Protected Attributes inherited from NonlinearComponent
int32	dim_

CuVector< double >	value_sum_

CuVector< double >	deriv_sum_

double	count_

std::mutex	mutex_

Detailed Description

Definition at line 777 of file nnet-component.h.

Constructor & Destructor Documentation

◆ SoftmaxComponent() [1/3]

SoftmaxComponent ( int32 dim )

inlineexplicit

Definition at line 779 of file nnet-component.h.

779 : NonlinearComponent(dim) { }

kaldi::nnet2::NonlinearComponent::NonlinearComponent

NonlinearComponent()

Definition: nnet-component.h:356

◆ SoftmaxComponent() [2/3]

SoftmaxComponent ( const SoftmaxComponent & other )

inlineexplicit

Definition at line 780 of file nnet-component.h.

780 : NonlinearComponent(other) { }

kaldi::nnet2::NonlinearComponent::NonlinearComponent

NonlinearComponent()

Definition: nnet-component.h:356

◆ SoftmaxComponent() [3/3]

SoftmaxComponent ( )

inline

Definition at line 781 of file nnet-component.h.

781 { }

Member Function Documentation

◆ Backprop()

void Backprop	(	const ChunkInfo &	in_info,
		const ChunkInfo &	out_info,
		const CuMatrixBase< BaseFloat > &	in_value,
		const CuMatrixBase< BaseFloat > &	out_value,
		const CuMatrixBase< BaseFloat > &	out_deriv,
		Component *	to_update,
		CuMatrix< BaseFloat > *	in_deriv
	)		const

virtual

Perform backward pass propagation of the derivative, and also either update the model (if to_update == this) or update another model or compute the model derivative (otherwise).

Note: in_value and out_value are the values of the input and output of the component, and these may be dummy variables if respectively BackpropNeedsInput() or BackpropNeedsOutput() return false for that component (not all components need these).

num_chunks lets us treat the input matrix as contiguous-in-time chunks of equal size; it only matters if splicing is involved.

Implements Component.

Definition at line 919 of file nnet-component.cc.

References CuMatrixBase< Real >::DiffSoftmaxPerRow(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), CuMatrix< Real >::Resize(), and NonlinearComponent::UpdateStats().

                                                                       {
   /*
     Note on the derivative of the softmax function: let it be
     p_i = exp(x_i) / sum_i exp_i
     The [matrix-valued] Jacobian of this function is
     diag(p) - p p^T
     Let the derivative vector at the output be e, and at the input be
     d.  We have
     d = diag(p) e - p (p^T e).
     d_i = p_i e_i - p_i (p^T e).
   */
   in_deriv->Resize(out_deriv.NumRows(), out_deriv.NumCols());
   in_deriv->DiffSoftmaxPerRow(out_value, out_deriv);
 
   // The SoftmaxComponent does not have any real trainable parameters, but
   // during the backprop we store some statistics on the average counts;
   // these may be used in mixing-up.
   if (to_update != NULL) {
     NonlinearComponent *to_update_nonlinear =
         dynamic_cast<NonlinearComponent*>(to_update);
     to_update_nonlinear->UpdateStats(out_value);
   }
 }

◆ BackpropNeedsInput()

virtual bool BackpropNeedsInput ( ) const

inlinevirtual

Reimplemented from Component.

Definition at line 783 of file nnet-component.h.

783 { return false; }

◆ BackpropNeedsOutput()

virtual bool BackpropNeedsOutput ( ) const

inlinevirtual

Reimplemented from Component.

Definition at line 784 of file nnet-component.h.

References Component::Propagate().

784 { return true; }

◆ Copy()

virtual Component* Copy ( ) const

inlinevirtual

Copy component (deep copy).

Implements Component.

Definition at line 805 of file nnet-component.h.

805 { return new SoftmaxComponent(*this); }

kaldi::nnet2::SoftmaxComponent::SoftmaxComponent

SoftmaxComponent()

Definition: nnet-component.h:781

◆ MixUp()

void MixUp	(	int32	num_mixtures,
		BaseFloat	power,
		BaseFloat	min_count,
		BaseFloat	perturb_stddev,
		AffineComponent *	ac,
		SumGroupComponent *	sc
	)

Allocate mixtures to states via a power rule, and add any new mixtures.

Total the count out of

all the output dims of the softmax layer that correspond to this mixture. We'll use this total to allocate new quasi-Gaussians.

Definition at line 107 of file mixup-nnet.cc.

References VectorBase< Real >::AddVec(), AffineComponent::bias_params_, CuVectorBase< Real >::CopyFromVec(), VectorBase< Real >::CopyFromVec(), NonlinearComponent::count_, rnnlm::d, VectorBase< Real >::Data(), VectorBase< Real >::Dim(), CuVectorBase< Real >::Dim(), NonlinearComponent::dim_, SumGroupComponent::GetSizes(), kaldi::GetSplitTargets(), rnnlm::i, SumGroupComponent::Init(), AffineComponent::InputDim(), KALDI_ASSERT, KALDI_LOG, AffineComponent::linear_params_, kaldi::Log(), VectorBase< Real >::Range(), MatrixBase< Real >::Range(), CuVector< Real >::Resize(), AffineComponent::SetParams(), VectorBase< Real >::SetRandn(), CuVectorBase< Real >::Sum(), and NonlinearComponent::value_sum_.

Referenced by kaldi::nnet2::MixupNnet().

                                                     {
   // "counts" is derived from this->counts_ by summing.
   std::vector<int32> old_sizes;
   sc->GetSizes(&old_sizes);
   Vector<BaseFloat> counts(old_sizes.size());
   int32 old_dim = 0;
   for (size_t i = 0; i < old_sizes.size(); i++) {
     int32 this_input_dim = old_sizes[i];
     BaseFloat this_tot_count = 0.0; 
     for (int32 d = 0; d < this_input_dim; d++, old_dim++)
       this_tot_count += this->value_sum_(old_dim);
     counts(i) = this_tot_count;
   }
   KALDI_ASSERT(old_dim == value_sum_.Dim());
   KALDI_ASSERT(counts.Sum() > 0 && "Cannot do mixing up without counts.");
 
   std::vector<int32> targets; // #mixtures for each state.
 
 
   // Get the target number of mixtures for each state.
   GetSplitTargets(counts, num_mixtures, power, min_count, &targets);
   KALDI_ASSERT(targets.size() == old_sizes.size());
   std::vector<int32> new_sizes(old_sizes.size());
   for (size_t i = 0; i < targets.size(); i++)
     new_sizes[i] = std::max(targets[i], old_sizes[i]);
   int32 new_dim = std::accumulate(new_sizes.begin(), new_sizes.end(),
                                   static_cast<int32>(0)),
       affine_input_dim = ac->InputDim();
   KALDI_ASSERT(new_dim >= old_dim);
   sc->Init(new_sizes);
   
   // bias and linear terms from affine component:
   Vector<BaseFloat> old_bias_term(ac->bias_params_);
   Matrix<BaseFloat> old_linear_term(ac->linear_params_);
   
   Vector<BaseFloat> new_bias_term(new_dim);
   Matrix<BaseFloat> new_linear_term(new_dim, affine_input_dim);
   Vector<BaseFloat> new_counts(new_dim);
 
   // old_offset and new_offset are offsets into the dimension at the
   // input/output of the softmax component, before and after mixing up
   // respectively.  They get incremented in the following loop.
   int32 old_offset = 0, new_offset = 0;
   Vector<BaseFloat> old_counts(this->value_sum_);
   for (size_t i = 0; i < old_sizes.size(); i++) {
     int32 this_old_dim = old_sizes[i],
           this_new_dim = new_sizes[i],
           this_cur_dim = this_old_dim; // this_cur_dim is loop variable.
     
     SubMatrix<BaseFloat> this_old_linear_term(old_linear_term,
                                               old_offset, this_old_dim,
                                               0, affine_input_dim),
         this_new_linear_term(new_linear_term,
                              new_offset, this_new_dim,
                              0, affine_input_dim);
     SubVector<BaseFloat> this_old_bias_term(old_bias_term,
                                             old_offset, this_old_dim),
         this_new_bias_term(new_bias_term, new_offset, this_new_dim),
         this_old_counts(old_counts,
                         old_offset, this_old_dim),
         this_new_counts(new_counts,
                         new_offset, this_new_dim);
     
     // Copy the same-dimensional part of the parameters and counts.
     this_new_linear_term.Range(0, this_old_dim, 0, affine_input_dim).
         CopyFromMat(this_old_linear_term);
     this_new_bias_term.Range(0, this_old_dim).
         CopyFromVec(this_old_bias_term);
     this_new_counts.Range(0, this_old_dim).
         CopyFromVec(this_old_counts);
     // this_new_params is the mixture weights.
     // Add the new components...
     for (; this_cur_dim < this_new_dim; this_cur_dim++) {
       BaseFloat *count_begin = this_new_counts.Data(),
           *count_end  = count_begin + this_cur_dim,
           *count_max = std::max_element(count_begin, count_end);
       KALDI_ASSERT(*count_max > 0.0);
       *count_max *= 0.5;
       *count_end = *count_max; // count for the element we're adding.
       int32 max_index = static_cast<int32>(count_max - count_begin),
           new_index = this_cur_dim;
       SubVector<BaseFloat> cur_vec(this_new_linear_term, max_index),
           new_vec(this_new_linear_term, new_index);
       new_vec.CopyFromVec(cur_vec);
       Vector<BaseFloat> rand(affine_input_dim);
       rand.SetRandn();
       cur_vec.AddVec(perturb_stddev, rand);
       new_vec.AddVec(-perturb_stddev, rand);
       this_new_bias_term(max_index) += Log(0.5);
       this_new_bias_term(new_index) = this_new_bias_term(max_index);
     }
     old_offset += this_old_dim;
     new_offset += this_new_dim;
   }
   KALDI_ASSERT(old_offset == old_dim && new_offset == new_dim);
   ac->SetParams(new_bias_term, new_linear_term);
   this->value_sum_.Resize(new_counts.Dim());
   this->value_sum_.CopyFromVec(new_counts);
   this->count_ = this->value_sum_.Sum();
   this->dim_ = new_dim;
   KALDI_LOG << "Mixed up from dimension of " << old_dim << " to " << new_dim
             << " in the softmax layer.";
 }

◆ operator=()

SoftmaxComponent& operator= ( const SoftmaxComponent & other )

private

◆ Propagate()

void Propagate	(	const ChunkInfo &	in_info,
		const ChunkInfo &	out_info,
		const CuMatrixBase< BaseFloat > &	in,
		CuMatrixBase< BaseFloat > *	out
	)		const

virtual

Perform forward pass propagation Input->Output.

Each row is one frame or training example. Interpreted as "num_chunks" equally sized chunks of frames; this only matters for layers that do things like context splicing. Typically this variable will either be 1 (when we're processing a single contiguous chunk of data) or will be the same as in.NumFrames(), but other values are possible if some layers do splicing.

Implements Component.

Definition at line 900 of file nnet-component.cc.

References CuMatrixBase< Real >::ApplyFloor(), ChunkInfo::CheckSize(), KALDI_ASSERT, ChunkInfo::NumChunks(), and CuMatrixBase< Real >::SoftMaxPerRow().

                                                                       {
   in_info.CheckSize(in);
   out_info.CheckSize(*out);
   KALDI_ASSERT(in_info.NumChunks() == out_info.NumChunks());
 
   // Apply softmax function to each row of the output...
   // for that row, we do
   // x_i = exp(x_i) / sum_j exp(x_j).
 
   out->SoftMaxPerRow(in);
 
   // This floor on the output helps us deal with
   // almost-zeros in a way that doesn't lead to overflow.
   out->ApplyFloor(1.0e-20);
 }

◆ Type()

virtual std::string Type ( ) const

inlinevirtual

Implements Component.

Definition at line 782 of file nnet-component.h.

782 { return "SoftmaxComponent"; }

The documentation for this class was generated from the following files:

Public Member Functions

Private Member Functions

Additional Inherited Members

Detailed Description

Constructor & Destructor Documentation

◆ SoftmaxComponent() [1/3]

◆ SoftmaxComponent() [2/3]

◆ SoftmaxComponent() [3/3]

Member Function Documentation

◆ Backprop()

◆ BackpropNeedsInput()

◆ BackpropNeedsOutput()

◆ Copy()

◆ MixUp()

◆ operator=()

◆ Propagate()

◆ Type()