MaxPoolingComponent : Maxpooling component was firstly used in ConvNet for selecting an representative activation in an area. More...

#include <nnet-component.h>

Inheritance diagram for MaxpoolingComponent:

[legend]

Collaboration diagram for MaxpoolingComponent:

[legend]

Public Member Functions
void	Init (int32 input_dim, int32 output_dim, int32 pool_size, int32 pool_stride)

	MaxpoolingComponent (int32 input_dim, int32 output_dim, int32 pool_size, int32 pool_stride)

	MaxpoolingComponent ()

virtual std::string	Type () const

virtual void	InitFromString (std::string args)
	Initialize, typically from a line of a config file. More...

virtual int32	InputDim () const
	Get size of input vectors. More...

virtual int32	OutputDim () const
	Get size of output vectors. More...

virtual void	Propagate (const ChunkInfo &in_info, const ChunkInfo &out_info, const CuMatrixBase< BaseFloat > &in, CuMatrixBase< BaseFloat > *out) const
	Perform forward pass propagation Input->Output. More...

virtual void	Backprop (const ChunkInfo &in_info, const ChunkInfo &out_info, const CuMatrixBase< BaseFloat > &in_value, const CuMatrixBase< BaseFloat > &, const CuMatrixBase< BaseFloat > &out_deriv, Component to_update, CuMatrix< BaseFloat > in_deriv) const
	Perform backward pass propagation of the derivative, and also either update the model (if to_update == this) or update another model or compute the model derivative (otherwise). More...

virtual bool	BackpropNeedsInput () const

virtual bool	BackpropNeedsOutput () const

virtual Component *	Copy () const
	Copy component (deep copy). More...

virtual void	Read (std::istream &is, bool binary)

virtual void	Write (std::ostream &os, bool binary) const
	Write component to stream. More...

virtual std::string	Info () const

Public Member Functions inherited from Component
	Component ()

virtual int32	Index () const
	Returns the index in the sequence of layers in the neural net; intended only to be used in debugging information. More...

virtual void	SetIndex (int32 index)

virtual std::vector< int32 >	Context () const
	Return a vector describing the temporal context this component requires for each frame of output, as a sorted list. More...

void	Propagate (const ChunkInfo &in_info, const ChunkInfo &out_info, const CuMatrixBase< BaseFloat > &in, CuMatrix< BaseFloat > *out) const
	A non-virtual propagate function that first resizes output if necessary. More...

virtual	~Component ()

Protected Attributes
int32	input_dim_

int32	output_dim_

int32	pool_size_

int32	pool_stride_

Additional Inherited Members
Static Public Member Functions inherited from Component
static Component *	ReadNew (std::istream &is, bool binary)
	Read component from stream. More...

static Component *	NewFromString (const std::string &initializer_line)
	Initialize the Component from one line that will contain first the type, e.g. More...

static Component *	NewComponentOfType (const std::string &type)
	Return a new Component of the given type e.g. More...

Detailed Description

MaxPoolingComponent : Maxpooling component was firstly used in ConvNet for selecting an representative activation in an area.

It inspired Maxout nonlinearity.

The input/output matrices are split to submatrices with width 'pool_stride_'. For instance, a minibatch of 512 frames is propagated by a convolutional layer, resulting in a 512 x 3840 input matrix for MaxpoolingComponent, which is composed of 128 feature maps for each frame (128 x 30). If you want a 3-to-1 maxpooling on each feature map, set 'pool_stride_' and 'pool_size_' as 128 and 3 respectively. Maxpooling component would create an output matrix of 512 x 1280. The 30 input neurons are grouped by a group size of 3, and the maximum in a group is selected, creating a smaller feature map of 10.

Our pooling does not supports overlaps, which simplifies the implementation (and was not helpful for Ossama).

Definition at line 468 of file nnet-component.h.

Constructor & Destructor Documentation

◆ MaxpoolingComponent() [1/2]

MaxpoolingComponent	(	int32	input_dim,
		int32	output_dim,
		int32	pool_size,
		int32	pool_stride
	)

inlineexplicit

Definition at line 472 of file nnet-component.h.

                                                                    {
     Init(input_dim, output_dim, pool_size, pool_stride);
   }

◆ MaxpoolingComponent() [2/2]

MaxpoolingComponent ( )

inline

Definition at line 476 of file nnet-component.h.

476 : input_dim_(0), output_dim_(0),

477 pool_size_(0), pool_stride_(0) { }

kaldi::nnet2::MaxpoolingComponent::output_dim_

int32 output_dim_

Definition: nnet-component.h:509

kaldi::nnet2::MaxpoolingComponent::input_dim_

int32 input_dim_

Definition: nnet-component.h:508

kaldi::nnet2::MaxpoolingComponent::pool_size_

int32 pool_size_

Definition: nnet-component.h:510

kaldi::nnet2::MaxpoolingComponent::pool_stride_

int32 pool_stride_

Definition: nnet-component.h:511

Member Function Documentation

◆ Backprop()

void Backprop	(	const ChunkInfo &	in_info,
		const ChunkInfo &	out_info,
		const CuMatrixBase< BaseFloat > &	in_value,
		const CuMatrixBase< BaseFloat > &	out_value,
		const CuMatrixBase< BaseFloat > &	out_deriv,
		Component *	to_update,
		CuMatrix< BaseFloat > *	in_deriv
	)		const

virtual

Perform backward pass propagation of the derivative, and also either update the model (if to_update == this) or update another model or compute the model derivative (otherwise).

Note: in_value and out_value are the values of the input and output of the component, and these may be dummy variables if respectively BackpropNeedsInput() or BackpropNeedsOutput() return false for that component (not all components need these).

num_chunks lets us treat the input matrix as contiguous-in-time chunks of equal size; it only matters if splicing is involved.

Implements Component.

Definition at line 4318 of file nnet-component.cc.

References CuMatrixBase< Real >::ColRange(), CuMatrixBase< Real >::EqualElementMask(), KALDI_ASSERT, kaldi::kSetZero, CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrix< Real >::Resize().

                                                                         {
   int32 num_patches = input_dim_ / pool_stride_;
   int32 num_pools = num_patches / pool_size_;
   std::vector<int32> patch_summands(num_patches, 0);
   in_deriv->Resize(in_value.NumRows(), in_value.NumCols(), kSetZero);
 
   for(int32 q = 0; q < num_pools; q++) {
     for(int32 r = 0; r < pool_size_; r++) {
       int32 p = r + q * pool_size_;
       CuSubMatrix<BaseFloat> in_p(in_value.ColRange(p * pool_stride_, pool_stride_));
       CuSubMatrix<BaseFloat> out_q(out_value.ColRange(q * pool_stride_, pool_stride_));
       CuSubMatrix<BaseFloat> tgt(in_deriv->ColRange(p * pool_stride_, pool_stride_));
       CuMatrix<BaseFloat> src(out_deriv.ColRange(q * pool_stride_, pool_stride_));
       // zero-out mask
       CuMatrix<BaseFloat> mask;
       in_p.EqualElementMask(out_q, &mask);
       src.MulElements(mask);
       tgt.AddMat(1.0, src);
       // summed deriv info
       patch_summands[p] += 1;
     }
   }
 
   // scale in_deriv of overlaped pools
   for(int32 p = 0; p < num_patches; p++) {
     CuSubMatrix<BaseFloat> tgt(in_deriv->ColRange(p * pool_stride_, pool_stride_));
     KALDI_ASSERT(patch_summands[p] > 0);
     tgt.Scale(1.0 / patch_summands[p]);
   }
 }

◆ BackpropNeedsInput()

virtual bool BackpropNeedsInput ( ) const

inlinevirtual

Reimplemented from Component.

Definition at line 494 of file nnet-component.h.

494 { return true; }

◆ BackpropNeedsOutput()

virtual bool BackpropNeedsOutput ( ) const

inlinevirtual

Reimplemented from Component.

Definition at line 495 of file nnet-component.h.

495 { return true; }

◆ Copy()

virtual Component* Copy ( ) const

inlinevirtual

Copy component (deep copy).

Implements Component.

Definition at line 496 of file nnet-component.h.

                                   {
     return new MaxpoolingComponent(input_dim_, output_dim_,
                                pool_size_, pool_stride_); }

◆ Info()

std::string Info ( ) const

virtual

Reimplemented from Component.

Definition at line 4380 of file nnet-component.cc.

References Convolutional1dComponent::Type().

                                           {
   std::stringstream stream;
   stream << Type() << ", input-dim = " << input_dim_
          << ", output-dim = " << output_dim_
          << ", pool-size = " << pool_size_
          << ", pool-stride = " << pool_stride_;
   return stream.str();
 }

◆ Init()

void Init	(	int32	input_dim,
		int32	output_dim,
		int32	pool_size,
		int32	pool_stride
	)

Definition at line 4251 of file nnet-component.cc.

References KALDI_ASSERT.

                                                                     {
   input_dim_ = input_dim;
   output_dim_ = output_dim;
   pool_size_ = pool_size;
   pool_stride_ = pool_stride;
 
   // sanity check
   // number of patches
   KALDI_ASSERT(input_dim_ % pool_stride_ == 0);
   int32 num_patches = input_dim_ / pool_stride_;
   // number of pools
   KALDI_ASSERT(num_patches % pool_size_ == 0);
   int32 num_pools = num_patches / pool_size_;
   // check output dim
   KALDI_ASSERT(output_dim_ == num_pools * pool_stride_);
 }

◆ InitFromString()

void InitFromString ( std::string args )

virtual

Initialize, typically from a line of a config file.

The "args" will contain any parameters that need to be passed to the Component, e.g. dimensions.

Implements Component.

Definition at line 4269 of file nnet-component.cc.

References Convolutional1dComponent::Init(), KALDI_ERR, KALDI_LOG, kaldi::nnet2::ParseFromString(), and Convolutional1dComponent::Type().

Referenced by kaldi::nnet2::UnitTestMaxpoolingComponent().

                                                        {
   std::string orig_args(args);
   int32 input_dim = 0;
   int32 output_dim = 0;
   int32 pool_size = -1, pool_stride = -1;
   bool ok = true;
 
   ok = ok && ParseFromString("input-dim", &args, &input_dim);
   ok = ok && ParseFromString("output-dim", &args, &output_dim);
   ok = ok && ParseFromString("pool-size", &args, &pool_size);
   ok = ok && ParseFromString("pool-stride", &args, &pool_stride);
 
   KALDI_LOG << output_dim << " " << input_dim << " " << ok;
   KALDI_LOG << "Pool: " << pool_size << " "
             << pool_stride << " " << ok;
   if (!ok || !args.empty() || output_dim <= 0)
     KALDI_ERR << "Invalid initializer for layer of type "
               << Type() << ": \"" << orig_args << "\"";
   Init(input_dim, output_dim, pool_size, pool_stride);
 }

◆ InputDim()

virtual int32 InputDim ( ) const

inlinevirtual

Get size of input vectors.

Implements Component.

Definition at line 480 of file nnet-component.h.

480 { return input_dim_; }

kaldi::nnet2::MaxpoolingComponent::input_dim_

int32 input_dim_

Definition: nnet-component.h:508

◆ OutputDim()

virtual int32 OutputDim ( ) const

inlinevirtual

Get size of output vectors.

Implements Component.

Definition at line 481 of file nnet-component.h.

References Component::Propagate().

481 { return output_dim_; }

kaldi::nnet2::MaxpoolingComponent::output_dim_

int32 output_dim_

Definition: nnet-component.h:509

◆ Propagate()

void Propagate	(	const ChunkInfo &	in_info,
		const ChunkInfo &	out_info,
		const CuMatrixBase< BaseFloat > &	in,
		CuMatrixBase< BaseFloat > *	out
	)		const

virtual

Perform forward pass propagation Input->Output.

Each row is one frame or training example. Interpreted as "num_chunks" equally sized chunks of frames; this only matters for layers that do things like context splicing. Typically this variable will either be 1 (when we're processing a single contiguous chunk of data) or will be the same as in.NumFrames(), but other values are possible if some layers do splicing.

Implements Component.

Definition at line 4295 of file nnet-component.cc.

References ChunkInfo::CheckSize(), CuMatrixBase< Real >::ColRange(), KALDI_ASSERT, ChunkInfo::NumChunks(), and CuMatrixBase< Real >::Set().

                                                                          {
   in_info.CheckSize(in);
   out_info.CheckSize(*out);
   KALDI_ASSERT(in_info.NumChunks() == out_info.NumChunks());
   int32 num_patches = input_dim_ / pool_stride_;
   int32 num_pools = num_patches / pool_size_;
 
   // do the max-pooling
   for (int32 q = 0; q < num_pools; q++) {
     // get output buffer of the pool
     CuSubMatrix<BaseFloat> pool(out->ColRange(q * pool_stride_, pool_stride_));
     pool.Set(-1e20); // reset a large negative value
     for (int32 r = 0; r < pool_size_; r++) {
       // col-by-col block comparison pool
       int32 p = r + q * pool_size_;
       pool.Max(in.ColRange(p * pool_stride_, pool_stride_));
     }
   }
 }

◆ Read()

void Read	(	std::istream &	is,
		bool	binary
	)

virtual

Implements Component.

Definition at line 4355 of file nnet-component.cc.

References kaldi::nnet2::ExpectOneOrTwoTokens(), kaldi::ExpectToken(), and kaldi::ReadBasicType().

                                                           {
   ExpectOneOrTwoTokens(is, binary, "<MaxpoolingComponent>", "<InputDim>");
   ReadBasicType(is, binary, &input_dim_);
   ExpectToken(is, binary, "<OutputDim>");
   ReadBasicType(is, binary, &output_dim_);
   ExpectToken(is, binary, "<PoolSize>");
   ReadBasicType(is, binary, &pool_size_);
   ExpectToken(is, binary, "<PoolStride>");
   ReadBasicType(is, binary, &pool_stride_);
   ExpectToken(is, binary, "</MaxpoolingComponent>");
 }

◆ Type()

virtual std::string Type ( ) const

inlinevirtual

Implements Component.

Definition at line 478 of file nnet-component.h.

478 { return "MaxpoolingComponent"; }

◆ Write()

void Write	(	std::ostream &	os,
		bool	binary
	)		const

virtual

Write component to stream.

Implements Component.

Definition at line 4367 of file nnet-component.cc.

References kaldi::WriteBasicType(), and kaldi::WriteToken().

                                                                  {
   WriteToken(os, binary, "<MaxpoolingComponent>");
   WriteToken(os, binary, "<InputDim>");
   WriteBasicType(os, binary, input_dim_);
   WriteToken(os, binary, "<OutputDim>");
   WriteBasicType(os, binary, output_dim_);
   WriteToken(os, binary, "<PoolSize>");
   WriteBasicType(os, binary, pool_size_);
   WriteToken(os, binary, "<PoolStride>");
   WriteBasicType(os, binary, pool_stride_);
   WriteToken(os, binary, "</MaxpoolingComponent>");
 }

Member Data Documentation

◆ input_dim_

int32 input_dim_

protected

Definition at line 508 of file nnet-component.h.

◆ output_dim_

int32 output_dim_

protected

Definition at line 509 of file nnet-component.h.

◆ pool_size_

int32 pool_size_

protected

Definition at line 510 of file nnet-component.h.

◆ pool_stride_

int32 pool_stride_

protected

Definition at line 511 of file nnet-component.h.

The documentation for this class was generated from the following files:

nnet2/nnet-component.h
nnet2/nnet-component.cc

Public Member Functions

Protected Attributes

Additional Inherited Members

Detailed Description

Constructor & Destructor Documentation

◆ MaxpoolingComponent() [1/2]

◆ MaxpoolingComponent() [2/2]

Member Function Documentation

◆ Backprop()

◆ BackpropNeedsInput()

◆ BackpropNeedsOutput()

◆ Copy()

◆ Info()

◆ Init()

◆ InitFromString()

◆ InputDim()

◆ OutputDim()

◆ Propagate()

◆ Read()

◆ Type()

◆ Write()

Member Data Documentation

◆ input_dim_

◆ output_dim_

◆ pool_size_

◆ pool_stride_