All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
MaxpoolingComponent Class Reference

MaxPoolingComponent : Maxpooling component was firstly used in ConvNet for selecting an representative activation in an area. More...

#include <nnet-component.h>

Inheritance diagram for MaxpoolingComponent:
Collaboration diagram for MaxpoolingComponent:

Public Member Functions

void Init (int32 input_dim, int32 output_dim, int32 pool_size, int32 pool_stride)
 
 MaxpoolingComponent (int32 input_dim, int32 output_dim, int32 pool_size, int32 pool_stride)
 
 MaxpoolingComponent ()
 
virtual std::string Type () const
 
virtual void InitFromString (std::string args)
 Initialize, typically from a line of a config file. More...
 
virtual int32 InputDim () const
 Get size of input vectors. More...
 
virtual int32 OutputDim () const
 Get size of output vectors. More...
 
virtual void Propagate (const ChunkInfo &in_info, const ChunkInfo &out_info, const CuMatrixBase< BaseFloat > &in, CuMatrixBase< BaseFloat > *out) const
 Perform forward pass propagation Input->Output. More...
 
virtual void Backprop (const ChunkInfo &in_info, const ChunkInfo &out_info, const CuMatrixBase< BaseFloat > &in_value, const CuMatrixBase< BaseFloat > &, const CuMatrixBase< BaseFloat > &out_deriv, Component *to_update, CuMatrix< BaseFloat > *in_deriv) const
 Perform backward pass propagation of the derivative, and also either update the model (if to_update == this) or update another model or compute the model derivative (otherwise). More...
 
virtual bool BackpropNeedsInput () const
 
virtual bool BackpropNeedsOutput () const
 
virtual ComponentCopy () const
 Copy component (deep copy). More...
 
virtual void Read (std::istream &is, bool binary)
 
virtual void Write (std::ostream &os, bool binary) const
 Write component to stream. More...
 
virtual std::string Info () const
 
- Public Member Functions inherited from Component
 Component ()
 
virtual int32 Index () const
 Returns the index in the sequence of layers in the neural net; intended only to be used in debugging information. More...
 
virtual void SetIndex (int32 index)
 
virtual std::vector< int32 > Context () const
 Return a vector describing the temporal context this component requires for each frame of output, as a sorted list. More...
 
void Propagate (const ChunkInfo &in_info, const ChunkInfo &out_info, const CuMatrixBase< BaseFloat > &in, CuMatrix< BaseFloat > *out) const
 A non-virtual propagate function that first resizes output if necessary. More...
 
virtual ~Component ()
 

Protected Attributes

int32 input_dim_
 
int32 output_dim_
 
int32 pool_size_
 
int32 pool_stride_
 

Additional Inherited Members

- Static Public Member Functions inherited from Component
static ComponentReadNew (std::istream &is, bool binary)
 Read component from stream. More...
 
static ComponentNewFromString (const std::string &initializer_line)
 Initialize the Component from one line that will contain first the type, e.g. More...
 
static ComponentNewComponentOfType (const std::string &type)
 Return a new Component of the given type e.g. More...
 

Detailed Description

MaxPoolingComponent : Maxpooling component was firstly used in ConvNet for selecting an representative activation in an area.

It inspired Maxout nonlinearity.

The input/output matrices are split to submatrices with width 'pool_stride_'. For instance, a minibatch of 512 frames is propagated by a convolutional layer, resulting in a 512 x 3840 input matrix for MaxpoolingComponent, which is composed of 128 feature maps for each frame (128 x 30). If you want a 3-to-1 maxpooling on each feature map, set 'pool_stride_' and 'pool_size_' as 128 and 3 respectively. Maxpooling component would create an output matrix of 512 x 1280. The 30 input neurons are grouped by a group size of 3, and the maximum in a group is selected, creating a smaller feature map of 10.

Our pooling does not supports overlaps, which simplifies the implementation (and was not helpful for Ossama).

Definition at line 468 of file nnet-component.h.

Constructor & Destructor Documentation

MaxpoolingComponent ( int32  input_dim,
int32  output_dim,
int32  pool_size,
int32  pool_stride 
)
inlineexplicit

Definition at line 472 of file nnet-component.h.

References MaxpoolingComponent::Init().

473  {
474  Init(input_dim, output_dim, pool_size, pool_stride);
475  }
void Init(int32 input_dim, int32 output_dim, int32 pool_size, int32 pool_stride)

Member Function Documentation

void Backprop ( const ChunkInfo in_info,
const ChunkInfo out_info,
const CuMatrixBase< BaseFloat > &  in_value,
const CuMatrixBase< BaseFloat > &  out_value,
const CuMatrixBase< BaseFloat > &  out_deriv,
Component to_update,
CuMatrix< BaseFloat > *  in_deriv 
) const
virtual

Perform backward pass propagation of the derivative, and also either update the model (if to_update == this) or update another model or compute the model derivative (otherwise).

Note: in_value and out_value are the values of the input and output of the component, and these may be dummy variables if respectively BackpropNeedsInput() or BackpropNeedsOutput() return false for that component (not all components need these).

num_chunks lets us treat the input matrix as contiguous-in-time chunks of equal size; it only matters if splicing is involved.

Implements Component.

Definition at line 4318 of file nnet-component.cc.

References CuMatrixBase< Real >::ColRange(), CuMatrixBase< Real >::EqualElementMask(), MaxpoolingComponent::input_dim_, KALDI_ASSERT, kaldi::kSetZero, CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), MaxpoolingComponent::pool_size_, MaxpoolingComponent::pool_stride_, and CuMatrix< Real >::Resize().

4324  {
4325  int32 num_patches = input_dim_ / pool_stride_;
4326  int32 num_pools = num_patches / pool_size_;
4327  std::vector<int32> patch_summands(num_patches, 0);
4328  in_deriv->Resize(in_value.NumRows(), in_value.NumCols(), kSetZero);
4329 
4330  for(int32 q = 0; q < num_pools; q++) {
4331  for(int32 r = 0; r < pool_size_; r++) {
4332  int32 p = r + q * pool_size_;
4333  CuSubMatrix<BaseFloat> in_p(in_value.ColRange(p * pool_stride_, pool_stride_));
4334  CuSubMatrix<BaseFloat> out_q(out_value.ColRange(q * pool_stride_, pool_stride_));
4335  CuSubMatrix<BaseFloat> tgt(in_deriv->ColRange(p * pool_stride_, pool_stride_));
4336  CuMatrix<BaseFloat> src(out_deriv.ColRange(q * pool_stride_, pool_stride_));
4337  // zero-out mask
4338  CuMatrix<BaseFloat> mask;
4339  in_p.EqualElementMask(out_q, &mask);
4340  src.MulElements(mask);
4341  tgt.AddMat(1.0, src);
4342  // summed deriv info
4343  patch_summands[p] += 1;
4344  }
4345  }
4346 
4347  // scale in_deriv of overlaped pools
4348  for(int32 p = 0; p < num_patches; p++) {
4349  CuSubMatrix<BaseFloat> tgt(in_deriv->ColRange(p * pool_stride_, pool_stride_));
4350  KALDI_ASSERT(patch_summands[p] > 0);
4351  tgt.Scale(1.0 / patch_summands[p]);
4352  }
4353 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
CuSubMatrix< Real > ColRange(const MatrixIndexT col_offset, const MatrixIndexT num_cols) const
Definition: cu-matrix.h:544
void Resize(MatrixIndexT rows, MatrixIndexT cols, MatrixResizeType resize_type=kSetZero, MatrixStrideType stride_type=kDefaultStride)
Allocate the memory.
Definition: cu-matrix.cc:47
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
virtual bool BackpropNeedsInput ( ) const
inlinevirtual

Reimplemented from Component.

Definition at line 494 of file nnet-component.h.

494 { return true; }
virtual bool BackpropNeedsOutput ( ) const
inlinevirtual

Reimplemented from Component.

Definition at line 495 of file nnet-component.h.

495 { return true; }
std::string Info ( ) const
virtual

Reimplemented from Component.

Definition at line 4380 of file nnet-component.cc.

References MaxpoolingComponent::input_dim_, MaxpoolingComponent::output_dim_, MaxpoolingComponent::pool_size_, MaxpoolingComponent::pool_stride_, and MaxpoolingComponent::Type().

4380  {
4381  std::stringstream stream;
4382  stream << Type() << ", input-dim = " << input_dim_
4383  << ", output-dim = " << output_dim_
4384  << ", pool-size = " << pool_size_
4385  << ", pool-stride = " << pool_stride_;
4386  return stream.str();
4387 }
virtual std::string Type() const
void Init ( int32  input_dim,
int32  output_dim,
int32  pool_size,
int32  pool_stride 
)

Definition at line 4251 of file nnet-component.cc.

References MaxpoolingComponent::input_dim_, KALDI_ASSERT, MaxpoolingComponent::output_dim_, MaxpoolingComponent::pool_size_, and MaxpoolingComponent::pool_stride_.

Referenced by MaxpoolingComponent::InitFromString(), and MaxpoolingComponent::MaxpoolingComponent().

4252  {
4253  input_dim_ = input_dim;
4254  output_dim_ = output_dim;
4255  pool_size_ = pool_size;
4256  pool_stride_ = pool_stride;
4257 
4258  // sanity check
4259  // number of patches
4261  int32 num_patches = input_dim_ / pool_stride_;
4262  // number of pools
4263  KALDI_ASSERT(num_patches % pool_size_ == 0);
4264  int32 num_pools = num_patches / pool_size_;
4265  // check output dim
4266  KALDI_ASSERT(output_dim_ == num_pools * pool_stride_);
4267 }
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void InitFromString ( std::string  args)
virtual

Initialize, typically from a line of a config file.

The "args" will contain any parameters that need to be passed to the Component, e.g. dimensions.

Implements Component.

Definition at line 4269 of file nnet-component.cc.

References MaxpoolingComponent::Init(), KALDI_ERR, KALDI_LOG, kaldi::nnet2::ParseFromString(), and MaxpoolingComponent::Type().

Referenced by kaldi::nnet2::UnitTestMaxpoolingComponent().

4269  {
4270  std::string orig_args(args);
4271  int32 input_dim = 0;
4272  int32 output_dim = 0;
4273  int32 pool_size = -1, pool_stride = -1;
4274  bool ok = true;
4275 
4276  ok = ok && ParseFromString("input-dim", &args, &input_dim);
4277  ok = ok && ParseFromString("output-dim", &args, &output_dim);
4278  ok = ok && ParseFromString("pool-size", &args, &pool_size);
4279  ok = ok && ParseFromString("pool-stride", &args, &pool_stride);
4280 
4281  KALDI_LOG << output_dim << " " << input_dim << " " << ok;
4282  KALDI_LOG << "Pool: " << pool_size << " "
4283  << pool_stride << " " << ok;
4284  if (!ok || !args.empty() || output_dim <= 0)
4285  KALDI_ERR << "Invalid initializer for layer of type "
4286  << Type() << ": \"" << orig_args << "\"";
4287  Init(input_dim, output_dim, pool_size, pool_stride);
4288 }
void Init(int32 input_dim, int32 output_dim, int32 pool_size, int32 pool_stride)
virtual std::string Type() const
bool ParseFromString(const std::string &name, std::string *string, int32 *param)
Functions used in Init routines.
#define KALDI_ERR
Definition: kaldi-error.h:127
#define KALDI_LOG
Definition: kaldi-error.h:133
virtual int32 InputDim ( ) const
inlinevirtual

Get size of input vectors.

Implements Component.

Definition at line 480 of file nnet-component.h.

References MaxpoolingComponent::input_dim_.

virtual int32 OutputDim ( ) const
inlinevirtual

Get size of output vectors.

Implements Component.

Definition at line 481 of file nnet-component.h.

References MaxpoolingComponent::output_dim_.

void Propagate ( const ChunkInfo in_info,
const ChunkInfo out_info,
const CuMatrixBase< BaseFloat > &  in,
CuMatrixBase< BaseFloat > *  out 
) const
virtual

Perform forward pass propagation Input->Output.

Each row is one frame or training example. Interpreted as "num_chunks" equally sized chunks of frames; this only matters for layers that do things like context splicing. Typically this variable will either be 1 (when we're processing a single contiguous chunk of data) or will be the same as in.NumFrames(), but other values are possible if some layers do splicing.

Implements Component.

Definition at line 4295 of file nnet-component.cc.

References ChunkInfo::CheckSize(), CuMatrixBase< Real >::ColRange(), MaxpoolingComponent::input_dim_, KALDI_ASSERT, ChunkInfo::NumChunks(), MaxpoolingComponent::pool_size_, MaxpoolingComponent::pool_stride_, and CuMatrixBase< Real >::Set().

4298  {
4299  in_info.CheckSize(in);
4300  out_info.CheckSize(*out);
4301  KALDI_ASSERT(in_info.NumChunks() == out_info.NumChunks());
4302  int32 num_patches = input_dim_ / pool_stride_;
4303  int32 num_pools = num_patches / pool_size_;
4304 
4305  // do the max-pooling
4306  for (int32 q = 0; q < num_pools; q++) {
4307  // get output buffer of the pool
4308  CuSubMatrix<BaseFloat> pool(out->ColRange(q * pool_stride_, pool_stride_));
4309  pool.Set(-1e20); // reset a large negative value
4310  for (int32 r = 0; r < pool_size_; r++) {
4311  // col-by-col block comparison pool
4312  int32 p = r + q * pool_size_;
4313  pool.Max(in.ColRange(p * pool_stride_, pool_stride_));
4314  }
4315  }
4316 }
CuSubMatrix< Real > ColRange(const MatrixIndexT col_offset, const MatrixIndexT num_cols) const
Definition: cu-matrix.h:544
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void Read ( std::istream &  is,
bool  binary 
)
virtual

Implements Component.

Definition at line 4355 of file nnet-component.cc.

References kaldi::nnet2::ExpectOneOrTwoTokens(), kaldi::ExpectToken(), MaxpoolingComponent::input_dim_, MaxpoolingComponent::output_dim_, MaxpoolingComponent::pool_size_, MaxpoolingComponent::pool_stride_, and kaldi::ReadBasicType().

4355  {
4356  ExpectOneOrTwoTokens(is, binary, "<MaxpoolingComponent>", "<InputDim>");
4357  ReadBasicType(is, binary, &input_dim_);
4358  ExpectToken(is, binary, "<OutputDim>");
4359  ReadBasicType(is, binary, &output_dim_);
4360  ExpectToken(is, binary, "<PoolSize>");
4361  ReadBasicType(is, binary, &pool_size_);
4362  ExpectToken(is, binary, "<PoolStride>");
4363  ReadBasicType(is, binary, &pool_stride_);
4364  ExpectToken(is, binary, "</MaxpoolingComponent>");
4365 }
void ReadBasicType(std::istream &is, bool binary, T *t)
ReadBasicType is the name of the read function for bool, integer types, and floating-point types...
Definition: io-funcs-inl.h:55
void ExpectToken(std::istream &is, bool binary, const char *token)
ExpectToken tries to read in the given token, and throws an exception on failure. ...
Definition: io-funcs.cc:188
static void ExpectOneOrTwoTokens(std::istream &is, bool binary, const std::string &token1, const std::string &token2)
virtual std::string Type ( ) const
inlinevirtual

Implements Component.

Definition at line 478 of file nnet-component.h.

Referenced by MaxpoolingComponent::Info(), and MaxpoolingComponent::InitFromString().

478 { return "MaxpoolingComponent"; }
void Write ( std::ostream &  os,
bool  binary 
) const
virtual

Write component to stream.

Implements Component.

Definition at line 4367 of file nnet-component.cc.

References MaxpoolingComponent::input_dim_, MaxpoolingComponent::output_dim_, MaxpoolingComponent::pool_size_, MaxpoolingComponent::pool_stride_, kaldi::WriteBasicType(), and kaldi::WriteToken().

4367  {
4368  WriteToken(os, binary, "<MaxpoolingComponent>");
4369  WriteToken(os, binary, "<InputDim>");
4370  WriteBasicType(os, binary, input_dim_);
4371  WriteToken(os, binary, "<OutputDim>");
4372  WriteBasicType(os, binary, output_dim_);
4373  WriteToken(os, binary, "<PoolSize>");
4374  WriteBasicType(os, binary, pool_size_);
4375  WriteToken(os, binary, "<PoolStride>");
4376  WriteBasicType(os, binary, pool_stride_);
4377  WriteToken(os, binary, "</MaxpoolingComponent>");
4378 }
void WriteToken(std::ostream &os, bool binary, const char *token)
The WriteToken functions are for writing nonempty sequences of non-space characters.
Definition: io-funcs.cc:134
void WriteBasicType(std::ostream &os, bool binary, T t)
WriteBasicType is the name of the write function for bool, integer types, and floating-point types...
Definition: io-funcs-inl.h:34

Member Data Documentation


The documentation for this class was generated from the following files: