ConvolutionalComponent Class Reference

ConvolutionalComponent implements convolution over single axis (i.e. More...

#include <nnet-convolutional-component.h>

Inheritance diagram for ConvolutionalComponent:
Collaboration diagram for ConvolutionalComponent:

Public Member Functions

 ConvolutionalComponent (int32 dim_in, int32 dim_out)
 
 ~ConvolutionalComponent ()
 
ComponentCopy () const
 Copy component (deep copy),. More...
 
ComponentType GetType () const
 Get Type Identification of the component,. More...
 
void InitData (std::istream &is)
 Initialize the content of the component by the 'line' from the prototype,. More...
 
void ReadData (std::istream &is, bool binary)
 Reads the component content. More...
 
void WriteData (std::ostream &os, bool binary) const
 Writes the component content. More...
 
int32 NumParams () const
 Number of trainable parameters,. More...
 
void GetGradient (VectorBase< BaseFloat > *gradient) const
 Get gradient reshaped as a vector,. More...
 
void GetParams (VectorBase< BaseFloat > *params) const
 Get the trainable parameters reshaped as a vector,. More...
 
void SetParams (const VectorBase< BaseFloat > &params)
 Set the trainable parameters from, reshaped as a vector,. More...
 
std::string Info () const
 Print some additional info (after <ComponentName> and the dims),. More...
 
std::string InfoGradient () const
 Print some additional info about gradient (after <...> and dims),. More...
 
void PropagateFnc (const CuMatrixBase< BaseFloat > &in, CuMatrixBase< BaseFloat > *out)
 Abstract interface for propagation/backpropagation. More...
 
void ReverseIndexes (const std::vector< int32 > &forward_indexes, std::vector< std::vector< int32 > > *backward_indexes)
 
void RearrangeIndexes (const std::vector< std::vector< int32 > > &in, std::vector< std::vector< int32 > > *out)
 
void BackpropagateFnc (const CuMatrixBase< BaseFloat > &in, const CuMatrixBase< BaseFloat > &out, const CuMatrixBase< BaseFloat > &out_diff, CuMatrixBase< BaseFloat > *in_diff)
 Backward pass transformation (to be implemented by descending class...) More...
 
void Update (const CuMatrixBase< BaseFloat > &input, const CuMatrixBase< BaseFloat > &diff)
 Compute gradient and update parameters,. More...
 
- Public Member Functions inherited from UpdatableComponent
 UpdatableComponent (int32 input_dim, int32 output_dim)
 
virtual ~UpdatableComponent ()
 
bool IsUpdatable () const
 Check if contains trainable parameters,. More...
 
virtual void SetTrainOptions (const NnetTrainOptions &opts)
 Set the training options to the component,. More...
 
const NnetTrainOptionsGetTrainOptions () const
 Get the training options from the component,. More...
 
virtual void SetLearnRateCoef (BaseFloat val)
 Set the learn-rate coefficient,. More...
 
virtual void SetBiasLearnRateCoef (BaseFloat val)
 Set the learn-rate coefficient for bias,. More...
 
- Public Member Functions inherited from Component
 Component (int32 input_dim, int32 output_dim)
 Generic interface of a component,. More...
 
virtual ~Component ()
 
virtual bool IsMultistream () const
 Check if component has 'Recurrent' interface (trainable and recurrent),. More...
 
int32 InputDim () const
 Get the dimension of the input,. More...
 
int32 OutputDim () const
 Get the dimension of the output,. More...
 
void Propagate (const CuMatrixBase< BaseFloat > &in, CuMatrix< BaseFloat > *out)
 Perform forward-pass propagation 'in' -> 'out',. More...
 
void Backpropagate (const CuMatrixBase< BaseFloat > &in, const CuMatrixBase< BaseFloat > &out, const CuMatrixBase< BaseFloat > &out_diff, CuMatrix< BaseFloat > *in_diff)
 Perform backward-pass propagation 'out_diff' -> 'in_diff'. More...
 
void Write (std::ostream &os, bool binary) const
 Write the component to a stream,. More...
 

Private Attributes

int32 patch_dim_
 number of consecutive inputs, 1st dim of patch More...
 
int32 patch_step_
 step of the convolution (i.e. More...
 
int32 patch_stride_
 shift for 2nd dim of a patch More...
 
CuMatrix< BaseFloatfilters_
 (i.e. frame length before splicing) More...
 
CuVector< BaseFloatbias_
 bias for each filter More...
 
CuMatrix< BaseFloatfilters_grad_
 gradient of filters More...
 
CuVector< BaseFloatbias_grad_
 gradient of biases More...
 
BaseFloat max_norm_
 limit L2 norm of a neuron weights to positive value More...
 
CuMatrix< BaseFloatvectorized_feature_patches_
 Buffer of reshaped inputs: 1row = vectorized rectangular feature patches, 1col = dim over speech frames Map of input features: std::vector-dim = patch-position. More...
 
std::vector< int32column_map_
 
CuMatrix< BaseFloatfeature_patch_diffs_
 Buffer for backpropagation: derivatives in the domain of 'vectorized_feature_patches_', 1row = vectorized rectangular feature patches, 1col = dim over speech frames,. More...
 

Additional Inherited Members

- Public Types inherited from Component
enum  ComponentType {
  kUnknown = 0x0, kUpdatableComponent = 0x0100, kAffineTransform, kLinearTransform,
  kConvolutionalComponent, kLstmProjected, kBlstmProjected, kRecurrentComponent,
  kActivationFunction = 0x0200, kSoftmax, kHiddenSoftmax, kBlockSoftmax,
  kSigmoid, kTanh, kParametricRelu, kDropout,
  kLengthNormComponent, kTranform = 0x0400, kRbm, kSplice,
  kCopy, kTranspose, kBlockLinearity, kAddShift,
  kRescale, kKlHmm = 0x0800, kSentenceAveragingComponent, kSimpleSentenceAveragingComponent,
  kAveragePoolingComponent, kMaxPoolingComponent, kFramePoolingComponent, kParallelComponent,
  kMultiBasisComponent
}
 Component type identification mechanism,. More...
 
- Static Public Member Functions inherited from Component
static const char * TypeToMarker (ComponentType t)
 Converts component type to marker,. More...
 
static ComponentType MarkerToType (const std::string &s)
 Converts marker to component type (case insensitive),. More...
 
static ComponentInit (const std::string &conf_line)
 Initialize component from a line in config file,. More...
 
static ComponentRead (std::istream &is, bool binary)
 Read the component from a stream (static method),. More...
 
- Static Public Attributes inherited from Component
static const struct key_value kMarkerMap []
 The table with pairs of Component types and markers (defined in nnet-component.cc),. More...
 
- Protected Attributes inherited from UpdatableComponent
NnetTrainOptions opts_
 Option-class with training hyper-parameters,. More...
 
BaseFloat learn_rate_coef_
 Scalar applied to learning rate for weight matrices (to be used in ::Update method),. More...
 
BaseFloat bias_learn_rate_coef_
 Scalar applied to learning rate for bias (to be used in ::Update method),. More...
 
- Protected Attributes inherited from Component
int32 input_dim_
 Data members,. More...
 
int32 output_dim_
 Dimension of the output of the Component,. More...
 

Detailed Description

ConvolutionalComponent implements convolution over single axis (i.e.

frequency axis in case we are the 1st component in NN). We don't do convolution along temporal axis, which simplifies the implementation (and was not helpful for Tara).

We assume the input featrues are spliced, i.e. each frame is in fact a set of stacked frames, where we can form patches which span over several frequency bands and whole time axis.

The convolution is done over whole axis with same filters, i.e. we don't use separate filters for different 'regions' of frequency axis.

In order to have a fast implementations, the filters are represented in vectorized form, where each rectangular filter corresponds to a row in a matrix, where all the filters are stored. The features are then re-shaped to a set of matrices, where one matrix corresponds to single patch-position, where all the filters get applied.

The type of convolution is controled by hyperparameters: patch_dim_ ... frequency axis size of the patch patch_step_ ... size of shift in the convolution patch_stride_ ... shift for 2nd dim of a patch (i.e. frame length before splicing)

Due to convolution same weights are used repeateadly, the final gradient is a sum of all position-specific gradients (the sum was found better than averaging).

Definition at line 66 of file nnet-convolutional-component.h.

Constructor & Destructor Documentation

◆ ConvolutionalComponent()

ConvolutionalComponent ( int32  dim_in,
int32  dim_out 
)
inline

Definition at line 68 of file nnet-convolutional-component.h.

Referenced by ConvolutionalComponent::Copy().

68  :
69  UpdatableComponent(dim_in, dim_out),
70  patch_dim_(0),
71  patch_step_(0),
72  patch_stride_(0),
73  max_norm_(0.0)
74  { }
int32 patch_step_
step of the convolution (i.e.
int32 patch_stride_
shift for 2nd dim of a patch
BaseFloat max_norm_
limit L2 norm of a neuron weights to positive value
UpdatableComponent(int32 input_dim, int32 output_dim)
int32 patch_dim_
number of consecutive inputs, 1st dim of patch

◆ ~ConvolutionalComponent()

Definition at line 76 of file nnet-convolutional-component.h.

77  { }

Member Function Documentation

◆ BackpropagateFnc()

void BackpropagateFnc ( const CuMatrixBase< BaseFloat > &  in,
const CuMatrixBase< BaseFloat > &  out,
const CuMatrixBase< BaseFloat > &  out_diff,
CuMatrixBase< BaseFloat > *  in_diff 
)
inlinevirtual

Backward pass transformation (to be implemented by descending class...)

Implements Component.

Definition at line 368 of file nnet-convolutional-component.h.

References CuMatrixBase< Real >::AddCols(), CuMatrixBase< Real >::AddMatMat(), CuMatrixBase< Real >::ColRange(), ConvolutionalComponent::column_map_, ConvolutionalComponent::feature_patch_diffs_, ConvolutionalComponent::filters_, kaldi::kNoTrans, ConvolutionalComponent::patch_dim_, ConvolutionalComponent::patch_step_, ConvolutionalComponent::patch_stride_, ConvolutionalComponent::RearrangeIndexes(), and ConvolutionalComponent::ReverseIndexes().

371  {
372  // useful dims
373  int32 num_patches = 1 + (patch_stride_ - patch_dim_) / patch_step_;
374  int32 num_filters = filters_.NumRows();
375  int32 filter_dim = filters_.NumCols();
376 
377  // backpropagate to vector of matrices
378  // (corresponding to position of a filter)
379  for (int32 p = 0; p < num_patches; p++) {
380  CuSubMatrix<BaseFloat> patch_diff(feature_patch_diffs_.ColRange(
381  p * filter_dim, filter_dim));
382  CuSubMatrix<BaseFloat> out_diff_patch(out_diff.ColRange(
383  p * num_filters, num_filters));
384  patch_diff.AddMatMat(1.0, out_diff_patch, kNoTrans,
385  filters_, kNoTrans, 0.0);
386  }
387 
388  // sum the derivatives into in_diff, we will compensate #summands
389  std::vector<std::vector<int32> > reversed_column_map;
390  ReverseIndexes(column_map_, &reversed_column_map);
391  std::vector<std::vector<int32> > rearranged_column_map;
392  RearrangeIndexes(reversed_column_map, &rearranged_column_map);
393  for (int32 p = 0; p < rearranged_column_map.size(); p++) {
394  CuArray<int32> cu_cols(rearranged_column_map[p]);
395  in_diff->AddCols(feature_patch_diffs_, cu_cols);
396  }
397  }
int32 patch_step_
step of the convolution (i.e.
void ReverseIndexes(const std::vector< int32 > &forward_indexes, std::vector< std::vector< int32 > > *backward_indexes)
kaldi::int32 int32
void RearrangeIndexes(const std::vector< std::vector< int32 > > &in, std::vector< std::vector< int32 > > *out)
int32 patch_stride_
shift for 2nd dim of a patch
CuMatrix< BaseFloat > filters_
(i.e. frame length before splicing)
CuMatrix< BaseFloat > feature_patch_diffs_
Buffer for backpropagation: derivatives in the domain of &#39;vectorized_feature_patches_&#39;, 1row = vectorized rectangular feature patches, 1col = dim over speech frames,.
int32 patch_dim_
number of consecutive inputs, 1st dim of patch

◆ Copy()

Component* Copy ( ) const
inlinevirtual

Copy component (deep copy),.

Implements Component.

Definition at line 79 of file nnet-convolutional-component.h.

References ConvolutionalComponent::ConvolutionalComponent().

79 { return new ConvolutionalComponent(*this); }
ConvolutionalComponent(int32 dim_in, int32 dim_out)

◆ GetGradient()

void GetGradient ( VectorBase< BaseFloat > *  gradient) const
inlinevirtual

Get gradient reshaped as a vector,.

Implements UpdatableComponent.

Definition at line 221 of file nnet-convolutional-component.h.

References ConvolutionalComponent::bias_, VectorBase< Real >::Dim(), ConvolutionalComponent::filters_, KALDI_ASSERT, ConvolutionalComponent::NumParams(), and VectorBase< Real >::Range().

221  {
222  KALDI_ASSERT(gradient->Dim() == NumParams());
223  int32 filters_num_elem = filters_.NumRows() * filters_.NumCols();
224  gradient->Range(0, filters_num_elem).CopyRowsFromMat(filters_);
225  gradient->Range(filters_num_elem, bias_.Dim()).CopyFromVec(bias_);
226  }
kaldi::int32 int32
CuMatrix< BaseFloat > filters_
(i.e. frame length before splicing)
int32 NumParams() const
Number of trainable parameters,.
CuVector< BaseFloat > bias_
bias for each filter
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185

◆ GetParams()

void GetParams ( VectorBase< BaseFloat > *  params) const
inlinevirtual

Get the trainable parameters reshaped as a vector,.

Implements UpdatableComponent.

Definition at line 228 of file nnet-convolutional-component.h.

References ConvolutionalComponent::bias_, VectorBase< Real >::Dim(), ConvolutionalComponent::filters_, KALDI_ASSERT, ConvolutionalComponent::NumParams(), and VectorBase< Real >::Range().

228  {
229  KALDI_ASSERT(params->Dim() == NumParams());
230  int32 filters_num_elem = filters_.NumRows() * filters_.NumCols();
231  params->Range(0, filters_num_elem).CopyRowsFromMat(filters_);
232  params->Range(filters_num_elem, bias_.Dim()).CopyFromVec(bias_);
233  }
kaldi::int32 int32
CuMatrix< BaseFloat > filters_
(i.e. frame length before splicing)
int32 NumParams() const
Number of trainable parameters,.
CuVector< BaseFloat > bias_
bias for each filter
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185

◆ GetType()

ComponentType GetType ( ) const
inlinevirtual

Get Type Identification of the component,.

Implements Component.

Definition at line 80 of file nnet-convolutional-component.h.

References Component::kConvolutionalComponent.

◆ Info()

std::string Info ( ) const
inlinevirtual

Print some additional info (after <ComponentName> and the dims),.

Reimplemented from Component.

Definition at line 242 of file nnet-convolutional-component.h.

References ConvolutionalComponent::bias_, UpdatableComponent::bias_learn_rate_coef_, ConvolutionalComponent::filters_, UpdatableComponent::learn_rate_coef_, ConvolutionalComponent::max_norm_, kaldi::nnet1::MomentStatistics(), and kaldi::nnet1::ToString().

242  {
243  return std::string("\n filters") + MomentStatistics(filters_) +
244  ", lr-coef " + ToString(learn_rate_coef_) +
245  ", max-norm " + ToString(max_norm_) +
246  "\n bias" + MomentStatistics(bias_) +
247  ", lr-coef " + ToString(bias_learn_rate_coef_);
248  }
std::string ToString(const T &t)
Convert basic type to a string (please don&#39;t overuse),.
Definition: nnet-utils.h:52
std::string MomentStatistics(const VectorBase< Real > &vec)
Get a string with statistics of the data in a vector, so we can print them easily.
Definition: nnet-utils.h:63
BaseFloat bias_learn_rate_coef_
Scalar applied to learning rate for bias (to be used in ::Update method),.
BaseFloat learn_rate_coef_
Scalar applied to learning rate for weight matrices (to be used in ::Update method),.
CuMatrix< BaseFloat > filters_
(i.e. frame length before splicing)
BaseFloat max_norm_
limit L2 norm of a neuron weights to positive value
CuVector< BaseFloat > bias_
bias for each filter

◆ InfoGradient()

std::string InfoGradient ( ) const
inlinevirtual

Print some additional info about gradient (after <...> and dims),.

Reimplemented from Component.

Definition at line 250 of file nnet-convolutional-component.h.

References ConvolutionalComponent::bias_grad_, UpdatableComponent::bias_learn_rate_coef_, ConvolutionalComponent::filters_grad_, UpdatableComponent::learn_rate_coef_, ConvolutionalComponent::max_norm_, kaldi::nnet1::MomentStatistics(), and kaldi::nnet1::ToString().

250  {
251  return std::string("\n filters_grad") + MomentStatistics(filters_grad_) +
252  ", lr-coef " + ToString(learn_rate_coef_) +
253  ", max-norm " + ToString(max_norm_) +
254  "\n bias_grad" + MomentStatistics(bias_grad_) +
255  ", lr-coef " + ToString(bias_learn_rate_coef_);
256  }
std::string ToString(const T &t)
Convert basic type to a string (please don&#39;t overuse),.
Definition: nnet-utils.h:52
std::string MomentStatistics(const VectorBase< Real > &vec)
Get a string with statistics of the data in a vector, so we can print them easily.
Definition: nnet-utils.h:63
BaseFloat bias_learn_rate_coef_
Scalar applied to learning rate for bias (to be used in ::Update method),.
BaseFloat learn_rate_coef_
Scalar applied to learning rate for weight matrices (to be used in ::Update method),.
CuMatrix< BaseFloat > filters_grad_
gradient of filters
BaseFloat max_norm_
limit L2 norm of a neuron weights to positive value
CuVector< BaseFloat > bias_grad_
gradient of biases

◆ InitData()

void InitData ( std::istream &  is)
inlinevirtual

Initialize the content of the component by the 'line' from the prototype,.

Implements UpdatableComponent.

Definition at line 82 of file nnet-convolutional-component.h.

References ConvolutionalComponent::bias_, UpdatableComponent::bias_learn_rate_coef_, ConvolutionalComponent::filters_, Component::input_dim_, KALDI_ASSERT, KALDI_ERR, KALDI_LOG, UpdatableComponent::learn_rate_coef_, ConvolutionalComponent::max_norm_, Component::output_dim_, ConvolutionalComponent::patch_dim_, ConvolutionalComponent::patch_step_, ConvolutionalComponent::patch_stride_, kaldi::nnet1::RandGauss(), kaldi::nnet1::RandUniform(), kaldi::ReadBasicType(), and kaldi::ReadToken().

82  {
83  // define options
84  BaseFloat bias_mean = -2.0, bias_range = 2.0, param_stddev = 0.1;
85  // parse config
86  std::string token;
87  while (is >> std::ws, !is.eof()) {
88  ReadToken(is, false, &token);
89  if (token == "<ParamStddev>") ReadBasicType(is, false, &param_stddev);
90  else if (token == "<BiasMean>") ReadBasicType(is, false, &bias_mean);
91  else if (token == "<BiasRange>") ReadBasicType(is, false, &bias_range);
92  else if (token == "<PatchDim>") ReadBasicType(is, false, &patch_dim_);
93  else if (token == "<PatchStep>") ReadBasicType(is, false, &patch_step_);
94  else if (token == "<PatchStride>") ReadBasicType(is, false, &patch_stride_);
95  else if (token == "<LearnRateCoef>") ReadBasicType(is, false, &learn_rate_coef_);
96  else if (token == "<BiasLearnRateCoef>") ReadBasicType(is, false, &bias_learn_rate_coef_);
97  else if (token == "<MaxNorm>") ReadBasicType(is, false, &max_norm_);
98  else KALDI_ERR << "Unknown token " << token << ", a typo in config?"
99  << " (ParamStddev|BiasMean|BiasRange|PatchDim|PatchStep|PatchStride)";
100  }
101 
102  //
103  // Sanity checks:
104  //
105  // splice (input are spliced frames):
107  int32 num_splice = input_dim_ / patch_stride_;
108  KALDI_LOG << "num_splice " << num_splice;
109  // number of patches:
111  int32 num_patches = 1 + (patch_stride_ - patch_dim_) / patch_step_;
112  KALDI_LOG << "num_patches " << num_patches;
113  // filter dim:
114  int32 filter_dim = num_splice * patch_dim_;
115  KALDI_LOG << "filter_dim " << filter_dim;
116  // num filters:
117  KALDI_ASSERT(output_dim_ % num_patches == 0);
118  int32 num_filters = output_dim_ / num_patches;
119  KALDI_LOG << "num_filters " << num_filters;
120  //
121 
122  //
123  // Initialize trainable parameters,
124  //
125  // Gaussian with given std_dev (mean = 0),
126  filters_.Resize(num_filters, filter_dim);
127  RandGauss(0.0, param_stddev, &filters_);
128  // Uniform,
129  bias_.Resize(num_filters);
130  RandUniform(bias_mean, bias_range, &bias_);
131  }
int32 patch_step_
step of the convolution (i.e.
int32 input_dim_
Data members,.
void ReadBasicType(std::istream &is, bool binary, T *t)
ReadBasicType is the name of the read function for bool, integer types, and floating-point types...
Definition: io-funcs-inl.h:55
BaseFloat bias_learn_rate_coef_
Scalar applied to learning rate for bias (to be used in ::Update method),.
BaseFloat learn_rate_coef_
Scalar applied to learning rate for weight matrices (to be used in ::Update method),.
void RandUniform(BaseFloat mu, BaseFloat range, CuMatrixBase< Real > *mat, struct RandomState *state=NULL)
Fill CuMatrix with random numbers (Uniform distribution): mu = the mean value, range = the &#39;width&#39; of...
Definition: nnet-utils.h:188
kaldi::int32 int32
void ReadToken(std::istream &is, bool binary, std::string *str)
ReadToken gets the next token and puts it in str (exception on failure).
Definition: io-funcs.cc:154
int32 patch_stride_
shift for 2nd dim of a patch
float BaseFloat
Definition: kaldi-types.h:29
CuMatrix< BaseFloat > filters_
(i.e. frame length before splicing)
BaseFloat max_norm_
limit L2 norm of a neuron weights to positive value
#define KALDI_ERR
Definition: kaldi-error.h:147
void RandGauss(BaseFloat mu, BaseFloat sigma, CuMatrixBase< Real > *mat, struct RandomState *state=NULL)
Fill CuMatrix with random numbers (Gaussian distribution): mu = the mean value, sigma = standard devi...
Definition: nnet-utils.h:164
CuVector< BaseFloat > bias_
bias for each filter
int32 output_dim_
Dimension of the output of the Component,.
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
#define KALDI_LOG
Definition: kaldi-error.h:153
int32 patch_dim_
number of consecutive inputs, 1st dim of patch

◆ NumParams()

int32 NumParams ( ) const
inlinevirtual

Number of trainable parameters,.

Implements UpdatableComponent.

Definition at line 217 of file nnet-convolutional-component.h.

References ConvolutionalComponent::bias_, and ConvolutionalComponent::filters_.

Referenced by ConvolutionalComponent::GetGradient(), ConvolutionalComponent::GetParams(), and ConvolutionalComponent::SetParams().

217  {
218  return filters_.NumRows()*filters_.NumCols() + bias_.Dim();
219  }
CuMatrix< BaseFloat > filters_
(i.e. frame length before splicing)
CuVector< BaseFloat > bias_
bias for each filter

◆ PropagateFnc()

void PropagateFnc ( const CuMatrixBase< BaseFloat > &  in,
CuMatrixBase< BaseFloat > *  out 
)
inlinevirtual

Abstract interface for propagation/backpropagation.

Forward pass transformation (to be implemented by descending class...)

Implements Component.

Definition at line 258 of file nnet-convolutional-component.h.

References CuMatrixBase< Real >::AddVecToRows(), ConvolutionalComponent::bias_, CuMatrixBase< Real >::ColRange(), ConvolutionalComponent::column_map_, rnnlm::d, ConvolutionalComponent::feature_patch_diffs_, ConvolutionalComponent::filters_, Component::input_dim_, kaldi::kNoTrans, kaldi::kSetZero, kaldi::kTrans, kaldi::kUndefined, CuMatrixBase< Real >::NumRows(), ConvolutionalComponent::patch_dim_, ConvolutionalComponent::patch_step_, ConvolutionalComponent::patch_stride_, and ConvolutionalComponent::vectorized_feature_patches_.

259  {
260  // useful dims
261  int32 num_splice = input_dim_ / patch_stride_;
262  int32 num_patches = 1 + (patch_stride_ - patch_dim_) / patch_step_;
263  int32 num_filters = filters_.NumRows();
264  int32 num_frames = in.NumRows();
265  int32 filter_dim = filters_.NumCols();
266 
267  // we will need the buffers
268  if (vectorized_feature_patches_.NumRows() != num_frames) {
269  vectorized_feature_patches_.Resize(num_frames, filter_dim * num_patches, kUndefined);
270  feature_patch_diffs_.Resize(num_frames, filter_dim * num_patches, kSetZero);
271  }
272 
273  /* Prepare feature patches, the layout is:
274  * |----------|----------|----------|---------| (in = spliced frames)
275  * xxx xxx xxx xxx (x = selected elements)
276  *
277  * xxx : patch dim
278  * xxx
279  * ^---: patch step
280  * |----------| : patch stride
281  *
282  * xxx-xxx-xxx-xxx : filter dim
283  *
284  */
285  // build-up a column selection map:
286  int32 index = 0;
287  column_map_.resize(filter_dim * num_patches);
288  for (int32 p = 0; p < num_patches; p++) {
289  for (int32 s = 0; s < num_splice; s++) {
290  for (int32 d = 0; d < patch_dim_; d++) {
291  column_map_[index] = p * patch_step_ + s * patch_stride_ + d;
292  index++;
293  }
294  }
295  }
296  // select the columns
297  CuArray<int32> cu_column_map(column_map_);
298  vectorized_feature_patches_.CopyCols(in, cu_column_map);
299 
300  // compute filter activations
301  for (int32 p = 0; p < num_patches; p++) {
302  CuSubMatrix<BaseFloat> tgt(out->ColRange(p * num_filters, num_filters));
303  CuSubMatrix<BaseFloat> patch(vectorized_feature_patches_.ColRange(
304  p * filter_dim, filter_dim));
305  tgt.AddVecToRows(1.0, bias_, 0.0); // add bias
306  // apply all filters
307  tgt.AddMatMat(1.0, patch, kNoTrans, filters_, kTrans, 1.0);
308  }
309  }
int32 patch_step_
step of the convolution (i.e.
int32 input_dim_
Data members,.
kaldi::int32 int32
int32 patch_stride_
shift for 2nd dim of a patch
CuMatrix< BaseFloat > filters_
(i.e. frame length before splicing)
CuMatrix< BaseFloat > feature_patch_diffs_
Buffer for backpropagation: derivatives in the domain of &#39;vectorized_feature_patches_&#39;, 1row = vectorized rectangular feature patches, 1col = dim over speech frames,.
CuMatrix< BaseFloat > vectorized_feature_patches_
Buffer of reshaped inputs: 1row = vectorized rectangular feature patches, 1col = dim over speech fram...
CuVector< BaseFloat > bias_
bias for each filter
int32 patch_dim_
number of consecutive inputs, 1st dim of patch

◆ ReadData()

void ReadData ( std::istream &  is,
bool  binary 
)
inlinevirtual

Reads the component content.

Reimplemented from Component.

Definition at line 133 of file nnet-convolutional-component.h.

References ConvolutionalComponent::bias_, UpdatableComponent::bias_learn_rate_coef_, kaldi::ExpectToken(), ConvolutionalComponent::filters_, Component::input_dim_, KALDI_ASSERT, UpdatableComponent::learn_rate_coef_, ConvolutionalComponent::max_norm_, Component::output_dim_, ConvolutionalComponent::patch_dim_, ConvolutionalComponent::patch_step_, ConvolutionalComponent::patch_stride_, kaldi::PeekToken(), and kaldi::ReadBasicType().

133  {
134  // convolution hyperparameters,
135  ExpectToken(is, binary, "<PatchDim>");
136  ReadBasicType(is, binary, &patch_dim_);
137  ExpectToken(is, binary, "<PatchStep>");
138  ReadBasicType(is, binary, &patch_step_);
139  ExpectToken(is, binary, "<PatchStride>");
140  ReadBasicType(is, binary, &patch_stride_);
141 
142  // variant-length list of parameters,
143  bool end_loop = false;
144  while (!end_loop) {
145  int first_char = PeekToken(is, binary);
146  switch (first_char) {
147  case 'L': ExpectToken(is, binary, "<LearnRateCoef>");
148  ReadBasicType(is, binary, &learn_rate_coef_);
149  break;
150  case 'B': ExpectToken(is, binary, "<BiasLearnRateCoef>");
151  ReadBasicType(is, binary, &bias_learn_rate_coef_);
152  break;
153  case 'M': ExpectToken(is, binary, "<MaxNorm>");
154  ReadBasicType(is, binary, &max_norm_);
155  break;
156  case '!': ExpectToken(is, binary, "<!EndOfComponent>");
157  default: end_loop = true;
158  }
159  }
160 
161  // trainable parameters
162  ExpectToken(is, binary, "<Filters>");
163  filters_.Read(is, binary);
164  ExpectToken(is, binary, "<Bias>");
165  bias_.Read(is, binary);
166 
167  //
168  // Sanity checks:
169  //
170  // splice (input are spliced frames):
172  int32 num_splice = input_dim_ / patch_stride_;
173  // number of patches:
175  int32 num_patches = 1 + (patch_stride_ - patch_dim_) / patch_step_;
176  // filter dim:
177  int32 filter_dim = num_splice * patch_dim_;
178  // num filters:
179  KALDI_ASSERT(output_dim_ % num_patches == 0);
180  int32 num_filters = output_dim_ / num_patches;
181  // check parameter dims:
182  KALDI_ASSERT(num_filters == filters_.NumRows());
183  KALDI_ASSERT(num_filters == bias_.Dim());
184  KALDI_ASSERT(filter_dim == filters_.NumCols());
185  //
186  }
int32 patch_step_
step of the convolution (i.e.
int32 input_dim_
Data members,.
void ReadBasicType(std::istream &is, bool binary, T *t)
ReadBasicType is the name of the read function for bool, integer types, and floating-point types...
Definition: io-funcs-inl.h:55
BaseFloat bias_learn_rate_coef_
Scalar applied to learning rate for bias (to be used in ::Update method),.
BaseFloat learn_rate_coef_
Scalar applied to learning rate for weight matrices (to be used in ::Update method),.
kaldi::int32 int32
int32 patch_stride_
shift for 2nd dim of a patch
CuMatrix< BaseFloat > filters_
(i.e. frame length before splicing)
BaseFloat max_norm_
limit L2 norm of a neuron weights to positive value
void ExpectToken(std::istream &is, bool binary, const char *token)
ExpectToken tries to read in the given token, and throws an exception on failure. ...
Definition: io-funcs.cc:191
int PeekToken(std::istream &is, bool binary)
PeekToken will return the first character of the next token, or -1 if end of file.
Definition: io-funcs.cc:170
CuVector< BaseFloat > bias_
bias for each filter
int32 output_dim_
Dimension of the output of the Component,.
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
int32 patch_dim_
number of consecutive inputs, 1st dim of patch

◆ RearrangeIndexes()

void RearrangeIndexes ( const std::vector< std::vector< int32 > > &  in,
std::vector< std::vector< int32 > > *  out 
)
inline

Definition at line 351 of file nnet-convolutional-component.h.

References rnnlm::i, and rnnlm::j.

Referenced by ConvolutionalComponent::BackpropagateFnc().

352  {
353  int32 D = in.size();
354  int32 L = 0;
355  for (int32 i = 0; i < D; i++)
356  if (in[i].size() > L)
357  L = in[i].size();
358  out->resize(L);
359  for (int32 i = 0; i < L; i++)
360  (*out)[i].resize(D, -1);
361  for (int32 i = 0; i < D; i++) {
362  for (int32 j = 0; j < in[i].size(); j++) {
363  (*out)[j][i] = in[i][j];
364  }
365  }
366  }
kaldi::int32 int32

◆ ReverseIndexes()

void ReverseIndexes ( const std::vector< int32 > &  forward_indexes,
std::vector< std::vector< int32 > > *  backward_indexes 
)
inline

Definition at line 323 of file nnet-convolutional-component.h.

References rnnlm::i, Component::input_dim_, rnnlm::j, and KALDI_ASSERT.

Referenced by ConvolutionalComponent::BackpropagateFnc().

324  {
325  int32 i;
326  int32 size = forward_indexes.size();
327  backward_indexes->resize(input_dim_);
328  int32 reserve_size = 2+ forward_indexes.size() / input_dim_;
329  std::vector<std::vector<int32> >::iterator iter = backward_indexes->begin(),
330  end = backward_indexes->end();
331  for (; iter != end; ++iter)
332  iter->reserve(reserve_size);
333  for (int32 j = 0; j < size; j++) {
334  i = forward_indexes[j];
336  (*backward_indexes)[i].push_back(j);
337  }
338  }
int32 input_dim_
Data members,.
kaldi::int32 int32
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185

◆ SetParams()

void SetParams ( const VectorBase< BaseFloat > &  params)
inlinevirtual

Set the trainable parameters from, reshaped as a vector,.

Implements UpdatableComponent.

Definition at line 235 of file nnet-convolutional-component.h.

References ConvolutionalComponent::bias_, VectorBase< Real >::Dim(), ConvolutionalComponent::filters_, KALDI_ASSERT, ConvolutionalComponent::NumParams(), and VectorBase< Real >::Range().

235  {
236  KALDI_ASSERT(params.Dim() == NumParams());
237  int32 filters_num_elem = filters_.NumRows() * filters_.NumCols();
238  filters_.CopyRowsFromVec(params.Range(0, filters_num_elem));
239  bias_.CopyFromVec(params.Range(filters_num_elem, bias_.Dim()));
240  }
kaldi::int32 int32
CuMatrix< BaseFloat > filters_
(i.e. frame length before splicing)
int32 NumParams() const
Number of trainable parameters,.
CuVector< BaseFloat > bias_
bias for each filter
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185

◆ Update()

void Update ( const CuMatrixBase< BaseFloat > &  input,
const CuMatrixBase< BaseFloat > &  diff 
)
inlinevirtual

Compute gradient and update parameters,.

Implements UpdatableComponent.

Definition at line 400 of file nnet-convolutional-component.h.

References CuVectorBase< Real >::AddColSumMat(), CuVectorBase< Real >::ApplyFloor(), ConvolutionalComponent::bias_, ConvolutionalComponent::bias_grad_, UpdatableComponent::bias_learn_rate_coef_, CuMatrixBase< Real >::ColRange(), ConvolutionalComponent::filters_, ConvolutionalComponent::filters_grad_, CuVectorBase< Real >::InvertElements(), kaldi::kNoTrans, kaldi::kSetZero, kaldi::kTrans, NnetTrainOptions::learn_rate, UpdatableComponent::learn_rate_coef_, ConvolutionalComponent::max_norm_, CuMatrixBase< Real >::MulElements(), UpdatableComponent::opts_, ConvolutionalComponent::patch_dim_, ConvolutionalComponent::patch_step_, ConvolutionalComponent::patch_stride_, CuVectorBase< Real >::Scale(), and ConvolutionalComponent::vectorized_feature_patches_.

401  {
402  // useful dims
403  int32 num_patches = 1 + (patch_stride_ - patch_dim_) / patch_step_;
404  int32 num_filters = filters_.NumRows();
405  int32 filter_dim = filters_.NumCols();
406 
407  // we use following hyperparameters from the option class
408  const BaseFloat lr = opts_.learn_rate;
409 
410  //
411  // calculate the gradient
412  //
413  filters_grad_.Resize(num_filters, filter_dim, kSetZero); // reset
414  bias_grad_.Resize(num_filters, kSetZero); // reset
415  // use all the patches
416  for (int32 p = 0; p < num_patches; p++) { // sum
417  CuSubMatrix<BaseFloat> diff_patch(diff.ColRange(p * num_filters,
418  num_filters));
419  CuSubMatrix<BaseFloat> patch(vectorized_feature_patches_.ColRange(
420  p * filter_dim, filter_dim));
421  filters_grad_.AddMatMat(1.0, diff_patch, kTrans, patch, kNoTrans, 1.0);
422  bias_grad_.AddRowSumMat(1.0, diff_patch, 1.0);
423  }
424 
425  //
426  // update
427  //
430  //
431 
432  // max-norm
433  if (max_norm_ > 0.0) {
434  CuMatrix<BaseFloat> lin_sqr(filters_);
435  lin_sqr.MulElements(filters_);
436  CuVector<BaseFloat> l2(filters_.NumRows());
437  l2.AddColSumMat(1.0, lin_sqr, 0.0);
438  l2.ApplyPow(0.5); // we have per-neuron L2 norms
439  CuVector<BaseFloat> scl(l2);
440  scl.Scale(1.0/max_norm_);
441  scl.ApplyFloor(1.0);
442  scl.InvertElements();
443  filters_.MulRowsVec(scl); // shink to sphere!
444  }
445  }
int32 patch_step_
step of the convolution (i.e.
NnetTrainOptions opts_
Option-class with training hyper-parameters,.
BaseFloat bias_learn_rate_coef_
Scalar applied to learning rate for bias (to be used in ::Update method),.
BaseFloat learn_rate_coef_
Scalar applied to learning rate for weight matrices (to be used in ::Update method),.
CuMatrix< BaseFloat > filters_grad_
gradient of filters
kaldi::int32 int32
int32 patch_stride_
shift for 2nd dim of a patch
float BaseFloat
Definition: kaldi-types.h:29
CuMatrix< BaseFloat > filters_
(i.e. frame length before splicing)
BaseFloat max_norm_
limit L2 norm of a neuron weights to positive value
CuMatrix< BaseFloat > vectorized_feature_patches_
Buffer of reshaped inputs: 1row = vectorized rectangular feature patches, 1col = dim over speech fram...
CuVector< BaseFloat > bias_
bias for each filter
CuVector< BaseFloat > bias_grad_
gradient of biases
int32 patch_dim_
number of consecutive inputs, 1st dim of patch

◆ WriteData()

void WriteData ( std::ostream &  os,
bool  binary 
) const
inlinevirtual

Writes the component content.

Reimplemented from Component.

Definition at line 188 of file nnet-convolutional-component.h.

References ConvolutionalComponent::bias_, UpdatableComponent::bias_learn_rate_coef_, ConvolutionalComponent::filters_, UpdatableComponent::learn_rate_coef_, ConvolutionalComponent::max_norm_, ConvolutionalComponent::patch_dim_, ConvolutionalComponent::patch_step_, ConvolutionalComponent::patch_stride_, kaldi::WriteBasicType(), and kaldi::WriteToken().

188  {
189  // convolution hyperparameters
190  WriteToken(os, binary, "<PatchDim>");
191  WriteBasicType(os, binary, patch_dim_);
192  WriteToken(os, binary, "<PatchStep>");
193  WriteBasicType(os, binary, patch_step_);
194  WriteToken(os, binary, "<PatchStride>");
195  WriteBasicType(os, binary, patch_stride_);
196  if (!binary) os << "\n";
197 
198  // re-scale learn rate
199  WriteToken(os, binary, "<LearnRateCoef>");
200  WriteBasicType(os, binary, learn_rate_coef_);
201  WriteToken(os, binary, "<BiasLearnRateCoef>");
203  // max-norm regularization
204  WriteToken(os, binary, "<MaxNorm>");
205  WriteBasicType(os, binary, max_norm_);
206  if (!binary) os << "\n";
207 
208  // trainable parameters
209  WriteToken(os, binary, "<Filters>");
210  if (!binary) os << "\n";
211  filters_.Write(os, binary);
212  WriteToken(os, binary, "<Bias>");
213  if (!binary) os << "\n";
214  bias_.Write(os, binary);
215  }
int32 patch_step_
step of the convolution (i.e.
BaseFloat bias_learn_rate_coef_
Scalar applied to learning rate for bias (to be used in ::Update method),.
BaseFloat learn_rate_coef_
Scalar applied to learning rate for weight matrices (to be used in ::Update method),.
int32 patch_stride_
shift for 2nd dim of a patch
CuMatrix< BaseFloat > filters_
(i.e. frame length before splicing)
BaseFloat max_norm_
limit L2 norm of a neuron weights to positive value
void WriteToken(std::ostream &os, bool binary, const char *token)
The WriteToken functions are for writing nonempty sequences of non-space characters.
Definition: io-funcs.cc:134
CuVector< BaseFloat > bias_
bias for each filter
void WriteBasicType(std::ostream &os, bool binary, T t)
WriteBasicType is the name of the write function for bool, integer types, and floating-point types...
Definition: io-funcs-inl.h:34
int32 patch_dim_
number of consecutive inputs, 1st dim of patch

Member Data Documentation

◆ bias_

◆ bias_grad_

CuVector<BaseFloat> bias_grad_
private

gradient of biases

Definition at line 458 of file nnet-convolutional-component.h.

Referenced by ConvolutionalComponent::InfoGradient(), and ConvolutionalComponent::Update().

◆ column_map_

std::vector<int32> column_map_
private

◆ feature_patch_diffs_

CuMatrix<BaseFloat> feature_patch_diffs_
private

Buffer for backpropagation: derivatives in the domain of 'vectorized_feature_patches_', 1row = vectorized rectangular feature patches, 1col = dim over speech frames,.

Definition at line 476 of file nnet-convolutional-component.h.

Referenced by ConvolutionalComponent::BackpropagateFnc(), and ConvolutionalComponent::PropagateFnc().

◆ filters_

◆ filters_grad_

CuMatrix<BaseFloat> filters_grad_
private

gradient of filters

Definition at line 457 of file nnet-convolutional-component.h.

Referenced by ConvolutionalComponent::InfoGradient(), and ConvolutionalComponent::Update().

◆ max_norm_

◆ patch_dim_

◆ patch_step_

◆ patch_stride_

◆ vectorized_feature_patches_

CuMatrix<BaseFloat> vectorized_feature_patches_
private

Buffer of reshaped inputs: 1row = vectorized rectangular feature patches, 1col = dim over speech frames Map of input features: std::vector-dim = patch-position.

Definition at line 468 of file nnet-convolutional-component.h.

Referenced by ConvolutionalComponent::PropagateFnc(), and ConvolutionalComponent::Update().


The documentation for this class was generated from the following file: