nnet-normalize-component.h
Go to the documentation of this file.
1 // nnet3/nnet-normalize-component.h
2 
3 // Copyright 2011-2013 Karel Vesely
4 // 2012-2015 Johns Hopkins University (author: Daniel Povey)
5 // 2013 Xiaohui Zhang
6 // 2014-2015 Vijayaditya Peddinti
7 // 2014-2015 Guoguo Chen
8 // 2015 Daniel Galvez
9 // 2015 Tom Ko
10 
11 // See ../../COPYING for clarification regarding multiple authors
12 //
13 // Licensed under the Apache License, Version 2.0 (the "License");
14 // you may not use this file except in compliance with the License.
15 // You may obtain a copy of the License at
16 //
17 // http://www.apache.org/licenses/LICENSE-2.0
18 //
19 // THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
20 // KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
21 // WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
22 // MERCHANTABLITY OR NON-INFRINGEMENT.
23 // See the Apache 2 License for the specific language governing permissions and
24 // limitations under the License.
25 
26 #ifndef KALDI_NNET3_NNET_NORMALIZE_COMPONENT_H_
27 #define KALDI_NNET3_NNET_NORMALIZE_COMPONENT_H_
28 
29 #include "nnet3/nnet-common.h"
32 #include <iostream>
33 
34 namespace kaldi {
35 namespace nnet3 {
36 
41 
42 /*
43  NormalizeComponent implements the function:
44 
45  y = x * (sqrt(dim(x)) * target-rms) / |x|
46 
47  where |x| is the 2-norm of the vector x. I.e. its output is its input
48  scaled such that the root-mean-square values of its elements equals
49  target-rms. (As a special case, if the input is zero, it outputs zero).
50  This is like Hinton's layer-norm, except not normalizing the mean, only
51  the variance.
52 
53 
54  Note: if you specify add-log-stddev=true, it adds an extra element to
55  y which equals log(|x| / sqrt(dim(x))).
56 
57 
58  Configuration values accepted:
59  dim, or input-dim Input dimension of this component, e.g. 1024.
60  Will be the same as the output dimension if add-log-stddev=false.
61  block-dim Defaults to 'dim' you may specify a divisor
62  of 'dim'. In this case the input dimension will
63  be interpreted as blocks of dimension 'block-dim'
64  to which the nonlinearity described above is applied
65  separately.
66  add-log-stddev You can set this to true to add an extra output
67  dimension which will equal |x| / sqrt(dim(x)).
68  If block-dim is specified, this is done per block.
69  target-rms This defaults to 1.0, but if set it to another
70  (nonzero) value, the output will be scaled by this
71  factor.
72  */
74  public:
75  explicit NormalizeComponent(const NormalizeComponent &other);
76 
77  virtual int32 Properties() const {
81  }
83  virtual std::string Type() const { return "NormalizeComponent"; }
84  virtual void InitFromConfig(ConfigLine *cfl);
85  virtual Component* Copy() const { return new NormalizeComponent(*this); }
86  virtual void* Propagate(const ComponentPrecomputedIndexes *indexes,
87  const CuMatrixBase<BaseFloat> &in,
88  CuMatrixBase<BaseFloat> *out) const;
89  virtual void Backprop(const std::string &debug_info,
90  const ComponentPrecomputedIndexes *indexes,
91  const CuMatrixBase<BaseFloat> &in_value,
92  const CuMatrixBase<BaseFloat> &, // out_value
93  const CuMatrixBase<BaseFloat> &out_deriv,
94  void *memo,
95  Component *to_update,
96  CuMatrixBase<BaseFloat> *in_deriv) const;
97 
98  virtual void Read(std::istream &is, bool binary);
99  virtual void Write(std::ostream &os, bool binary) const;
100  virtual int32 InputDim() const { return input_dim_; }
101  virtual int32 OutputDim() const {
102  return (input_dim_ + (add_log_stddev_ ? (input_dim_ / block_dim_) : 0));
103  }
104  virtual std::string Info() const;
105  private:
106  NormalizeComponent &operator = (const NormalizeComponent &other); // Disallow.
107  enum { kExpSquaredNormFloor = -66 };
108  // kSquaredNormFloor is about 0.7e-20. We need a value that's exactly representable in
109  // float and whose inverse square root is also exactly representable
110  // in float (hence, an even power of two).
114  BaseFloat target_rms_; // The target rms for outputs, default 1.0.
115 
116  bool add_log_stddev_; // If true, log(max(epsi, sqrt(row_in^T row_in / D)))
117  // is an extra dimension of the output.
118 };
119 
120 
121 /*
122  BatchNormComponent
123 
124  This implements batch normalization; for each dimension of the
125  input it normalizes the data to be zero-mean, unit-variance. You
126  can set the block-dim configuration value to implement spatial
127  batch normalization, see the comment for the variable.
128 
129  If you want to combine this with the trainable offset and scale that the
130  original BatchNorm paper used, then follow this by the
131  ScaleAndOffsetComponent.
132 
133  It's a simple component (uses the kSimpleComponent flag), but it is unusual in
134  that it will give different results if you call it on half the matrix at a
135  time. Most of the time this would be pretty harmless, so we still return the
136  kSimpleComponent flag. We may have to modify the test code a little to
137  account for this, or possibly remove the kSimpleComponent flag. In some sense
138  each output Index depends on every input Index, but putting those dependencies
139  explicitly into the dependency-tracking framework as a GeneralComponent
140  would be very impractical and might lead to a lot of unnecessary things being
141  computed. You have to be a bit careful where you put this component, and understand
142  what you're doing e.g. putting it in the path of a recurrence is a bit problematic
143  if the minibatch size is small.
144 
145  Accepted configuration values:
146  dim Dimension of the input and output
147  block-dim Defaults to 'dim', but may be set to a divisor
148  of 'dim'. In this case, each block of dimension 'block-dim'
149  is treated like a separate row of the input matrix, which
150  means that the stats from n'th element of each
151  block are pooled into one class, for each n.
152  epsilon Small term added to the variance that is used to prevent
153  division by zero
154  target-rms This defaults to 1.0, but if set, for instance, to 2.0,
155  it will normalize the standard deviation of the output to
156  2.0. 'target-stddev' might be a more suitable name, but this
157  was chosen for consistency with NormalizeComponent.
158  */
160  public:
161 
163 
164  // call this with 'true' to set 'test mode' where the batch normalization is
165  // done with stored stats. There won't normally be any need to specially
166  // accumulate these stats; they are stored as a matter of course on each
167  // iteration of training, as for NonlinearComponents, and we'll use the stats
168  // from the most recent [script-level] iteration.
169  // (Note: it will refuse to actually set test-mode to true if there
170  // are no stats stored.)
171  void SetTestMode(bool test_mode);
172 
173  // constructor using another component
175 
176  virtual int32 InputDim() const { return dim_; }
177  virtual int32 OutputDim() const { return dim_; }
178 
179  virtual std::string Info() const;
180  virtual void InitFromConfig(ConfigLine *cfl);
181  virtual std::string Type() const { return "BatchNormComponent"; }
182  virtual int32 Properties() const {
183  // If the block-dim is less than the dim, we need the input and output
184  // matrices to be contiguous (stride==num-cols), as we'll be reshaping
185  // internally. This is not much of a cost, because this will be used
186  // in convnets where we have to do this anyway.
190  (test_mode_ ? 0 : kUsesMemo|kStoresStats);
191  }
192  virtual void* Propagate(const ComponentPrecomputedIndexes *indexes,
193  const CuMatrixBase<BaseFloat> &in,
194  CuMatrixBase<BaseFloat> *out) const;
195  virtual void Backprop(const std::string &debug_info,
196  const ComponentPrecomputedIndexes *indexes,
197  const CuMatrixBase<BaseFloat> &in_value,
198  const CuMatrixBase<BaseFloat> &out_value,
199  const CuMatrixBase<BaseFloat> &out_deriv,
200  void *memo,
201  Component *, // to_update,
202  CuMatrixBase<BaseFloat> *in_deriv) const;
203 
204  virtual void Read(std::istream &is, bool binary); // This Read function
205  // requires that the Component has the correct type.
206 
208  virtual void Write(std::ostream &os, bool binary) const;
209  virtual Component* Copy() const { return new BatchNormComponent(*this); }
210 
211  virtual void Scale(BaseFloat scale);
212  virtual void Add(BaseFloat alpha, const Component &other);
213  virtual void ZeroStats();
214 
215 
216  virtual void DeleteMemo(void *memo) const { delete static_cast<Memo*>(memo); }
217 
218  virtual void StoreStats(const CuMatrixBase<BaseFloat> &in_value,
219  const CuMatrixBase<BaseFloat> &out_value,
220  void *memo);
221 
222  // Members specific to this component type.
223  // Note: the offset and scale will only be nonempty in 'test mode'.
224  const CuVector<BaseFloat> &Offset() const { return offset_; }
225  const CuVector<BaseFloat> &Scale() const { return scale_; }
226 
227  private:
228 
229  struct Memo {
230  // number of frames (after any reshaping).
232  // 'sum_sumsq_scale' is of dimension 5 by block_dim_:
233  // Row 0 = mean = the mean of the rows of the input
234  // Row 1 = uvar = the uncentered variance of the input (= sumsq / num_frames).
235  // Row 2 = scale = the scale of the renormalization.
236  // Rows 3 and 4 are used as temporaries in Backprop.
238  };
239 
240  void Check() const;
241 
242  // this function is used in a couple of places; it turns the raw stats into
243  // the offset/scale term of a normalizing transform.
244  static void ComputeOffsetAndScale(double count,
245  BaseFloat epsilon,
246  const Vector<double> &stats_sum,
247  const Vector<double> &stats_sumsq,
248  Vector<BaseFloat> *offset,
249  Vector<BaseFloat> *scale);
250  // computes derived parameters offset_ and scale_.
251  void ComputeDerived();
252 
253  // Dimension of the input and output.
255  // This would normally be the same as dim_, but if it's less (and it must be >
256  // 0 and must divide dim_), then each separate block of the input of dimension
257  // 'block_dim_' is treated like a separate frame for the purposes of
258  // normalization. This can be used to implement spatial batch normalization
259  // for convolutional setups-- assuming the filter-dim has stride 1, which it
260  // always will in the new code in nnet-convolutional-component.h.
262 
263  // Used to avoid exact-zero variances, epsilon has the dimension of a
264  // covariance.
266 
267  // This value will normally be 1.0, which is the default, but you can set it
268  // to other values as a way to control how fast the following layer learns
269  // (smaller -> slower). The same config exists in NormalizeComponent.
271 
272  // This is true if we want the batch normalization to operate in 'test mode'
273  // meaning the data mean and stddev used for the normalization are fixed
274  // quantities based on previously accumulated stats. Note: the stats we use
275  // for this are based on the same 'StoreStats' mechanism as we use for
276  // components like SigmoidComponent and ReluComponent; we'll be using
277  // the stats from the most recent [script-level] iteration of training.
279 
280 
281  // total count of stats stored by StoreStats().
282  double count_;
283  // sum-of-data component of stats of input data.
285  // sum-of-squared component of stats of input data.
287 
288  // offset_ and scale_ are derived from stats_sum_ and stats_sumsq_; they
289  // dictate the transform that is done in 'test mode'. They are set only when
290  // reading the model from disk and when calling SetTestMode(true); they are
291  // resized to empty when the stats are updated, to ensure that out-of-date
292  // values are not kept around.
295 };
296 
297 
298 
299 } // namespace nnet3
300 } // namespace kaldi
301 
302 
303 #endif
virtual Component * Copy() const
Copies component (deep copy).
This code computes Goodness of Pronunciation (GOP) and extracts phone-level pronunciation feature for...
Definition: chain.dox:20
virtual int32 Properties() const
Return bitmask of the component&#39;s properties.
NormalizeComponent & operator=(const NormalizeComponent &other)
virtual Component * Copy() const
Copies component (deep copy).
Abstract base-class for neural-net components.
virtual void InitFromConfig(ConfigLine *cfl)
Initialize, from a ConfigLine object.
kaldi::int32 int32
This class represents a matrix that&#39;s stored on the GPU if we have one, and in memory if not...
Definition: matrix-common.h:71
virtual void Write(std::ostream &os, bool binary) const
Write component to stream.
virtual void Backprop(const std::string &debug_info, const ComponentPrecomputedIndexes *indexes, const CuMatrixBase< BaseFloat > &in_value, const CuMatrixBase< BaseFloat > &, const CuMatrixBase< BaseFloat > &out_deriv, void *memo, Component *to_update, CuMatrixBase< BaseFloat > *in_deriv) const
Backprop function; depending on which of the arguments &#39;to_update&#39; and &#39;in_deriv&#39; are non-NULL...
virtual std::string Info() const
Returns some text-form information about this component, for diagnostics.
virtual int32 Properties() const
Return bitmask of the component&#39;s properties.
const size_t count
virtual void Scale(BaseFloat scale)
This virtual function when called on – an UpdatableComponent scales the parameters by "scale" when c...
virtual void ZeroStats()
Components that provide an implementation of StoreStats should also provide an implementation of Zero...
virtual std::string Type() const
Returns a string such as "SigmoidComponent", describing the type of the object.
virtual void StoreStats(const CuMatrixBase< BaseFloat > &in_value, const CuMatrixBase< BaseFloat > &out_value, void *memo)
This function may store stats on average activation values, and for some component types...
virtual int32 InputDim() const
Returns input-dimension of this component.
virtual int32 OutputDim() const
Returns output-dimension of this component.
virtual std::string Type() const
Returns a string such as "SigmoidComponent", describing the type of the object.
const CuVector< BaseFloat > & Offset() const
Matrix for CUDA computing.
Definition: matrix-common.h:69
const CuVector< BaseFloat > & Scale() const
This class is responsible for parsing input like hi-there xx=yyy a=b c empty= f-oo=Append(bar, sss) ba_z=123 bing=&#39;a b c&#39; baz="a b c d=&#39;a b&#39; e" and giving you access to the fields, in this case.
Definition: text-utils.h:205
virtual void DeleteMemo(void *memo) const
This virtual function only needs to be overwritten by Components that return a non-NULL memo from the...
virtual int32 OutputDim() const
Returns output-dimension of this component.
virtual void Read(std::istream &is, bool binary)
Read function (used after we know the type of the Component); accepts input that is missing the token...
virtual int32 InputDim() const
Returns input-dimension of this component.
virtual void Add(BaseFloat alpha, const Component &other)
This virtual function when called by – an UpdatableComponent adds the parameters of another updatabl...
virtual void * Propagate(const ComponentPrecomputedIndexes *indexes, const CuMatrixBase< BaseFloat > &in, CuMatrixBase< BaseFloat > *out) const
Propagate function.