All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
BlstmProjected Class Reference

#include <nnet-blstm-projected.h>

Inheritance diagram for BlstmProjected:
Collaboration diagram for BlstmProjected:

Public Member Functions

 BlstmProjected (int32 input_dim, int32 output_dim)
 
 ~BlstmProjected ()
 
ComponentCopy () const
 Copy component (deep copy),. More...
 
ComponentType GetType () const
 Get Type Identification of the component,. More...
 
void InitData (std::istream &is)
 Initialize the content of the component by the 'line' from the prototype,. More...
 
void ReadData (std::istream &is, bool binary)
 Reads the component content. More...
 
void WriteData (std::ostream &os, bool binary) const
 Writes the component content. More...
 
int32 NumParams () const
 Number of trainable parameters,. More...
 
void GetGradient (VectorBase< BaseFloat > *gradient) const
 Get gradient reshaped as a vector,. More...
 
void GetParams (VectorBase< BaseFloat > *params) const
 Get the trainable parameters reshaped as a vector,. More...
 
void SetParams (const VectorBase< BaseFloat > &params)
 Set the trainable parameters from, reshaped as a vector,. More...
 
std::string Info () const
 Print some additional info (after <ComponentName> and the dims),. More...
 
std::string InfoGradient () const
 Print some additional info about gradient (after <...> and dims),. More...
 
void PropagateFnc (const CuMatrixBase< BaseFloat > &in, CuMatrixBase< BaseFloat > *out)
 Abstract interface for propagation/backpropagation. More...
 
void BackpropagateFnc (const CuMatrixBase< BaseFloat > &in, const CuMatrixBase< BaseFloat > &out, const CuMatrixBase< BaseFloat > &out_diff, CuMatrixBase< BaseFloat > *in_diff)
 Backward pass transformation (to be implemented by descending class...) More...
 
void Update (const CuMatrixBase< BaseFloat > &input, const CuMatrixBase< BaseFloat > &diff)
 Compute gradient and update parameters,. More...
 
- Public Member Functions inherited from MultistreamComponent
 MultistreamComponent (int32 input_dim, int32 output_dim)
 
bool IsMultistream () const
 Check if component has 'Recurrent' interface (trainable and recurrent),. More...
 
virtual void SetSeqLengths (const std::vector< int32 > &sequence_lengths)
 
int32 NumStreams () const
 
virtual void ResetStreams (const std::vector< int32 > &stream_reset_flag)
 Optional function to reset the transfer of context (not used for BLSTMs. More...
 
- Public Member Functions inherited from UpdatableComponent
 UpdatableComponent (int32 input_dim, int32 output_dim)
 
virtual ~UpdatableComponent ()
 
bool IsUpdatable () const
 Check if contains trainable parameters,. More...
 
virtual void SetTrainOptions (const NnetTrainOptions &opts)
 Set the training options to the component,. More...
 
const NnetTrainOptionsGetTrainOptions () const
 Get the training options from the component,. More...
 
virtual void SetLearnRateCoef (BaseFloat val)
 Set the learn-rate coefficient,. More...
 
virtual void SetBiasLearnRateCoef (BaseFloat val)
 Set the learn-rate coefficient for bias,. More...
 
- Public Member Functions inherited from Component
 Component (int32 input_dim, int32 output_dim)
 Generic interface of a component,. More...
 
virtual ~Component ()
 
int32 InputDim () const
 Get the dimension of the input,. More...
 
int32 OutputDim () const
 Get the dimension of the output,. More...
 
void Propagate (const CuMatrixBase< BaseFloat > &in, CuMatrix< BaseFloat > *out)
 Perform forward-pass propagation 'in' -> 'out',. More...
 
void Backpropagate (const CuMatrixBase< BaseFloat > &in, const CuMatrixBase< BaseFloat > &out, const CuMatrixBase< BaseFloat > &out_diff, CuMatrix< BaseFloat > *in_diff)
 Perform backward-pass propagation 'out_diff' -> 'in_diff'. More...
 
void Write (std::ostream &os, bool binary) const
 Write the component to a stream,. More...
 

Private Attributes

int32 cell_dim_
 the number of memory-cell blocks, More...
 
int32 proj_dim_
 recurrent projection layer dim, More...
 
BaseFloat cell_clip_
 Clipping of 'cell-values' in forward pass (per-frame),. More...
 
BaseFloat diff_clip_
 Clipping of 'derivatives' in backprop (per-frame),. More...
 
BaseFloat cell_diff_clip_
 Clipping of 'cell-derivatives' accumulated over CEC (per-frame),. More...
 
BaseFloat grad_clip_
 Clipping of the updates,. More...
 
CuMatrix< BaseFloatf_w_gifo_x_
 
CuMatrix< BaseFloatf_w_gifo_x_corr_
 
CuMatrix< BaseFloatb_w_gifo_x_
 
CuMatrix< BaseFloatb_w_gifo_x_corr_
 
CuMatrix< BaseFloatf_w_gifo_r_
 
CuMatrix< BaseFloatf_w_gifo_r_corr_
 
CuMatrix< BaseFloatb_w_gifo_r_
 
CuMatrix< BaseFloatb_w_gifo_r_corr_
 
CuVector< BaseFloatf_bias_
 
CuVector< BaseFloatf_bias_corr_
 
CuVector< BaseFloatb_bias_
 
CuVector< BaseFloatb_bias_corr_
 
CuVector< BaseFloatf_peephole_i_c_
 
CuVector< BaseFloatf_peephole_f_c_
 
CuVector< BaseFloatf_peephole_o_c_
 
CuVector< BaseFloatb_peephole_i_c_
 
CuVector< BaseFloatb_peephole_f_c_
 
CuVector< BaseFloatb_peephole_o_c_
 
CuVector< BaseFloatf_peephole_i_c_corr_
 
CuVector< BaseFloatf_peephole_f_c_corr_
 
CuVector< BaseFloatf_peephole_o_c_corr_
 
CuVector< BaseFloatb_peephole_i_c_corr_
 
CuVector< BaseFloatb_peephole_f_c_corr_
 
CuVector< BaseFloatb_peephole_o_c_corr_
 
CuMatrix< BaseFloatf_w_r_m_
 
CuMatrix< BaseFloatf_w_r_m_corr_
 
CuMatrix< BaseFloatb_w_r_m_
 
CuMatrix< BaseFloatb_w_r_m_corr_
 
CuMatrix< BaseFloatf_propagate_buf_
 
CuMatrix< BaseFloatb_propagate_buf_
 
CuMatrix< BaseFloatf_backpropagate_buf_
 
CuMatrix< BaseFloatb_backpropagate_buf_
 

Additional Inherited Members

- Public Types inherited from Component
enum  ComponentType {
  kUnknown = 0x0, kUpdatableComponent = 0x0100, kAffineTransform, kLinearTransform,
  kConvolutionalComponent, kConvolutional2DComponent, kLstmProjected, kBlstmProjected,
  kRecurrentComponent, kActivationFunction = 0x0200, kSoftmax, kHiddenSoftmax,
  kBlockSoftmax, kSigmoid, kTanh, kParametricRelu,
  kDropout, kLengthNormComponent, kTranform = 0x0400, kRbm,
  kSplice, kCopy, kTranspose, kBlockLinearity,
  kAddShift, kRescale, kKlHmm = 0x0800, kSentenceAveragingComponent,
  kSimpleSentenceAveragingComponent, kAveragePoolingComponent, kAveragePooling2DComponent, kMaxPoolingComponent,
  kMaxPooling2DComponent, kFramePoolingComponent, kParallelComponent, kMultiBasisComponent
}
 Component type identification mechanism,. More...
 
- Static Public Member Functions inherited from Component
static const char * TypeToMarker (ComponentType t)
 Converts component type to marker,. More...
 
static ComponentType MarkerToType (const std::string &s)
 Converts marker to component type (case insensitive),. More...
 
static ComponentInit (const std::string &conf_line)
 Initialize component from a line in config file,. More...
 
static ComponentRead (std::istream &is, bool binary)
 Read the component from a stream (static method),. More...
 
- Static Public Attributes inherited from Component
static const struct key_value kMarkerMap []
 The table with pairs of Component types and markers (defined in nnet-component.cc),. More...
 
- Protected Attributes inherited from MultistreamComponent
std::vector< int32 > sequence_lengths_
 
- Protected Attributes inherited from UpdatableComponent
NnetTrainOptions opts_
 Option-class with training hyper-parameters,. More...
 
BaseFloat learn_rate_coef_
 Scalar applied to learning rate for weight matrices (to be used in ::Update method),. More...
 
BaseFloat bias_learn_rate_coef_
 Scalar applied to learning rate for bias (to be used in ::Update method),. More...
 
- Protected Attributes inherited from Component
int32 input_dim_
 Data members,. More...
 
int32 output_dim_
 Dimension of the output of the Component,. More...
 

Detailed Description

Definition at line 50 of file nnet-blstm-projected.h.

Constructor & Destructor Documentation

BlstmProjected ( int32  input_dim,
int32  output_dim 
)
inline

Definition at line 52 of file nnet-blstm-projected.h.

Referenced by BlstmProjected::Copy().

52  :
53  MultistreamComponent(input_dim, output_dim),
54  cell_dim_(0),
55  proj_dim_(static_cast<int32>(output_dim/2)),
56  cell_clip_(50.0),
57  diff_clip_(1.0),
58  cell_diff_clip_(0.0),
59  grad_clip_(250.0)
60  { }
BaseFloat diff_clip_
Clipping of 'derivatives' in backprop (per-frame),.
BaseFloat cell_diff_clip_
Clipping of 'cell-derivatives' accumulated over CEC (per-frame),.
MultistreamComponent(int32 input_dim, int32 output_dim)
int32 proj_dim_
recurrent projection layer dim,
BaseFloat grad_clip_
Clipping of the updates,.
int32 cell_dim_
the number of memory-cell blocks,
BaseFloat cell_clip_
Clipping of 'cell-values' in forward pass (per-frame),.
~BlstmProjected ( )
inline

Definition at line 62 of file nnet-blstm-projected.h.

63  { }

Member Function Documentation

void BackpropagateFnc ( const CuMatrixBase< BaseFloat > &  in,
const CuMatrixBase< BaseFloat > &  out,
const CuMatrixBase< BaseFloat > &  out_diff,
CuMatrixBase< BaseFloat > *  in_diff 
)
inlinevirtual

Backward pass transformation (to be implemented by descending class...)

Implements Component.

Definition at line 720 of file nnet-blstm-projected.h.

References CuVectorBase< Real >::AddDiagMatMat(), CuMatrixBase< Real >::AddMatMat(), CuVectorBase< Real >::AddRowSumMat(), BlstmProjected::b_backpropagate_buf_, BlstmProjected::b_bias_corr_, BlstmProjected::b_peephole_f_c_, BlstmProjected::b_peephole_f_c_corr_, BlstmProjected::b_peephole_i_c_, BlstmProjected::b_peephole_i_c_corr_, BlstmProjected::b_peephole_o_c_, BlstmProjected::b_peephole_o_c_corr_, BlstmProjected::b_propagate_buf_, BlstmProjected::b_w_gifo_r_, BlstmProjected::b_w_gifo_r_corr_, BlstmProjected::b_w_gifo_x_, BlstmProjected::b_w_gifo_x_corr_, BlstmProjected::b_w_r_m_, BlstmProjected::b_w_r_m_corr_, BlstmProjected::cell_diff_clip_, BlstmProjected::cell_dim_, CuMatrixBase< Real >::ColRange(), BlstmProjected::diff_clip_, BlstmProjected::f_backpropagate_buf_, BlstmProjected::f_bias_corr_, BlstmProjected::f_peephole_f_c_, BlstmProjected::f_peephole_f_c_corr_, BlstmProjected::f_peephole_i_c_, BlstmProjected::f_peephole_i_c_corr_, BlstmProjected::f_peephole_o_c_, BlstmProjected::f_peephole_o_c_corr_, BlstmProjected::f_propagate_buf_, BlstmProjected::f_w_gifo_r_, BlstmProjected::f_w_gifo_r_corr_, BlstmProjected::f_w_gifo_x_, BlstmProjected::f_w_gifo_x_corr_, BlstmProjected::f_w_r_m_, BlstmProjected::f_w_r_m_corr_, Component::input_dim_, kaldi::kNoTrans, kaldi::kSetZero, kaldi::kTrans, NnetTrainOptions::momentum, CuMatrixBase< Real >::NumRows(), MultistreamComponent::NumStreams(), UpdatableComponent::opts_, BlstmProjected::proj_dim_, CuVector< Real >::Resize(), CuMatrix< Real >::Resize(), CuMatrixBase< Real >::RowRange(), and MultistreamComponent::sequence_lengths_.

723  {
724 
725  // the number of sequences to be processed in parallel
726  int32 T = in.NumRows() / NumStreams();
727  int32 S = NumStreams();
728 
729  // buffers,
732 
733  // FORWARD DIRECTION,
734  // forward-direction activations,
735  CuSubMatrix<BaseFloat> F_YG(f_propagate_buf_.ColRange(0*cell_dim_, cell_dim_));
736  CuSubMatrix<BaseFloat> F_YI(f_propagate_buf_.ColRange(1*cell_dim_, cell_dim_));
737  CuSubMatrix<BaseFloat> F_YF(f_propagate_buf_.ColRange(2*cell_dim_, cell_dim_));
738  CuSubMatrix<BaseFloat> F_YO(f_propagate_buf_.ColRange(3*cell_dim_, cell_dim_));
739  CuSubMatrix<BaseFloat> F_YC(f_propagate_buf_.ColRange(4*cell_dim_, cell_dim_));
740  CuSubMatrix<BaseFloat> F_YH(f_propagate_buf_.ColRange(5*cell_dim_, cell_dim_));
741  CuSubMatrix<BaseFloat> F_YM(f_propagate_buf_.ColRange(6*cell_dim_, cell_dim_));
742  CuSubMatrix<BaseFloat> F_YR(f_propagate_buf_.ColRange(7*cell_dim_, proj_dim_));
743 
744  // forward-direction derivatives,
745  CuSubMatrix<BaseFloat> F_DG(f_backpropagate_buf_.ColRange(0*cell_dim_, cell_dim_));
746  CuSubMatrix<BaseFloat> F_DI(f_backpropagate_buf_.ColRange(1*cell_dim_, cell_dim_));
747  CuSubMatrix<BaseFloat> F_DF(f_backpropagate_buf_.ColRange(2*cell_dim_, cell_dim_));
748  CuSubMatrix<BaseFloat> F_DO(f_backpropagate_buf_.ColRange(3*cell_dim_, cell_dim_));
749  CuSubMatrix<BaseFloat> F_DC(f_backpropagate_buf_.ColRange(4*cell_dim_, cell_dim_));
750  CuSubMatrix<BaseFloat> F_DH(f_backpropagate_buf_.ColRange(5*cell_dim_, cell_dim_));
751  CuSubMatrix<BaseFloat> F_DM(f_backpropagate_buf_.ColRange(6*cell_dim_, cell_dim_));
752  CuSubMatrix<BaseFloat> F_DR(f_backpropagate_buf_.ColRange(7*cell_dim_, proj_dim_));
753  CuSubMatrix<BaseFloat> F_DGIFO(f_backpropagate_buf_.ColRange(0, 4*cell_dim_));
754 
755  // pre-copy partial derivatives from the BLSTM output,
756  F_DR.RowRange(1*S, T*S).CopyFromMat(out_diff.ColRange(0, proj_dim_));
757 
758  // BufferPadding [T0]:dummy, [1,T]:current sequence, [T+1]: dummy,
759  for (int t = T; t >= 1; t--) {
760  CuSubMatrix<BaseFloat> y_g(F_YG.RowRange(t*S, S));
761  CuSubMatrix<BaseFloat> y_i(F_YI.RowRange(t*S, S));
762  CuSubMatrix<BaseFloat> y_f(F_YF.RowRange(t*S, S));
763  CuSubMatrix<BaseFloat> y_o(F_YO.RowRange(t*S, S));
764  CuSubMatrix<BaseFloat> y_c(F_YC.RowRange(t*S, S));
765  CuSubMatrix<BaseFloat> y_h(F_YH.RowRange(t*S, S));
766  CuSubMatrix<BaseFloat> y_m(F_YM.RowRange(t*S, S));
767  CuSubMatrix<BaseFloat> y_r(F_YR.RowRange(t*S, S));
768 
769  CuSubMatrix<BaseFloat> d_all(f_backpropagate_buf_.RowRange(t*S, S));
770  CuSubMatrix<BaseFloat> d_g(F_DG.RowRange(t*S, S));
771  CuSubMatrix<BaseFloat> d_i(F_DI.RowRange(t*S, S));
772  CuSubMatrix<BaseFloat> d_f(F_DF.RowRange(t*S, S));
773  CuSubMatrix<BaseFloat> d_o(F_DO.RowRange(t*S, S));
774  CuSubMatrix<BaseFloat> d_c(F_DC.RowRange(t*S, S));
775  CuSubMatrix<BaseFloat> d_h(F_DH.RowRange(t*S, S));
776  CuSubMatrix<BaseFloat> d_m(F_DM.RowRange(t*S, S));
777  CuSubMatrix<BaseFloat> d_r(F_DR.RowRange(t*S, S));
778  CuSubMatrix<BaseFloat> d_gifo(F_DGIFO.RowRange(t*S, S));
779 
780  // r
781  // Version 1 (precise gradients):
782  // backprop error from g(t+1), i(t+1), f(t+1), o(t+1) to r(t)
783  d_r.AddMatMat(1.0, F_DGIFO.RowRange((t+1)*S, S), kNoTrans, f_w_gifo_r_, kNoTrans, 1.0);
784 
785  /*
786  // Version 2 (Alex Graves' PhD dissertation):
787  // only backprop g(t+1) to r(t)
788  CuSubMatrix<BaseFloat> w_g_r_(w_gifo_r_.RowRange(0, cell_dim_));
789  d_r.AddMatMat(1.0, DG.RowRange((t+1)*S,S), kNoTrans, w_g_r_, kNoTrans, 1.0);
790  */
791 
792  /*
793  // Version 3 (Felix Gers' PhD dissertation):
794  // truncate gradients of g(t+1), i(t+1), f(t+1), o(t+1) once they leak out memory block
795  // CEC(with forget connection) is the only "error-bridge" through time
796  ;
797  */
798 
799  // r -> m
800  d_m.AddMatMat(1.0, d_r, kNoTrans, f_w_r_m_, kNoTrans, 0.0);
801 
802  // m -> h, via output gate
803  d_h.AddMatMatElements(1.0, d_m, y_o, 0.0);
804  d_h.DiffTanh(y_h, d_h);
805 
806  // o
807  d_o.AddMatMatElements(1.0, d_m, y_h, 0.0);
808  d_o.DiffSigmoid(y_o, d_o);
809 
810  // c
811  // 1. diff from h(t)
812  // 2. diff from c(t+1) (via forget-gate between CEC)
813  // 3. diff from i(t+1) (via peephole)
814  // 4. diff from f(t+1) (via peephole)
815  // 5. diff from o(t) (via peephole, not recurrent)
816  d_c.AddMat(1.0, d_h);
817  d_c.AddMatMatElements(1.0, F_DC.RowRange((t+1)*S, S), F_YF.RowRange((t+1)*S, S), 1.0);
818  d_c.AddMatDiagVec(1.0, F_DI.RowRange((t+1)*S, S), kNoTrans, f_peephole_i_c_, 1.0);
819  d_c.AddMatDiagVec(1.0, F_DF.RowRange((t+1)*S, S), kNoTrans, f_peephole_f_c_, 1.0);
820  d_c.AddMatDiagVec(1.0, d_o , kNoTrans, f_peephole_o_c_, 1.0);
821  // optionally clip the cell_derivative,
822  if (cell_diff_clip_ > 0.0) {
823  d_c.ApplyFloor(-cell_diff_clip_);
824  d_c.ApplyCeiling(cell_diff_clip_);
825  }
826 
827  // f
828  d_f.AddMatMatElements(1.0, d_c, F_YC.RowRange((t-1)*S, S), 0.0);
829  d_f.DiffSigmoid(y_f, d_f);
830 
831  // i
832  d_i.AddMatMatElements(1.0, d_c, y_g, 0.0);
833  d_i.DiffSigmoid(y_i, d_i);
834 
835  // c -> g, via input gate
836  d_g.AddMatMatElements(1.0, d_c, y_i, 0.0);
837  d_g.DiffTanh(y_g, d_g);
838 
839  // Clipping per-frame derivatives for the next `t'.
840  // Clipping applied to gates and input gate (as done in Google).
841  // [ICASSP2015, Sak, Learning acoustic frame labelling...],
842  //
843  // The path from 'out_diff' to 'd_c' via 'd_h' is unclipped,
844  // which is probably important for the 'Constant Error Carousel'
845  // to work well.
846  //
847  if (diff_clip_ > 0.0) {
848  d_gifo.ApplyFloor(-diff_clip_);
849  d_gifo.ApplyCeiling(diff_clip_);
850  }
851 
852  // set zeros to padded frames,
853  if (sequence_lengths_.size() > 0) {
854  for (int s = 0; s < S; s++) {
855  if (t > sequence_lengths_[s]) {
856  d_all.Row(s).SetZero();
857  }
858  }
859  }
860  }
861 
862  // BACKWARD DIRECTION,
863  // backward-direction activations,
864  CuSubMatrix<BaseFloat> B_YG(b_propagate_buf_.ColRange(0*cell_dim_, cell_dim_));
865  CuSubMatrix<BaseFloat> B_YI(b_propagate_buf_.ColRange(1*cell_dim_, cell_dim_));
866  CuSubMatrix<BaseFloat> B_YF(b_propagate_buf_.ColRange(2*cell_dim_, cell_dim_));
867  CuSubMatrix<BaseFloat> B_YO(b_propagate_buf_.ColRange(3*cell_dim_, cell_dim_));
868  CuSubMatrix<BaseFloat> B_YC(b_propagate_buf_.ColRange(4*cell_dim_, cell_dim_));
869  CuSubMatrix<BaseFloat> B_YH(b_propagate_buf_.ColRange(5*cell_dim_, cell_dim_));
870  CuSubMatrix<BaseFloat> B_YM(b_propagate_buf_.ColRange(6*cell_dim_, cell_dim_));
871  CuSubMatrix<BaseFloat> B_YR(b_propagate_buf_.ColRange(7*cell_dim_, proj_dim_));
872 
873  // backward-direction derivatives,
874  CuSubMatrix<BaseFloat> B_DG(b_backpropagate_buf_.ColRange(0*cell_dim_, cell_dim_));
875  CuSubMatrix<BaseFloat> B_DI(b_backpropagate_buf_.ColRange(1*cell_dim_, cell_dim_));
876  CuSubMatrix<BaseFloat> B_DF(b_backpropagate_buf_.ColRange(2*cell_dim_, cell_dim_));
877  CuSubMatrix<BaseFloat> B_DO(b_backpropagate_buf_.ColRange(3*cell_dim_, cell_dim_));
878  CuSubMatrix<BaseFloat> B_DC(b_backpropagate_buf_.ColRange(4*cell_dim_, cell_dim_));
879  CuSubMatrix<BaseFloat> B_DH(b_backpropagate_buf_.ColRange(5*cell_dim_, cell_dim_));
880  CuSubMatrix<BaseFloat> B_DM(b_backpropagate_buf_.ColRange(6*cell_dim_, cell_dim_));
881  CuSubMatrix<BaseFloat> B_DR(b_backpropagate_buf_.ColRange(7*cell_dim_, proj_dim_));
882  CuSubMatrix<BaseFloat> B_DGIFO(b_backpropagate_buf_.ColRange(0, 4*cell_dim_));
883 
884  // pre-copy partial derivatives from the BLSTM output,
885  B_DR.RowRange(1*S, T*S).CopyFromMat(out_diff.ColRange(proj_dim_, proj_dim_));
886 
887  // BufferPadding [T0]:dummy, [1,T]:current sequence, [T+1]: dummy,
888  for (int t = 1; t <= T; t++) {
889  CuSubMatrix<BaseFloat> y_g(B_YG.RowRange(t*S, S));
890  CuSubMatrix<BaseFloat> y_i(B_YI.RowRange(t*S, S));
891  CuSubMatrix<BaseFloat> y_f(B_YF.RowRange(t*S, S));
892  CuSubMatrix<BaseFloat> y_o(B_YO.RowRange(t*S, S));
893  CuSubMatrix<BaseFloat> y_c(B_YC.RowRange(t*S, S));
894  CuSubMatrix<BaseFloat> y_h(B_YH.RowRange(t*S, S));
895  CuSubMatrix<BaseFloat> y_m(B_YM.RowRange(t*S, S));
896  CuSubMatrix<BaseFloat> y_r(B_YR.RowRange(t*S, S));
897 
898  CuSubMatrix<BaseFloat> d_all(b_backpropagate_buf_.RowRange(t*S, S));
899  CuSubMatrix<BaseFloat> d_g(B_DG.RowRange(t*S, S));
900  CuSubMatrix<BaseFloat> d_i(B_DI.RowRange(t*S, S));
901  CuSubMatrix<BaseFloat> d_f(B_DF.RowRange(t*S, S));
902  CuSubMatrix<BaseFloat> d_o(B_DO.RowRange(t*S, S));
903  CuSubMatrix<BaseFloat> d_c(B_DC.RowRange(t*S, S));
904  CuSubMatrix<BaseFloat> d_h(B_DH.RowRange(t*S, S));
905  CuSubMatrix<BaseFloat> d_m(B_DM.RowRange(t*S, S));
906  CuSubMatrix<BaseFloat> d_r(B_DR.RowRange(t*S, S));
907  CuSubMatrix<BaseFloat> d_gifo(B_DGIFO.RowRange(t*S, S));
908 
909  // r
910  // Version 1 (precise gradients):
911  // backprop error from g(t-1), i(t-1), f(t-1), o(t-1) to r(t)
912  d_r.AddMatMat(1.0, B_DGIFO.RowRange((t-1)*S, S), kNoTrans, b_w_gifo_r_, kNoTrans, 1.0);
913 
914  /*
915  // Version 2 (Alex Graves' PhD dissertation):
916  // only backprop g(t+1) to r(t)
917  CuSubMatrix<BaseFloat> w_g_r_(w_gifo_r_.RowRange(0, cell_dim_));
918  d_r.AddMatMat(1.0, DG.RowRange((t+1)*S,S), kNoTrans, w_g_r_, kNoTrans, 1.0);
919  */
920 
921  /*
922  // Version 3 (Felix Gers' PhD dissertation):
923  // truncate gradients of g(t+1), i(t+1), f(t+1), o(t+1) once they leak out memory block
924  // CEC(with forget connection) is the only "error-bridge" through time
925  */
926 
927  // r -> m
928  d_m.AddMatMat(1.0, d_r, kNoTrans, b_w_r_m_, kNoTrans, 0.0);
929 
930  // m -> h via output gate
931  d_h.AddMatMatElements(1.0, d_m, y_o, 0.0);
932  d_h.DiffTanh(y_h, d_h);
933 
934  // o
935  d_o.AddMatMatElements(1.0, d_m, y_h, 0.0);
936  d_o.DiffSigmoid(y_o, d_o);
937 
938  // c
939  // 1. diff from h(t)
940  // 2. diff from c(t+1) (via forget-gate between CEC)
941  // 3. diff from i(t+1) (via peephole)
942  // 4. diff from f(t+1) (via peephole)
943  // 5. diff from o(t) (via peephole, not recurrent)
944  d_c.AddMat(1.0, d_h);
945  d_c.AddMatMatElements(1.0, B_DC.RowRange((t-1)*S, S), B_YF.RowRange((t-1)*S, S), 1.0);
946  d_c.AddMatDiagVec(1.0, B_DI.RowRange((t-1)*S, S), kNoTrans, b_peephole_i_c_, 1.0);
947  d_c.AddMatDiagVec(1.0, B_DF.RowRange((t-1)*S, S), kNoTrans, b_peephole_f_c_, 1.0);
948  d_c.AddMatDiagVec(1.0, d_o , kNoTrans, b_peephole_o_c_, 1.0);
949  // optionally clip the cell_derivative,
950  if (cell_diff_clip_ > 0.0) {
951  d_c.ApplyFloor(-cell_diff_clip_);
952  d_c.ApplyCeiling(cell_diff_clip_);
953  }
954 
955  // f
956  d_f.AddMatMatElements(1.0, d_c, B_YC.RowRange((t-1)*S, S), 0.0);
957  d_f.DiffSigmoid(y_f, d_f);
958 
959  // i
960  d_i.AddMatMatElements(1.0, d_c, y_g, 0.0);
961  d_i.DiffSigmoid(y_i, d_i);
962 
963  // c -> g, via input gate,
964  d_g.AddMatMatElements(1.0, d_c, y_i, 0.0);
965  d_g.DiffTanh(y_g, d_g);
966 
967  // Clipping per-frame derivatives for the next `t'.
968  // Clipping applied to gates and input gate (as done in Google).
969  // [ICASSP2015, Sak, Learning acoustic frame labelling...],
970  //
971  // The path from 'out_diff' to 'd_c' via 'd_h' is unclipped,
972  // which is probably important for the 'Constant Error Carousel'
973  // to work well.
974  //
975  if (diff_clip_ > 0.0) {
976  d_gifo.ApplyFloor(-diff_clip_);
977  d_gifo.ApplyCeiling(diff_clip_);
978  }
979 
980  // set zeros to padded frames,
981  if (sequence_lengths_.size() > 0) {
982  for (int s = 0; s < S; s++) {
983  if (t > sequence_lengths_[s]) {
984  d_all.Row(s).SetZero();
985  }
986  }
987  }
988  }
989 
990  // g,i,f,o -> x, calculating input derivatives,
991  // forward direction difference
992  in_diff->AddMatMat(1.0, F_DGIFO.RowRange(1*S, T*S), kNoTrans, f_w_gifo_x_, kNoTrans, 0.0);
993  // backward direction difference
994  in_diff->AddMatMat(1.0, B_DGIFO.RowRange(1*S, T*S), kNoTrans, b_w_gifo_x_, kNoTrans, 1.0);
995 
996  // lazy initialization of udpate buffers,
997  if (f_w_gifo_x_corr_.NumRows() == 0) {
998  // init delta buffers,
999  // forward direction,
1007 
1008  // backward direction,
1016  }
1017 
1018  // calculate delta
1019  const BaseFloat mmt = opts_.momentum;
1020 
1021  // forward direction
1022  // weight x -> g, i, f, o
1023  f_w_gifo_x_corr_.AddMatMat(1.0, F_DGIFO.RowRange(1*S, T*S), kTrans,
1024  in, kNoTrans, mmt);
1025  // recurrent weight r -> g, i, f, o
1026  f_w_gifo_r_corr_.AddMatMat(1.0, F_DGIFO.RowRange(1*S, T*S), kTrans,
1027  F_YR.RowRange(0*S, T*S), kNoTrans, mmt);
1028  // bias of g, i, f, o
1029  f_bias_corr_.AddRowSumMat(1.0, F_DGIFO.RowRange(1*S, T*S), mmt);
1030 
1031  // recurrent peephole c -> i
1032  f_peephole_i_c_corr_.AddDiagMatMat(1.0, F_DI.RowRange(1*S, T*S), kTrans,
1033  F_YC.RowRange(0*S, T*S), kNoTrans, mmt);
1034  // recurrent peephole c -> f
1035  f_peephole_f_c_corr_.AddDiagMatMat(1.0, F_DF.RowRange(1*S, T*S), kTrans,
1036  F_YC.RowRange(0*S, T*S), kNoTrans, mmt);
1037  // peephole c -> o
1038  f_peephole_o_c_corr_.AddDiagMatMat(1.0, F_DO.RowRange(1*S, T*S), kTrans,
1039  F_YC.RowRange(1*S, T*S), kNoTrans, mmt);
1040 
1041  f_w_r_m_corr_.AddMatMat(1.0, F_DR.RowRange(1*S, T*S), kTrans,
1042  F_YM.RowRange(1*S, T*S), kNoTrans, mmt);
1043 
1044  // backward direction backpropagate
1045  // weight x -> g, i, f, o
1046  b_w_gifo_x_corr_.AddMatMat(1.0, B_DGIFO.RowRange(1*S, T*S), kTrans, in, kNoTrans, mmt);
1047  // recurrent weight r -> g, i, f, o
1048  b_w_gifo_r_corr_.AddMatMat(1.0, B_DGIFO.RowRange(1*S, T*S), kTrans,
1049  B_YR.RowRange(0*S, T*S) , kNoTrans, mmt);
1050  // bias of g, i, f, o
1051  b_bias_corr_.AddRowSumMat(1.0, B_DGIFO.RowRange(1*S, T*S), mmt);
1052 
1053  // recurrent peephole c -> i, c(t+1) --> i
1054  b_peephole_i_c_corr_.AddDiagMatMat(1.0, B_DI.RowRange(1*S, T*S), kTrans,
1055  B_YC.RowRange(2*S, T*S), kNoTrans, mmt);
1056  // recurrent peephole c -> f, c(t+1) --> f
1057  b_peephole_f_c_corr_.AddDiagMatMat(1.0, B_DF.RowRange(1*S, T*S), kTrans,
1058  B_YC.RowRange(2*S, T*S), kNoTrans, mmt);
1059  // peephole c -> o
1060  b_peephole_o_c_corr_.AddDiagMatMat(1.0, B_DO.RowRange(1*S, T*S), kTrans,
1061  B_YC.RowRange(1*S, T*S), kNoTrans, mmt);
1062 
1063  b_w_r_m_corr_.AddMatMat(1.0, B_DR.RowRange(1*S, T*S), kTrans,
1064  B_YM.RowRange(1*S, T*S), kNoTrans, mmt);
1065  }
CuVector< BaseFloat > b_peephole_i_c_
CuMatrix< BaseFloat > b_propagate_buf_
CuVector< BaseFloat > b_peephole_o_c_corr_
CuMatrix< BaseFloat > b_w_gifo_x_corr_
BaseFloat diff_clip_
Clipping of 'derivatives' in backprop (per-frame),.
NnetTrainOptions opts_
Option-class with training hyper-parameters,.
int32 input_dim_
Data members,.
CuVector< BaseFloat > f_peephole_o_c_corr_
CuVector< BaseFloat > b_peephole_i_c_corr_
CuMatrix< BaseFloat > f_w_gifo_x_corr_
CuMatrix< BaseFloat > f_backpropagate_buf_
CuVector< BaseFloat > f_peephole_o_c_
BaseFloat cell_diff_clip_
Clipping of 'cell-derivatives' accumulated over CEC (per-frame),.
void Resize(MatrixIndexT dim, MatrixResizeType t=kSetZero)
Allocate the memory.
Definition: cu-vector.cc:892
CuSubMatrix< Real > RowRange(const MatrixIndexT row_offset, const MatrixIndexT num_rows) const
Definition: cu-matrix.h:539
void AddDiagMatMat(Real alpha, const CuMatrixBase< Real > &M, MatrixTransposeType transM, const CuMatrixBase< Real > &N, MatrixTransposeType transN, Real beta=1.0)
Add the diagonal of a matrix product: *this = diag(M N), assuming the "trans" arguments are both kNoT...
Definition: cu-vector.cc:544
CuVector< BaseFloat > f_bias_corr_
int32 proj_dim_
recurrent projection layer dim,
void AddRowSumMat(Real alpha, const CuMatrixBase< Real > &mat, Real beta=1.0)
Sum the rows of the matrix, add to vector.
Definition: cu-vector.cc:1166
float BaseFloat
Definition: kaldi-types.h:29
CuSubMatrix< Real > ColRange(const MatrixIndexT col_offset, const MatrixIndexT num_cols) const
Definition: cu-matrix.h:544
CuVector< BaseFloat > b_peephole_f_c_
void Resize(MatrixIndexT rows, MatrixIndexT cols, MatrixResizeType resize_type=kSetZero, MatrixStrideType stride_type=kDefaultStride)
Allocate the memory.
Definition: cu-matrix.cc:47
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
CuVector< BaseFloat > f_peephole_i_c_corr_
CuVector< BaseFloat > f_peephole_f_c_
void AddMatMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
C = alpha * A(^T)*B(^T) + beta * C.
Definition: cu-matrix.cc:1142
CuVector< BaseFloat > b_peephole_o_c_
CuVector< BaseFloat > b_peephole_f_c_corr_
CuVector< BaseFloat > f_peephole_i_c_
CuMatrix< BaseFloat > b_w_gifo_r_corr_
CuMatrix< BaseFloat > f_w_r_m_corr_
CuVector< BaseFloat > f_peephole_f_c_corr_
CuMatrix< BaseFloat > f_propagate_buf_
int32 cell_dim_
the number of memory-cell blocks,
std::vector< int32 > sequence_lengths_
CuMatrix< BaseFloat > b_w_r_m_corr_
CuMatrix< BaseFloat > f_w_gifo_r_corr_
CuMatrix< BaseFloat > b_backpropagate_buf_
CuVector< BaseFloat > b_bias_corr_
Component* Copy ( ) const
inlinevirtual

Copy component (deep copy),.

Implements Component.

Definition at line 65 of file nnet-blstm-projected.h.

References BlstmProjected::BlstmProjected().

65 { return new BlstmProjected(*this); }
BlstmProjected(int32 input_dim, int32 output_dim)
void GetGradient ( VectorBase< BaseFloat > *  gradient) const
inlinevirtual

Get gradient reshaped as a vector,.

Implements UpdatableComponent.

Definition at line 243 of file nnet-blstm-projected.h.

References BlstmProjected::b_bias_, BlstmProjected::b_bias_corr_, BlstmProjected::b_peephole_f_c_, BlstmProjected::b_peephole_f_c_corr_, BlstmProjected::b_peephole_i_c_, BlstmProjected::b_peephole_i_c_corr_, BlstmProjected::b_peephole_o_c_, BlstmProjected::b_peephole_o_c_corr_, BlstmProjected::b_w_gifo_r_, BlstmProjected::b_w_gifo_r_corr_, BlstmProjected::b_w_gifo_x_, BlstmProjected::b_w_gifo_x_corr_, BlstmProjected::b_w_r_m_, BlstmProjected::b_w_r_m_corr_, VectorBase< Real >::Dim(), CuVectorBase< Real >::Dim(), BlstmProjected::f_bias_, BlstmProjected::f_bias_corr_, BlstmProjected::f_peephole_f_c_, BlstmProjected::f_peephole_f_c_corr_, BlstmProjected::f_peephole_i_c_, BlstmProjected::f_peephole_i_c_corr_, BlstmProjected::f_peephole_o_c_, BlstmProjected::f_peephole_o_c_corr_, BlstmProjected::f_w_gifo_r_, BlstmProjected::f_w_gifo_r_corr_, BlstmProjected::f_w_gifo_x_, BlstmProjected::f_w_gifo_x_corr_, BlstmProjected::f_w_r_m_, BlstmProjected::f_w_r_m_corr_, KALDI_ASSERT, CuMatrixBase< Real >::NumCols(), BlstmProjected::NumParams(), CuMatrixBase< Real >::NumRows(), and VectorBase< Real >::Range().

243  {
244  KALDI_ASSERT(gradient->Dim() == NumParams());
245  int32 offset, len;
246 
247  // Copying parameters corresponding to forward direction
248  offset = 0; len = f_w_gifo_x_.NumRows() * f_w_gifo_x_.NumCols();
249  gradient->Range(offset, len).CopyRowsFromMat(f_w_gifo_x_corr_);
250 
251  offset += len; len = f_w_gifo_r_.NumRows() * f_w_gifo_r_.NumCols();
252  gradient->Range(offset, len).CopyRowsFromMat(f_w_gifo_r_corr_);
253 
254  offset += len; len = f_bias_.Dim();
255  gradient->Range(offset, len).CopyFromVec(f_bias_corr_);
256 
257  offset += len; len = f_peephole_i_c_.Dim();
258  gradient->Range(offset, len).CopyFromVec(f_peephole_i_c_corr_);
259 
260  offset += len; len = f_peephole_f_c_.Dim();
261  gradient->Range(offset, len).CopyFromVec(f_peephole_f_c_corr_);
262 
263  offset += len; len = f_peephole_o_c_.Dim();
264  gradient->Range(offset, len).CopyFromVec(f_peephole_o_c_corr_);
265 
266  offset += len; len = f_w_r_m_.NumRows() * f_w_r_m_.NumCols();
267  gradient->Range(offset, len).CopyRowsFromMat(f_w_r_m_corr_);
268 
269  // Copying parameters corresponding to backward direction
270  offset += len; len = b_w_gifo_x_.NumRows() * b_w_gifo_x_.NumCols();
271  gradient->Range(offset, len).CopyRowsFromMat(b_w_gifo_x_corr_);
272 
273  offset += len; len = b_w_gifo_r_.NumRows() * b_w_gifo_r_.NumCols();
274  gradient->Range(offset, len).CopyRowsFromMat(b_w_gifo_r_corr_);
275 
276  offset += len; len = b_bias_.Dim();
277  gradient->Range(offset, len).CopyFromVec(b_bias_corr_);
278 
279  offset += len; len = b_peephole_i_c_.Dim();
280  gradient->Range(offset, len).CopyFromVec(b_peephole_i_c_corr_);
281 
282  offset += len; len = b_peephole_f_c_.Dim();
283  gradient->Range(offset, len).CopyFromVec(b_peephole_f_c_corr_);
284 
285  offset += len; len = b_peephole_o_c_.Dim();
286  gradient->Range(offset, len).CopyFromVec(b_peephole_o_c_corr_);
287 
288  offset += len; len = b_w_r_m_.NumRows() * b_w_r_m_.NumCols();
289  gradient->Range(offset, len).CopyRowsFromMat(b_w_r_m_corr_);
290 
291  // check the dim,
292  offset += len;
293  KALDI_ASSERT(offset == NumParams());
294  }
CuVector< BaseFloat > b_peephole_i_c_
CuVector< BaseFloat > b_peephole_o_c_corr_
CuMatrix< BaseFloat > b_w_gifo_x_corr_
CuVector< BaseFloat > f_peephole_o_c_corr_
CuVector< BaseFloat > b_peephole_i_c_corr_
CuMatrix< BaseFloat > f_w_gifo_x_corr_
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
CuVector< BaseFloat > f_peephole_o_c_
CuVector< BaseFloat > f_bias_corr_
CuVector< BaseFloat > b_peephole_f_c_
MatrixIndexT Dim() const
Dimensions.
Definition: cu-vector.h:67
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
CuVector< BaseFloat > f_peephole_i_c_corr_
CuVector< BaseFloat > f_peephole_f_c_
CuVector< BaseFloat > b_peephole_o_c_
CuVector< BaseFloat > b_peephole_f_c_corr_
CuVector< BaseFloat > f_peephole_i_c_
CuMatrix< BaseFloat > b_w_gifo_r_corr_
CuMatrix< BaseFloat > f_w_r_m_corr_
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
CuVector< BaseFloat > f_peephole_f_c_corr_
int32 NumParams() const
Number of trainable parameters,.
CuMatrix< BaseFloat > b_w_r_m_corr_
CuMatrix< BaseFloat > f_w_gifo_r_corr_
MatrixIndexT Dim() const
Returns the dimension of the vector.
Definition: kaldi-vector.h:62
SubVector< Real > Range(const MatrixIndexT o, const MatrixIndexT l)
Returns a sub-vector of a vector (a range of elements).
Definition: kaldi-vector.h:92
CuVector< BaseFloat > b_bias_corr_
void GetParams ( VectorBase< BaseFloat > *  params) const
inlinevirtual

Get the trainable parameters reshaped as a vector,.

Implements UpdatableComponent.

Definition at line 296 of file nnet-blstm-projected.h.

References BlstmProjected::b_bias_, BlstmProjected::b_peephole_f_c_, BlstmProjected::b_peephole_i_c_, BlstmProjected::b_peephole_o_c_, BlstmProjected::b_w_gifo_r_, BlstmProjected::b_w_gifo_x_, BlstmProjected::b_w_r_m_, VectorBase< Real >::Dim(), CuVectorBase< Real >::Dim(), BlstmProjected::f_bias_, BlstmProjected::f_peephole_f_c_, BlstmProjected::f_peephole_i_c_, BlstmProjected::f_peephole_o_c_, BlstmProjected::f_w_gifo_r_, BlstmProjected::f_w_gifo_x_, BlstmProjected::f_w_r_m_, KALDI_ASSERT, CuMatrixBase< Real >::NumCols(), BlstmProjected::NumParams(), CuMatrixBase< Real >::NumRows(), and VectorBase< Real >::Range().

296  {
297  KALDI_ASSERT(params->Dim() == NumParams());
298  int32 offset, len;
299 
300  // Copying parameters corresponding to forward direction
301  offset = 0; len = f_w_gifo_x_.NumRows() * f_w_gifo_x_.NumCols();
302  params->Range(offset, len).CopyRowsFromMat(f_w_gifo_x_);
303 
304  offset += len; len = f_w_gifo_r_.NumRows() * f_w_gifo_r_.NumCols();
305  params->Range(offset, len).CopyRowsFromMat(f_w_gifo_r_);
306 
307  offset += len; len = f_bias_.Dim();
308  params->Range(offset, len).CopyFromVec(f_bias_);
309 
310  offset += len; len = f_peephole_i_c_.Dim();
311  params->Range(offset, len).CopyFromVec(f_peephole_i_c_);
312 
313  offset += len; len = f_peephole_f_c_.Dim();
314  params->Range(offset, len).CopyFromVec(f_peephole_f_c_);
315 
316  offset += len; len = f_peephole_o_c_.Dim();
317  params->Range(offset, len).CopyFromVec(f_peephole_o_c_);
318 
319  offset += len; len = f_w_r_m_.NumRows() * f_w_r_m_.NumCols();
320  params->Range(offset, len).CopyRowsFromMat(f_w_r_m_);
321 
322  // Copying parameters corresponding to backward direction
323  offset += len; len = b_w_gifo_x_.NumRows() * b_w_gifo_x_.NumCols();
324  params->Range(offset, len).CopyRowsFromMat(b_w_gifo_x_);
325 
326  offset += len; len = b_w_gifo_r_.NumRows() * b_w_gifo_r_.NumCols();
327  params->Range(offset, len).CopyRowsFromMat(b_w_gifo_r_);
328 
329  offset += len; len = b_bias_.Dim();
330  params->Range(offset, len).CopyFromVec(b_bias_);
331 
332  offset += len; len = b_peephole_i_c_.Dim();
333  params->Range(offset, len).CopyFromVec(b_peephole_i_c_);
334 
335  offset += len; len = b_peephole_f_c_.Dim();
336  params->Range(offset, len).CopyFromVec(b_peephole_f_c_);
337 
338  offset += len; len = b_peephole_o_c_.Dim();
339  params->Range(offset, len).CopyFromVec(b_peephole_o_c_);
340 
341  offset += len; len = b_w_r_m_.NumRows() * b_w_r_m_.NumCols();
342  params->Range(offset, len).CopyRowsFromMat(b_w_r_m_);
343 
344  // check the dim,
345  offset += len;
346  KALDI_ASSERT(offset == NumParams());
347  }
CuVector< BaseFloat > b_peephole_i_c_
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
CuVector< BaseFloat > f_peephole_o_c_
CuVector< BaseFloat > b_peephole_f_c_
MatrixIndexT Dim() const
Dimensions.
Definition: cu-vector.h:67
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
CuVector< BaseFloat > f_peephole_f_c_
CuVector< BaseFloat > b_peephole_o_c_
CuVector< BaseFloat > f_peephole_i_c_
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
int32 NumParams() const
Number of trainable parameters,.
MatrixIndexT Dim() const
Returns the dimension of the vector.
Definition: kaldi-vector.h:62
SubVector< Real > Range(const MatrixIndexT o, const MatrixIndexT l)
Returns a sub-vector of a vector (a range of elements).
Definition: kaldi-vector.h:92
ComponentType GetType ( ) const
inlinevirtual

Get Type Identification of the component,.

Implements Component.

Definition at line 66 of file nnet-blstm-projected.h.

References Component::kBlstmProjected.

std::string Info ( ) const
inlinevirtual

Print some additional info (after <ComponentName> and the dims),.

Reimplemented from Component.

Definition at line 403 of file nnet-blstm-projected.h.

References BlstmProjected::b_bias_, BlstmProjected::b_peephole_f_c_, BlstmProjected::b_peephole_i_c_, BlstmProjected::b_peephole_o_c_, BlstmProjected::b_w_gifo_r_, BlstmProjected::b_w_gifo_x_, BlstmProjected::b_w_r_m_, UpdatableComponent::bias_learn_rate_coef_, BlstmProjected::cell_clip_, BlstmProjected::cell_dim_, BlstmProjected::diff_clip_, BlstmProjected::f_bias_, BlstmProjected::f_peephole_f_c_, BlstmProjected::f_peephole_i_c_, BlstmProjected::f_peephole_o_c_, BlstmProjected::f_w_gifo_r_, BlstmProjected::f_w_gifo_x_, BlstmProjected::f_w_r_m_, BlstmProjected::grad_clip_, UpdatableComponent::learn_rate_coef_, kaldi::nnet1::MomentStatistics(), and kaldi::nnet1::ToString().

403  {
404  return std::string("cell-dim 2x") + ToString(cell_dim_) + " " +
405  "( learn_rate_coef_ " + ToString(learn_rate_coef_) +
406  ", bias_learn_rate_coef_ " + ToString(bias_learn_rate_coef_) +
407  ", cell_clip_ " + ToString(cell_clip_) +
408  ", diff_clip_ " + ToString(diff_clip_) +
409  ", grad_clip_ " + ToString(grad_clip_) + " )" +
410  "\n Forward Direction weights:" +
411  "\n f_w_gifo_x_ " + MomentStatistics(f_w_gifo_x_) +
412  "\n f_w_gifo_r_ " + MomentStatistics(f_w_gifo_r_) +
413  "\n f_bias_ " + MomentStatistics(f_bias_) +
414  "\n f_peephole_i_c_ " + MomentStatistics(f_peephole_i_c_) +
415  "\n f_peephole_f_c_ " + MomentStatistics(f_peephole_f_c_) +
416  "\n f_peephole_o_c_ " + MomentStatistics(f_peephole_o_c_) +
417  "\n f_w_r_m_ " + MomentStatistics(f_w_r_m_) +
418  "\n Backward Direction weights:" +
419  "\n b_w_gifo_x_ " + MomentStatistics(b_w_gifo_x_) +
420  "\n b_w_gifo_r_ " + MomentStatistics(b_w_gifo_r_) +
421  "\n b_bias_ " + MomentStatistics(b_bias_) +
422  "\n b_peephole_i_c_ " + MomentStatistics(b_peephole_i_c_) +
423  "\n b_peephole_f_c_ " + MomentStatistics(b_peephole_f_c_) +
424  "\n b_peephole_o_c_ " + MomentStatistics(b_peephole_o_c_) +
425  "\n b_w_r_m_ " + MomentStatistics(b_w_r_m_);
426  }
std::string ToString(const T &t)
Convert basic type to a string (please don't overuse),.
Definition: nnet-utils.h:52
CuVector< BaseFloat > b_peephole_i_c_
BaseFloat diff_clip_
Clipping of 'derivatives' in backprop (per-frame),.
std::string MomentStatistics(const VectorBase< Real > &vec)
Get a string with statistics of the data in a vector, so we can print them easily.
Definition: nnet-utils.h:63
BaseFloat bias_learn_rate_coef_
Scalar applied to learning rate for bias (to be used in ::Update method),.
BaseFloat learn_rate_coef_
Scalar applied to learning rate for weight matrices (to be used in ::Update method),.
CuVector< BaseFloat > f_peephole_o_c_
CuVector< BaseFloat > b_peephole_f_c_
BaseFloat grad_clip_
Clipping of the updates,.
CuVector< BaseFloat > f_peephole_f_c_
CuVector< BaseFloat > b_peephole_o_c_
CuVector< BaseFloat > f_peephole_i_c_
int32 cell_dim_
the number of memory-cell blocks,
BaseFloat cell_clip_
Clipping of 'cell-values' in forward pass (per-frame),.
std::string InfoGradient ( ) const
inlinevirtual

Print some additional info about gradient (after <...> and dims),.

Reimplemented from Component.

Definition at line 429 of file nnet-blstm-projected.h.

References BlstmProjected::b_backpropagate_buf_, BlstmProjected::b_bias_corr_, BlstmProjected::b_peephole_f_c_corr_, BlstmProjected::b_peephole_i_c_corr_, BlstmProjected::b_peephole_o_c_corr_, BlstmProjected::b_propagate_buf_, BlstmProjected::b_w_gifo_r_corr_, BlstmProjected::b_w_gifo_x_corr_, BlstmProjected::b_w_r_m_corr_, UpdatableComponent::bias_learn_rate_coef_, BlstmProjected::cell_clip_, BlstmProjected::cell_dim_, CuMatrixBase< Real >::ColRange(), BlstmProjected::diff_clip_, BlstmProjected::f_backpropagate_buf_, BlstmProjected::f_bias_corr_, BlstmProjected::f_peephole_f_c_corr_, BlstmProjected::f_peephole_i_c_corr_, BlstmProjected::f_peephole_o_c_corr_, BlstmProjected::f_propagate_buf_, BlstmProjected::f_w_gifo_r_corr_, BlstmProjected::f_w_gifo_x_corr_, BlstmProjected::f_w_r_m_corr_, BlstmProjected::grad_clip_, UpdatableComponent::learn_rate_coef_, kaldi::nnet1::MomentStatistics(), BlstmProjected::proj_dim_, and kaldi::nnet1::ToString().

429  {
430  // forward-direction activations,
431  const CuSubMatrix<BaseFloat> YG_FW(f_propagate_buf_.ColRange(0*cell_dim_, cell_dim_));
432  const CuSubMatrix<BaseFloat> YI_FW(f_propagate_buf_.ColRange(1*cell_dim_, cell_dim_));
433  const CuSubMatrix<BaseFloat> YF_FW(f_propagate_buf_.ColRange(2*cell_dim_, cell_dim_));
434  const CuSubMatrix<BaseFloat> YO_FW(f_propagate_buf_.ColRange(3*cell_dim_, cell_dim_));
435  const CuSubMatrix<BaseFloat> YC_FW(f_propagate_buf_.ColRange(4*cell_dim_, cell_dim_));
436  const CuSubMatrix<BaseFloat> YH_FW(f_propagate_buf_.ColRange(5*cell_dim_, cell_dim_));
437  const CuSubMatrix<BaseFloat> YM_FW(f_propagate_buf_.ColRange(6*cell_dim_, cell_dim_));
438  const CuSubMatrix<BaseFloat> YR_FW(f_propagate_buf_.ColRange(7*cell_dim_, proj_dim_));
439 
440  // forward-direction derivatives,
441  const CuSubMatrix<BaseFloat> DG_FW(f_backpropagate_buf_.ColRange(0*cell_dim_, cell_dim_));
442  const CuSubMatrix<BaseFloat> DI_FW(f_backpropagate_buf_.ColRange(1*cell_dim_, cell_dim_));
443  const CuSubMatrix<BaseFloat> DF_FW(f_backpropagate_buf_.ColRange(2*cell_dim_, cell_dim_));
444  const CuSubMatrix<BaseFloat> DO_FW(f_backpropagate_buf_.ColRange(3*cell_dim_, cell_dim_));
445  const CuSubMatrix<BaseFloat> DC_FW(f_backpropagate_buf_.ColRange(4*cell_dim_, cell_dim_));
446  const CuSubMatrix<BaseFloat> DH_FW(f_backpropagate_buf_.ColRange(5*cell_dim_, cell_dim_));
447  const CuSubMatrix<BaseFloat> DM_FW(f_backpropagate_buf_.ColRange(6*cell_dim_, cell_dim_));
448  const CuSubMatrix<BaseFloat> DR_FW(f_backpropagate_buf_.ColRange(7*cell_dim_, proj_dim_));
449 
450  // backward-direction activations,
451  const CuSubMatrix<BaseFloat> YG_BW(b_propagate_buf_.ColRange(0*cell_dim_, cell_dim_));
452  const CuSubMatrix<BaseFloat> YI_BW(b_propagate_buf_.ColRange(1*cell_dim_, cell_dim_));
453  const CuSubMatrix<BaseFloat> YF_BW(b_propagate_buf_.ColRange(2*cell_dim_, cell_dim_));
454  const CuSubMatrix<BaseFloat> YO_BW(b_propagate_buf_.ColRange(3*cell_dim_, cell_dim_));
455  const CuSubMatrix<BaseFloat> YC_BW(b_propagate_buf_.ColRange(4*cell_dim_, cell_dim_));
456  const CuSubMatrix<BaseFloat> YH_BW(b_propagate_buf_.ColRange(5*cell_dim_, cell_dim_));
457  const CuSubMatrix<BaseFloat> YM_BW(b_propagate_buf_.ColRange(6*cell_dim_, cell_dim_));
458  const CuSubMatrix<BaseFloat> YR_BW(b_propagate_buf_.ColRange(7*cell_dim_, proj_dim_));
459 
460  // backward-direction derivatives,
461  const CuSubMatrix<BaseFloat> DG_BW(b_backpropagate_buf_.ColRange(0*cell_dim_, cell_dim_));
462  const CuSubMatrix<BaseFloat> DI_BW(b_backpropagate_buf_.ColRange(1*cell_dim_, cell_dim_));
463  const CuSubMatrix<BaseFloat> DF_BW(b_backpropagate_buf_.ColRange(2*cell_dim_, cell_dim_));
464  const CuSubMatrix<BaseFloat> DO_BW(b_backpropagate_buf_.ColRange(3*cell_dim_, cell_dim_));
465  const CuSubMatrix<BaseFloat> DC_BW(b_backpropagate_buf_.ColRange(4*cell_dim_, cell_dim_));
466  const CuSubMatrix<BaseFloat> DH_BW(b_backpropagate_buf_.ColRange(5*cell_dim_, cell_dim_));
467  const CuSubMatrix<BaseFloat> DM_BW(b_backpropagate_buf_.ColRange(6*cell_dim_, cell_dim_));
468  const CuSubMatrix<BaseFloat> DR_BW(b_backpropagate_buf_.ColRange(7*cell_dim_, proj_dim_));
469 
470  return std::string("") +
471  "( learn_rate_coef_ " + ToString(learn_rate_coef_) +
472  ", bias_learn_rate_coef_ " + ToString(bias_learn_rate_coef_) +
473  ", cell_clip_ " + ToString(cell_clip_) +
474  ", diff_clip_ " + ToString(diff_clip_) +
475  ", grad_clip_ " + ToString(grad_clip_) + " )" +
476  "\n ### Gradients " +
477  "\n f_w_gifo_x_corr_ " + MomentStatistics(f_w_gifo_x_corr_) +
478  "\n f_w_gifo_r_corr_ " + MomentStatistics(f_w_gifo_r_corr_) +
479  "\n f_bias_corr_ " + MomentStatistics(f_bias_corr_) +
480  "\n f_peephole_i_c_corr_ " + MomentStatistics(f_peephole_i_c_corr_) +
481  "\n f_peephole_f_c_corr_ " + MomentStatistics(f_peephole_f_c_corr_) +
482  "\n f_peephole_o_c_corr_ " + MomentStatistics(f_peephole_o_c_corr_) +
483  "\n f_w_r_m_corr_ " + MomentStatistics(f_w_r_m_corr_) +
484  "\n ---" +
485  "\n b_w_gifo_x_corr_ " + MomentStatistics(b_w_gifo_x_corr_) +
486  "\n b_w_gifo_r_corr_ " + MomentStatistics(b_w_gifo_r_corr_) +
487  "\n b_bias_corr_ " + MomentStatistics(b_bias_corr_) +
488  "\n b_peephole_i_c_corr_ " + MomentStatistics(b_peephole_i_c_corr_) +
489  "\n b_peephole_f_c_corr_ " + MomentStatistics(b_peephole_f_c_corr_) +
490  "\n b_peephole_o_c_corr_ " + MomentStatistics(b_peephole_o_c_corr_) +
491  "\n b_w_r_m_corr_ " + MomentStatistics(b_w_r_m_corr_) +
492  "\n" +
493  "\n ### Activations (mostly after non-linearities)" +
494  "\n YI_FW(0..1)^ " + MomentStatistics(YI_FW) +
495  "\n YF_FW(0..1)^ " + MomentStatistics(YF_FW) +
496  "\n YO_FW(0..1)^ " + MomentStatistics(YO_FW) +
497  "\n YG_FW(-1..1) " + MomentStatistics(YG_FW) +
498  "\n YC_FW(-R..R)* " + MomentStatistics(YC_FW) +
499  "\n YH_FW(-1..1) " + MomentStatistics(YH_FW) +
500  "\n YM_FW(-1..1) " + MomentStatistics(YM_FW) +
501  "\n YR_FW(-R..R) " + MomentStatistics(YR_FW) +
502  "\n ---" +
503  "\n YI_BW(0..1)^ " + MomentStatistics(YI_BW) +
504  "\n YF_BW(0..1)^ " + MomentStatistics(YF_BW) +
505  "\n YO_BW(0..1)^ " + MomentStatistics(YO_BW) +
506  "\n YG_BW(-1..1) " + MomentStatistics(YG_BW) +
507  "\n YC_BW(-R..R)* " + MomentStatistics(YC_BW) +
508  "\n YH_BW(-1..1) " + MomentStatistics(YH_BW) +
509  "\n YM_BW(-1..1) " + MomentStatistics(YM_BW) +
510  "\n YR_BW(-R..R) " + MomentStatistics(YR_BW) +
511  "\n" +
512  "\n ### Derivatives (w.r.t. inputs of non-linearities)" +
513  "\n DI_FW^ " + MomentStatistics(DI_FW) +
514  "\n DF_FW^ " + MomentStatistics(DF_FW) +
515  "\n DO_FW^ " + MomentStatistics(DO_FW) +
516  "\n DG_FW " + MomentStatistics(DG_FW) +
517  "\n DC_FW* " + MomentStatistics(DC_FW) +
518  "\n DH_FW " + MomentStatistics(DH_FW) +
519  "\n DM_FW " + MomentStatistics(DM_FW) +
520  "\n DR_FW " + MomentStatistics(DR_FW) +
521  "\n ---" +
522  "\n DI_BW^ " + MomentStatistics(DI_BW) +
523  "\n DF_BW^ " + MomentStatistics(DF_BW) +
524  "\n DO_BW^ " + MomentStatistics(DO_BW) +
525  "\n DG_BW " + MomentStatistics(DG_BW) +
526  "\n DC_BW* " + MomentStatistics(DC_BW) +
527  "\n DH_BW " + MomentStatistics(DH_BW) +
528  "\n DM_BW " + MomentStatistics(DM_BW) +
529  "\n DR_BW " + MomentStatistics(DR_BW);
530  }
std::string ToString(const T &t)
Convert basic type to a string (please don't overuse),.
Definition: nnet-utils.h:52
CuMatrix< BaseFloat > b_propagate_buf_
CuVector< BaseFloat > b_peephole_o_c_corr_
CuMatrix< BaseFloat > b_w_gifo_x_corr_
BaseFloat diff_clip_
Clipping of 'derivatives' in backprop (per-frame),.
std::string MomentStatistics(const VectorBase< Real > &vec)
Get a string with statistics of the data in a vector, so we can print them easily.
Definition: nnet-utils.h:63
BaseFloat bias_learn_rate_coef_
Scalar applied to learning rate for bias (to be used in ::Update method),.
BaseFloat learn_rate_coef_
Scalar applied to learning rate for weight matrices (to be used in ::Update method),.
CuVector< BaseFloat > f_peephole_o_c_corr_
CuVector< BaseFloat > b_peephole_i_c_corr_
CuMatrix< BaseFloat > f_w_gifo_x_corr_
CuMatrix< BaseFloat > f_backpropagate_buf_
CuVector< BaseFloat > f_bias_corr_
int32 proj_dim_
recurrent projection layer dim,
CuSubMatrix< Real > ColRange(const MatrixIndexT col_offset, const MatrixIndexT num_cols) const
Definition: cu-matrix.h:544
BaseFloat grad_clip_
Clipping of the updates,.
CuVector< BaseFloat > f_peephole_i_c_corr_
CuVector< BaseFloat > b_peephole_f_c_corr_
CuMatrix< BaseFloat > b_w_gifo_r_corr_
CuMatrix< BaseFloat > f_w_r_m_corr_
CuVector< BaseFloat > f_peephole_f_c_corr_
CuMatrix< BaseFloat > f_propagate_buf_
int32 cell_dim_
the number of memory-cell blocks,
BaseFloat cell_clip_
Clipping of 'cell-values' in forward pass (per-frame),.
CuMatrix< BaseFloat > b_w_r_m_corr_
CuMatrix< BaseFloat > f_w_gifo_r_corr_
CuMatrix< BaseFloat > b_backpropagate_buf_
CuVector< BaseFloat > b_bias_corr_
void InitData ( std::istream &  is)
inlinevirtual

Initialize the content of the component by the 'line' from the prototype,.

Implements UpdatableComponent.

Definition at line 68 of file nnet-blstm-projected.h.

References BlstmProjected::b_bias_, BlstmProjected::b_peephole_f_c_, BlstmProjected::b_peephole_i_c_, BlstmProjected::b_peephole_o_c_, BlstmProjected::b_w_gifo_r_, BlstmProjected::b_w_gifo_x_, BlstmProjected::b_w_r_m_, UpdatableComponent::bias_learn_rate_coef_, BlstmProjected::cell_clip_, BlstmProjected::cell_diff_clip_, BlstmProjected::cell_dim_, BlstmProjected::diff_clip_, BlstmProjected::f_bias_, BlstmProjected::f_peephole_f_c_, BlstmProjected::f_peephole_i_c_, BlstmProjected::f_peephole_o_c_, BlstmProjected::f_w_gifo_r_, BlstmProjected::f_w_gifo_x_, BlstmProjected::f_w_r_m_, BlstmProjected::grad_clip_, Component::input_dim_, KALDI_ASSERT, KALDI_ERR, kaldi::kUndefined, UpdatableComponent::learn_rate_coef_, BlstmProjected::proj_dim_, kaldi::nnet1::RandUniform(), CuVectorBase< Real >::Range(), kaldi::ReadBasicType(), kaldi::ReadToken(), CuVector< Real >::Resize(), and CuMatrix< Real >::Resize().

68  {
69  // define options,
70  float param_range = 0.1;
71  // parse the line from prototype,
72  std::string token;
73  while (is >> std::ws, !is.eof()) {
74  ReadToken(is, false, &token);
75  if (token == "<ParamRange>") ReadBasicType(is, false, &param_range);
76  else if (token == "<CellDim>") ReadBasicType(is, false, &cell_dim_);
77  else if (token == "<LearnRateCoef>") ReadBasicType(is, false, &learn_rate_coef_);
78  else if (token == "<BiasLearnRateCoef>") ReadBasicType(is, false, &bias_learn_rate_coef_);
79  else if (token == "<CellClip>") ReadBasicType(is, false, &cell_clip_);
80  else if (token == "<DiffClip>") ReadBasicType(is, false, &diff_clip_);
81  else if (token == "<CellDiffClip>") ReadBasicType(is, false, &cell_diff_clip_);
82  else if (token == "<GradClip>") ReadBasicType(is, false, &grad_clip_);
83  else KALDI_ERR << "Unknown token " << token << ", a typo in config?"
84  << " (ParamRange|CellDim|LearnRateCoef|BiasLearnRateCoef|CellClip|DiffClip|GradClip)";
85  }
86 
87  // init the weights and biases (from uniform dist.),
88  // forward direction,
96  // (mean), (range)
97  RandUniform(0.0, 2.0 * param_range, &f_w_gifo_x_);
98  RandUniform(0.0, 2.0 * param_range, &f_w_gifo_r_);
99  RandUniform(0.0, 2.0 * param_range, &f_bias_);
100  RandUniform(0.0, 2.0 * param_range, &f_peephole_i_c_);
101  RandUniform(0.0, 2.0 * param_range, &f_peephole_f_c_);
102  RandUniform(0.0, 2.0 * param_range, &f_peephole_o_c_);
103  RandUniform(0.0, 2.0 * param_range, &f_w_r_m_);
104 
105  // Add 1.0 to forget-gate bias
106  // [Miao IS16: AN EMPIRICAL EXPLORATION...]
107  f_bias_.Range(2*cell_dim_, cell_dim_).Add(1.0);
108 
109  // backward direction,
117 
118  RandUniform(0.0, 2.0 * param_range, &b_w_gifo_x_);
119  RandUniform(0.0, 2.0 * param_range, &b_w_gifo_r_);
120  RandUniform(0.0, 2.0 * param_range, &b_bias_);
121  RandUniform(0.0, 2.0 * param_range, &b_peephole_i_c_);
122  RandUniform(0.0, 2.0 * param_range, &b_peephole_f_c_);
123  RandUniform(0.0, 2.0 * param_range, &b_peephole_o_c_);
124  RandUniform(0.0, 2.0 * param_range, &b_w_r_m_);
125 
126  // Add 1.0 to forget-gate bias,
127  // [Miao IS16: AN EMPIRICAL EXPLORATION...]
128  b_bias_.Range(2*cell_dim_, cell_dim_).Add(1.0);
129 
130  KALDI_ASSERT(cell_dim_ > 0);
133  }
CuVector< BaseFloat > b_peephole_i_c_
BaseFloat diff_clip_
Clipping of 'derivatives' in backprop (per-frame),.
int32 input_dim_
Data members,.
void ReadBasicType(std::istream &is, bool binary, T *t)
ReadBasicType is the name of the read function for bool, integer types, and floating-point types...
Definition: io-funcs-inl.h:55
BaseFloat bias_learn_rate_coef_
Scalar applied to learning rate for bias (to be used in ::Update method),.
BaseFloat learn_rate_coef_
Scalar applied to learning rate for weight matrices (to be used in ::Update method),.
void RandUniform(BaseFloat mu, BaseFloat range, CuMatrixBase< Real > *mat, struct RandomState *state=NULL)
Fill CuMatrix with random numbers (Uniform distribution): mu = the mean value, range = the 'width' of...
Definition: nnet-utils.h:188
void ReadToken(std::istream &is, bool binary, std::string *str)
ReadToken gets the next token and puts it in str (exception on failure).
Definition: io-funcs.cc:154
CuVector< BaseFloat > f_peephole_o_c_
BaseFloat cell_diff_clip_
Clipping of 'cell-derivatives' accumulated over CEC (per-frame),.
void Resize(MatrixIndexT dim, MatrixResizeType t=kSetZero)
Allocate the memory.
Definition: cu-vector.cc:892
int32 proj_dim_
recurrent projection layer dim,
CuVector< BaseFloat > b_peephole_f_c_
void Resize(MatrixIndexT rows, MatrixIndexT cols, MatrixResizeType resize_type=kSetZero, MatrixStrideType stride_type=kDefaultStride)
Allocate the memory.
Definition: cu-matrix.cc:47
BaseFloat grad_clip_
Clipping of the updates,.
#define KALDI_ERR
Definition: kaldi-error.h:127
CuVector< BaseFloat > f_peephole_f_c_
CuVector< BaseFloat > b_peephole_o_c_
CuSubVector< Real > Range(const MatrixIndexT o, const MatrixIndexT l)
Definition: cu-vector.h:132
CuVector< BaseFloat > f_peephole_i_c_
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
int32 cell_dim_
the number of memory-cell blocks,
BaseFloat cell_clip_
Clipping of 'cell-values' in forward pass (per-frame),.
int32 NumParams ( ) const
inlinevirtual

Number of trainable parameters,.

Implements UpdatableComponent.

Definition at line 233 of file nnet-blstm-projected.h.

References CuVectorBase< Real >::Dim(), BlstmProjected::f_bias_, BlstmProjected::f_peephole_f_c_, BlstmProjected::f_peephole_i_c_, BlstmProjected::f_peephole_o_c_, BlstmProjected::f_w_gifo_r_, BlstmProjected::f_w_gifo_x_, BlstmProjected::f_w_r_m_, CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::NumRows().

Referenced by BlstmProjected::GetGradient(), BlstmProjected::GetParams(), and BlstmProjected::SetParams().

233  {
234  return 2 * ( f_w_gifo_x_.NumRows() * f_w_gifo_x_.NumCols() +
236  f_bias_.Dim() +
237  f_peephole_i_c_.Dim() +
238  f_peephole_f_c_.Dim() +
239  f_peephole_o_c_.Dim() +
241  }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
CuVector< BaseFloat > f_peephole_o_c_
MatrixIndexT Dim() const
Dimensions.
Definition: cu-vector.h:67
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
CuVector< BaseFloat > f_peephole_f_c_
CuVector< BaseFloat > f_peephole_i_c_
void PropagateFnc ( const CuMatrixBase< BaseFloat > &  in,
CuMatrixBase< BaseFloat > *  out 
)
inlinevirtual

Abstract interface for propagation/backpropagation.

Forward pass transformation (to be implemented by descending class...)

Implements Component.

Definition at line 532 of file nnet-blstm-projected.h.

References CuMatrixBase< Real >::AddMatMat(), BlstmProjected::b_bias_, BlstmProjected::b_peephole_f_c_, BlstmProjected::b_peephole_i_c_, BlstmProjected::b_peephole_o_c_, BlstmProjected::b_propagate_buf_, BlstmProjected::b_w_gifo_r_, BlstmProjected::b_w_gifo_x_, BlstmProjected::b_w_r_m_, BlstmProjected::cell_clip_, BlstmProjected::cell_dim_, CuMatrixBase< Real >::ColRange(), CuMatrixBase< Real >::CopyFromMat(), BlstmProjected::f_bias_, BlstmProjected::f_peephole_f_c_, BlstmProjected::f_peephole_i_c_, BlstmProjected::f_peephole_o_c_, BlstmProjected::f_propagate_buf_, BlstmProjected::f_w_gifo_r_, BlstmProjected::f_w_gifo_x_, BlstmProjected::f_w_r_m_, KALDI_ASSERT, kaldi::kNoTrans, kaldi::kSetZero, kaldi::kTrans, CuMatrixBase< Real >::NumRows(), MultistreamComponent::NumStreams(), BlstmProjected::proj_dim_, CuMatrix< Real >::Resize(), CuMatrixBase< Real >::RowRange(), and MultistreamComponent::sequence_lengths_.

533  {
534 
535  KALDI_ASSERT(in.NumRows() % NumStreams() == 0);
536  int32 S = NumStreams();
537  int32 T = in.NumRows() / NumStreams();
538 
539  // buffers,
542 
543  // forward-direction activations,
544  CuSubMatrix<BaseFloat> F_YG(f_propagate_buf_.ColRange(0*cell_dim_, cell_dim_));
545  CuSubMatrix<BaseFloat> F_YI(f_propagate_buf_.ColRange(1*cell_dim_, cell_dim_));
546  CuSubMatrix<BaseFloat> F_YF(f_propagate_buf_.ColRange(2*cell_dim_, cell_dim_));
547  CuSubMatrix<BaseFloat> F_YO(f_propagate_buf_.ColRange(3*cell_dim_, cell_dim_));
548  CuSubMatrix<BaseFloat> F_YC(f_propagate_buf_.ColRange(4*cell_dim_, cell_dim_));
549  CuSubMatrix<BaseFloat> F_YH(f_propagate_buf_.ColRange(5*cell_dim_, cell_dim_));
550  CuSubMatrix<BaseFloat> F_YM(f_propagate_buf_.ColRange(6*cell_dim_, cell_dim_));
551  CuSubMatrix<BaseFloat> F_YR(f_propagate_buf_.ColRange(7*cell_dim_, proj_dim_));
552  CuSubMatrix<BaseFloat> F_YGIFO(f_propagate_buf_.ColRange(0, 4*cell_dim_));
553 
554  // backward-direction activations,
555  CuSubMatrix<BaseFloat> B_YG(b_propagate_buf_.ColRange(0*cell_dim_, cell_dim_));
556  CuSubMatrix<BaseFloat> B_YI(b_propagate_buf_.ColRange(1*cell_dim_, cell_dim_));
557  CuSubMatrix<BaseFloat> B_YF(b_propagate_buf_.ColRange(2*cell_dim_, cell_dim_));
558  CuSubMatrix<BaseFloat> B_YO(b_propagate_buf_.ColRange(3*cell_dim_, cell_dim_));
559  CuSubMatrix<BaseFloat> B_YC(b_propagate_buf_.ColRange(4*cell_dim_, cell_dim_));
560  CuSubMatrix<BaseFloat> B_YH(b_propagate_buf_.ColRange(5*cell_dim_, cell_dim_));
561  CuSubMatrix<BaseFloat> B_YM(b_propagate_buf_.ColRange(6*cell_dim_, cell_dim_));
562  CuSubMatrix<BaseFloat> B_YR(b_propagate_buf_.ColRange(7*cell_dim_, proj_dim_));
563  CuSubMatrix<BaseFloat> B_YGIFO(b_propagate_buf_.ColRange(0, 4*cell_dim_));
564 
565  // FORWARD DIRECTION,
566  // x -> g, i, f, o, not recurrent, do it all in once
567  F_YGIFO.RowRange(1*S, T*S).AddMatMat(1.0, in, kNoTrans, f_w_gifo_x_, kTrans, 0.0);
568 
569  // bias -> g, i, f, o
570  F_YGIFO.RowRange(1*S, T*S).AddVecToRows(1.0, f_bias_);
571 
572  // BufferPadding [T0]:dummy, [1, T]:current sequence, [T+1]:dummy
573  for (int t = 1; t <= T; t++) {
574  // multistream buffers for current time-step,
575  CuSubMatrix<BaseFloat> y_all(f_propagate_buf_.RowRange(t*S, S));
576  CuSubMatrix<BaseFloat> y_g(F_YG.RowRange(t*S, S));
577  CuSubMatrix<BaseFloat> y_i(F_YI.RowRange(t*S, S));
578  CuSubMatrix<BaseFloat> y_f(F_YF.RowRange(t*S, S));
579  CuSubMatrix<BaseFloat> y_o(F_YO.RowRange(t*S, S));
580  CuSubMatrix<BaseFloat> y_c(F_YC.RowRange(t*S, S));
581  CuSubMatrix<BaseFloat> y_h(F_YH.RowRange(t*S, S));
582  CuSubMatrix<BaseFloat> y_m(F_YM.RowRange(t*S, S));
583  CuSubMatrix<BaseFloat> y_r(F_YR.RowRange(t*S, S));
584  CuSubMatrix<BaseFloat> y_gifo(F_YGIFO.RowRange(t*S, S));
585 
586  // r(t-1) -> g, i, f, o
587  y_gifo.AddMatMat(1.0, F_YR.RowRange((t-1)*S, S), kNoTrans, f_w_gifo_r_, kTrans, 1.0);
588 
589  // c(t-1) -> i(t) via peephole
590  y_i.AddMatDiagVec(1.0, F_YC.RowRange((t-1)*S, S), kNoTrans, f_peephole_i_c_, 1.0);
591 
592  // c(t-1) -> f(t) via peephole
593  y_f.AddMatDiagVec(1.0, F_YC.RowRange((t-1)*S, S), kNoTrans, f_peephole_f_c_, 1.0);
594 
595  // i, f sigmoid squashing
596  y_i.Sigmoid(y_i);
597  y_f.Sigmoid(y_f);
598 
599  // g tanh squashing
600  y_g.Tanh(y_g);
601 
602  // g * i -> c
603  y_c.AddMatMatElements(1.0, y_g, y_i, 0.0);
604  // c(t-1) * f -> c(t) via forget-gate
605  y_c.AddMatMatElements(1.0, F_YC.RowRange((t-1)*S, S), y_f, 1.0);
606 
607  if (cell_clip_ > 0.0) {
608  y_c.ApplyFloor(-cell_clip_); // Optional clipping of cell activation,
609  y_c.ApplyCeiling(cell_clip_); // Google paper Interspeech2014: LSTM for LVCSR
610  }
611 
612  // c(t) -> o(t) via peephole (not recurrent, using c(t))
613  y_o.AddMatDiagVec(1.0, y_c, kNoTrans, f_peephole_o_c_, 1.0);
614 
615  // o sigmoid squashing,
616  y_o.Sigmoid(y_o);
617 
618  // c -> h, tanh squashing,
619  y_h.Tanh(y_c);
620 
621  // h * o -> m via output gate,
622  y_m.AddMatMatElements(1.0, y_h, y_o, 0.0);
623 
624  // m -> r
625  y_r.AddMatMat(1.0, y_m, kNoTrans, f_w_r_m_, kTrans, 0.0);
626 
627  // set zeros to padded frames,
628  if (sequence_lengths_.size() > 0) {
629  for (int s = 0; s < S; s++) {
630  if (t > sequence_lengths_[s]) {
631  y_all.Row(s).SetZero();
632  }
633  }
634  }
635  }
636 
637  // BACKWARD DIRECTION,
638  // x -> g, i, f, o, not recurrent, do it all in once
639  B_YGIFO.RowRange(1*S, T*S).AddMatMat(1.0, in, kNoTrans, b_w_gifo_x_, kTrans, 0.0);
640 
641  // bias -> g, i, f, o
642  B_YGIFO.RowRange(1*S, T*S).AddVecToRows(1.0, b_bias_);
643 
644  // BufferPadding [T0]:dummy, [1, T]:current sequence, [T+1]:dummy
645  for (int t = T; t >= 1; t--) {
646  // multistream buffers for current time-step,
647  CuSubMatrix<BaseFloat> y_all(b_propagate_buf_.RowRange(t*S, S));
648  CuSubMatrix<BaseFloat> y_g(B_YG.RowRange(t*S, S));
649  CuSubMatrix<BaseFloat> y_i(B_YI.RowRange(t*S, S));
650  CuSubMatrix<BaseFloat> y_f(B_YF.RowRange(t*S, S));
651  CuSubMatrix<BaseFloat> y_o(B_YO.RowRange(t*S, S));
652  CuSubMatrix<BaseFloat> y_c(B_YC.RowRange(t*S, S));
653  CuSubMatrix<BaseFloat> y_h(B_YH.RowRange(t*S, S));
654  CuSubMatrix<BaseFloat> y_m(B_YM.RowRange(t*S, S));
655  CuSubMatrix<BaseFloat> y_r(B_YR.RowRange(t*S, S));
656  CuSubMatrix<BaseFloat> y_gifo(B_YGIFO.RowRange(t*S, S));
657 
658  // r(t+1) -> g, i, f, o
659  y_gifo.AddMatMat(1.0, B_YR.RowRange((t+1)*S, S), kNoTrans, b_w_gifo_r_, kTrans, 1.0);
660 
661  // c(t+1) -> i(t) via peephole
662  y_i.AddMatDiagVec(1.0, B_YC.RowRange((t+1)*S, S), kNoTrans, b_peephole_i_c_, 1.0);
663 
664  // c(t+1) -> f(t) via peephole
665  y_f.AddMatDiagVec(1.0, B_YC.RowRange((t+1)*S, S), kNoTrans, b_peephole_f_c_, 1.0);
666 
667  // i, f sigmoid squashing
668  y_i.Sigmoid(y_i);
669  y_f.Sigmoid(y_f);
670 
671  // g tanh squashing
672  y_g.Tanh(y_g);
673 
674  // g * i -> c
675  y_c.AddMatMatElements(1.0, y_g, y_i, 0.0);
676  // c(t+1) * f -> c(t) via forget-gate
677  y_c.AddMatMatElements(1.0, B_YC.RowRange((t+1)*S, S), y_f, 1.0);
678 
679  if (cell_clip_ > 0.0) {
680  y_c.ApplyFloor(-cell_clip_); // optional clipping of cell activation,
681  y_c.ApplyCeiling(cell_clip_); // google paper Interspeech2014: LSTM for LVCSR
682  }
683 
684  // c(t) -> o(t) via peephole (not recurrent, using c(t))
685  y_o.AddMatDiagVec(1.0, y_c, kNoTrans, b_peephole_o_c_, 1.0);
686 
687  // o sigmoid squashing,
688  y_o.Sigmoid(y_o);
689 
690  // h tanh squashing,
691  y_h.Tanh(y_c);
692 
693  // h * o -> m via output gate,
694  y_m.AddMatMatElements(1.0, y_h, y_o, 0.0);
695 
696  // m -> r
697  y_r.AddMatMat(1.0, y_m, kNoTrans, b_w_r_m_, kTrans, 0.0);
698 
699  // set zeros to padded frames,
700  if (sequence_lengths_.size() > 0) {
701  for (int s = 0; s < S; s++) {
702  if (t > sequence_lengths_[s]) {
703  y_all.Row(s).SetZero();
704  }
705  }
706  }
707  }
708 
709  CuMatrix<BaseFloat> YR_FB;
710  YR_FB.Resize((T+2)*S, 2 * proj_dim_, kSetZero);
711  // forward part
712  YR_FB.ColRange(0, proj_dim_).CopyFromMat(f_propagate_buf_.ColRange(7*cell_dim_, proj_dim_));
713  // backward part
714  YR_FB.ColRange(proj_dim_, proj_dim_).CopyFromMat(b_propagate_buf_.ColRange(7*cell_dim_, proj_dim_));
715  // recurrent projection layer is also feed-forward as BLSTM output
716  out->CopyFromMat(YR_FB.RowRange(1*S, T*S));
717  }
CuVector< BaseFloat > b_peephole_i_c_
CuMatrix< BaseFloat > b_propagate_buf_
CuVector< BaseFloat > f_peephole_o_c_
void CopyFromMat(const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:337
CuSubMatrix< Real > RowRange(const MatrixIndexT row_offset, const MatrixIndexT num_rows) const
Definition: cu-matrix.h:539
int32 proj_dim_
recurrent projection layer dim,
CuSubMatrix< Real > ColRange(const MatrixIndexT col_offset, const MatrixIndexT num_cols) const
Definition: cu-matrix.h:544
CuVector< BaseFloat > b_peephole_f_c_
void Resize(MatrixIndexT rows, MatrixIndexT cols, MatrixResizeType resize_type=kSetZero, MatrixStrideType stride_type=kDefaultStride)
Allocate the memory.
Definition: cu-matrix.cc:47
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
CuVector< BaseFloat > f_peephole_f_c_
CuVector< BaseFloat > b_peephole_o_c_
CuVector< BaseFloat > f_peephole_i_c_
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
CuMatrix< BaseFloat > f_propagate_buf_
int32 cell_dim_
the number of memory-cell blocks,
std::vector< int32 > sequence_lengths_
BaseFloat cell_clip_
Clipping of 'cell-values' in forward pass (per-frame),.
void ReadData ( std::istream &  is,
bool  binary 
)
inlinevirtual

Reads the component content.

Reimplemented from Component.

Definition at line 135 of file nnet-blstm-projected.h.

References BlstmProjected::b_bias_, BlstmProjected::b_peephole_f_c_, BlstmProjected::b_peephole_i_c_, BlstmProjected::b_peephole_o_c_, BlstmProjected::b_w_gifo_r_, BlstmProjected::b_w_gifo_x_, BlstmProjected::b_w_r_m_, UpdatableComponent::bias_learn_rate_coef_, BlstmProjected::cell_clip_, BlstmProjected::cell_diff_clip_, BlstmProjected::cell_dim_, BlstmProjected::diff_clip_, kaldi::ExpectToken(), BlstmProjected::f_bias_, BlstmProjected::f_peephole_f_c_, BlstmProjected::f_peephole_i_c_, BlstmProjected::f_peephole_o_c_, BlstmProjected::f_w_gifo_r_, BlstmProjected::f_w_gifo_x_, BlstmProjected::f_w_r_m_, BlstmProjected::grad_clip_, KALDI_ASSERT, KALDI_ERR, UpdatableComponent::learn_rate_coef_, kaldi::Peek(), kaldi::PeekToken(), CuVector< Real >::Read(), CuMatrix< Real >::Read(), kaldi::ReadBasicType(), and kaldi::ReadToken().

135  {
136  // Read all the '<Tokens>' in arbitrary order,
137  while ('<' == Peek(is, binary)) {
138  std::string token;
139  int first_char = PeekToken(is, binary);
140  switch (first_char) {
141  case 'C': ReadToken(is, false, &token);
142  if (token == "<CellDim>") ReadBasicType(is, binary, &cell_dim_);
143  else if (token == "<CellClip>") ReadBasicType(is, binary, &cell_clip_);
144  else if (token == "<CellDiffClip>") ReadBasicType(is, binary, &cell_diff_clip_);
145  else if (token == "<ClipGradient>") ReadBasicType(is, binary, &grad_clip_); // bwd-compat.
146  else KALDI_ERR << "Unknown token: " << token;
147  break;
148  case 'L': ExpectToken(is, binary, "<LearnRateCoef>");
149  ReadBasicType(is, binary, &learn_rate_coef_);
150  break;
151  case 'B': ExpectToken(is, binary, "<BiasLearnRateCoef>");
152  ReadBasicType(is, binary, &bias_learn_rate_coef_);
153  break;
154  case 'D': ExpectToken(is, binary, "<DiffClip>");
155  ReadBasicType(is, binary, &diff_clip_);
156  break;
157  case 'G': ExpectToken(is, binary, "<GradClip>");
158  ReadBasicType(is, binary, &grad_clip_);
159  break;
160  default: ReadToken(is, false, &token);
161  KALDI_ERR << "Unknown token: " << token;
162  }
163  }
164  KALDI_ASSERT(cell_dim_ != 0);
165  // Read the data (data follow the tokens),
166 
167  // reading parameters corresponding to forward direction
168  f_w_gifo_x_.Read(is, binary);
169  f_w_gifo_r_.Read(is, binary);
170  f_bias_.Read(is, binary);
171 
172  f_peephole_i_c_.Read(is, binary);
173  f_peephole_f_c_.Read(is, binary);
174  f_peephole_o_c_.Read(is, binary);
175 
176  f_w_r_m_.Read(is, binary);
177 
178  // reading parameters corresponding to backward direction
179  b_w_gifo_x_.Read(is, binary);
180  b_w_gifo_r_.Read(is, binary);
181  b_bias_.Read(is, binary);
182 
183  b_peephole_i_c_.Read(is, binary);
184  b_peephole_f_c_.Read(is, binary);
185  b_peephole_o_c_.Read(is, binary);
186 
187  b_w_r_m_.Read(is, binary);
188  }
CuVector< BaseFloat > b_peephole_i_c_
BaseFloat diff_clip_
Clipping of 'derivatives' in backprop (per-frame),.
void ReadBasicType(std::istream &is, bool binary, T *t)
ReadBasicType is the name of the read function for bool, integer types, and floating-point types...
Definition: io-funcs-inl.h:55
BaseFloat bias_learn_rate_coef_
Scalar applied to learning rate for bias (to be used in ::Update method),.
BaseFloat learn_rate_coef_
Scalar applied to learning rate for weight matrices (to be used in ::Update method),.
void Read(std::istream &is, bool binary)
I/O.
Definition: cu-vector.cc:862
void ReadToken(std::istream &is, bool binary, std::string *str)
ReadToken gets the next token and puts it in str (exception on failure).
Definition: io-funcs.cc:154
CuVector< BaseFloat > f_peephole_o_c_
BaseFloat cell_diff_clip_
Clipping of 'cell-derivatives' accumulated over CEC (per-frame),.
int Peek(std::istream &is, bool binary)
Peek consumes whitespace (if binary == false) and then returns the peek() value of the stream...
Definition: io-funcs.cc:145
CuVector< BaseFloat > b_peephole_f_c_
BaseFloat grad_clip_
Clipping of the updates,.
void ExpectToken(std::istream &is, bool binary, const char *token)
ExpectToken tries to read in the given token, and throws an exception on failure. ...
Definition: io-funcs.cc:188
#define KALDI_ERR
Definition: kaldi-error.h:127
CuVector< BaseFloat > f_peephole_f_c_
CuVector< BaseFloat > b_peephole_o_c_
int PeekToken(std::istream &is, bool binary)
PeekToken will return the first character of the next token, or -1 if end of file.
Definition: io-funcs.cc:170
void Read(std::istream &is, bool binary)
I/O functions.
Definition: cu-matrix.cc:459
CuVector< BaseFloat > f_peephole_i_c_
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
int32 cell_dim_
the number of memory-cell blocks,
BaseFloat cell_clip_
Clipping of 'cell-values' in forward pass (per-frame),.
void SetParams ( const VectorBase< BaseFloat > &  params)
inlinevirtual

Set the trainable parameters from, reshaped as a vector,.

Implements UpdatableComponent.

Definition at line 349 of file nnet-blstm-projected.h.

References BlstmProjected::b_bias_, BlstmProjected::b_peephole_f_c_, BlstmProjected::b_peephole_i_c_, BlstmProjected::b_peephole_o_c_, BlstmProjected::b_w_gifo_r_, BlstmProjected::b_w_gifo_x_, BlstmProjected::b_w_r_m_, CuVectorBase< Real >::CopyFromVec(), CuMatrixBase< Real >::CopyRowsFromVec(), VectorBase< Real >::Dim(), CuVectorBase< Real >::Dim(), BlstmProjected::f_bias_, BlstmProjected::f_peephole_f_c_, BlstmProjected::f_peephole_i_c_, BlstmProjected::f_peephole_o_c_, BlstmProjected::f_w_gifo_r_, BlstmProjected::f_w_gifo_x_, BlstmProjected::f_w_r_m_, KALDI_ASSERT, CuMatrixBase< Real >::NumCols(), BlstmProjected::NumParams(), CuMatrixBase< Real >::NumRows(), and VectorBase< Real >::Range().

349  {
350  KALDI_ASSERT(params.Dim() == NumParams());
351  int32 offset, len;
352 
353  // Copying parameters corresponding to forward direction
354  offset = 0; len = f_w_gifo_x_.NumRows() * f_w_gifo_x_.NumCols();
355  f_w_gifo_x_.CopyRowsFromVec(params.Range(offset, len));
356 
357  offset += len; len = f_w_gifo_r_.NumRows() * f_w_gifo_r_.NumCols();
358  f_w_gifo_r_.CopyRowsFromVec(params.Range(offset, len));
359 
360  offset += len; len = f_bias_.Dim();
361  f_bias_.CopyFromVec(params.Range(offset, len));
362 
363  offset += len; len = f_peephole_i_c_.Dim();
364  f_peephole_i_c_.CopyFromVec(params.Range(offset, len));
365 
366  offset += len; len = f_peephole_f_c_.Dim();
367  f_peephole_f_c_.CopyFromVec(params.Range(offset, len));
368 
369  offset += len; len = f_peephole_o_c_.Dim();
370  f_peephole_o_c_.CopyFromVec(params.Range(offset, len));
371 
372  offset += len; len = f_w_r_m_.NumRows() * f_w_r_m_.NumCols();
373  f_w_r_m_.CopyRowsFromVec(params.Range(offset, len));
374 
375  // Copying parameters corresponding to backward direction
376  offset += len; len = b_w_gifo_x_.NumRows() * b_w_gifo_x_.NumCols();
377  b_w_gifo_x_.CopyRowsFromVec(params.Range(offset, len));
378 
379  offset += len; len = b_w_gifo_r_.NumRows() * b_w_gifo_r_.NumCols();
380  b_w_gifo_r_.CopyRowsFromVec(params.Range(offset, len));
381 
382  offset += len; len = b_bias_.Dim();
383  b_bias_.CopyFromVec(params.Range(offset, len));
384 
385  offset += len; len = b_peephole_i_c_.Dim();
386  b_peephole_i_c_.CopyFromVec(params.Range(offset, len));
387 
388  offset += len; len = b_peephole_f_c_.Dim();
389  b_peephole_f_c_.CopyFromVec(params.Range(offset, len));
390 
391  offset += len; len = b_peephole_o_c_.Dim();
392  b_peephole_o_c_.CopyFromVec(params.Range(offset, len));
393 
394  offset += len; len = b_w_r_m_.NumRows() * b_w_r_m_.NumCols();
395  b_w_r_m_.CopyRowsFromVec(params.Range(offset, len));
396 
397  // check the dim,
398  offset += len;
399  KALDI_ASSERT(offset == NumParams());
400  }
CuVector< BaseFloat > b_peephole_i_c_
void CopyRowsFromVec(const CuVectorBase< Real > &v)
This function has two modes of operation.
Definition: cu-matrix.cc:2146
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
CuVector< BaseFloat > f_peephole_o_c_
CuVector< BaseFloat > b_peephole_f_c_
MatrixIndexT Dim() const
Dimensions.
Definition: cu-vector.h:67
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
void CopyFromVec(const CuVectorBase< Real > &src)
Copy functions; these will crash if the dimension do not match.
Definition: cu-vector.cc:970
CuVector< BaseFloat > f_peephole_f_c_
CuVector< BaseFloat > b_peephole_o_c_
CuVector< BaseFloat > f_peephole_i_c_
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
int32 NumParams() const
Number of trainable parameters,.
MatrixIndexT Dim() const
Returns the dimension of the vector.
Definition: kaldi-vector.h:62
SubVector< Real > Range(const MatrixIndexT o, const MatrixIndexT l)
Returns a sub-vector of a vector (a range of elements).
Definition: kaldi-vector.h:92
void Update ( const CuMatrixBase< BaseFloat > &  input,
const CuMatrixBase< BaseFloat > &  diff 
)
inlinevirtual

Compute gradient and update parameters,.

Implements UpdatableComponent.

Definition at line 1067 of file nnet-blstm-projected.h.

References CuMatrixBase< Real >::AddMat(), CuVectorBase< Real >::AddVec(), CuVectorBase< Real >::ApplyCeiling(), CuMatrixBase< Real >::ApplyCeiling(), CuVectorBase< Real >::ApplyFloor(), CuMatrixBase< Real >::ApplyFloor(), BlstmProjected::b_bias_, BlstmProjected::b_bias_corr_, BlstmProjected::b_peephole_f_c_, BlstmProjected::b_peephole_f_c_corr_, BlstmProjected::b_peephole_i_c_, BlstmProjected::b_peephole_i_c_corr_, BlstmProjected::b_peephole_o_c_, BlstmProjected::b_peephole_o_c_corr_, BlstmProjected::b_w_gifo_r_, BlstmProjected::b_w_gifo_r_corr_, BlstmProjected::b_w_gifo_x_, BlstmProjected::b_w_gifo_x_corr_, BlstmProjected::b_w_r_m_, BlstmProjected::b_w_r_m_corr_, UpdatableComponent::bias_learn_rate_coef_, BlstmProjected::f_bias_, BlstmProjected::f_bias_corr_, BlstmProjected::f_peephole_f_c_, BlstmProjected::f_peephole_f_c_corr_, BlstmProjected::f_peephole_i_c_, BlstmProjected::f_peephole_i_c_corr_, BlstmProjected::f_peephole_o_c_, BlstmProjected::f_peephole_o_c_corr_, BlstmProjected::f_w_gifo_r_, BlstmProjected::f_w_gifo_r_corr_, BlstmProjected::f_w_gifo_x_, BlstmProjected::f_w_gifo_x_corr_, BlstmProjected::f_w_r_m_, BlstmProjected::f_w_r_m_corr_, BlstmProjected::grad_clip_, NnetTrainOptions::learn_rate, UpdatableComponent::learn_rate_coef_, and UpdatableComponent::opts_.

1068  {
1069 
1070  // apply the gradient clipping,
1071  if (grad_clip_ > 0.0) {
1086 
1101  }
1102 
1103  const BaseFloat lr = opts_.learn_rate;
1104 
1105  // forward direction update
1107  f_w_gifo_r_.AddMat(-lr * learn_rate_coef_, f_w_gifo_r_corr_);
1109 
1110  f_peephole_i_c_.AddVec(-lr * bias_learn_rate_coef_, f_peephole_i_c_corr_, 1.0);
1111  f_peephole_f_c_.AddVec(-lr * bias_learn_rate_coef_, f_peephole_f_c_corr_, 1.0);
1112  f_peephole_o_c_.AddVec(-lr * bias_learn_rate_coef_, f_peephole_o_c_corr_, 1.0);
1113 
1114  f_w_r_m_.AddMat(-lr * learn_rate_coef_, f_w_r_m_corr_);
1115 
1116  // backward direction update
1117  b_w_gifo_x_.AddMat(-lr * learn_rate_coef_, b_w_gifo_x_corr_);
1118  b_w_gifo_r_.AddMat(-lr * learn_rate_coef_, b_w_gifo_r_corr_);
1119  b_bias_.AddVec(-lr * bias_learn_rate_coef_, b_bias_corr_, 1.0);
1120 
1121  b_peephole_i_c_.AddVec(-lr * bias_learn_rate_coef_, b_peephole_i_c_corr_, 1.0);
1122  b_peephole_f_c_.AddVec(-lr * bias_learn_rate_coef_, b_peephole_f_c_corr_, 1.0);
1123  b_peephole_o_c_.AddVec(-lr * bias_learn_rate_coef_, b_peephole_o_c_corr_, 1.0);
1124 
1125  b_w_r_m_.AddMat(-lr * learn_rate_coef_, b_w_r_m_corr_);
1126  }
CuVector< BaseFloat > b_peephole_i_c_
CuVector< BaseFloat > b_peephole_o_c_corr_
CuMatrix< BaseFloat > b_w_gifo_x_corr_
NnetTrainOptions opts_
Option-class with training hyper-parameters,.
BaseFloat bias_learn_rate_coef_
Scalar applied to learning rate for bias (to be used in ::Update method),.
BaseFloat learn_rate_coef_
Scalar applied to learning rate for weight matrices (to be used in ::Update method),.
CuVector< BaseFloat > f_peephole_o_c_corr_
CuVector< BaseFloat > b_peephole_i_c_corr_
CuMatrix< BaseFloat > f_w_gifo_x_corr_
void ApplyCeiling(Real ceiling_val)
Definition: cu-matrix.cc:2385
CuVector< BaseFloat > f_peephole_o_c_
CuVector< BaseFloat > f_bias_corr_
void ApplyFloor(Real floor_val)
Definition: cu-matrix.cc:2367
float BaseFloat
Definition: kaldi-types.h:29
CuVector< BaseFloat > b_peephole_f_c_
BaseFloat grad_clip_
Clipping of the updates,.
CuVector< BaseFloat > f_peephole_i_c_corr_
CuVector< BaseFloat > f_peephole_f_c_
CuVector< BaseFloat > b_peephole_o_c_
CuVector< BaseFloat > b_peephole_f_c_corr_
CuVector< BaseFloat > f_peephole_i_c_
CuMatrix< BaseFloat > b_w_gifo_r_corr_
void AddMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType trans=kNoTrans)
*this += alpha * A
Definition: cu-matrix.cc:939
CuMatrix< BaseFloat > f_w_r_m_corr_
CuVector< BaseFloat > f_peephole_f_c_corr_
MatrixIndexT ApplyFloor(Real floor_val)
Definition: cu-vector.cc:324
CuMatrix< BaseFloat > b_w_r_m_corr_
CuMatrix< BaseFloat > f_w_gifo_r_corr_
MatrixIndexT ApplyCeiling(Real ceiling_val)
Definition: cu-vector.cc:349
void AddVec(Real alpha, const CuVectorBase< Real > &vec, Real beta=1.0)
Definition: cu-vector.cc:1126
CuVector< BaseFloat > b_bias_corr_
void WriteData ( std::ostream &  os,
bool  binary 
) const
inlinevirtual

Writes the component content.

Reimplemented from Component.

Definition at line 190 of file nnet-blstm-projected.h.

References BlstmProjected::b_bias_, BlstmProjected::b_peephole_f_c_, BlstmProjected::b_peephole_i_c_, BlstmProjected::b_peephole_o_c_, BlstmProjected::b_w_gifo_r_, BlstmProjected::b_w_gifo_x_, BlstmProjected::b_w_r_m_, UpdatableComponent::bias_learn_rate_coef_, BlstmProjected::cell_clip_, BlstmProjected::cell_diff_clip_, BlstmProjected::cell_dim_, BlstmProjected::diff_clip_, BlstmProjected::f_bias_, BlstmProjected::f_peephole_f_c_, BlstmProjected::f_peephole_i_c_, BlstmProjected::f_peephole_o_c_, BlstmProjected::f_w_gifo_r_, BlstmProjected::f_w_gifo_x_, BlstmProjected::f_w_r_m_, BlstmProjected::grad_clip_, UpdatableComponent::learn_rate_coef_, CuVector< Real >::Write(), CuMatrixBase< Real >::Write(), kaldi::WriteBasicType(), and kaldi::WriteToken().

190  {
191  WriteToken(os, binary, "<CellDim>");
192  WriteBasicType(os, binary, cell_dim_);
193 
194  WriteToken(os, binary, "<LearnRateCoef>");
195  WriteBasicType(os, binary, learn_rate_coef_);
196  WriteToken(os, binary, "<BiasLearnRateCoef>");
198 
199  WriteToken(os, binary, "<CellClip>");
200  WriteBasicType(os, binary, cell_clip_);
201  WriteToken(os, binary, "<DiffClip>");
202  WriteBasicType(os, binary, diff_clip_);
203  WriteToken(os, binary, "<CellDiffClip>");
204  WriteBasicType(os, binary, cell_diff_clip_);
205  WriteToken(os, binary, "<GradClip>");
206  WriteBasicType(os, binary, grad_clip_);
207 
208  if (!binary) os << "\n";
209  // writing parameters, forward direction,
210  f_w_gifo_x_.Write(os, binary);
211  f_w_gifo_r_.Write(os, binary);
212  f_bias_.Write(os, binary);
213 
214  f_peephole_i_c_.Write(os, binary);
215  f_peephole_f_c_.Write(os, binary);
216  f_peephole_o_c_.Write(os, binary);
217 
218  f_w_r_m_.Write(os, binary);
219 
220  if (!binary) os << "\n";
221  // writing parameters, backward direction,
222  b_w_gifo_x_.Write(os, binary);
223  b_w_gifo_r_.Write(os, binary);
224  b_bias_.Write(os, binary);
225 
226  b_peephole_i_c_.Write(os, binary);
227  b_peephole_f_c_.Write(os, binary);
228  b_peephole_o_c_.Write(os, binary);
229 
230  b_w_r_m_.Write(os, binary);
231  }
CuVector< BaseFloat > b_peephole_i_c_
BaseFloat diff_clip_
Clipping of 'derivatives' in backprop (per-frame),.
BaseFloat bias_learn_rate_coef_
Scalar applied to learning rate for bias (to be used in ::Update method),.
BaseFloat learn_rate_coef_
Scalar applied to learning rate for weight matrices (to be used in ::Update method),.
CuVector< BaseFloat > f_peephole_o_c_
BaseFloat cell_diff_clip_
Clipping of 'cell-derivatives' accumulated over CEC (per-frame),.
void Write(std::ostream &is, bool binary) const
Definition: cu-vector.cc:872
CuVector< BaseFloat > b_peephole_f_c_
BaseFloat grad_clip_
Clipping of the updates,.
CuVector< BaseFloat > f_peephole_f_c_
CuVector< BaseFloat > b_peephole_o_c_
void WriteToken(std::ostream &os, bool binary, const char *token)
The WriteToken functions are for writing nonempty sequences of non-space characters.
Definition: io-funcs.cc:134
CuVector< BaseFloat > f_peephole_i_c_
void WriteBasicType(std::ostream &os, bool binary, T t)
WriteBasicType is the name of the write function for bool, integer types, and floating-point types...
Definition: io-funcs-inl.h:34
int32 cell_dim_
the number of memory-cell blocks,
BaseFloat cell_clip_
Clipping of 'cell-values' in forward pass (per-frame),.
void Write(std::ostream &os, bool binary) const
Definition: cu-matrix.cc:467

Member Data Documentation

CuMatrix<BaseFloat> b_backpropagate_buf_
private
BaseFloat cell_clip_
private
BaseFloat cell_diff_clip_
private

Clipping of 'cell-derivatives' accumulated over CEC (per-frame),.

Definition at line 1135 of file nnet-blstm-projected.h.

Referenced by BlstmProjected::BackpropagateFnc(), BlstmProjected::InitData(), BlstmProjected::ReadData(), and BlstmProjected::WriteData().

CuMatrix<BaseFloat> f_backpropagate_buf_
private
int32 proj_dim_
private

The documentation for this class was generated from the following file: