All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
CuMatrixBase< Real > Singleton Reference

Matrix for CUDA computing. More...

#include <matrix-common.h>

Inheritance diagram for CuMatrixBase< Real >:
Collaboration diagram for CuMatrixBase< Real >:

Public Member Functions

void CopyCols (const CuMatrixBase< Real > &src, const CuArray< MatrixIndexT > &indexes)
 Copies column r from column indexes[r] of src. More...
 
void AddCols (const CuMatrixBase< Real > &src, const CuArray< MatrixIndexT > &indices)
 Add column indices[r] of src to column r. More...
 
void CopyRows (const CuMatrixBase< Real > &src, const CuArray< MatrixIndexT > &indexes)
 Copies row r from row indexes[r] of src. More...
 
void CopyRows (const CuArray< const Real * > &src)
 Copies row r of this matrix from an array of floats at the location given by src[r], where src[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (the point is: the data it points to should be on the GPU if we're using a GPU, and on a CPU otherwise). More...
 
void CopyToRows (const CuArray< Real * > &dst) const
 For each row r of this matrix, copies it to the array of floats at the location given by dst[r], where dst[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (i.e. More...
 
void AddRows (Real alpha, const CuMatrixBase< Real > &src, const CuArray< MatrixIndexT > &indexes)
 Does for each row r, this.Row(r) += alpha * src.row(indexes[r]). More...
 
void AddRows (Real alpha, const CuArray< const Real * > &src)
 Does for each row r, this.Row(r) += alpha * src[r], treating src[r] as the beginning of a region of memory representing a vector of floats, of the same length as this.NumCols(). More...
 
void AddToRows (Real alpha, const CuArray< Real * > &dst) const
 For each row r of this matrix, adds it (times alpha) to the array of floats at the location given by dst[r], where dst[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (i.e. More...
 
void SumColumnRanges (const CuMatrixBase< Real > &src, const CuArray< Int32Pair > &indexes)
 For each row r of this and for each column c, sets (*this)(r, c) to the sum src(r, j), where j ranges from indexes[c].first through indexes[c].second - 1. More...
 
void AddRowRanges (const CuMatrixBase< Real > &src, const CuArray< Int32Pair > &indexes)
 For each row r of this and for each column c, do (*this)(r, c) += src(j, c), where j ranges from indexes[r].first through indexes[r].second - 1. More...
 
void AddToDiag (Real value)
 Adds "value" to the diagonal elements of the matrix. More...
 
MatrixIndexT NumRows () const
 Dimensions. More...
 
MatrixIndexT NumCols () const
 
MatrixIndexT Stride () const
 
::MatrixDim Dim () const
 
Real FrobeniusNorm () const
 
bool IsUnit (Real tol=0.001) const
 
bool ApproxEqual (const CuMatrixBase< Real > &other, float tol=0.01) const
 True if ((*this)-other).FrobeniusNorm() <= tol * this->FrobeniusNorm() More...
 
MatrixIndexT SizeInBytes () const
 Get size of matrix in bytes. More...
 
template<typename OtherReal >
void CopyFromMat (const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
 
void CopyFromGeneralMat (const GeneralMatrix &src, MatrixTransposeType trans=kNoTrans)
 
void CopyFromMat (const MatrixBase< Real > &src, MatrixTransposeType trans=kNoTrans)
 
void CopyFromSp (const CuSpMatrix< Real > &M)
 
template<typename OtherReal >
void CopyFromTp (const CuTpMatrix< OtherReal > &M, MatrixTransposeType trans=kNoTrans)
 
template<typename OtherReal >
void CopyFromMat (const CuMatrixBase< OtherReal > &M, MatrixTransposeType trans=kNoTrans)
 
template<typename OtherReal >
void CopyToMat (MatrixBase< OtherReal > *dst, MatrixTransposeType trans=kNoTrans) const
 
void CopyRowsFromVec (const CuVectorBase< Real > &v)
 This function has two modes of operation. More...
 
void CopyRowsFromVec (const VectorBase< Real > &v)
 Version of CopyRowsFromVec() that takes a CPU-based vector. More...
 
void CopyColsFromVec (const CuVectorBase< Real > &v)
 Copies vector into matrix, column-by-column. More...
 
void CopyColFromVec (const CuVectorBase< Real > &v, const MatrixIndexT col)
 Copy vector into specific column of matrix. More...
 
void Sigmoid (const CuMatrixBase< Real > &src)
 Set each element to the sigmoid of the corresponding element of "src": element by element, x = 1 / (1 + exp(-x)) More...
 
void Heaviside (const CuMatrixBase< Real > &src)
 Set each element to the Heaviside function of the corresponding element of "src", which we define as the function (x > 0 ? 1.0 : 0.0) [note: in general, there are different ways to deal with the situation when x==0. More...
 
void SoftHinge (const CuMatrixBase< Real > &src)
 Apply the function y = log(1 + exp(x)), to each element. More...
 
void GroupPnorm (const CuMatrixBase< Real > &src, Real pow)
 Apply the function y(i) = (sum_{j = i*G}^{(i+1)*G-1} x_j ^ (power)) ^ (1 / p) where G = x.NumCols() / y.NumCols() must be an integer. More...
 
void DiffGroupPnorm (const CuMatrixBase< Real > &in_value, const CuMatrixBase< Real > &out_value, const CuMatrixBase< Real > &out_deriv, Real power)
 Differentiate backward through the GroupPnorm function. More...
 
void GroupMax (const CuMatrixBase< Real > &src)
 Apply the function y(i) = (max_{j = i*G}^{(i+1)*G-1} x_j where G = x.NumCols() / y.NumCols() must be an integer. More...
 
void GroupMaxDeriv (const CuMatrixBase< Real > &input, const CuMatrixBase< Real > &output)
 Calculate derivatives for the GroupMax function above, where "input" is the input to the GroupMax function above (i.e. More...
 
void ParametricRelu (const CuMatrixBase< Real > &src, const CuVectorBase< Real > &alpha, const CuVectorBase< Real > &beta)
 Compute the parametric rectified linear unit function; element by element, *this = src * (src > 0 ? alpha : beta) More...
 
void DiffParametricRelu (const CuMatrixBase< Real > &value, const CuMatrixBase< Real > &diff, const CuVectorBase< Real > &alpha, const CuVectorBase< Real > &beta)
 Differentiate backward through the parametric relu function. More...
 
void Tanh (const CuMatrixBase< Real > &src)
 Compute the hyperbolic tangent (tanh) function; element by element, *this = tanh(src). More...
 
void DiffSigmoid (const CuMatrixBase< Real > &value, const CuMatrixBase< Real > &diff)
 Differentiate backward through the sigmoid function. More...
 
void DiffTanh (const CuMatrixBase< Real > &value, const CuMatrixBase< Real > &diff)
 Differentiate backward through the tanh function. More...
 
void DiffSoftmaxPerRow (const CuMatrixBase< Real > &value, const CuMatrixBase< Real > &diff)
 Differentiate backward through the softmax function. More...
 
void DiffLogSoftmaxPerRow (const CuMatrixBase< Real > &out_value, const CuMatrixBase< Real > &out_deriv)
 Differentiate backward through the log softmax function. More...
 
void DiffXent (const CuArray< int32 > &tgt, CuVector< Real > *log_post_tgt)
 Differentiate the block [softmax+cross-entropy] : dE/da = posterior_mat - target_mat, 'E' is error function, 'a' is activation on softmax input. More...
 
void Cholesky (CuMatrixBase< Real > *inv_cholesky=NULL)
 This function does sets *this to the Cholesky factor of *this (i.e. More...
 
void SymInvertPosDef ()
 Inversion for positive definite symmetric matrices. More...
 
void ApplyPow (Real power)
 Apply power to the absolute value of each element. More...
 
void ApplyPowAbs (Real power, bool include_sign=false)
 
void ApplyHeaviside ()
 For each element, sets x = (x > 0 ? 1.0 : 0.0). More...
 
void ApplyFloor (Real floor_val)
 
void ApplyCeiling (Real ceiling_val)
 
void ApplyExp ()
 
void ApplySoftMaxPerRow (const CuMatrixBase< Real > &src)
 Softmax nonlinearity Y = Softmax(X) : Yij = e^Xij / sum_k(e^Xik), done to each row for each row, the max value is first subtracted for good numerical stability. More...
 
void ApplyLogSoftMaxPerRow (const CuMatrixBase< Real > &src)
 LogSoftmax nonlinearity Y = LogSoftmax(X) : Yij = Xij - log(sum_k(e^Xik)), done to each row for each row, the max value is first subtracted for good numerical stability. More...
 
void FindRowMaxId (CuArray< int32 > *id) const
 Find the id of the maximal element for each row. More...
 
void SetZero ()
 Math operations, some calling kernels. More...
 
void Set (Real value)
 
void Add (Real value)
 
void SetZeroAboveDiag ()
 Zeroes all elements for which col > row. More...
 
void Scale (Real value)
 
void ApplyLog ()
 
void MulElements (const CuMatrixBase< Real > &A)
 Multiply two matrices elementwise: C = C .* A. More...
 
void DivElements (const CuMatrixBase< Real > &A)
 Divide two matrices elementwise: C = A ./ A. More...
 
void Max (const CuMatrixBase< Real > &A)
 Do, elementwise, *this = max(*this, A). More...
 
void Min (const CuMatrixBase< Real > &A)
 Do, elementwise, *this = min(*this, A). More...
 
void MulColsVec (const CuVectorBase< Real > &scale)
 scale i'th column by scale[i] More...
 
void MulRowsVec (const CuVectorBase< Real > &scale)
 scale i'th row by scale[i] More...
 
void MulRowsGroupMat (const CuMatrixBase< Real > &src)
 divide each row into src.NumCols() groups, and then scale i'th row's jth group of elements by src[i, j]. More...
 
void DivRowsVec (const CuVectorBase< Real > &div)
 divide i'th row by scale[i] More...
 
void InvertElements ()
 invert the matrix by elements. More...
 
void AddMat (Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType trans=kNoTrans)
 *this += alpha * A More...
 
void AddMatBlocks (Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType trans=kNoTrans)
 This function is like AddMat (it does *this += alpha * src), except that it supports cases where *this and src have different dimension. More...
 
void AddVecToCols (Real alpha, const CuVectorBase< Real > &col, Real beta=1.0)
 (for each column c of *this), c = alpha * col + beta * c More...
 
void AddVecToRows (Real alpha, const CuVectorBase< Real > &row, Real beta=1.0)
 (for each row r of *this), r = alpha * row + beta * r More...
 
void AddMatMat (Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
 C = alpha * A(^T)*B(^T) + beta * C. More...
 
void AddVecVec (Real alpha, const CuVectorBase< Real > &x, const CuVectorBase< Real > &y)
 A = alpha * x * y^T + A . More...
 
void SetMatMatDivMat (const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, const CuMatrixBase< Real > &C)
 *this = a * b / c (by element; when c = 0, *this = a) *this can be an alias of a, b or c safely and get expected result. More...
 
void SymAddMat2 (const Real alpha, const CuMatrixBase< Real > &M, MatrixTransposeType transA, Real beta)
 *this = beta * *this + alpha * M M^T, for symmetric matrices. More...
 
void AddMatBlock (Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuBlockMatrix< Real > &B, MatrixTransposeType transB, Real beta)
 This function is like AddMatMat but for where the second argument is of type CuBlockMatrix (a block-diagonal matrix of blocks). More...
 
void AddDiagVecMat (const Real alpha, const CuVectorBase< Real > &v, const CuMatrixBase< Real > &M, MatrixTransposeType transM, Real beta=1.0)
 *this = beta * *this + alpha * diag(v) * M [or M^T]. More...
 
void AddMatDiagVec (const Real alpha, const CuMatrixBase< Real > &M, MatrixTransposeType transM, CuVectorBase< Real > &v, Real beta=1.0)
 
void AddMatMatElements (const Real alpha, const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, const Real beta)
 *this = beta * *this + alpha * A .* B (.* element by element multiplication) More...
 
void AddMatSp (const Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuSpMatrix< Real > &B, const Real beta)
 this <– beta*this + alpha*A*B More...
 
void AddSpMat (const Real alpha, const CuSpMatrix< Real > &A, const CuMatrixBase< Real > &B, MatrixTransposeType transB, const Real beta)
 this <– beta*this + alpha*SpA*B More...
 
void AddTpMat (const Real alpha, const CuTpMatrix< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, const Real beta)
 this <– beta*this + alpha*A*B. More...
 
void AddMatTp (const Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuTpMatrix< Real > &B, MatrixTransposeType transB, const Real beta)
 this <– beta*this + alpha*A*B. More...
 
void CopyFromBlock (const CuBlockMatrix< Real > &B, MatrixTransposeType trans=kNoTrans)
 
void CopyLowerToUpper ()
 
void CopyUpperToLower ()
 
CuSubMatrix< Real > Range (const MatrixIndexT row_offset, const MatrixIndexT num_rows, const MatrixIndexT col_offset, const MatrixIndexT num_cols) const
 
CuSubMatrix< Real > RowRange (const MatrixIndexT row_offset, const MatrixIndexT num_rows) const
 
CuSubMatrix< Real > ColRange (const MatrixIndexT col_offset, const MatrixIndexT num_cols) const
 
const CuSubVector< Real > Row (MatrixIndexT i) const
 
CuSubVector< Real > Row (MatrixIndexT i)
 
CuValue< Real > operator() (MatrixIndexT r, MatrixIndexT c)
 
Real operator() (MatrixIndexT r, MatrixIndexT c) const
 
Real Sum () const
 
Real Max () const
 
Real Min () const
 
Real Trace (bool check_square=true) const
 Return the trace. If check_square = true, will crash if matrix is not square. More...
 
void SetRandn ()
 
void SetRandUniform ()
 
void Write (std::ostream &os, bool binary) const
 
void AddElements (Real alpha, const std::vector< MatrixElement< Real > > &input)
 
void AddElements (Real alpha, const CuArray< Int32Pair > &indexes, const Real *input)
 
void Lookup (const std::vector< Int32Pair > &indexes, Real *output) const
 
void Lookup (const CuArray< Int32Pair > &indexes, Real *output) const
 
void EqualElementMask (const CuMatrixBase< Real > &mat, CuMatrix< Real > *mask) const
 
const Real * RowData (MatrixIndexT r) const
 Get raw row pointer (const). More...
 
Real * RowData (MatrixIndexT r)
 Get raw row pointer. More...
 
const Real * Data () const
 Return data pointer (const). More...
 
Real * Data ()
 Return data pointer. More...
 
const MatrixBase< Real > & Mat () const
 
MatrixBase< Real > & Mat ()
 

Protected Member Functions

 CuMatrixBase ()
 
 CuMatrixBase (Real *data, MatrixIndexT num_rows, MatrixIndexT num_cols, MatrixIndexT stride)
 This constructor takes the #rows, #cols and stride; it's called from the constructor of CuSubMatrix. More...
 

Protected Attributes

Real * data_
 GPU data pointer (or regular matrix data pointer,. More...
 
MatrixIndexT num_cols_
 
MatrixIndexT num_rows_
 
MatrixIndexT stride_
 

Private Member Functions

 KALDI_DISALLOW_COPY_AND_ASSIGN (CuMatrixBase)
 

Friends

class CuMatrixBase< float >
 
class CuMatrixBase< double >
 
class CuVectorBase< float >
 
class CuVectorBase< double >
 
class VectorBase< Real >
 
class CuSpMatrix< Real >
 
class CuTpMatrix< float >
 
class CuTpMatrix< double >
 
class CuVectorBase< Real >
 
class CuSubMatrix< Real >
 
class CuRand< Real >
 
class CuSubVector< Real >
 
class CuBlockMatrix< Real >
 
class CuSparseMatrix< float >
 
class CuSparseMatrix< double >
 
class CuSparseMatrix< Real >
 
Real TraceMatMat (const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, MatrixTransposeType trans)
 
Real TraceMatSmat (const CuMatrixBase< Real > &A, const CuSparseMatrix< Real > &B, MatrixTransposeType trans)
 
void AddMatMatBatched (const Real alpha, std::vector< CuSubMatrix< Real > * > &C, const std::vector< CuSubMatrix< Real > * > &A, MatrixTransposeType transA, const std::vector< CuSubMatrix< Real > * > &B, MatrixTransposeType transB, const Real beta)
 Does multiple matrix multiplications, executing them in parallel using cuBLAS's gemmBatched if we are using a GPU. More...
 

Detailed Description

template<typename Real>
singleton kaldi::CuMatrixBase< Real >

Matrix for CUDA computing.

Does the computation on the CUDA card when CUDA is compiled in and we have a suitable GPU (CuDevice::Instantiate().Enabled() == true); otherwise, does it on the CPU.

Definition at line 67 of file matrix-common.h.

Constructor & Destructor Documentation

CuMatrixBase ( )
inlineprotected

Definition at line 642 of file cu-matrix.h.

642 : data_(NULL), num_cols_(0), num_rows_(0), stride_(0) { }
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
MatrixIndexT stride_
Definition: cu-matrix.h:662
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
CuMatrixBase ( Real *  data,
MatrixIndexT  num_rows,
MatrixIndexT  num_cols,
MatrixIndexT  stride 
)
inlineprotected

This constructor takes the #rows, #cols and stride; it's called from the constructor of CuSubMatrix.

Definition at line 646 of file cu-matrix.h.

649  :
650  data_(data), num_cols_(num_cols), num_rows_(num_rows), stride_(stride) { }
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
MatrixIndexT stride_
Definition: cu-matrix.h:662
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
MatrixIndexT num_rows_
Definition: cu-matrix.h:661

Member Function Documentation

void Add ( Real  value)

Definition at line 546 of file cu-matrix.cc.

References data_.

Referenced by BackpropTruncationComponent::Backprop(), TanhComponent::Backprop(), Xent::Eval(), main(), kaldi::MeanVariance(), DropoutMaskComponent::Propagate(), DropoutComponent::Propagate(), Dropout::PropagateFnc(), ClipGradientComponent::RepairGradients(), TanhComponent::StoreStats(), kaldi::TestCuMatrixCompObjfAndDeriv(), kaldi::nnet3::TestSimpleComponentPropagateProperties(), kaldi::UnitTestCuMatrixAdd(), kaldi::UnitTestCuMatrixAdd2(), kaldi::UnitTestCuMatrixEqualElementMask(), kaldi::UnitTestCuMatrixObjfDeriv(), and kaldi::UnitTestCuMatrixSetRandUniform().

546  {
547 #if HAVE_CUDA == 1
548  if (CuDevice::Instantiate().Enabled()) {
549  if (num_rows_ == 0) return;
550  CuTimer tim;
551 
552  dim3 dimGrid, dimBlock;
553  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
554  &dimGrid, &dimBlock);
555 
556  cuda_add(dimGrid, dimBlock, data_, value, Dim());
557  CU_SAFE_CALL(cudaGetLastError());
558 
559  CuDevice::Instantiate().AccuProfile(__func__, tim);
560  } else
561  #endif
562  {
563  Mat().Add(value);
564  }
565 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void AddCols ( const CuMatrixBase< Real > &  src,
const CuArray< MatrixIndexT > &  indices 
)

Add column indices[r] of src to column r.

As a special case, if indexes[i] == -1, skip column i indices.size() must equal this->NumCols(), and src.NumRows() must equal this.NumRows()

Definition at line 2470 of file cu-matrix.cc.

References CuArray< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuArray< T >::Dim(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by Convolutional1dComponent::Backprop(), ConvolutionalComponent::BackpropagateFnc(), ConvolutionComponent::InderivPatchesToInderiv(), and MaxpoolingComponent::InderivPatchesToInderiv().

2471  {
2472 #if HAVE_CUDA == 1
2473  if (CuDevice::Instantiate().Enabled()) {
2474  KALDI_ASSERT(indices.Dim() == NumCols());
2475  KALDI_ASSERT(NumRows() == src.NumRows());
2476  CuTimer tim;
2477  dim3 dimGrid, dimBlock;
2478  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2479  &dimGrid, &dimBlock);
2480  cuda_add_cols(dimGrid, dimBlock, data_, src.Data(), indices.Data(),
2481  Dim(), src.Stride());
2482  CU_SAFE_CALL(cudaGetLastError());
2483  CuDevice::Instantiate().AccuProfile(__func__, tim);
2484  } else
2485 #endif
2486  {
2487  Mat().AddCols(src.Mat(), indices.Data());
2488  }
2489 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddDiagVecMat ( const Real  alpha,
const CuVectorBase< Real > &  v,
const CuMatrixBase< Real > &  M,
MatrixTransposeType  transM,
Real  beta = 1.0 
)

*this = beta * *this + alpha * diag(v) * M [or M^T].

The same as adding M but scaling each row M_i by v(i).

Definition at line 1231 of file cu-matrix.cc.

References CU2DBLOCK, CuVectorBase< Real >::Data(), CuMatrixBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), kaldi::SameDim(), CuMatrixBase< Real >::Stride(), kaldi::swap(), and CuVectorBase< Real >::Vec().

Referenced by HiddenSoftmax::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), OnlineNaturalGradient::ComputeWt1(), OnlinePreconditioner::ComputeWt1(), kaldi::cu::DiffNormalizePerRow(), CuMatrixBase< Real >::DiffSoftmaxPerRow(), MultiBasisComponent::PropagateFnc(), and kaldi::TestCuMatrixAddDiagVecMat().

1234  {
1235 #if HAVE_CUDA == 1
1236  if (CuDevice::Instantiate().Enabled()) {
1237  if (transM == kNoTrans) {
1238  KALDI_ASSERT(SameDim(*this, M));
1239  } else {
1240  KALDI_ASSERT(M.NumRows() == NumCols() && M.NumCols() == NumRows());
1241  }
1242  KALDI_ASSERT(v.Dim() == this->NumRows());
1243 
1244  CuTimer tim;
1245  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
1246  dim3 dimGrid(n_blocks(num_cols_, CU2DBLOCK),
1247  n_blocks(num_rows_, CU2DBLOCK));
1248  MatrixIndexT M_row_stride = M.Stride(), M_col_stride = 1;
1249  if (transM == kTrans)
1250  std::swap(M_row_stride, M_col_stride);
1251  cuda_add_diag_vec_mat(dimGrid, dimBlock, alpha, data_, Dim(),
1252  v.Data(), M.Data(), M_row_stride, M_col_stride, beta);
1253  CU_SAFE_CALL(cudaGetLastError());
1254  CuDevice::Instantiate().AccuProfile(__func__, tim);
1255  } else
1256 #endif
1257  {
1258  Mat().AddDiagVecMat(alpha, v.Vec(), M.Mat(), transM, beta);
1259  }
1260 }
void swap(basic_filebuf< CharT, Traits > &x, basic_filebuf< CharT, Traits > &y)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void AddElements ( Real  alpha,
const std::vector< MatrixElement< Real > > &  input 
)

Definition at line 2987 of file cu-matrix.cc.

References CU1DBLOCK, data_, rnnlm::i, and KALDI_ASSERT.

Referenced by OnlineNaturalGradient::InitOrthonormalSpecial(), OnlinePreconditioner::InitOrthonormalSpecial(), DiscriminativeComputation::ProcessPosteriors(), and kaldi::UnitTestCuMatrixAddElements().

2988  {
2989  // Checks the dimension.
2990  MatrixIndexT num_rows = this->num_rows_, num_cols = this->num_cols_;
2991  for (int32 i = 0; i < input.size(); ++i) {
2992  KALDI_ASSERT(input[i].row < num_rows && input[i].row >= 0 &&
2993  input[i].column < num_cols && input[i].column >= 0);
2994  }
2995 #if HAVE_CUDA == 1
2996  if (CuDevice::Instantiate().Enabled()) {
2997  void *addr = CuDevice::Instantiate().Malloc(input.size() * sizeof(MatrixElement<Real>));
2998  CU_SAFE_CALL(cudaMemcpy(addr, input.data(),
2999  input.size() * sizeof(MatrixElement<Real>),
3000  cudaMemcpyHostToDevice));
3001 
3002  CuTimer tim;
3003  int dimBlock(CU1DBLOCK);
3004  int dimGrid(n_blocks(input.size(), CU1DBLOCK));
3005 
3006  cuda_matrix_add_elements(dimGrid, dimBlock, this->data_, this->Dim(),
3007  alpha, (MatrixElement<Real>*)addr, input.size());
3008  CU_SAFE_CALL(cudaGetLastError());
3009  CuDevice::Instantiate().Free(addr);
3010  CuDevice::Instantiate().AccuProfile(__func__, tim);
3011  } else
3012 #endif
3013  {
3014  for (int32 i = 0; i < input.size(); i++) {
3015  (*this)(input[i].row, input[i].column) += alpha * input[i].weight;
3016  }
3017  }
3018 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
int32 MatrixIndexT
Definition: matrix-common.h:96
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void AddElements ( Real  alpha,
const CuArray< Int32Pair > &  indexes,
const Real *  input 
)

Definition at line 3021 of file cu-matrix.cc.

References CU1DBLOCK, CuArray< T >::Data(), data_, CuArray< T >::Dim(), Int32Pair::first, rnnlm::i, KALDI_ASSERT, kaldi::kUndefined, and Int32Pair::second.

3022  {
3023  if (indexes.Dim() == 0) return;
3024  KALDI_ASSERT(input != NULL);
3025 
3026 #if HAVE_CUDA == 1
3027  if (CuDevice::Instantiate().Enabled()) {
3028  CuTimer tim;
3029  CuVector<Real> tmp_vec(indexes.Dim(), kUndefined);
3030  CU_SAFE_CALL(cudaMemcpy(tmp_vec.Data(), input, indexes.Dim() * sizeof(Real),
3031  cudaMemcpyHostToDevice));
3032 
3033  int dimBlock(CU1DBLOCK);
3034  int dimGrid = n_blocks(indexes.Dim(), CU1DBLOCK);
3035  cuda_matrix_add_indexed_values(dimGrid, dimBlock, this->Dim(), alpha,
3036  indexes.Data(), tmp_vec.Data(), indexes.Dim(), this->data_);
3037  CU_SAFE_CALL(cudaGetLastError());
3038  CuDevice::Instantiate().AccuProfile(__func__, tim);
3039  } else
3040 #endif
3041  {
3042  MatrixIndexT num_rows = this->num_rows_, num_cols = this->num_cols_;
3043  const Int32Pair *index = indexes.Data();
3044  for (int32 i = 0; i < indexes.Dim(); i++) {
3045  KALDI_ASSERT(index[i].first < num_rows && index[i].first >= 0 &&
3046  index[i].second < num_cols && index[i].second >= 0);
3047  (*this)(index[i].first, index[i].second) += alpha * input[i];
3048  }
3049  }
3050 }
const T * Data() const
Get raw pointer.
Definition: cu-array.h:65
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
int32 MatrixIndexT
Definition: matrix-common.h:96
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:62
int32_cuda second
Definition: cu-matrixdim.h:86
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
int32_cuda first
Definition: cu-matrixdim.h:85
void AddMat ( Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  trans = kNoTrans 
)

*this += alpha * A

Definition at line 939 of file cu-matrix.cc.

References CU2DBLOCK, data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by TimeHeightConvolutionComponent::Add(), RepeatedAffineComponent::Add(), NaturalGradientAffineComponent::Add(), AffineComponent::Add(), BlockAffineComponent::Add(), ConvolutionComponent::Add(), Convolutional1dComponent::Add(), LstmNonlinearityComponent::Add(), CuRand< Real >::AddGaussNoise(), GeneralMatrix::AddToMat(), CuMatrixBase< Real >::ApproxEqual(), SigmoidComponent::Backprop(), LstmNonlinearityComponent::Backprop(), Splice::BackpropagateFnc(), AveragePoolingComponent::BackpropagateFnc(), AveragePooling2DComponent::BackpropagateFnc(), Convolutional2DComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), DiscriminativeComputation::Compute(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), Xent::Eval(), Mse::Eval(), NnetComputer::ExecuteCommand(), TimeHeightConvolutionComponent::PerturbParams(), RepeatedAffineComponent::PerturbParams(), AffineComponent::PerturbParams(), BlockAffineComponent::PerturbParams(), ConvolutionComponent::PerturbParams(), Convolutional1dComponent::PerturbParams(), LstmNonlinearityComponent::PerturbParams(), AdditiveNoiseComponent::Propagate(), Rbm::RbmUpdate(), ClipGradientComponent::RepairGradients(), kaldi::UnitTestCuMatrixAddMat(), kaldi::UnitTestCuMatrixAddMatBlocks1(), kaldi::UnitTestCuMatrixAddMatBlocks1Trans(), kaldi::UnitTestCuMatrixAddMatBlocks2(), kaldi::UnitTestCuMatrixAddMatDiagVec(), kaldi::UnitTestCuMatrixAddMatMatElements(), kaldi::UnitTestLstmNonlinearity(), kaldi::nnet3::UnitTestNnetInputDerivatives(), LinearTransform::Update(), AffineTransform::Update(), RecurrentComponent::Update(), ConvolutionalComponent::Update(), Convolutional2DComponent::Update(), LstmProjected::Update(), NaturalGradientRepeatedAffineComponent::Update(), BlstmProjected::Update(), ConvolutionComponent::Update(), Convolutional1dComponent::Update(), and TimeHeightConvolutionComponent::UpdateNaturalGradient().

940  {
941 
942 #if HAVE_CUDA == 1
943  if (CuDevice::Instantiate().Enabled()) {
944  if (transA == kNoTrans) {
945  KALDI_ASSERT(A.NumRows() == num_rows_ && A.NumCols() == num_cols_);
946  } else {
947  KALDI_ASSERT(A.NumCols() == num_rows_ && A.NumRows() == num_cols_);
948  }
949  if (num_rows_ == 0) return;
950  CuTimer tim;
951  // This block dimension seems to work better than the
952  // one from GetBlockSizesForSimpleMatrixOperation().
953  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
954  dim3 dimGrid(n_blocks(NumCols(), CU2DBLOCK),
955  n_blocks(NumRows(), CU2DBLOCK));
956  cuda_add_mat(dimGrid, dimBlock, alpha, A.data_,
957  data_, Dim(), A.Stride(),
958  (transA == kTrans ? 1 : 0));
959  CU_SAFE_CALL(cudaGetLastError());
960 
961  CuDevice::Instantiate().AccuProfile(__func__, tim);
962  } else
963 #endif
964  {
965  Mat().AddMat(alpha, A.Mat(), transA);
966  }
967 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void AddMatBlock ( Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  transA,
const CuBlockMatrix< Real > &  B,
MatrixTransposeType  transB,
Real  beta 
)

This function is like AddMatMat but for where the second argument is of type CuBlockMatrix (a block-diagonal matrix of blocks).

Definition at line 2915 of file cu-matrix.cc.

References CuMatrixBase< Real >::AddMatMat(), CuBlockMatrix< Real >::Block(), CU2DBLOCK, CuMatrixBase< Real >::Data(), data_, KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuBlockMatrix< Real >::NumBlocks(), CuBlockMatrix< Real >::NumCols(), CuMatrixBase< Real >::NumCols(), CuBlockMatrix< Real >::NumRows(), CuMatrixBase< Real >::NumRows(), CuMatrixBase< Real >::Stride(), and kaldi::swap().

Referenced by kaldi::UnitTestCuBlockMatrixAddMatBlock().

2919  {
2920  // Check dimensions
2921  int32 A_num_rows = A.NumRows(), A_num_cols = A.NumCols(),
2922  A_row_stride = A.Stride(), A_col_stride = 1,
2923  B_num_rows = B.NumRows(), B_num_cols = B.NumCols();
2924  if (transA == kTrans) {
2925  std::swap(A_num_rows, A_num_cols);
2926  std::swap(A_row_stride, A_col_stride);
2927  }
2928  if (transB == kTrans) {
2929  std::swap(B_num_rows, B_num_cols);
2930  }
2931  // At this point the {A,B}_{rows,cols} variables are
2932  // after any transposition.
2933  KALDI_ASSERT(NumRows() == A_num_rows && NumCols() == B_num_cols);
2934  KALDI_ASSERT(A_num_cols == B_num_rows);
2935  int32 B_num_blocks = B.NumBlocks();
2936 
2937  if (num_rows_ == 0) return;
2938 #if HAVE_CUDA == 1
2939  if (CuDevice::Instantiate().Enabled()) {
2940  CuTimer tim;
2941  MatrixDim this_dim = Dim();
2942 
2943  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2944  // (x,y) indices will be (row of *this, block of B)
2945  dim3 dimGrid(n_blocks(num_rows_, CU2DBLOCK),
2946  n_blocks(B_num_blocks, CU2DBLOCK));
2947 
2948  // caution: the use of x as the row-index is not good, but
2949  // this code is not much used, so I'm not updating it.a
2950  cuda_add_mat_blockmat(dimGrid, dimBlock, data_, this_dim, A.Data(),
2951  A_num_rows, A_num_cols, A_row_stride, A_col_stride,
2952  B.CuData(), B_num_blocks, alpha, beta,
2953  (transB == kTrans ? 1 : 0));
2954 
2955  CU_SAFE_CALL(cudaGetLastError());
2956 
2957  CuDevice::Instantiate().AccuProfile(__func__, tim);
2958  } else
2959 #endif
2960  {
2961  // "row_offset" and "col_offset" are offsets into B (or into B^T, if
2962  // transB == kTrans).
2963  int32 row_offset = 0, col_offset = 0;
2964  for (int32 b = 0; b < B_num_blocks; b++) {
2965  const CuSubMatrix<Real> this_block = B.Block(b);
2966  int32 this_num_rows = this_block.NumRows(),
2967  this_num_cols = this_block.NumCols();
2968  if (transB == kTrans) std::swap(this_num_rows, this_num_cols);
2969  CuSubMatrix<Real> this_part(*this, 0, num_rows_,
2970  col_offset, this_num_cols);
2971  CuSubMatrix<Real> A_part = (transA == kNoTrans ?
2973  row_offset, this_num_rows) :
2974  CuSubMatrix<Real>(A, row_offset, this_num_rows,
2975  0, num_rows_));
2976  this_part.AddMatMat(alpha, A_part, transA, this_block, transB, beta);
2977  row_offset += this_num_rows;
2978  col_offset += this_num_cols;
2979  }
2980  // Note: the values being compared below are all after applying any
2981  // transposition to B.
2982  KALDI_ASSERT(row_offset == B_num_rows && col_offset == B_num_cols);
2983  }
2984 }
friend class CuSubMatrix< Real >
Definition: cu-matrix.h:88
Structure containing size of the matrix plus stride.
Definition: cu-matrixdim.h:52
void swap(basic_filebuf< CharT, Traits > &x, basic_filebuf< CharT, Traits > &y)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void AddMatBlocks ( Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  trans = kNoTrans 
)

This function is like AddMat (it does *this += alpha * src), except that it supports cases where *this and src have different dimension.

There are two allowed cases:

(1) *this is larger than src; we do a broadcasting operation. *this must have NumRows() == a * src.NumRows() and NumCols() == b * src.NumCols() for integer a >= 1, b >= 1. *this will be treated as a being made up of of blocks with the same size as src, and to each block we'll add alpha * src. This case does not support trans == kTrans.

(2) *this is smaller than src; we sum. src.NumRows() must == a * this->NumRows(), and src.NumCols() must == b * this->NumCols(), for a >= 1, b >= 1. In this case, src will be treated as being made up of blocks with the same size as *this, and to *this we will add the summation of all of those blocks.

Definition at line 970 of file cu-matrix.cc.

References MatrixBase< Real >::AddMat(), data_, CuMatrixBase< Real >::data_, CuMatrixBase< Real >::Dim(), rnnlm::i, rnnlm::j, KALDI_ASSERT, KALDI_ERR, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::Mat(), MatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by SumBlockComponent::Backprop(), SumBlockComponent::Propagate(), kaldi::UnitTestCuMatrixAddMatBlocks1(), kaldi::UnitTestCuMatrixAddMatBlocks1Trans(), kaldi::UnitTestCuMatrixAddMatBlocks2(), ConvolutionComponent::Update(), and Convolutional1dComponent::Update().

971  {
972  if (num_rows_ == 0 || num_cols_ == 0) return;
973 
974  if (A.NumRows() >= num_rows_ && A.NumCols() >= num_cols_) {
975  // This is the "summing", not broadcasting, version of AddMatBlocks.
976  // It supports both regular and transposed operation.
977  int32 num_row_blocks, num_col_blocks;
978  if (transA == kNoTrans) {
979  KALDI_ASSERT(A.NumRows() % num_rows_ == 0 && A.NumCols() % num_cols_ == 0);
980  num_row_blocks = A.Mat().NumRows() / num_rows_;
981  num_col_blocks = A.Mat().NumCols() / num_cols_;
982  } else {
983  KALDI_ASSERT(A.NumRows() % num_cols_ == 0 && A.NumCols() % num_rows_ == 0);
984  num_row_blocks = A.Mat().NumRows() / num_cols_;
985  num_col_blocks = A.Mat().NumCols() / num_rows_;
986  }
987 #if HAVE_CUDA == 1
988  if (CuDevice::Instantiate().Enabled()) {
989  CuTimer tim;
990  dim3 dimGrid, dimBlock;
991  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
992  &dimGrid, &dimBlock);
993  cuda_add_mat_blocks(dimGrid, dimBlock, alpha, A.data_, num_row_blocks,
994  num_col_blocks, data_, Dim(), A.Stride(),
995  (transA == kTrans ? 1 : 0));
996  CU_SAFE_CALL(cudaGetLastError());
997 
998  CuDevice::Instantiate().AccuProfile(__func__, tim);
999  } else
1000 #endif
1001  {
1002  int32 nr, nc;
1003  if (transA == kNoTrans) {
1004  nr = num_rows_;
1005  nc = num_cols_;
1006  } else {
1007  nr = num_cols_;
1008  nc = num_rows_;
1009  }
1010  for (int32 i = 0; i < num_row_blocks; i++) {
1011  for (int32 j = 0; j < num_col_blocks; j++) {
1012  Mat().AddMat(alpha, SubMatrix<Real>(A.Mat(), i * nr, nr, j * nc, nc),
1013  transA);
1014  }
1015  }
1016  }
1017  } else {
1018  // This is the "broadcasting" version of AddMatBlocks, where
1019  // *this is larger than src.
1020  if (!(num_rows_ % A.NumRows() == 0 && num_cols_ % A.NumCols() == 0))
1021  KALDI_ERR << "Invalid sizes of arguments";
1022  if (transA != kNoTrans)
1023  KALDI_ERR << "Transposed operation not supported currently.";
1024 #if HAVE_CUDA == 1
1025  if (CuDevice::Instantiate().Enabled()) {
1026  CuTimer tim;
1027  dim3 dimGrid, dimBlock;
1028  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1029  &dimGrid, &dimBlock);
1030  cuda_add_mat_repeated(dimGrid, dimBlock, alpha,
1031  A.data_, A.Dim(), data_, Dim());
1032  CU_SAFE_CALL(cudaGetLastError());
1033  CuDevice::Instantiate().AccuProfile(__func__, tim);
1034  } else
1035 #endif
1036  {
1037  const MatrixBase<Real> &src_mat = A.Mat(),
1038  &this_mat = this->Mat();
1039  for (int32 row_offset = 0; row_offset < NumRows();
1040  row_offset += src_mat.NumRows()) {
1041  for (int32 col_offset = 0; col_offset < NumCols();
1042  col_offset += src_mat.NumCols()) {
1043  SubMatrix<Real> this_part(this_mat,
1044  row_offset, src_mat.NumRows(),
1045  col_offset, src_mat.NumCols());
1046  this_part.AddMat(alpha, src_mat);
1047  }
1048  }
1049  }
1050  }
1051 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define KALDI_ERR
Definition: kaldi-error.h:127
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void AddMatDiagVec ( const Real  alpha,
const CuMatrixBase< Real > &  M,
MatrixTransposeType  transM,
CuVectorBase< Real > &  v,
Real  beta = 1.0 
)

Definition at line 1264 of file cu-matrix.cc.

References CuVectorBase< Real >::Data(), CuMatrixBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), kaldi::SameDim(), CuMatrixBase< Real >::Stride(), kaldi::swap(), and CuVectorBase< Real >::Vec().

Referenced by BatchNormComponent::Backprop(), SigmoidComponent::RepairGradients(), and TanhComponent::RepairGradients().

1268  {
1269 #if HAVE_CUDA == 1
1270  if (CuDevice::Instantiate().Enabled()) {
1271  if (transM == kNoTrans) {
1272  KALDI_ASSERT(SameDim(*this, M));
1273  } else {
1274  KALDI_ASSERT(M.NumRows() == NumCols() && M.NumCols() == NumRows());
1275  }
1276  KALDI_ASSERT(v.Dim() == this->NumCols());
1277 
1278  CuTimer tim;
1279  dim3 dimGrid, dimBlock;
1280  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1281  &dimGrid, &dimBlock);
1282  MatrixIndexT M_row_stride = M.Stride(), M_col_stride = 1;
1283  if (transM == kTrans) std::swap(M_row_stride, M_col_stride);
1284  cuda_add_mat_diag_vec(dimGrid, dimBlock, alpha, data_, Dim(),
1285  M.Data(), M_row_stride, M_col_stride, v.Data(), beta);
1286  CU_SAFE_CALL(cudaGetLastError());
1287  CuDevice::Instantiate().AccuProfile(__func__, tim);
1288  } else
1289 #endif
1290  {
1291  Mat().AddMatDiagVec(alpha, M.Mat(), transM, v.Vec(), beta);
1292  }
1293 }
void swap(basic_filebuf< CharT, Traits > &x, basic_filebuf< CharT, Traits > &y)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddMatMat ( Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  transA,
const CuMatrixBase< Real > &  B,
MatrixTransposeType  transB,
Real  beta 
)

C = alpha * A(^T)*B(^T) + beta * C.

Definition at line 1141 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, kaldi::kTrans, CuMatrixBase< Real >::Mat(), rnnlm::n, CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by CuMatrixBase< Real >::AddMatBlock(), CuBlockMatrix< Real >::AddMatMat(), CuMatrixBase< BaseFloat >::AddMatSp(), CuMatrixBase< BaseFloat >::AddMatTp(), CuMatrixBase< BaseFloat >::AddSpMat(), CuMatrixBase< BaseFloat >::AddTpMat(), RepeatedAffineComponent::Backprop(), AffineComponent::Backprop(), FixedLinearComponent::Backprop(), FixedAffineComponent::Backprop(), LinearTransform::BackpropagateFnc(), AffineTransform::BackpropagateFnc(), RecurrentComponent::BackpropagateFnc(), ConvolutionalComponent::BackpropagateFnc(), LstmProjected::BackpropagateFnc(), BlstmProjected::BackpropagateFnc(), AffineComponent::CollapseWithNext(), AffineComponent::CollapseWithPrevious(), OnlineNaturalGradient::ComputeWt1(), OnlinePreconditioner::ComputeWt1(), kaldi::nnet3::time_height_convolution::ConvolveBackwardDataInternal(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParamsInternal(), kaldi::nnet3::time_height_convolution::ConvolveForwardInternal(), kaldi::CuVectorUnitTestAddDiagMatMat(), OnlinePreconditioner::InitOrthonormalSpecial(), kaldi::nnet2::PreconditionDirections(), OnlineNaturalGradient::PreconditionDirectionsInternal(), OnlinePreconditioner::PreconditionDirectionsInternal(), AffineComponent::Propagate(), DctComponent::Propagate(), FixedLinearComponent::Propagate(), FixedAffineComponent::Propagate(), KlHmm::PropagateFnc(), LinearTransform::PropagateFnc(), AffineTransform::PropagateFnc(), RecurrentComponent::PropagateFnc(), Rbm::PropagateFnc(), LstmProjected::PropagateFnc(), BlstmProjected::PropagateFnc(), Rbm::RbmUpdate(), Rbm::Reconstruct(), OnlineNaturalGradient::ReorthogonalizeXt1(), OnlinePreconditioner::ReorthogonalizeXt1(), kaldi::TestCuMatrixMatMat(), kaldi::UnitTestCuBlockMatrixAddMatMat(), kaldi::UnitTestCuCholesky(), kaldi::UnitTestCuMatrixAddMatMat(), kaldi::UnitTestCuMatrixSymAddMat2(), kaldi::UnitTestCuMatrixSymInvertPosDef(), kaldi::UnitTestCuSpMatrixInvert(), LinearTransform::Update(), AffineTransform::Update(), RecurrentComponent::Update(), ConvolutionalComponent::Update(), Convolutional2DComponent::Update(), RepeatedAffineComponent::Update(), NaturalGradientAffineComponent::Update(), AffineComponentPreconditioned::Update(), AffineComponentPreconditionedOnline::Update(), BlockAffineComponentPreconditioned::Update(), AffineComponent::UpdateSimple(), and BlockAffineComponent::UpdateSimple().

1143  {
1144 
1145 
1146  // CUBLAS is col-major, cudamatrix is row-major, how to do the mapping?
1147  // keep trans..., just swap A&B matrices: A->B B->A
1148  MatrixIndexT m = ((transB==kTrans)? B.NumRows() : B.NumCols());
1149  MatrixIndexT n = ((transA==kTrans)? A.NumCols() : A.NumRows());
1150  MatrixIndexT k = ((transB==kTrans)? B.NumCols() : B.NumRows());
1151  MatrixIndexT k1 = ((transA==kTrans)? A.NumRows() : A.NumCols());
1152 
1153  KALDI_ASSERT(m == NumCols());
1154  KALDI_ASSERT(n == NumRows());
1155  KALDI_ASSERT(k == k1);
1156 
1157  if (m == 0) return;
1158 
1159 
1160 #if HAVE_CUDA == 1
1161  if (CuDevice::Instantiate().Enabled()) {
1162  CuTimer tim;
1163  CU_SAFE_CALL(cublas_gemm(GetCublasHandle(),
1164  (transB==kTrans? CUBLAS_OP_T:CUBLAS_OP_N),
1165  (transA==kTrans? CUBLAS_OP_T:CUBLAS_OP_N),
1166  m, n, k, alpha, B.data_, B.Stride(),
1167  A.data_, A.Stride(), beta, data_, Stride()));
1168 
1169  CuDevice::Instantiate().AccuProfile(__func__, tim);
1170  } else
1171 #endif
1172  {
1173  Mat().AddMatMat(alpha, A.Mat(), transA, B.Mat(), transB, beta);
1174  }
1175 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
struct rnnlm::@11::@12 n
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT Stride() const
Definition: cu-matrix.h:197
void AddMatMatElements ( const Real  alpha,
const CuMatrixBase< Real > &  A,
const CuMatrixBase< Real > &  B,
const Real  beta 
)

*this = beta * *this + alpha * A .* B (.* element by element multiplication)

Definition at line 1296 of file cu-matrix.cc.

References CuMatrixBase< Real >::Data(), data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by StatisticsExtractionComponent::Backprop(), StatisticsPoolingComponent::Propagate(), and kaldi::UnitTestCuMatrixSetMatMatDivMat().

1297  {
1298 #if HAVE_CUDA == 1
1299  if (CuDevice::Instantiate().Enabled()) {
1300  KALDI_ASSERT(SameDim(*this, A) && SameDim(A, B));
1301  CuTimer tim;
1302  dim3 dimGrid, dimBlock;
1303  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1304  &dimGrid, &dimBlock);
1305  cuda_add_mat_mat_elements(dimGrid, dimBlock, this->data_, A.Data(),
1306  B.Data(), Dim(), A.Stride(), B.Stride(), alpha, beta);
1307  CuDevice::Instantiate().AccuProfile(__func__, tim);
1308  } else
1309 #endif
1310  {
1311  Mat().AddMatMatElements(alpha, A.Mat(), B.Mat(), beta);
1312  }
1313 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddMatSp ( const Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  transA,
const CuSpMatrix< Real > &  B,
const Real  beta 
)
inline

this <– beta*this + alpha*A*B

Definition at line 489 of file cu-matrix.h.

492  {
493  CuMatrix<Real> M(B);
494  return AddMatMat(alpha, A, transA, M, kNoTrans, beta);
495  }
void AddMatMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
C = alpha * A(^T)*B(^T) + beta * C.
Definition: cu-matrix.cc:1141
void AddMatTp ( const Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  transA,
const CuTpMatrix< Real > &  B,
MatrixTransposeType  transB,
const Real  beta 
)
inline

this <– beta*this + alpha*A*B.

Definition at line 516 of file cu-matrix.h.

Referenced by kaldi::UnitTestCuMatrixAddMatTp().

519  {
520  CuMatrix<Real> M(B);
521  return AddMatMat(alpha, A, transA, M, transB, beta);
522  }
void AddMatMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
C = alpha * A(^T)*B(^T) + beta * C.
Definition: cu-matrix.cc:1141
void AddRowRanges ( const CuMatrixBase< Real > &  src,
const CuArray< Int32Pair > &  indexes 
)

For each row r of this and for each column c, do (*this)(r, c) += src(j, c), where j ranges from indexes[r].first through indexes[r].second - 1.

In general indexes must be >= 0 and < src.NumRows(); but to represent an empty range you may use the pair (-1, -1) or any pair of numbers (i, j) such that i >= j.

Definition at line 2641 of file cu-matrix.cc.

References CuArray< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuMatrixBase< Real >::data_, CuArray< T >::Dim(), CuMatrixBase< Real >::Dim(), Int32Pair::first, KALDI_ASSERT, CuMatrixBase< Real >::NumCols(), Int32Pair::second, and CuMatrixBase< Real >::stride_.

Referenced by StatisticsPoolingComponent::Backprop(), NnetComputer::ExecuteCommand(), StatisticsPoolingComponent::Propagate(), and kaldi::UnitTestCuMatrixAddRowRanges().

2642  {
2643  KALDI_ASSERT(static_cast<MatrixIndexT>(indexes.Dim()) == NumRows());
2644  KALDI_ASSERT(src.NumCols() == NumCols());
2645  if (NumRows() == 0) return;
2646 #if HAVE_CUDA == 1
2647  if (CuDevice::Instantiate().Enabled()) {
2648  CuTimer tim;
2649  dim3 dimGrid, dimBlock;
2650  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2651  &dimGrid, &dimBlock);
2652  cuda_add_row_ranges(dimGrid, dimBlock,
2653  data_, Dim(), src.Data(), src.Dim(), indexes.Data());
2654  CU_SAFE_CALL(cudaGetLastError());
2655  CuDevice::Instantiate().AccuProfile(__func__, tim);
2656  } else
2657 #endif
2658  { // Implement here for the CPU..
2659  int32 num_rows = this->num_rows_, num_cols = this->num_cols_,
2660  this_stride = this->stride_, src_stride = src.stride_;
2661  Real *data = this->data_;
2662  const Real *src_data = src.data_;
2663  const Int32Pair *indexes_data = indexes.Data();
2664  for (int32 row = 0; row < num_rows; row++) {
2665  int32 start_row = indexes_data[row].first,
2666  end_row = indexes_data[row].second;
2667  for (int32 col = 0; col < num_cols; col++) {
2668  Real sum = 0.0;
2669  for (int32 src_row = start_row; src_row < end_row; src_row++)
2670  sum += src_data[src_row * src_stride + col];
2671  data[row * this_stride + col] += sum;
2672  }
2673  }
2674  }
2675 }
const T * Data() const
Get raw pointer.
Definition: cu-array.h:65
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
MatrixIndexT stride_
Definition: cu-matrix.h:662
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:62
int32_cuda second
Definition: cu-matrixdim.h:86
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
int32_cuda first
Definition: cu-matrixdim.h:85
void AddRows ( Real  alpha,
const CuMatrixBase< Real > &  src,
const CuArray< MatrixIndexT > &  indexes 
)

Does for each row r, this.Row(r) += alpha * src.row(indexes[r]).

If indexes[r] < 0, does not add anything. src.NumCols() must equal this.NumCols()

Definition at line 2535 of file cu-matrix.cc.

References CuArray< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuArray< T >::Dim(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by StatisticsExtractionComponent::Backprop(), and NnetComputer::ExecuteCommand().

2537  {
2538  if (NumRows() == 0) return;
2539 #if HAVE_CUDA == 1
2540  if (CuDevice::Instantiate().Enabled()) {
2541  KALDI_ASSERT(static_cast<MatrixIndexT>(indexes.Dim()) == NumRows());
2542  KALDI_ASSERT(src.NumCols() == NumCols());
2543  CuTimer tim;
2544  dim3 dimGrid, dimBlock;
2545  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2546  &dimGrid, &dimBlock);
2547  cuda_add_rows(dimGrid, dimBlock, alpha,
2548  data_, src.Data(), indexes.Data(), Dim(), src.Stride());
2549  CU_SAFE_CALL(cudaGetLastError());
2550  CuDevice::Instantiate().AccuProfile(__func__, tim);
2551  } else
2552 #endif
2553  {
2554  Mat().AddRows(alpha, src.Mat(), indexes.Data());
2555  }
2556 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddRows ( Real  alpha,
const CuArray< const Real * > &  src 
)

Does for each row r, this.Row(r) += alpha * src[r], treating src[r] as the beginning of a region of memory representing a vector of floats, of the same length as this.NumCols().

Definition at line 2560 of file cu-matrix.cc.

References CuArray< T >::Data(), data_, CuArray< T >::Dim(), and KALDI_ASSERT.

2560  {
2561  if (NumRows() == 0) return;
2562 #if HAVE_CUDA == 1
2563  if (CuDevice::Instantiate().Enabled()) {
2564  KALDI_ASSERT(static_cast<MatrixIndexT>(src.Dim()) == NumRows());
2565  CuTimer tim;
2566  dim3 dimGrid, dimBlock;
2567  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2568  &dimGrid, &dimBlock);
2569  cuda_add_rows(dimGrid, dimBlock, alpha, data_, src.Data(), Dim());
2570  CU_SAFE_CALL(cudaGetLastError());
2571  CuDevice::Instantiate().AccuProfile(__func__, tim);
2572  } else
2573 #endif
2574  {
2575  Mat().AddRows(alpha, src.Data());
2576  }
2577 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddSpMat ( const Real  alpha,
const CuSpMatrix< Real > &  A,
const CuMatrixBase< Real > &  B,
MatrixTransposeType  transB,
const Real  beta 
)
inline

this <– beta*this + alpha*SpA*B

Definition at line 498 of file cu-matrix.h.

501  {
502  CuMatrix<Real> M(A);
503  return AddMatMat(alpha, M, kNoTrans, B, transB, beta);
504  }
void AddMatMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
C = alpha * A(^T)*B(^T) + beta * C.
Definition: cu-matrix.cc:1141
void AddToDiag ( Real  value)

Adds "value" to the diagonal elements of the matrix.

The matrix *this does not have to be square.

Definition at line 568 of file cu-matrix.cc.

References CU1DBLOCK, rnnlm::d, and data_.

Referenced by kaldi::nnet2::PreconditionDirections(), kaldi::TestCuMatrixCholesky(), and kaldi::UnitTestCuMatrixAddToDiag().

568  {
569 #if HAVE_CUDA == 1
570  if (CuDevice::Instantiate().Enabled()) {
571  if (num_rows_ == 0) return;
572  CuTimer tim;
573  // We'll create a fake matrix with "num_diag" rows, one
574  // columnn, and a stride of "this_stride". The y-value of
575  // the grid/blocks corresponds to the row, in this kernel.
576  MatrixIndexT num_diag = std::min(num_rows_, num_cols_),
577  this_stride = stride_ + 1;
578  dim3 dimBlock(1, CU1DBLOCK);
579  dim3 dimGrid(1, n_blocks(num_diag, CU1DBLOCK));
580  ::MatrixDim d = { num_diag, 1, this_stride };
581  cuda_add(dimGrid, dimBlock, data_, value, d);
582  CU_SAFE_CALL(cudaGetLastError());
583 
584  CuDevice::Instantiate().AccuProfile(__func__, tim);
585  } else
586  #endif
587  {
588  Mat().AddToDiag(value);
589  }
590 }
Structure containing size of the matrix plus stride.
Definition: cu-matrixdim.h:52
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT stride_
Definition: cu-matrix.h:662
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void AddToRows ( Real  alpha,
const CuArray< Real * > &  dst 
) const

For each row r of this matrix, adds it (times alpha) to the array of floats at the location given by dst[r], where dst[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (i.e.

it should point to memory on the GPU if we're using a GPU, or on the CPU otherwise). If dst[r] is NULL, does not do anything for that row. Requires that none of the memory regions pointed to by the pointers in "dst" overlap (e.g. none of the pointers should be the same).

Definition at line 2581 of file cu-matrix.cc.

References CuArray< T >::Data(), data_, CuArray< T >::Dim(), and KALDI_ASSERT.

Referenced by NnetComputer::ExecuteCommand(), and kaldi::UnitTestCuMatrixAddToRows().

2582  {
2583  if (NumRows() == 0) return;
2584 #if HAVE_CUDA == 1
2585  if (CuDevice::Instantiate().Enabled()) {
2586  KALDI_ASSERT(static_cast<MatrixIndexT>(dst.Dim()) == NumRows());
2587  CuTimer tim;
2588  dim3 dimGrid, dimBlock;
2589  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2590  &dimGrid, &dimBlock);
2591  cuda_add_to_rows(dimGrid, dimBlock, alpha, dst.Data(), data_, Dim());
2592  CU_SAFE_CALL(cudaGetLastError());
2593  CuDevice::Instantiate().AccuProfile(__func__, tim);
2594  } else
2595 #endif
2596  {
2597  Mat().AddToRows(alpha, dst.Data());
2598  }
2599 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddTpMat ( const Real  alpha,
const CuTpMatrix< Real > &  A,
MatrixTransposeType  transA,
const CuMatrixBase< Real > &  B,
MatrixTransposeType  transB,
const Real  beta 
)
inline

this <– beta*this + alpha*A*B.

Definition at line 507 of file cu-matrix.h.

Referenced by kaldi::UnitTestCuMatrixAddTpMat().

510  {
511  CuMatrix<Real> M(A);
512  return AddMatMat(alpha, M, transA, B, transB, beta);
513  }
void AddMatMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
C = alpha * A(^T)*B(^T) + beta * C.
Definition: cu-matrix.cc:1141
void AddVecToCols ( Real  alpha,
const CuVectorBase< Real > &  col,
Real  beta = 1.0 
)

(for each column c of *this), c = alpha * col + beta * c

Definition at line 1082 of file cu-matrix.cc.

References data_, CuVectorBase< Real >::data_, CuVectorBase< Real >::Dim(), KALDI_ERR, and CuVectorBase< Real >::Vec().

Referenced by KlHmm::PropagateFnc(), and kaldi::UnitTestCuMatrixAddVecToCols().

1084  {
1085  if (col.Dim() != NumRows()) {
1086  KALDI_ERR << "Non matching dimensions: Rows:" << NumRows() << " VectorDim:" << col.Dim();
1087  }
1088 
1089  #if HAVE_CUDA == 1
1090  if (CuDevice::Instantiate().Enabled()) {
1091  CuTimer tim;
1092  dim3 dimGrid, dimBlock;
1093  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1094  &dimGrid, &dimBlock);
1095  cuda_add_vec_to_cols(dimGrid, dimBlock, alpha, col.data_, beta,
1096  data_, Dim());
1097  CU_SAFE_CALL(cudaGetLastError());
1098 
1099  CuDevice::Instantiate().AccuProfile(__func__, tim);
1100  } else
1101  #endif
1102  {
1103  if (beta != 1.0) Mat().Scale(beta);
1104  Mat().AddVecToCols(alpha, col.Vec());
1105  }
1106 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define KALDI_ERR
Definition: kaldi-error.h:127
::MatrixDim Dim() const
Definition: cu-matrix.h:201
void AddVecToRows ( Real  alpha,
const CuVectorBase< Real > &  row,
Real  beta = 1.0 
)

(for each row r of *this), r = alpha * row + beta * r

Definition at line 1111 of file cu-matrix.cc.

References data_, CuVectorBase< Real >::data_, CuVectorBase< Real >::Dim(), KALDI_ERR, and CuVectorBase< Real >::Vec().

Referenced by DecodableNnetLoopedOnlineBase::AdvanceChunk(), DecodableNnetSimpleLooped::AdvanceChunk(), BatchNormComponent::Backprop(), SimpleSentenceAveragingComponent::BackpropagateFnc(), DecodableAmNnetParallel::Compute(), DecodableNnet2Online::ComputeForFrame(), DecodableNnetSimple::DoNnetComputation(), PerElementOffsetComponent::Propagate(), FixedAffineComponent::Propagate(), FixedBiasComponent::Propagate(), ConvolutionComponent::Propagate(), Convolutional1dComponent::Propagate(), BatchNormComponent::Propagate(), SimpleSentenceAveragingComponent::PropagateFnc(), AffineTransform::PropagateFnc(), RecurrentComponent::PropagateFnc(), Rbm::PropagateFnc(), ConvolutionalComponent::PropagateFnc(), Convolutional2DComponent::PropagateFnc(), AddShift::PropagateFnc(), Rbm::Reconstruct(), SigmoidComponent::RepairGradients(), RectifiedLinearComponent::RepairGradients(), PdfPrior::SubtractOnLogpost(), kaldi::UnitTestCuMatrixAddVecToRows(), and SentenceAveragingComponent::Update().

1113  {
1114  if (row.Dim() != NumCols()) {
1115  KALDI_ERR << "Non matching dimensions: Cols:" << NumCols() << " VectorDim:" << row.Dim();
1116  }
1117 #if HAVE_CUDA == 1
1118  if (CuDevice::Instantiate().Enabled()) {
1119  CuTimer tim;
1120  dim3 dimGrid, dimBlock;
1121  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1122  &dimGrid, &dimBlock);
1123  cuda_add_vec_to_rows(dimGrid, dimBlock, alpha, row.data_, beta, data_, Dim());
1124  CU_SAFE_CALL(cudaGetLastError());
1125 
1126  CuDevice::Instantiate().AccuProfile(__func__, tim);
1127  } else
1128 #endif
1129  {
1130  if (beta != 1.0) Mat().Scale(beta);
1131  Mat().AddVecToRows(alpha, row.Vec());
1132  }
1133 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define KALDI_ERR
Definition: kaldi-error.h:127
::MatrixDim Dim() const
Definition: cu-matrix.h:201
void AddVecVec ( Real  alpha,
const CuVectorBase< Real > &  x,
const CuVectorBase< Real > &  y 
)

A = alpha * x * y^T + A .

Definition at line 1179 of file cu-matrix.cc.

References CuVectorBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), KALDI_ASSERT, rnnlm::n, and CuVectorBase< Real >::Vec().

Referenced by kaldi::UnitTestCuMatrixAddVecVec().

1180  {
1181 
1182  MatrixIndexT m = y.Dim();
1183  MatrixIndexT n = x.Dim();
1184  KALDI_ASSERT(m == NumCols());
1185  KALDI_ASSERT(n == NumRows());
1186 
1187 #if HAVE_CUDA == 1
1188  if (CuDevice::Instantiate().Enabled()) {
1189  CuTimer tim;
1190  CU_SAFE_CALL(cublas_ger(GetCublasHandle(), m, n, alpha,
1191  y.Data(), 1, x.Data(), 1, data_, Stride()));
1192 
1193  CuDevice::Instantiate().AccuProfile(__func__, tim);
1194  } else
1195 #endif
1196  {
1197  Mat().AddVecVec(alpha, x.Vec(), y.Vec());
1198  }
1199 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
struct rnnlm::@11::@12 n
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT Stride() const
Definition: cu-matrix.h:197
void ApplyCeiling ( Real  ceiling_val)

Definition at line 2374 of file cu-matrix.cc.

References data_.

Referenced by ClipGradientComponent::Backprop(), RecurrentComponent::BackpropagateFnc(), kaldi::UnitTestCuMatrixApplyCeiling(), ParametricRelu::Update(), LstmProjected::Update(), and BlstmProjected::Update().

2374  {
2375 #if HAVE_CUDA == 1
2376  if (CuDevice::Instantiate().Enabled()) {
2377  CuTimer tim;
2378  dim3 dimGrid, dimBlock;
2379  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2380  &dimGrid, &dimBlock);
2381  cuda_apply_ceiling(dimGrid, dimBlock, data_, ceiling_val, Dim());
2382  CU_SAFE_CALL(cudaGetLastError());
2383  CuDevice::Instantiate().AccuProfile(__func__, tim);
2384  } else
2385 #endif
2386  {
2387  Mat().ApplyCeiling(ceiling_val);
2388  }
2389 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
void ApplyExp ( )

Definition at line 2337 of file cu-matrix.cc.

References data_.

Referenced by DiscriminativeComputation::Compute(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), and kaldi::UnitTestCuMatrixApplyExp().

2337  {
2338 #if HAVE_CUDA == 1
2339  if (CuDevice::Instantiate().Enabled()) {
2340  CuTimer tim;
2341  dim3 dimGrid, dimBlock;
2342  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2343  &dimGrid, &dimBlock);
2344  cuda_apply_exp(dimGrid, dimBlock, data_, Dim());
2345  CU_SAFE_CALL(cudaGetLastError());
2346  CuDevice::Instantiate().AccuProfile(__func__, tim);
2347  } else
2348 #endif
2349  {
2350  Mat().ApplyExp();
2351  }
2352 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
void ApplyFloor ( Real  floor_val)

Definition at line 2356 of file cu-matrix.cc.

References data_.

Referenced by ClipGradientComponent::Backprop(), RecurrentComponent::BackpropagateFnc(), DecodableAmNnetParallel::Compute(), DecodableNnet2Online::ComputeForFrame(), main(), StatisticsPoolingComponent::Propagate(), RectifiedLinearComponent::Propagate(), SoftmaxComponent::Propagate(), LogSoftmaxComponent::Propagate(), ClipGradientComponent::RepairGradients(), kaldi::TestCuMatrixCompObjfAndDeriv(), kaldi::UnitTestCuMatrixApplyFloor(), kaldi::UnitTestCuMatrixObjfDeriv(), ParametricRelu::Update(), LstmProjected::Update(), BlstmProjected::Update(), and NnetLogprobTask::~NnetLogprobTask().

2356  {
2357 #if HAVE_CUDA == 1
2358  if (CuDevice::Instantiate().Enabled()) {
2359  CuTimer tim;
2360  dim3 dimGrid, dimBlock;
2361  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2362  &dimGrid, &dimBlock);
2363  cuda_apply_floor(dimGrid, dimBlock, data_, floor_val, Dim());
2364  CU_SAFE_CALL(cudaGetLastError());
2365  CuDevice::Instantiate().AccuProfile(__func__, tim);
2366  } else
2367 #endif
2368  {
2369  Mat().ApplyFloor(floor_val);
2370  }
2371 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
void ApplyHeaviside ( )

For each element, sets x = (x > 0 ? 1.0 : 0.0).

See also Heaviside().

Definition at line 2298 of file cu-matrix.cc.

References data_.

Referenced by BackpropTruncationComponent::Backprop(), RectifiedLinearComponent::Backprop(), DropoutMaskComponent::Propagate(), DropoutComponent::Propagate(), SigmoidComponent::RepairGradients(), TanhComponent::RepairGradients(), ClipGradientComponent::RepairGradients(), kaldi::TestCuMatrixHeaviside(), and kaldi::UnitTestCuMatrixApplyHeaviside().

2298  {
2299 #if HAVE_CUDA == 1
2300  if (CuDevice::Instantiate().Enabled()) {
2301  CuTimer tim;
2302  dim3 dimGrid, dimBlock;
2303  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2304  &dimGrid, &dimBlock);
2305  cuda_apply_heaviside(dimGrid, dimBlock, data_, Dim());
2306  CU_SAFE_CALL(cudaGetLastError());
2307  CuDevice::Instantiate().AccuProfile(__func__, tim);
2308  } else
2309 #endif
2310  {
2311  Mat().ApplyHeaviside();
2312  }
2313 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
void ApplyLog ( )

Definition at line 630 of file cu-matrix.cc.

References data_.

Referenced by DecodableAmNnetParallel::Compute(), DecodableNnet2Online::ComputeForFrame(), Xent::Eval(), main(), kaldi::TestCuMatrixCompObjfAndDeriv(), kaldi::UnitTestCuMatrixApplyLog(), kaldi::UnitTestCuMatrixObjfDeriv(), and NnetLogprobTask::~NnetLogprobTask().

630  {
631  #if HAVE_CUDA == 1
632  if (CuDevice::Instantiate().Enabled()) {
633  if (num_rows_ == 0) return;
634  CuTimer tim;
635 
636  dim3 dimGrid, dimBlock;
637  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
638  &dimGrid, &dimBlock);
639 
640  cuda_apply_log(dimGrid, dimBlock, data_, Dim());
641  CU_SAFE_CALL(cudaGetLastError());
642 
643  CuDevice::Instantiate().AccuProfile(__func__, tim);
644  } else
645  #endif
646  {
647  Mat().ApplyLog();
648  }
649 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void ApplyLogSoftMaxPerRow ( const CuMatrixBase< Real > &  src)

LogSoftmax nonlinearity Y = LogSoftmax(X) : Yij = Xij - log(sum_k(e^Xik)), done to each row for each row, the max value is first subtracted for good numerical stability.

Definition at line 1586 of file cu-matrix.cc.

References MatrixBase< Real >::CopyFromMat(), CU1DBLOCK, data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::num_rows_, MatrixBase< Real >::NumRows(), MatrixBase< Real >::Row(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by LogSoftmaxComponent::Propagate(), kaldi::TestCuMatrixLogSoftmax(), and kaldi::UnitTestCuLogSoftmax().

1586  {
1587  KALDI_ASSERT(SameDim(*this, src));
1588 #if HAVE_CUDA == 1
1589  if (CuDevice::Instantiate().Enabled()) {
1590  CuTimer tim;
1591  size_t dimBlock = CU1DBLOCK;
1592  size_t dimGrid = src.num_rows_;
1593  cuda_log_softmax_reduce(dimGrid, dimBlock,
1594  data_, src.data_, Dim(), src.Stride());
1595  CU_SAFE_CALL(cudaGetLastError());
1596 
1597  CuDevice::Instantiate().AccuProfile(__func__, tim);
1598  } else
1599 #endif
1600  {
1601  MatrixBase<Real> &mat(this->Mat());
1602  mat.CopyFromMat(src.Mat());
1603  for(MatrixIndexT r = 0; r < mat.NumRows(); r++) {
1604  mat.Row(r).ApplyLogSoftMax();
1605  }
1606  }
1607 }
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
int32 MatrixIndexT
Definition: matrix-common.h:96
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void ApplyPow ( Real  power)

Apply power to the absolute value of each element.

If inlude_sign is true, the result will be multiplied with the sign of the input value. If the power is negative and the input to the power is zero, The output will be set zero. If include_sign is true, it will multiply the result by the sign of the input.

Definition at line 2262 of file cu-matrix.cc.

References data_.

Referenced by TanhComponent::Backprop(), kaldi::MeanVariance(), StatisticsExtractionComponent::Propagate(), StatisticsPoolingComponent::Propagate(), TanhComponent::StoreStats(), kaldi::UnitTestCuMatrixApplyPow(), kaldi::UnitTestCuMatrixSetRandn(), and kaldi::UnitTestCuMatrixSetRandUniform().

2262  {
2263 #if HAVE_CUDA == 1
2264  if (CuDevice::Instantiate().Enabled()) {
2265  CuTimer tim;
2266  dim3 dimGrid, dimBlock;
2267  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2268  &dimGrid, &dimBlock);
2269  cuda_apply_pow(dimGrid, dimBlock, data_, power, Dim());
2270  CU_SAFE_CALL(cudaGetLastError());
2271  CuDevice::Instantiate().AccuProfile(__func__, tim);
2272  } else
2273 #endif
2274  {
2275  Mat().ApplyPow(power);
2276  }
2277 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
void ApplyPowAbs ( Real  power,
bool  include_sign = false 
)

Definition at line 2280 of file cu-matrix.cc.

References data_.

Referenced by PowerComponent::Backprop(), PowerComponent::Propagate(), ClipGradientComponent::RepairGradients(), and kaldi::UnitTestCuMatrixApplyPowAbs().

2280  {
2281 #if HAVE_CUDA == 1
2282  if (CuDevice::Instantiate().Enabled()) {
2283  CuTimer tim;
2284  dim3 dimGrid, dimBlock;
2285  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2286  &dimGrid, &dimBlock);
2287  cuda_apply_pow_abs(dimGrid, dimBlock, data_, power, include_sign, Dim());
2288  CU_SAFE_CALL(cudaGetLastError());
2289  CuDevice::Instantiate().AccuProfile(__func__, tim);
2290  } else
2291 #endif
2292  {
2293  Mat().ApplyPowAbs(power, include_sign);
2294  }
2295 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
void ApplySoftMaxPerRow ( const CuMatrixBase< Real > &  src)

Softmax nonlinearity Y = Softmax(X) : Yij = e^Xij / sum_k(e^Xik), done to each row for each row, the max value is first subtracted for good numerical stability.

Definition at line 1563 of file cu-matrix.cc.

References MatrixBase< Real >::CopyFromMat(), CU1DBLOCK, data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::num_rows_, MatrixBase< Real >::NumRows(), MatrixBase< Real >::Row(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by SoftmaxComponent::Propagate(), Softmax::PropagateFnc(), HiddenSoftmax::PropagateFnc(), BlockSoftmax::PropagateFnc(), kaldi::TestCuMatrixSoftmax(), and kaldi::UnitTestCuSoftmax().

1563  {
1564  KALDI_ASSERT(SameDim(*this, src));
1565 #if HAVE_CUDA == 1
1566  if (CuDevice::Instantiate().Enabled()) {
1567  CuTimer tim;
1568  size_t dimBlock = CU1DBLOCK;
1569  size_t dimGrid = src.num_rows_;
1570  cuda_softmax_reduce(dimGrid, dimBlock, data_, src.data_, Dim(), src.Stride());
1571  CU_SAFE_CALL(cudaGetLastError());
1572 
1573  CuDevice::Instantiate().AccuProfile(__func__, tim);
1574  } else
1575  #endif
1576  {
1577  MatrixBase<Real> &mat(this->Mat());
1578  mat.CopyFromMat(src.Mat());
1579  for(MatrixIndexT r = 0; r < mat.NumRows(); r++) {
1580  mat.Row(r).ApplySoftMax();
1581  }
1582  }
1583 }
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
int32 MatrixIndexT
Definition: matrix-common.h:96
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
bool ApproxEqual ( const CuMatrixBase< Real > &  other,
float  tol = 0.01 
) const

True if ((*this)-other).FrobeniusNorm() <= tol * this->FrobeniusNorm()

Definition at line 1973 of file cu-matrix.cc.

References CuMatrixBase< Real >::AddMat(), and CuMatrixBase< Real >::FrobeniusNorm().

Referenced by kaldi::UnitTestCuCholesky(), and kaldi::UnitTestCuCopy().

1974  {
1975  CuMatrix<Real> diff(*this);
1976  diff.AddMat(-1.0, other);
1977  return (diff.FrobeniusNorm() <= tol * (*this).FrobeniusNorm());
1978 }
void Cholesky ( CuMatrixBase< Real > *  inv_cholesky = NULL)

This function does sets *this to the Cholesky factor of *this (i.e.

the C satisfying *this = C C^T), and sets "inv_cholesky" (if supplied) to its inverse. *this is treated as a symmetric matrix but only the lower triangle is accessed.

Definition at line 1824 of file cu-matrix.cc.

References TpMatrix< Real >::Cholesky(), CuMatrixBase< Real >::Cholesky(), CuSpMatrix< Real >::CopyFromMat(), CuTpMatrix< Real >::CopyFromTp(), CuMatrixBase< Real >::CopyFromTp(), TpMatrix< Real >::Invert(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTakeLower, kaldi::kTrans, kaldi::kUndefined, and CuMatrixBase< Real >::SymAddMat2().

Referenced by CuTpMatrix< Real >::Cholesky(), CuMatrixBase< Real >::Cholesky(), kaldi::TestCuMatrixCholesky(), kaldi::UnitTestCholesky(), and kaldi::UnitTestCuCholesky().

1824  {
1825  KALDI_ASSERT(this->NumRows() == this->NumCols());
1826  const int32 block_size = 64; // We can tune this.
1827 #if HAVE_CUDA == 1
1828  bool have_gpu = CuDevice::Instantiate().Enabled();
1829 #else
1830  bool have_gpu = false;
1831 #endif
1832  if (this->NumRows() == 0) {
1833  return;
1834  }
1835  if (inv_cholesky == NULL && this->NumRows() >= block_size * 2 && have_gpu) {
1836  // Even if the user did not request the inverse Cholesky, for large enough
1837  // matrices (on GPUs) it's going to be more efficient to compute it anyway
1838  // as the recursion depends on it.
1839  CuMatrix<Real> inv(this->NumRows(), this->NumCols());
1840  Cholesky(&inv);
1841  return;
1842  }
1843  if (this->NumRows() <= block_size || inv_cholesky == NULL || !have_gpu) {
1844  // Don't recurse: compute the Cholesky (and inverse Cholesky, if requested)
1845  // directly, on the CPu.
1846  int32 dim = this->NumRows();
1847  CuSpMatrix<Real> this_sp(dim, kUndefined);
1848  this_sp.CopyFromMat(*this, kTakeLower);
1849  SpMatrix<Real> this_sp_cpu(this_sp);
1850  TpMatrix<Real> C_cpu(dim);
1851  C_cpu.Cholesky(this_sp_cpu);
1852  CuTpMatrix<Real> C(C_cpu);
1853  this->CopyFromTp(C);
1854  if (inv_cholesky != NULL) {
1855  C_cpu.Invert(); // Get inverse Cholesky on CPU.
1856  C.CopyFromTp(C_cpu);
1857  inv_cholesky->CopyFromTp(C); // Copy inverse Cholesky from CPU.
1858  }
1859  return;
1860  }
1861  // At this point, if none of the other cases apply, we recurse.
1862 
1863  // The selection of dim1 is a heuristic. We could also just take half.
1864  int32 tot_dim = this->NumRows();
1865  int32 dim1;
1866  // Break it up into a whole number of blocks, for better memory alignment.
1867  // The line below, setting dim1 can be decided on a heuristic basis: from
1868  // the point of view of correctness, it can really be any value
1869  // 0 < dim1 < tot_dim.
1870  dim1 = block_size * std::max<int32>(1, tot_dim / (2 * block_size));
1871 
1872  int32 dim2 = tot_dim - dim1;
1873  CuSubMatrix<Real> this_11(*this, 0, dim1, 0, dim1),
1874  this_12(*this, 0, dim1, dim1, dim2),
1875  this_21(*this, dim1, dim2, 0, dim1),
1876  this_22(*this, dim1, dim2, dim1, dim2);
1877  CuSubMatrix<Real> inv_11(*inv_cholesky, 0, dim1, 0, dim1),
1878  inv_12(*inv_cholesky, 0, dim1, dim1, dim2),
1879  inv_21(*inv_cholesky, dim1, dim2, 0, dim1),
1880  inv_22(*inv_cholesky, dim1, dim2, dim1, dim2);
1881  /*
1882  Here is the math on block-wise Cholesky. We'll use a Matlab-like notation for blocks of a matrix,
1883  e.g. [ A B; C D ], and also for transposes, e.g. A' is the transpose of A.
1884  Let A be the input matrix; we want to compute both its Cholesky L and its inverse Cholesky, which
1885  we'll call M.
1886  OK. let L = [ L11 0; L21 L22 ] be the Cholesky factor of A.
1887  We have A = L L' = [ L11 0; L21 L22 ] * [ L11' L21'; 0 L22' ]. Multiplying it out,
1888  if A = [ A11 A12; A21 A22 ]; then
1889  A11 = L11 L11', A21 = L21 L11', A22 = L21 L21' + L22 L22', and A12 = A21'.
1890 
1891  We also want an expression for the inverse of L (we call this M).
1892  If M = [ M11 0; M21 M22 ], then it's not hard to see that
1893  M11 = inv(L11), M22 = inv(L22).
1894  We can work out M21 as follows. We know that [ L11 0; L21 L22 ] [ M11 0; M21 M22 ] = [ I 0; 0 I ].
1895  Considering the zero on the bottom of the rhs, we have: L21 M11 + L22 M21 = 0, which gives us:
1896  M21 = - L22^{-1} L21 M11 = - M22 L21 M11.
1897 
1898  Next, we want expressions for L21 and L22. From the equation A21 = L21 L11', we have:
1899  L21 = A21 inv(L11') = A21 M11'
1900  We can compute L22 and M22 recursively by doing Cholesky (and computing the inverse Cholesky)
1901  on the quantity T = (A22 - L21 L21'). [we give it the name T just for easy reference.]
1902 
1903  Computationally, we do this as follows:
1904  (1) Recurse to get L11 and M11.
1905  (2) Compute L21 = A21 M11'
1906  (3) Compute T = A22 - L21 L21'
1907  (4) Recurse on T to get L22 and M22.
1908  (5) Compute M21 = -M22 L21 M11.
1909  Next, we have to consider the in-place nature of the computation, since L overwrites A
1910  [M has its own storage, in "inv_cholesky"].
1911  We address this here:
1912  (1) is in-place [L11 replaces A11, M11 has its own storage].
1913  (2) L21 gets written where M21 belongs.
1914  (3) T replaces A22.
1915  (4) is in-place [L22 replaces T where A22 was, M22 has its own storage]
1916  (5):(a) we first compute the transpose of (L21 M11) is done in the upper part of A/L,
1917  where A12 or L12 would be. Define a temporary expression
1918  U = (L21 M11)' = M11' L21'; this goes where A12 or L12 would be.
1919  (b) copy L21 to where it should be, in *this.
1920  (c) Compute M21 = -M22 U', in the correct place for M21.
1921  (d) zero L12 and M12. */
1922 
1923  // (1) compute L11 and M11.
1924  this_11.Cholesky(&inv_11);
1925  // (2) compute L21 = A21 M11'. For now it's in the "wrong place", where M21 should be.
1926  inv_21.AddMatMat(1.0, this_21, kNoTrans, inv_11, kTrans, 0.0);
1927  // (3) compute T = A22 - L21 L21'. Note: only the lower triangle of T will be valid, but
1928  // that's OK because Cholesky will ignore the upper part.
1929  this_22.SymAddMat2(-1.0, inv_21, kNoTrans, 1.0);
1930  // (4) Recurse to compute L22 and M22.
1931  this_22.Cholesky(&inv_22);
1932  // (5)(a) compute U = M11' L21'. We use the storage of this_12 for this. Note that L21 is
1933  // currently where M21 should be.
1934  this_12.AddMatMat(1.0, inv_11, kTrans, inv_21, kTrans, 0.0);
1935  // (5)(b) copy L21 to where it should be.
1936  this_21.CopyFromMat(inv_21);
1937  // (5)(c) compute M21 = -M22 U'.
1938  inv_21.AddMatMat(-1.0, inv_22, kNoTrans, this_12, kTrans, 0.0);
1939  // (5)(d) zero L12 and M12.
1940  this_12.SetZero();
1941  inv_12.SetZero();
1942 }
friend class CuSubMatrix< Real >
Definition: cu-matrix.h:88
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
void Cholesky(CuMatrixBase< Real > *inv_cholesky=NULL)
This function does sets *this to the Cholesky factor of *this (i.e.
Definition: cu-matrix.cc:1824
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
void CopyFromTp(const CuTpMatrix< OtherReal > &M, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:275
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
friend class CuSpMatrix< Real >
Definition: cu-matrix.h:84
CuSubMatrix<Real> ColRange ( const MatrixIndexT  col_offset,
const MatrixIndexT  num_cols 
) const
inline

Definition at line 540 of file cu-matrix.h.

Referenced by StatisticsExtractionComponent::Backprop(), StatisticsPoolingComponent::Backprop(), MaxpoolingComponent::Backprop(), BlockAffineComponent::Backprop(), ConvolutionComponent::Backprop(), Convolutional1dComponent::Backprop(), MaxPoolingComponent::BackpropagateFnc(), AveragePoolingComponent::BackpropagateFnc(), AveragePooling2DComponent::BackpropagateFnc(), MaxPooling2DComponent::BackpropagateFnc(), BlockSoftmax::BackpropagateFnc(), ParallelComponent::BackpropagateFnc(), SentenceAveragingComponent::BackpropagateFnc(), Convolutional2DComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), ConvolutionalComponent::BackpropagateFnc(), LstmProjected::BackpropagateFnc(), BlstmProjected::BackpropagateFnc(), MultiTaskLoss::Eval(), LstmProjected::InfoGradient(), BlstmProjected::InfoGradient(), StatisticsExtractionComponent::Propagate(), StatisticsPoolingComponent::Propagate(), MaxpoolingComponent::Propagate(), BlockAffineComponent::Propagate(), ConvolutionComponent::Propagate(), Convolutional1dComponent::Propagate(), AveragePoolingComponent::PropagateFnc(), MaxPoolingComponent::PropagateFnc(), AveragePooling2DComponent::PropagateFnc(), MaxPooling2DComponent::PropagateFnc(), BlockSoftmax::PropagateFnc(), FramePoolingComponent::PropagateFnc(), ParallelComponent::PropagateFnc(), SentenceAveragingComponent::PropagateFnc(), ConvolutionalComponent::PropagateFnc(), Convolutional2DComponent::PropagateFnc(), MultiBasisComponent::PropagateFnc(), LstmProjected::PropagateFnc(), BlstmProjected::PropagateFnc(), kaldi::UnitTestLstmNonlinearity(), FramePoolingComponent::Update(), SentenceAveragingComponent::Update(), ConvolutionalComponent::Update(), Convolutional2DComponent::Update(), NaturalGradientRepeatedAffineComponent::Update(), ConvolutionComponent::Update(), and Convolutional1dComponent::Update().

541  {
542  return CuSubMatrix<Real>(*this, 0, num_rows_, col_offset, num_cols);
543  }
friend class CuSubMatrix< Real >
Definition: cu-matrix.h:88
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void CopyColFromVec ( const CuVectorBase< Real > &  v,
const MatrixIndexT  col 
)

Copy vector into specific column of matrix.

Definition at line 2241 of file cu-matrix.cc.

References CuVectorBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), KALDI_ASSERT, and CuVectorBase< Real >::Vec().

Referenced by kaldi::cu::NormalizePerRow(), StatisticsExtractionComponent::Propagate(), DropoutMaskComponent::Propagate(), NaturalGradientRepeatedAffineComponent::Update(), and TimeHeightConvolutionComponent::UpdateNaturalGradient().

2242  {
2243  KALDI_ASSERT(v.Dim() == num_rows_ &&
2244  static_cast<UnsignedMatrixIndexT>(col) <
2245  static_cast<UnsignedMatrixIndexT>(num_cols_));
2246 #if HAVE_CUDA == 1
2247  if (CuDevice::Instantiate().Enabled()) {
2248  CuTimer tim;
2249  cublas_copy(GetCublasHandle(),
2250  v.Dim(), v.Data(), 1,
2251  this->data_ + col, this->stride_);
2252  CU_SAFE_CALL(cudaGetLastError());
2253  CuDevice::Instantiate().AccuProfile(__func__, tim);
2254  } else
2255 #endif
2256  {
2257  Mat().CopyColFromVec(v.Vec(), col);
2258  }
2259 }
uint32 UnsignedMatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT stride_
Definition: cu-matrix.h:662
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void CopyCols ( const CuMatrixBase< Real > &  src,
const CuArray< MatrixIndexT > &  indexes 
)

Copies column r from column indexes[r] of src.

As a special case, if indexes[i] == -1, sets column i to zero indexes.size() must equal this->NumCols(), and src.NumRows() must equal this.NumRows()

Definition at line 2425 of file cu-matrix.cc.

References CuArray< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuArray< T >::Dim(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by SumGroupComponent::Backprop(), PermuteComponent::Backprop(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParamsInternal(), kaldi::nnet3::time_height_convolution::ConvolveForwardInternal(), ConvolutionComponent::InputToInputPatches(), MaxpoolingComponent::InputToInputPatches(), PermuteComponent::Propagate(), Convolutional1dComponent::Propagate(), ConvolutionalComponent::PropagateFnc(), and Convolutional1dComponent::Update().

2426  {
2427 #if HAVE_CUDA == 1
2428  if (CuDevice::Instantiate().Enabled()) {
2429  KALDI_ASSERT(indices.Dim() == NumCols());
2430  KALDI_ASSERT(NumRows() == src.NumRows());
2431  CuTimer tim;
2432  dim3 dimGrid, dimBlock;
2433  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2434  &dimGrid, &dimBlock);
2435  cuda_copy_cols(dimGrid, dimBlock, data_, src.Data(), indices.Data(), Dim(), src.Stride());
2436  CU_SAFE_CALL(cudaGetLastError());
2437  CuDevice::Instantiate().AccuProfile(__func__, tim);
2438  } else
2439 #endif
2440  {
2441  Mat().CopyCols(src.Mat(), indices.Data());
2442  }
2443 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void CopyColsFromVec ( const CuVectorBase< Real > &  v)

Copies vector into matrix, column-by-column.

Note that rv.Dim() must either equal NumRows()*NumCols() or NumRows(); this has two modes of operation.

Definition at line 2203 of file cu-matrix.cc.

References MatrixDim_::cols, CU1DBLOCK, CuVectorBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), KALDI_ERR, MatrixDim_::rows, and CuVectorBase< Real >::Vec().

Referenced by DropoutComponent::Propagate(), and kaldi::UnitTestCuMatrixCopyColsFromVec().

2203  {
2204 #if HAVE_CUDA == 1
2205  if (CuDevice::Instantiate().Enabled()) {
2206  CuTimer tim;
2207  if (rv.Dim() == num_rows_ * num_cols_) {
2208  // treat rv as a matrix of the size (num_cols x num_rows_)
2209  // and use transposed copy to fill *this
2210  // see CuMatrixBase<Real>::CopyFromMat() for more detail of the impl
2211  MatrixDim rv_dim = { num_cols_, num_rows_, num_rows_ };
2212  const int32 warpSize = 32;
2213  dim3 dimBlock(warpSize, CU1DBLOCK / warpSize);
2214  dim3 dimGrid(n_blocks(rv_dim.cols, warpSize),
2215  n_blocks(rv_dim.rows, warpSize));
2216  cuda_copy_from_mat_trans(dimGrid, dimBlock, data_, rv.Data(), Dim(),
2217  rv_dim);
2218  CU_SAFE_CALL(cudaGetLastError());
2219  } else if (rv.Dim() == num_rows_) {
2220  // use 2D block (8x32) and large enough grid to cover matrix *this
2221  // dimBlock.x need to be at least warpSize for coalesced memory access.
2222  const int32 warpSize = 32;
2223  dim3 dimBlock(warpSize, CU1DBLOCK / warpSize);
2224  dim3 dimGrid(n_blocks(num_cols_, dimBlock.x),
2225  n_blocks(num_rows_, dimBlock.y));
2226  cuda_copy_cols_from_vec(dimGrid, dimBlock, Data(), Dim(), rv.Data());
2227  CU_SAFE_CALL(cudaGetLastError());
2228  } else {
2229  KALDI_ERR<< "Wrong sized arguments";
2230  }
2231  CuDevice::Instantiate().AccuProfile(__func__, tim);
2232  } else
2233 #endif
2234  {
2235  Mat().CopyColsFromVec(rv.Vec());
2236  }
2237 }
int32_cuda rows
Definition: cu-matrixdim.h:53
Structure containing size of the matrix plus stride.
Definition: cu-matrixdim.h:52
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define KALDI_ERR
Definition: kaldi-error.h:127
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
int32_cuda cols
Definition: cu-matrixdim.h:54
::MatrixDim Dim() const
Definition: cu-matrix.h:201
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
const Real * Data() const
Return data pointer (const).
Definition: cu-matrix.h:621
void CopyFromBlock ( const CuBlockMatrix< Real > &  B,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 158 of file cu-matrix.cc.

References CuBlockMatrix< Real >::Block(), CuMatrixBase< Real >::CopyFromMat(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuBlockMatrix< Real >::NumBlocks(), CuBlockMatrix< Real >::NumCols(), CuMatrixBase< Real >::NumCols(), CuBlockMatrix< Real >::NumRows(), and CuMatrixBase< Real >::NumRows().

Referenced by CuMatrix< Real >::CuMatrix().

159  {
160  this->SetZero();
161  if (trans == kNoTrans) {
162  KALDI_ASSERT(NumRows() == B.NumRows() && NumCols() == B.NumCols());
163  int32 row_offset = 0, col_offset = 0;
164  for (int32 b = 0; b < B.NumBlocks(); b++) {
165  const CuMatrixBase<Real> &block = B.Block(b);
166  int32 num_rows = block.NumRows(), num_cols = block.NumCols();
167  CuSubMatrix<Real> this_block(*this, row_offset, num_rows,
168  col_offset, num_cols);
169  this_block.CopyFromMat(block);
170  row_offset += num_rows;
171  col_offset += num_cols;
172  }
173  KALDI_ASSERT(row_offset == NumRows() && col_offset == NumCols());
174  } else {
175  KALDI_ASSERT(NumRows() == B.NumCols() && NumCols() == B.NumRows());
176  int32 row_offset = 0, col_offset = 0;
177  for (int32 b = 0; b < B.NumBlocks(); b++) {
178  const CuMatrixBase<Real> &block = B.Block(b);
179  int32 num_rows = block.NumCols(), num_cols = block.NumRows();
180  CuSubMatrix<Real> this_block(*this, row_offset, num_rows,
181  col_offset, num_cols);
182  this_block.CopyFromMat(block, kTrans);
183  row_offset += num_rows;
184  col_offset += num_cols;
185  }
186  KALDI_ASSERT(row_offset == NumRows() && col_offset == NumCols());
187  }
188 }
friend class CuSubMatrix< Real >
Definition: cu-matrix.h:88
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
void SetZero()
Math operations, some calling kernels.
Definition: cu-matrix.cc:474
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void CopyFromGeneralMat ( const GeneralMatrix src,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 2806 of file cu-matrix.cc.

References CuSparseMatrix< Real >::CopyToMat(), SparseMatrix< Real >::CopyToMat(), GeneralMatrix::GetFullMatrix(), GeneralMatrix::GetMatrix(), GeneralMatrix::GetSparseMatrix(), KALDI_ERR, kaldi::kCompressedMatrix, kaldi::kFullMatrix, kaldi::kSparseMatrix, and GeneralMatrix::Type().

Referenced by NnetComputer::AcceptInputs(), and kaldi::nnet3::ComputeObjectiveFunction().

2807  {
2808  switch (src.Type()) {
2809  case kFullMatrix: {
2810  const Matrix<BaseFloat> &src_full_mat = src.GetFullMatrix();
2811  this->CopyFromMat(src_full_mat, trans);
2812  return;
2813  }
2814  case kCompressedMatrix: {
2815  Matrix<BaseFloat> mat;
2816  src.GetMatrix(&mat);
2817  this->CopyFromMat(mat, trans);
2818  return;
2819  }
2820  case kSparseMatrix: {
2821  const SparseMatrix<BaseFloat> &smat = src.GetSparseMatrix();
2822 #if HAVE_CUDA == 1
2823  if (CuDevice::Instantiate().Enabled()) {
2824  // only take this branch if we're actually using CUDA, or it would
2825  // entail a wasteful copy of the sparse matrix.
2826  CuSparseMatrix<BaseFloat> cu_smat(smat);
2827  cu_smat.CopyToMat(this, trans);
2828  return;
2829  }
2830 #endif
2831  smat.CopyToMat(&(Mat()), trans);
2832  return;
2833  }
2834  default:
2835  KALDI_ERR << "Invalid GeneralMatrix type.";
2836  }
2837 }
void CopyFromMat(const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:337
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
#define KALDI_ERR
Definition: kaldi-error.h:127
void CopyFromMat ( const MatrixBase< OtherReal > &  src,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 337 of file cu-matrix.cc.

Referenced by ElementwiseProductComponent::Backprop(), BackpropTruncationComponent::Backprop(), TanhComponent::Backprop(), PowerComponent::Backprop(), RectifiedLinearComponent::Backprop(), ScaleComponent::Backprop(), FixedScaleComponent::Backprop(), FixedBiasComponent::Backprop(), NoOpComponent::Backprop(), ClipGradientComponent::Backprop(), PerElementScaleComponent::Backprop(), PerElementOffsetComponent::Backprop(), BatchNormComponent::Backprop(), Softmax::BackpropagateFnc(), HiddenSoftmax::BackpropagateFnc(), BlockSoftmax::BackpropagateFnc(), ParallelComponent::BackpropagateFnc(), SentenceAveragingComponent::BackpropagateFnc(), LengthNormComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), Dropout::BackpropagateFnc(), AddShift::BackpropagateFnc(), Rescale::BackpropagateFnc(), BlockAffineComponent::BlockAffineComponent(), NnetOnlineComputer::Compute(), kaldi::nnet3::time_height_convolution::ConvolveForwardSimple(), CuMatrixBase< Real >::CopyFromBlock(), CuBlockMatrix< Real >::CopyFromMat(), GeneralMatrix::CopyToMat(), CuMatrix< Real >::CuMatrix(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), CuMatrixBase< Real >::DiffSoftmaxPerRow(), NnetComputer::ExecuteCommand(), NnetRescaler::FormatInput(), OnlineNaturalGradient::Init(), OnlinePreconditioner::Init(), NaturalGradientAffineComponent::Init(), AffineComponent::Init(), AffineComponentPreconditioned::Init(), AffineComponentPreconditionedOnline::Init(), Convolutional1dComponent::Init(), ConvolutionComponent::Init(), main(), kaldi::nnet2::NnetComputation(), kaldi::cu::NormalizePerRow(), CuMatrix< BaseFloat >::operator=(), kaldi::nnet2::PreconditionDirections(), OnlinePreconditionerSimple::PreconditionDirections(), OnlineNaturalGradientSimple::PreconditionDirections(), kaldi::nnet2::PreconditionDirectionsAlphaRescaled(), DropoutComponent::Propagate(), ElementwiseProductComponent::Propagate(), BackpropTruncationComponent::Propagate(), PowerComponent::Propagate(), RectifiedLinearComponent::Propagate(), ScaleComponent::Propagate(), NoOpComponent::Propagate(), SpliceMaxComponent::Propagate(), ClipGradientComponent::Propagate(), PerElementScaleComponent::Propagate(), PerElementOffsetComponent::Propagate(), FixedScaleComponent::Propagate(), FixedBiasComponent::Propagate(), AdditiveNoiseComponent::Propagate(), BatchNormComponent::Propagate(), KlHmm::PropagateFnc(), ParallelComponent::PropagateFnc(), LengthNormComponent::PropagateFnc(), Dropout::PropagateFnc(), LstmProjected::PropagateFnc(), AddShift::PropagateFnc(), Rescale::PropagateFnc(), BlstmProjected::PropagateFnc(), kaldi::nnet1::RandGauss(), CuRand< Real >::RandGaussian(), CuRand< Real >::RandUniform(), kaldi::nnet1::RandUniform(), OnlineNaturalGradient::ReorthogonalizeXt1(), OnlinePreconditioner::ReorthogonalizeXt1(), LinearTransform::SetLinearity(), AffineTransform::SetLinearity(), kaldi::TestCuFindRowMaxId(), kaldi::TestCuMatrixTransposeCross(), kaldi::nnet3::TestSimpleComponentPropagateProperties(), kaldi::TestSymInvertPosDef(), kaldi::UnitInvert(), kaldi::UnitTestCheck(), kaldi::UnitTestCholesky(), kaldi::UnitTestConstructor(), kaldi::UnitTestCopyFromMat(), kaldi::UnitTestCopySp(), kaldi::UnitTestCuCopy(), kaldi::UnitTestCuDiffLogSoftmax(), kaldi::UnitTestCuDiffNormalizePerRow(), kaldi::UnitTestCuDiffSigmoid(), kaldi::UnitTestCuDiffSoftmax(), kaldi::UnitTestCuDiffTanh(), kaldi::UnitTestCuDiffXent(), kaldi::UnitTestCuFindRowMaxId(), kaldi::UnitTestCuLogSoftmax(), kaldi::UnitTestCuMathNormalizePerRow(), kaldi::UnitTestCuMatrixAddMat(), kaldi::UnitTestCuMatrixAddMatDiagVec(), kaldi::UnitTestCuMatrixAddMatMat(), kaldi::UnitTestCuMatrixAddMatMatBatched(), kaldi::UnitTestCuMatrixAddMatMatElements(), kaldi::UnitTestCuMatrixAddVecToCols(), kaldi::UnitTestCuMatrixAddVecToRows(), kaldi::UnitTestCuMatrixCopyCross(), kaldi::UnitTestCuMatrixCopyCross2(), kaldi::UnitTestCuMatrixCopyFromMat(), kaldi::UnitTestCuMatrixDiffGroupPnorm(), kaldi::UnitTestCuMatrixDivElements(), kaldi::UnitTestCuMatrixDivRowsVec(), kaldi::UnitTestCuMatrixGroupMaxDeriv(), kaldi::UnitTestCuMatrixInvertElements(), kaldi::UnitTestCuMatrixMax(), kaldi::UnitTestCuMatrixMin(), kaldi::UnitTestCuMatrixMulColsVec(), kaldi::UnitTestCuMatrixMulElements(), kaldi::UnitTestCuMatrixMulRowsGroupMat(), kaldi::UnitTestCuMatrixMulRowsVec(), kaldi::UnitTestCuSigmoid(), kaldi::UnitTestCuSoftmax(), kaldi::UnitTestCuTanh(), kaldi::UnitTestCuVectorAddColSumMat(), kaldi::UnitTestCuVectorAddColSumMatLarge(), kaldi::UnitTestCuVectorAddRowSumMat(), kaldi::UnitTestCuVectorAddRowSumMatLarge(), kaldi::UnitTestInvert(), kaldi::UnitTestSwapCu2Cu(), kaldi::UnitTestSwapCu2M(), BlockAffineComponentPreconditioned::Update(), and MatrixRandomizer::Value().

338  {
339  CuMatrix<OtherReal> temp(src);
340  this->CopyFromMat(temp, trans);
341 }
void CopyFromMat(const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:337
void CopyFromMat ( const MatrixBase< Real > &  src,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 309 of file cu-matrix.cc.

References MatrixBase< Real >::Data(), data_, KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, MatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), and MatrixBase< Real >::Stride().

310  {
311 #if HAVE_CUDA == 1
312  if (CuDevice::Instantiate().Enabled()) {
313  if (trans == kNoTrans) {
314  KALDI_ASSERT(src.NumRows() == num_rows_ && src.NumCols() == num_cols_);
315  CuTimer tim;
316 
317  MatrixIndexT dst_pitch = stride_*sizeof(Real);
318  MatrixIndexT src_pitch = src.Stride()*sizeof(Real);
319  MatrixIndexT width = src.NumCols()*sizeof(Real);
320  CU_SAFE_CALL(cudaMemcpy2D(data_, dst_pitch, src.Data(), src_pitch,
321  width, src.NumRows(), cudaMemcpyHostToDevice));
322 
323  CuDevice::Instantiate().AccuProfile("CuMatrixBase::CopyFromMat(from CPU)", tim);
324  } else {
325  CuMatrix<Real> trans_mat(src); // Do the transpose on the GPU board.
326  this->CopyFromMat(trans_mat, kTrans);
327  }
328  } else
329 #endif
330  {
331  Mat().CopyFromMat(src, trans);
332  }
333 }
void CopyFromMat(const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:337
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT stride_
Definition: cu-matrix.h:662
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void CopyFromMat ( const CuMatrixBase< OtherReal > &  M,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 205 of file cu-matrix.cc.

References CU1DBLOCK, CuMatrixBase< Real >::Data(), data_, CuMatrixBase< Real >::data_, CuMatrixBase< Real >::Dim(), KALDI_ASSERT, kaldi::kNoTrans, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::num_rows_, CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

206  {
207  if (sizeof(Real) == sizeof(OtherReal) &&
208  static_cast<const void*>(M.Data()) ==
209  static_cast<const void*>(this->Data())) {
210  if (M.Data() == NULL)
211  return;
212  // CopyFromMat called on same data. Nothing to do (except sanity checks)
213  KALDI_ASSERT(trans == kNoTrans && M.NumRows() == NumRows() &&
214  M.NumCols() == NumCols() && M.Stride() == Stride());
215  return;
216  }
217 #if HAVE_CUDA == 1
218  if (CuDevice::Instantiate().Enabled()) {
219  if (trans == kNoTrans) {
220  KALDI_ASSERT(M.NumRows() == num_rows_ && M.NumCols() == num_cols_);
221  } else {
222  KALDI_ASSERT(M.NumCols() == num_rows_ && M.NumRows() == num_cols_);
223  }
224  if (M.num_rows_ == 0) return; // Nothing to do.
225  CuTimer tim;
226  if (sizeof(Real) == sizeof(OtherReal) && trans == kNoTrans ) {
227  MatrixIndexT dst_pitch = stride_ * sizeof(Real);
228  MatrixIndexT src_pitch = M.Stride() * sizeof(Real);
229  MatrixIndexT width = M.NumCols() * sizeof(Real);
230  CU_SAFE_CALL(cudaMemcpy2D(data_, dst_pitch, M.data_, src_pitch,
231  width, M.num_rows_, cudaMemcpyDeviceToDevice));
232  } else {
233  if (trans == kNoTrans) {
234  dim3 dimGrid, dimBlock;
235  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
236  &dimGrid, &dimBlock);
237  cuda_copy_from_mat(dimGrid, dimBlock, data_, M.data_, Dim(), M.Dim());
238  } else {
239  // 2D thread block with warps (blockDim.x) along the row-dim of input M.
240  // Each (8x32) thread block will transpose (32x32) data
241  const int32 warpSize = 32;
242  dim3 dimBlock(warpSize, CU1DBLOCK / warpSize);
243  dim3 dimGrid(n_blocks(M.NumCols(), warpSize),
244  n_blocks(M.NumRows(), warpSize));
245  cuda_copy_from_mat_trans(dimGrid, dimBlock, data_, M.data_, Dim(),
246  M.Dim());
247  }
248  CU_SAFE_CALL(cudaGetLastError());
249  }
250  CuDevice::Instantiate().AccuProfile("CuMatrixBase::CopyFromMat(from other CuMatrixBase)", tim);
251  } else
252 #endif
253  {
254  Mat().CopyFromMat(M.Mat(), trans);
255  }
256 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT stride_
Definition: cu-matrix.h:662
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
MatrixIndexT Stride() const
Definition: cu-matrix.h:197
const Real * Data() const
Return data pointer (const).
Definition: cu-matrix.h:621
void CopyFromSp ( const CuSpMatrix< Real > &  M)

Definition at line 353 of file cu-matrix.cc.

References CU2DBLOCK, CuPackedMatrix< Real >::Data(), data_, KALDI_ASSERT, CuSpMatrix< Real >::Mat(), and CuPackedMatrix< Real >::NumRows().

Referenced by CuMatrix< BaseFloat >::CuMatrix(), CuSpMatrix< Real >::Invert(), and kaldi::TestCuMatrixCopyFromSp().

353  {
354  KALDI_ASSERT(num_rows_ == M.NumRows() && num_cols_ == num_rows_);
355  if (num_rows_ == 0)
356  return;
357 #if HAVE_CUDA == 1
358  if (CuDevice::Instantiate().Enabled()) {
359  CuTimer tim;
360  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
361  dim3 dimGrid(n_blocks(NumRows(), CU2DBLOCK),
362  n_blocks(NumRows(), CU2DBLOCK));
363  cuda_copy_from_sp(dimGrid, dimBlock, M.Data(), data_, Dim());
364  CuDevice::Instantiate().AccuProfile("CuMatrix::CopyFromSp", tim);
365  } else
366 #endif
367  {
368  Mat().CopyFromSp(M.Mat());
369  }
370 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
template void CopyFromTp ( const CuTpMatrix< OtherReal > &  M,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 275 of file cu-matrix.cc.

References CU2DBLOCK, CuPackedMatrix< Real >::Data(), data_, KALDI_ASSERT, kaldi::kNoTrans, CuTpMatrix< Real >::Mat(), and CuPackedMatrix< Real >::NumRows().

Referenced by CuMatrixBase< Real >::Cholesky(), CuMatrix< BaseFloat >::CuMatrix(), CuTpMatrix< Real >::Invert(), kaldi::TestCuMatrixCopyFromTp(), and kaldi::UnitTestCuMatrixCopyFromTp().

276  {
277  KALDI_ASSERT(num_rows_ == M.NumRows() && num_cols_ == num_rows_);
278  if (num_rows_ == 0)
279  return;
280 #if HAVE_CUDA == 1
281  if (CuDevice::Instantiate().Enabled()) {
282  CuTimer tim;
283  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
284  dim3 dimGrid(n_blocks(num_rows_, CU2DBLOCK),
285  n_blocks(num_rows_, CU2DBLOCK));
286  if (trans == kNoTrans) {
287  cuda_copy_from_tp(dimGrid, dimBlock, data_, M.Data(), Dim());
288  } else {
289  cuda_copy_from_tp_trans(dimGrid, dimBlock, data_, M.Data(), Dim());
290  }
291  CuDevice::Instantiate().AccuProfile(__func__, tim);
292  } else
293 #endif
294  {
295  Mat().CopyFromTp(M.Mat(), trans);
296  }
297 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void CopyLowerToUpper ( )

Definition at line 2679 of file cu-matrix.cc.

References CU2DBLOCK, data_, and KALDI_ASSERT.

Referenced by kaldi::nnet2::PreconditionDirections(), kaldi::TestCuMatrixCopyLowerToUpper(), kaldi::UnitTestCuCholesky(), and kaldi::UnitTestCuMatrixCopyLowerToUpper().

2679  {
2681  if (num_rows_ == 0) return;
2682 #if HAVE_CUDA == 1
2683  if (CuDevice::Instantiate().Enabled()) {
2684  CuTimer tim;
2685  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2686  int32 dim = num_rows_;
2687  dim3 dimGrid(n_blocks(dim, CU2DBLOCK),
2688  n_blocks(dim, CU2DBLOCK));
2689  cuda_copy_low_upp(dimGrid, dimBlock, data_, Dim());
2690  CU_SAFE_CALL(cudaGetLastError());
2691  CuDevice::Instantiate().AccuProfile(__func__, tim);
2692  } else
2693 #endif
2694  {
2695  Mat().CopyLowerToUpper();
2696  }
2697 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void CopyRows ( const CuMatrixBase< Real > &  src,
const CuArray< MatrixIndexT > &  indexes 
)

Copies row r from row indexes[r] of src.

As a special case, if indexes[i] < 0, sets row i to zero. src.NumCols() must equal this.NumCols()

Definition at line 2447 of file cu-matrix.cc.

References CuArray< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuArray< T >::Dim(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by StatisticsExtractionComponent::Backprop(), SpliceComponent::Backprop(), NnetComputer::ExecuteCommand(), main(), DistributeComponent::Propagate(), and SpliceMaxComponent::Propagate().

2448  {
2449 #if HAVE_CUDA == 1
2450  if (CuDevice::Instantiate().Enabled()) {
2451  KALDI_ASSERT(static_cast<MatrixIndexT>(indices.Dim()) == NumRows());
2452  KALDI_ASSERT(NumCols() == src.NumCols());
2453 
2454  CuTimer tim;
2455  dim3 dimGrid, dimBlock;
2456  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2457  &dimGrid, &dimBlock);
2458  cuda_copy_rows(dimGrid, dimBlock, data_, src.Data(), indices.Data(),
2459  Dim(), src.Stride());
2460  CU_SAFE_CALL(cudaGetLastError());
2461  CuDevice::Instantiate().AccuProfile(__func__, tim);
2462  } else
2463 #endif
2464  {
2465  Mat().CopyRows(src.Mat(), indices.Data());
2466  }
2467 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void CopyRows ( const CuArray< const Real * > &  src)

Copies row r of this matrix from an array of floats at the location given by src[r], where src[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (the point is: the data it points to should be on the GPU if we're using a GPU, and on a CPU otherwise).

src.size() must equal this.NumRows(), and if any src[r] is NULL then this.Row(r) will be set to zero.

Definition at line 2492 of file cu-matrix.cc.

References CU2DBLOCK, CuArray< T >::Data(), data_, CuArray< T >::Dim(), and KALDI_ASSERT.

2492  {
2493  if (NumRows() == 0) return;
2494 #if HAVE_CUDA == 1
2495  if (CuDevice::Instantiate().Enabled()) {
2496  KALDI_ASSERT(static_cast<MatrixIndexT>(src.Dim()) == NumRows());
2497  CuTimer tim;
2498  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2499  dim3 dimGrid(n_blocks(num_cols_, CU2DBLOCK),
2500  n_blocks(num_rows_, CU2DBLOCK));
2501  cuda_copy_rows(dimGrid, dimBlock, data_, src.Data(), Dim());
2502  CU_SAFE_CALL(cudaGetLastError());
2503  CuDevice::Instantiate().AccuProfile(__func__, tim);
2504  } else
2505 #endif
2506  {
2507  Mat().CopyRows(src.Data());
2508  }
2509 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void CopyRowsFromVec ( const CuVectorBase< Real > &  v)

This function has two modes of operation.

If v.Dim() == NumRows() * NumCols(), then treats the vector as a row-by-row concatenation of a matrix and copies to *this. if v.Dim() == NumCols(), it sets each row of *this to a copy of v.

Definition at line 2135 of file cu-matrix.cc.

References CuVectorBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), KALDI_ERR, and CuVectorBase< Real >::Vec().

Referenced by kaldi::CuVectorUnitTestCopyFromMat(), NnetOnlineComputer::Flush(), NnetRescaler::FormatInput(), TimeHeightConvolutionComponent::Propagate(), ConstantComponent::Propagate(), RepeatedAffineComponent::Propagate(), AffineComponent::Propagate(), FixedAffineComponent::Propagate(), BlockAffineComponent::Propagate(), ConstantFunctionComponent::Propagate(), LinearTransform::SetParams(), AffineTransform::SetParams(), RecurrentComponent::SetParams(), ConvolutionalComponent::SetParams(), Convolutional2DComponent::SetParams(), LstmProjected::SetParams(), BlstmProjected::SetParams(), kaldi::UnitTestCuMatrixCopyRowsFromVec(), TimeHeightConvolutionComponent::UnVectorize(), RepeatedAffineComponent::UnVectorize(), AffineComponent::UnVectorize(), BlockAffineComponent::UnVectorize(), ConvolutionComponent::UnVectorize(), and LstmNonlinearityComponent::UnVectorize().

2135  {
2136 #if HAVE_CUDA == 1
2137  if (CuDevice::Instantiate().Enabled()) {
2138  CuTimer tim;
2139  if (v.Dim() == num_rows_*num_cols_) {
2140  if (stride_ == num_cols_) {
2141  const Real* v_data = v.Data();
2142  CU_SAFE_CALL(cudaMemcpy(data_, v_data,
2143  sizeof(Real)*num_rows_*num_cols_,
2144  cudaMemcpyDeviceToDevice));
2145  } else {
2146  CU_SAFE_CALL(cudaMemcpy2D(data_, stride_ * sizeof(Real), v.Data(),
2147  num_cols_*sizeof(Real), num_cols_*sizeof(Real),
2148  num_rows_,
2149  cudaMemcpyDeviceToDevice));
2150  }
2151  } else if (v.Dim() == num_cols_) {
2152  dim3 dimGrid, dimBlock;
2153  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2154  &dimGrid, &dimBlock);
2155  cuda_copy_rows_from_vec(dimGrid, dimBlock, data_, this->Dim(), v.Data());
2156  CU_SAFE_CALL(cudaGetLastError());
2157  } else {
2158  KALDI_ERR << "Wrong sized arguments";
2159  }
2160  CuDevice::Instantiate().AccuProfile(__func__, tim);
2161  } else
2162 #endif
2163  {
2164  Mat().CopyRowsFromVec(v.Vec());
2165  }
2166 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT stride_
Definition: cu-matrix.h:662
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define KALDI_ERR
Definition: kaldi-error.h:127
::MatrixDim Dim() const
Definition: cu-matrix.h:201
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void CopyRowsFromVec ( const VectorBase< Real > &  v)

Version of CopyRowsFromVec() that takes a CPU-based vector.

Definition at line 2169 of file cu-matrix.cc.

References VectorBase< Real >::Data(), data_, VectorBase< Real >::Dim(), and KALDI_ERR.

2169  {
2170 #if HAVE_CUDA == 1
2171  if (CuDevice::Instantiate().Enabled()) {
2172  CuTimer tim;
2173  if (v.Dim() == num_rows_*num_cols_) {
2174  if (stride_ == num_cols_) {
2175  const Real* v_data = v.Data();
2176  cudaMemcpy(data_, v_data, sizeof(Real)*num_rows_*num_cols_, cudaMemcpyHostToDevice);
2177  } else {
2178  const Real *v_data = v.Data();
2179  for (MatrixIndexT r = 0; r < num_rows_; r++) {
2180  Real *row_data = RowData(r);
2181  cudaMemcpy(row_data, v_data, sizeof(Real)*num_cols_, cudaMemcpyHostToDevice);
2182  v_data += num_cols_;
2183  }
2184  }
2185  } else if (v.Dim() == num_cols_) {
2186  dim3 dimGrid, dimBlock;
2187  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2188  &dimGrid, &dimBlock);
2189  cuda_copy_rows_from_vec(dimGrid, dimBlock, this->data_, this->Dim(), v.Data());
2190  CU_SAFE_CALL(cudaGetLastError());
2191  } else {
2192  KALDI_ERR << "Wrong sized arguments";
2193  }
2194  CuDevice::Instantiate().AccuProfile(__func__, tim);
2195  } else
2196 #endif
2197  {
2198  Mat().CopyRowsFromVec(v);
2199  }
2200 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT stride_
Definition: cu-matrix.h:662
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define KALDI_ERR
Definition: kaldi-error.h:127
::MatrixDim Dim() const
Definition: cu-matrix.h:201
const Real * RowData(MatrixIndexT r) const
Get raw row pointer (const).
Definition: cu-matrix.h:615
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
template void CopyToMat ( MatrixBase< OtherReal > *  dst,
MatrixTransposeType  trans = kNoTrans 
) const

Definition at line 413 of file cu-matrix.cc.

References MatrixBase< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyToMat(), MatrixBase< Real >::Data(), data_, KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, MatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), and MatrixBase< Real >::Stride().

Referenced by NnetComputerFromEg::Compute(), CuMatrixBase< Real >::CopyToMat(), kaldi::nnet1::MomentStatistics(), KlHmm::PropagateFnc(), kaldi::UnitInvert(), kaldi::UnitTestCholesky(), kaldi::UnitTestCuDiffLogSoftmax(), kaldi::UnitTestCuDiffSigmoid(), kaldi::UnitTestCuDiffSoftmax(), kaldi::UnitTestCuDiffTanh(), kaldi::UnitTestCuDiffXent(), kaldi::UnitTestCuMatrixAddMat(), kaldi::UnitTestCuMatrixAddMatMat(), kaldi::UnitTestCuMatrixAddVecToCols(), kaldi::UnitTestCuMatrixAddVecToRows(), kaldi::UnitTestCuMatrixAddVecVec(), kaldi::UnitTestCuMatrixDiffGroupPnorm(), kaldi::UnitTestCuMatrixDivElements(), kaldi::UnitTestCuMatrixDivRowsVec(), kaldi::UnitTestCuMatrixGroupMaxDeriv(), kaldi::UnitTestCuMatrixInvertElements(), kaldi::UnitTestCuMatrixMax(), kaldi::UnitTestCuMatrixMin(), kaldi::UnitTestCuMatrixMulColsVec(), kaldi::UnitTestCuMatrixMulElements(), kaldi::UnitTestCuMatrixMulRowsGroupMat(), kaldi::UnitTestCuMatrixMulRowsVec(), kaldi::UnitTestCuSigmoid(), kaldi::UnitTestCuTanh(), kaldi::UnitTestInvert(), kaldi::UnitTestMatrix(), UnitTestMatrixRandomizer(), kaldi::UnitTestSetZeroAboveDiag(), kaldi::UnitTestSwapCu2Cu(), and kaldi::UnitTestSwapCu2M().

414  {
415 #if HAVE_CUDA == 1
416  if (CuDevice::Instantiate().Enabled()) {
417  if (trans == kTrans || sizeof(OtherReal) != sizeof(Real)) {
418  CuMatrix<OtherReal> this_trans(*this, trans);
419  this_trans.CopyToMat(dst, kNoTrans);
420  } else {
421  KALDI_ASSERT(dst->NumRows() == NumRows() && dst->NumCols() == NumCols());
422  if (num_rows_ == 0) return;
423  CuTimer tim;
424 
425  MatrixIndexT src_pitch = stride_*sizeof(Real);
426  MatrixIndexT dst_pitch = dst->Stride()*sizeof(Real);
427  MatrixIndexT width = NumCols()*sizeof(Real);
428  CU_SAFE_CALL(cudaMemcpy2D(dst->Data(), dst_pitch, this->data_, src_pitch,
429  width, this->num_rows_, cudaMemcpyDeviceToHost));
430 
431  CuDevice::Instantiate().AccuProfile("CuMatrix::CopyToMatD2H", tim);
432  }
433  } else
434  #endif
435  {
436  dst->CopyFromMat(Mat(), trans);
437  }
438 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT stride_
Definition: cu-matrix.h:662
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void CopyToRows ( const CuArray< Real * > &  dst) const

For each row r of this matrix, copies it to the array of floats at the location given by dst[r], where dst[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (i.e.

it should point to memory on the GPU if we're using a GPU, or on the CPU otherwise). If dst[r] is NULL, does not copy anywhere. Requires that none of the memory regions pointed to by the pointers in "dst" overlap (e.g. none of the pointers should be the same).

Definition at line 2513 of file cu-matrix.cc.

References CU2DBLOCK, CuArray< T >::Data(), data_, CuArray< T >::Dim(), and KALDI_ASSERT.

Referenced by DistributeComponent::Backprop(), NnetComputer::ExecuteCommand(), and kaldi::UnitTestCuMatrixCopyToRows().

2513  {
2514  if (NumRows() == 0) return;
2515 #if HAVE_CUDA == 1
2516  if (CuDevice::Instantiate().Enabled()) {
2517  KALDI_ASSERT(static_cast<MatrixIndexT>(dst.Dim()) == NumRows());
2518 
2519  CuTimer tim;
2520  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2521  dim3 dimGrid(n_blocks(num_cols_, CU2DBLOCK),
2522  n_blocks(num_rows_, CU2DBLOCK));
2523  cuda_copy_to_rows(dimGrid, dimBlock, dst.Data(), data_, Dim());
2524  CU_SAFE_CALL(cudaGetLastError());
2525  CuDevice::Instantiate().AccuProfile(__func__, tim);
2526  } else
2527 #endif
2528  {
2529  Mat().CopyToRows(dst.Data());
2530  }
2531 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void CopyUpperToLower ( )

Definition at line 2700 of file cu-matrix.cc.

References CU2DBLOCK, data_, and KALDI_ASSERT.

Referenced by kaldi::TestCuMatrixCopyUpperToLower(), and kaldi::UnitTestCuMatrixCopyUpperToLower().

2700  {
2702  if (num_rows_ == 0) return;
2703 #if HAVE_CUDA == 1
2704  if (CuDevice::Instantiate().Enabled()) {
2705  CuTimer tim;
2706  int32 dim = this->num_rows_;
2707  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2708  dim3 dimGrid(n_blocks(dim, CU2DBLOCK),
2709  n_blocks(dim, CU2DBLOCK));
2710  cuda_copy_upp_low(dimGrid, dimBlock, data_, Dim());
2711  CU_SAFE_CALL(cudaGetLastError());
2712  CuDevice::Instantiate().AccuProfile(__func__, tim);
2713  } else
2714 #endif
2715  {
2716  Mat().CopyUpperToLower();
2717  }
2718 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
const Real* Data ( ) const
inline

Return data pointer (const).

Warning: may return a pointer to GPU memory. Use at your own risk.

Definition at line 621 of file cu-matrix.h.

Referenced by CuMatrixBase< Real >::AddCols(), CuVectorBase< Real >::AddColSumMat(), CuVectorBase< Real >::AddDiagMatMat(), CuMatrixBase< Real >::AddDiagVecMat(), CuSpMatrix< Real >::AddMat2(), CuMatrixBase< Real >::AddMatBlock(), CuMatrixBase< Real >::AddMatDiagVec(), CuBlockMatrix< Real >::AddMatMat(), CuMatrixBase< Real >::AddMatMatElements(), CuVectorBase< Real >::AddMatVec(), CuMatrixBase< Real >::AddRowRanges(), CuMatrixBase< Real >::AddRows(), RepeatedAffineComponent::Backprop(), BatchNormComponent::Backprop(), kaldi::cu::BackpropLstmNonlinearity(), CuMatrix< Real >::CompObjfAndDeriv(), DistributeComponent::ComputeInputPointers(), kaldi::cu::ComputeLstmNonlinearity(), kaldi::nnet3::time_height_convolution::ConvolveBackwardData(), kaldi::nnet3::time_height_convolution::ConvolveBackwardDataInternal(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParams(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParamsInternal(), kaldi::nnet3::time_height_convolution::ConvolveForward(), kaldi::nnet3::time_height_convolution::ConvolveForwardInternal(), kaldi::cu::Copy(), CuVectorBase< Real >::CopyColFromMat(), CuMatrixBase< Real >::CopyCols(), CuVectorBase< Real >::CopyDiagFromMat(), CuTpMatrix< Real >::CopyFromMat(), CuSpMatrix< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyRows(), CuVectorBase< Real >::CopyRowsFromMat(), VectorBase< Real >::CopyRowsFromMat(), CuSparseMatrix< Real >::CopyToMat(), CuMatrixBase< Real >::DiffGroupPnorm(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), kaldi::cu::DiffNormalizePerRow(), CuMatrixBase< Real >::DiffSoftmaxPerRow(), CuMatrixBase< Real >::EqualElementMask(), NnetComputer::GetPointers(), CuMatrixBase< Real >::GroupMaxDeriv(), CuTpMatrix< Real >::Invert(), kaldi::cu::NormalizePerRow(), TimeHeightConvolutionComponent::Propagate(), RepeatedAffineComponent::Propagate(), BatchNormComponent::Propagate(), CuRand< Real >::RandGaussian(), kaldi::cu::Randomize(), CuRand< Real >::RandUniform(), kaldi::cu::RegularizeL1(), CuBlockMatrix< Real >::SetCudaData(), kaldi::cu::Splice(), BatchNormComponent::StoreStats(), CuMatrixBase< Real >::SumColumnRanges(), CuMatrixBase< Real >::SymAddMat2(), kaldi::TraceMatMat(), kaldi::TraceMatSmat(), RepeatedAffineComponent::Update(), NaturalGradientRepeatedAffineComponent::Update(), TimeHeightConvolutionComponent::UpdateNaturalGradient(), and TimeHeightConvolutionComponent::UpdateSimple().

621 { return data_; }
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
Real* Data ( )
inline

Return data pointer.

Warning: may return a pointer to GPU memory. Use at your own risk.

Definition at line 624 of file cu-matrix.h.

624 { return data_; }
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
void DiffGroupPnorm ( const CuMatrixBase< Real > &  in_value,
const CuMatrixBase< Real > &  out_value,
const CuMatrixBase< Real > &  out_deriv,
Real  power 
)

Differentiate backward through the GroupPnorm function.

It is a combination of GroupPnormDeriv and MulRowsGroupMat.

Definition at line 826 of file cu-matrix.cc.

References CU1DBLOCK, CuMatrixBase< Real >::Data(), data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by PnormComponent::Backprop(), and kaldi::UnitTestCuMatrixDiffGroupPnorm().

829  {
830  KALDI_ASSERT(out_value.NumCols() > 0);
831  KALDI_ASSERT(out_value.NumCols() == out_deriv.NumCols());
832  int group_size = this->NumCols() / out_value.NumCols();
833  KALDI_ASSERT(this->NumCols() == out_value.NumCols() * group_size);
834 #if HAVE_CUDA == 1
835  if (CuDevice::Instantiate().Enabled()) {
836  CuTimer tim;
837  const int kWarpSize = 32;
838  dim3 dimBlock(kWarpSize, CU1DBLOCK / kWarpSize);
839  dim3 dimGrid(n_blocks(NumCols(), dimBlock.x),
840  n_blocks(NumRows(), dimBlock.y));
841  if (dimGrid.x * dimGrid.y > 1024) {
842  dimGrid.y = std::max(1024 / dimGrid.x, unsigned(1));
843  }
844  cuda_diff_group_pnorm(dimGrid, dimBlock, this->data_, in_value.Data(),
845  out_value.Data(), out_deriv.Data(), Dim(),
846  in_value.Stride(), out_value.Stride(),
847  out_deriv.Stride(), group_size, power);
848  CU_SAFE_CALL(cudaGetLastError());
849  CuDevice::Instantiate().AccuProfile(__func__, tim);
850  } else
851 #endif
852  {
853  Mat().GroupPnormDeriv(in_value.Mat(), out_value.Mat(), power);
854  MulRowsGroupMat(out_deriv);
855  }
856 }
void MulRowsGroupMat(const CuMatrixBase< Real > &src)
divide each row into src.NumCols() groups, and then scale i'th row's jth group of elements by src[i...
Definition: cu-matrix.cc:801
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void DiffLogSoftmaxPerRow ( const CuMatrixBase< Real > &  out_value,
const CuMatrixBase< Real > &  out_deriv 
)

Differentiate backward through the log softmax function.

Here, "out_value" is the log softmax output. Does, for each row i, *this(i) = out_deriv(i) - sum(out_deriv(i)) .* exp(out_value(i)) xxxx(i) is row-vector.

Definition at line 1748 of file cu-matrix.cc.

References CuVectorBase< Real >::AddColSumMat(), CuMatrixBase< Real >::AddMat(), CuMatrixBase< Real >::ApplyExp(), CuMatrixBase< Real >::CopyFromMat(), CU1DBLOCK, CuMatrixBase< Real >::Data(), data_, KALDI_ASSERT, kaldi::kNoTrans, CuMatrixBase< Real >::MulRowsVec(), CuMatrixBase< Real >::NumRows(), kaldi::SameDim(), CuMatrixBase< Real >::Scale(), and CuMatrixBase< Real >::Stride().

Referenced by LogSoftmaxComponent::Backprop(), and kaldi::UnitTestCuDiffLogSoftmax().

1749  {
1750 
1751  KALDI_ASSERT(SameDim(out_value, out_deriv) && SameDim(out_value, *this));
1752 
1753 #if HAVE_CUDA == 1
1754  if (CuDevice::Instantiate().Enabled()) {
1755  CuTimer tim;
1756 
1757  // CUDA thread layout: one thread block per matrix-row.
1758  dim3 dimBlock(CU1DBLOCK);
1759  dim3 dimGrid(num_rows_);
1760  cuda_diff_log_softmax(dimGrid, dimBlock, this->Dim(), out_value.Data(),
1761  out_value.Stride(), out_deriv.Data(),
1762  out_deriv.Stride(), data_);
1763  CU_SAFE_CALL(cudaGetLastError());
1764 
1765  CuDevice::Instantiate().AccuProfile(__func__, tim);
1766  } else
1767 #endif
1768  {
1769  /*
1770  Let the output be y, then
1771  y_i = x_i - log(sum_i exp(x_i))
1772  where x_i is the input to the component. The Jacobian matrix of this
1773  function is
1774  J = I - 1 exp(y^T)
1775  where 1 is a vector of ones. Let the derivative vector at the output be e,
1776  and at the input be d, then we have
1777  d = e - exp(y) Sum(e)
1778  d_i = e_i - exp(y_i) Sum(e)
1779  */
1780  const CuMatrixBase<Real> &Y(out_value), &E(out_deriv);
1781  CuMatrixBase<Real> &D(*this);
1782 
1783  D.CopyFromMat(Y);
1784  D.ApplyExp(); // exp(y)
1785  CuVector<Real> E_sum(D.NumRows()); // Initializes to zero
1786  E_sum.AddColSumMat(1.0, E); // Sum(e)
1787  D.MulRowsVec(E_sum); // exp(y) Sum(e)
1788  D.Scale(-1.0); // - exp(y) Sum(e)
1789  D.AddMat(1.0, E, kNoTrans); // e - exp(y_i) Sum(e)
1790  }
1791 }
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void DiffParametricRelu ( const CuMatrixBase< Real > &  value,
const CuMatrixBase< Real > &  diff,
const CuVectorBase< Real > &  alpha,
const CuVectorBase< Real > &  beta 
)

Differentiate backward through the parametric relu function.

Here the "value" is the Relu input. Does, element-by-element. *this = diff * (value > 0 ? alpha : beta)

Definition at line 1350 of file cu-matrix.cc.

References CU2DBLOCK, data_, CuVectorBase< Real >::data_, CuMatrixBase< Real >::data_, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::Stride(), and CuVectorBase< Real >::Vec().

Referenced by ParametricRelu::BackpropagateFnc().

1354  {
1355 #if HAVE_CUDA == 1
1356  if (CuDevice::Instantiate().Enabled()) {
1357  CuTimer tim;
1358 
1359  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
1360  dim3 dimGrid(n_blocks(num_cols_, CU2DBLOCK), n_blocks(num_rows_, CU2DBLOCK));
1361 
1362  cuda_diff_parametric_relu(dimGrid, dimBlock, data_, diff.data_, value.data_,
1363  Dim(), diff.Stride(), value.Stride(),
1364  alpha.data_, beta.data_);
1365  CU_SAFE_CALL(cudaGetLastError());
1366 
1367  CuDevice::Instantiate().AccuProfile(__func__, tim);
1368  } else
1369 #endif
1370  {
1371  // Do it on CPU,
1372  for (MatrixIndexT r = 0; r < NumRows(); r++) {
1373  for (MatrixIndexT c = 0; c < NumCols(); c++) {
1374  Real value_elem = value.Mat()(r,c);
1375  this->Mat()(r,c) = diff.Mat()(r,c) *
1376  (value_elem >= 0.0 ? alpha.Vec()(c) : beta.Vec()(c));
1377  }
1378  }
1379  }
1380 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:201
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void DiffSigmoid ( const CuMatrixBase< Real > &  value,
const CuMatrixBase< Real > &  diff 
)

Differentiate backward through the sigmoid function.

Here, "value" is the sigmoid output. Does, element-by-element, *this = diff * value * (1 - value).

Definition at line 1610 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by SigmoidComponent::Backprop(), Sigmoid::BackpropagateFnc(), and kaldi::UnitTestCuDiffSigmoid().

1611  {
1612  KALDI_ASSERT(SameDim(*this, value) && SameDim(*this, diff));
1613 #if HAVE_CUDA == 1
1614  if (CuDevice::Instantiate().Enabled()) {
1615  CuTimer tim;
1616  dim3 dimGrid, dimBlock;
1617  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1618  &dimGrid, &dimBlock);
1619  cuda_diff_sigmoid(dimGrid, dimBlock, data_, diff.data_, value.data_, Dim(), diff.Stride(), value.Stride());
1620  CU_SAFE_CALL(cudaGetLastError());
1621 
1622  CuDevice::Instantiate().AccuProfile(__func__, tim);
1623  } else
1624 #endif
1625  {
1626  Mat().DiffSigmoid(value.Mat(), diff.Mat());
1627  }
1628 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void DiffSoftmaxPerRow ( const CuMatrixBase< Real > &  value,
const CuMatrixBase< Real > &  diff 
)

Differentiate backward through the softmax function.

Here, "value" is the softmax output. Does, for each row i, *this(i) = diff(i) * diag(value(i)) - diff(i) * (value(i)^T * value(i)) xxxx(i) is row-vector; '*' and '-' are matrix operations.

Definition at line 1714 of file cu-matrix.cc.

References CuVectorBase< Real >::AddDiagMatMat(), CuMatrixBase< Real >::AddDiagVecMat(), CuMatrixBase< Real >::CopyFromMat(), CU1DBLOCK, CuMatrixBase< Real >::Data(), data_, KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::MulElements(), CuMatrixBase< Real >::NumRows(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by SoftmaxComponent::Backprop(), and kaldi::UnitTestCuDiffSoftmax().

1715  {
1716 
1717  KALDI_ASSERT(SameDim(value, diff) && SameDim(value, *this));
1718 
1719 #if HAVE_CUDA == 1
1720  if (CuDevice::Instantiate().Enabled()) {
1721  CuTimer tim;
1722 
1723  // CUDA thread layout: one thread block per matrix-row.
1724  dim3 dimBlock(CU1DBLOCK);
1725  dim3 dimGrid(num_rows_);
1726  cuda_diff_softmax(dimGrid, dimBlock, data_, this->Dim(), value.Data(),
1727  value.Stride(), diff.Data(), diff.Stride());
1728  CU_SAFE_CALL(cudaGetLastError());
1729 
1730  CuDevice::Instantiate().AccuProfile(__func__, tim);
1731  } else
1732 #endif
1733  {
1734  const CuMatrixBase<Real> &P(value), &E(diff);
1735  CuMatrixBase<Real> &D(*this);
1736 
1737  D.CopyFromMat(P);
1738  D.MulElements(E);
1739  // At this point, D = P .* E (in matlab notation)
1740  CuVector<Real> pe_vec(D.NumRows()); // For each row i, the dot product (p_t . e_t).
1741  pe_vec.AddDiagMatMat(1.0, P, kNoTrans, E, kTrans, 0.0);
1742 
1743  D.AddDiagVecMat(-1.0, pe_vec, P, kNoTrans, 1.0); // does D -= diag(pe_vec) * P.
1744  }
1745 }
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void DiffTanh ( const CuMatrixBase< Real > &  value,
const CuMatrixBase< Real > &  diff 
)

Differentiate backward through the tanh function.

Here, "value" is the tanh output. Does, element-by-element, *this = diff * (1 - value^2).

Definition at line 1655 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, CuMatrixBase< Real >::Mat(), and CuMatrixBase< Real >::Stride().

Referenced by TanhComponent::Backprop(), RecurrentComponent::BackpropagateFnc(), Tanh::BackpropagateFnc(), and kaldi::UnitTestCuDiffTanh().

1656  {
1657 #if HAVE_CUDA == 1
1658  if (CuDevice::Instantiate().Enabled()) {
1659  CuTimer tim;
1660  dim3 dimGrid, dimBlock;
1661  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1662  &dimGrid, &dimBlock);
1663  cuda_diff_tanh(dimGrid, dimBlock, data_, diff.data_, value.data_, Dim(), diff.Stride(), value.Stride());
1664  CU_SAFE_CALL(cudaGetLastError());
1665 
1666  CuDevice::Instantiate().AccuProfile(__func__, tim);
1667  } else
1668 #endif
1669  {
1670  Mat().DiffTanh(value.Mat(), diff.Mat());
1671  }
1672 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
void DiffXent ( const CuArray< int32 > &  tgt,
CuVector< Real > *  log_post_tgt 
)

Differentiate the block [softmax+cross-entropy] : dE/da = posterior_mat - target_mat, 'E' is error function, 'a' is activation on softmax input.

Interface: tgt ... index vector, encodes the matrix of targets net_out_or_diff ... before invocation net output, after diff dE/da log_post_tgt ... per-frame statistics for cross-entropy computations : log(sum_row(posterior_mat .* target_mat))

Definition at line 1794 of file cu-matrix.cc.

References CU2DBLOCK, CuArray< T >::Data(), data_, CuVectorBase< Real >::data_, CuArray< T >::Dim(), KALDI_ASSERT, kaldi::Log(), CuVector< Real >::Resize(), and CuVectorBase< Real >::Vec().

Referenced by kaldi::UnitTestCuDiffXent().

1795  {
1796 
1797  KALDI_ASSERT(tgt.Dim() == num_rows_);
1798  log_post_tgt->Resize(tgt.Dim());
1799 
1800 #if HAVE_CUDA == 1
1801  if (CuDevice::Instantiate().Enabled()) {
1802  CuTimer tim;
1803  dim3 dimBlock(1, CU2DBLOCK*8);
1804  dim3 dimGrid(1, n_blocks(tgt.Dim(), CU2DBLOCK*8));
1805  cuda_diff_xent(dimGrid, dimBlock, tgt.Data(), data_,
1806  log_post_tgt->data_, Dim());
1807 
1808  CuDevice::Instantiate().AccuProfile(__func__, tim);
1809  } else
1810 #endif
1811  {
1812  MatrixIndexT num_rows = num_rows_;
1813  for(int32 r = 0; r < num_rows; r++) {
1814  int32 col_tgt = tgt.Data()[r];
1815  Real &value = Mat()(r, col_tgt);
1816  log_post_tgt->Vec()(r) = Log(value);
1817  value -= 1.0;
1818  }
1819  }
1820 }
const T * Data() const
Get raw pointer.
Definition: cu-array.h:65
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
int32 MatrixIndexT
Definition: matrix-common.h:96
double Log(double x)
Definition: kaldi-math.h:100
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:62
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void DivElements ( const CuMatrixBase< Real > &  A)

Divide two matrices elementwise: C = A ./ A.

Definition at line 676 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by kaldi::UnitTestCuMatrixDivElements(), and kaldi::UnitTestCuMatrixSetMatMatDivMat().

676  {
677  #if HAVE_CUDA == 1
678  if (CuDevice::Instantiate().Enabled()) {
679  CuTimer tim;
680 
681  KALDI_ASSERT(num_cols_ == A.NumCols());
682  KALDI_ASSERT(num_rows_ == A.NumRows());
683 
684  dim3 dimGrid, dimBlock;
685  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
686  &dimGrid, &dimBlock);
687 
688  cuda_div_elements(dimGrid, dimBlock, data_, A.data_, Dim(), A.Stride());
689  CU_SAFE_CALL(cudaGetLastError());
690 
691  CuDevice::Instantiate().AccuProfile(__func__, tim);
692  } else
693  #endif
694  {
695  Mat().DivElements(A.Mat());
696  }
697 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void DivRowsVec ( const CuVectorBase< Real > &  div)

divide i'th row by scale[i]

Definition at line 884 of file cu-matrix.cc.

References data_, CuVectorBase< Real >::data_, CuVectorBase< Real >::Dim(), VectorBase< Real >::InvertElements(), KALDI_ASSERT, and CuVectorBase< Real >::Vec().

Referenced by StatisticsPoolingComponent::Backprop(), StatisticsPoolingComponent::Propagate(), kaldi::TestCuMatrixDivRowsVec(), and kaldi::UnitTestCuMatrixDivRowsVec().

884  {
885 #if HAVE_CUDA == 1
886  if (CuDevice::Instantiate().Enabled()) {
887  CuTimer tim;
888 
889  KALDI_ASSERT(div.Dim() == NumRows());
890 
891  dim3 dimGrid, dimBlock;
892  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
893  &dimGrid, &dimBlock);
894  // For large matrix we do more work per thread by limiting the
895  // the grid size to reduce the block launching overhead.
896  if (dimGrid.x * dimGrid.y > 1024) {
897  dimGrid.x = 1024 / dimGrid.y;
898  if (dimGrid.x == 0) {
899  dimGrid.x = 1;
900  }
901  }
902  cuda_div_rows_vec(dimGrid, dimBlock, data_, div.data_, Dim());
903  CU_SAFE_CALL(cudaGetLastError());
904 
905  CuDevice::Instantiate().AccuProfile(__func__, tim);
906  } else
907 #endif
908  {
909  Vector<Real> temp(div.Vec()); // will copy.
910  temp.InvertElements();
911  Mat().MulRowsVec(temp);
912  }
913 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void EqualElementMask ( const CuMatrixBase< Real > &  mat,
CuMatrix< Real > *  mask 
) const

Definition at line 3112 of file cu-matrix.cc.

References CuMatrixBase< Real >::Data(), data_, KALDI_ASSERT, kaldi::kSetZero, CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), CuMatrix< Real >::Resize(), and CuMatrixBase< Real >::Stride().

Referenced by MaxpoolingComponent::Backprop(), MaxPoolingComponent::BackpropagateFnc(), and MaxPooling2DComponent::BackpropagateFnc().

3112  {
3113  // Check the inputs:
3114  KALDI_ASSERT(mat.NumRows() == NumRows() && mat.NumCols() == NumCols());
3115  KALDI_ASSERT(mask != NULL);
3116  // Resizes the output matrix:
3117  mask->Resize(NumRows(), NumCols(), kSetZero);
3118 
3119 #if HAVE_CUDA == 1
3120  if (CuDevice::Instantiate().Enabled()) {
3121  CuTimer tim;
3122  dim3 dimGrid, dimBlock;
3123  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
3124  &dimGrid, &dimBlock);
3125  cuda_equal_element_mask(dimGrid, dimBlock, this->data_, mat.Data(),
3126  mask->Data(), this->Dim(), mat.Stride(),
3127  mask->Stride());
3128  CU_SAFE_CALL(cudaGetLastError());
3129 
3130  CuDevice::Instantiate().AccuProfile(__func__, tim);
3131  } else
3132 #endif
3133  {
3134  for (int32 r = 0; r < NumRows(); r++) {
3135  for (int32 c = 0; c < NumCols(); c++) {
3136  (*mask)(r,c) = ((*this)(r,c) == mat(r,c) ? 1.0 : 0.0);
3137  }
3138  }
3139  }
3140 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void FindRowMaxId ( CuArray< int32 > *  id) const

Find the id of the maximal element for each row.

Definition at line 1675 of file cu-matrix.cc.

References CU1DBLOCK, rnnlm::d, CuArray< T >::Data(), and data_.

Referenced by kaldi::nnet3::ComputeAccuracy(), NnetUpdater::ComputeTotAccuracy(), Xent::Eval(), kaldi::TestCuFindRowMaxId(), and kaldi::UnitTestCuFindRowMaxId().

1675  {
1676 #if HAVE_CUDA == 1
1677  if (CuDevice::Instantiate().Enabled()) {
1678  CuTimer tim;
1679  id->Resize(num_rows_);
1680  MatrixDim d = Dim();
1681 
1682  // CUDA thread layout: one thread block per matrix-row.
1683  dim3 dimBlock(CU1DBLOCK);
1684  dim3 dimGrid(num_rows_);
1685  cuda_find_row_max_id(dimGrid, dimBlock, data_, NULL, id->Data(), d);
1686  CU_SAFE_CALL(cudaGetLastError());
1687 
1688  // now we have the indices!
1689  CuDevice::Instantiate().AccuProfile(__func__, tim);
1690  } else
1691 #endif
1692  {
1693  // allocate index buffer
1694  id->Resize(num_rows_);
1695  id->Set(-1);
1696  // find maxima
1697  MatrixIndexT num_rows = num_rows_, num_cols = num_cols_;
1698  for (MatrixIndexT r = 0; r < num_rows; r++) {
1699  Real max = -1e21;
1700  int32 max_id = -1;
1701  const Real *row_data = Mat().RowData(r);
1702  for (MatrixIndexT c = 0; c < num_cols; c++) {
1703  if (max < row_data[c]) {
1704  max = row_data[c];
1705  max_id = c;
1706  }
1707  }
1708  id->Data()[r] = max_id;
1709  }
1710  }
1711 }
const T * Data() const
Get raw pointer.
Definition: cu-array.h:65
Structure containing size of the matrix plus stride.
Definition: cu-matrixdim.h:52
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
int32 MatrixIndexT
Definition: matrix-common.h:96
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:201
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
Real FrobeniusNorm ( ) const
inline

Definition at line 206 of file cu-matrix.h.

Referenced by CuMatrixBase< Real >::ApproxEqual(), and kaldi::UnitTestCuSparseMatrixFrobeniusNorm().

206 { return sqrt(TraceMatMat(*this, *this, kTrans)); }
friend Real TraceMatMat(const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, MatrixTransposeType trans)
Definition: cu-matrix.cc:1981
void GroupMax ( const CuMatrixBase< Real > &  src)

Apply the function y(i) = (max_{j = i*G}^{(i+1)*G-1} x_j where G = x.NumCols() / y.NumCols() must be an integer.

[note: y corresponds to *this and x to src, so src.NumCols() / this->NumCols() must be an integer.

Definition at line 1466 of file cu-matrix.cc.

References CU1DBLOCK, data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by MaxoutComponent::Propagate(), kaldi::TestCuMatrixGroupMax(), kaldi::TestCuMatrixGroupMaxAllGroupSizes(), and kaldi::UnitTestCuMatrixGroupMax().

1466  {
1467  int group_size = src.NumCols() / this->NumCols();
1468  KALDI_ASSERT(src.NumCols() == this->NumCols() * group_size &&
1469  this->NumRows() == src.NumRows());
1470 #if HAVE_CUDA == 1
1471  if (CuDevice::Instantiate().Enabled()) {
1472  CuTimer tim;
1473  // One thread block per row.
1474  // Use 2D block for small group size to simplify the calculation.
1475  // Each group is reduced by threads_per_group threads.
1476  // threads_per_group should be a power of 2 for fast tree reduction.
1477  // group size: 1 2 3 4 5 6 7 .. 12 13 .. 24 25 .. 48 ...
1478  // threads_per_group: 1 1 1 2 2 2 4 .. 4 8 .. 8 16 .. 16 ...
1479  int threads_per_group = CU1DBLOCK;
1480  while (threads_per_group * 3 / 2 >= group_size) {
1481  threads_per_group >>= 1;
1482  }
1483  if (group_size == 1) {
1484  threads_per_group = 1;
1485  }
1486  dim3 dimBlock(threads_per_group, CU1DBLOCK / threads_per_group);
1487  dim3 dimGrid(NumRows());
1488  cuda_group_max(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
1489  src.Stride(), group_size);
1490  CU_SAFE_CALL(cudaGetLastError());
1491  CuDevice::Instantiate().AccuProfile(__func__, tim);
1492  } else
1493 #endif
1494  {
1495  Mat().GroupMax(src.Mat());
1496  }
1497 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void GroupMaxDeriv ( const CuMatrixBase< Real > &  input,
const CuMatrixBase< Real > &  output 
)

Calculate derivatives for the GroupMax function above, where "input" is the input to the GroupMax function above (i.e.

the "src" variable), and "output" is the result of the computation (i.e. the "this" of that function call), and *this must have the same dimension as "input". Each element of *this will be set to 1 if the corresponding input equals the output of the group, and 0 otherwise. The equals the function derivative where it is defined (it's not defined where multiple inputs in the group are equal to the output).

Definition at line 859 of file cu-matrix.cc.

References CU2DBLOCK, CuMatrixBase< Real >::Data(), data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by MaxoutComponent::Backprop(), kaldi::TestCuMatrixGroupMaxDeriv(), and kaldi::UnitTestCuMatrixGroupMaxDeriv().

860  {
861  KALDI_ASSERT(src2.NumCols() > 0);
862  int group_size = this->NumCols() / src2.NumCols();
863  KALDI_ASSERT(this->NumCols() == src2.NumCols() * group_size);
864 #if HAVE_CUDA == 1
865  if (CuDevice::Instantiate().Enabled()) {
866  CuTimer tim;
867  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
868  dim3 dimGrid(n_blocks(NumCols(), CU2DBLOCK),
869  n_blocks(NumRows(), CU2DBLOCK));
870  cuda_calc_group_max_deriv(dimGrid, dimBlock, this->data_, src1.Data(),
871  src2.Data(), Dim(), src1.Stride(), src2.Stride(),
872  group_size);
873  CU_SAFE_CALL(cudaGetLastError());
874 
875  CuDevice::Instantiate().AccuProfile(__func__, tim);
876  } else
877 #endif
878  {
879  Mat().GroupMaxDeriv(src1.Mat(), src2.Mat());
880  }
881 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void GroupPnorm ( const CuMatrixBase< Real > &  src,
Real  pow 
)

Apply the function y(i) = (sum_{j = i*G}^{(i+1)*G-1} x_j ^ (power)) ^ (1 / p) where G = x.NumCols() / y.NumCols() must be an integer.

[note: y corresponds to *this and x to src, so src.NumCols() / this->NumCols() must be an integer.

Definition at line 1425 of file cu-matrix.cc.

References CU1DBLOCK, CU2DBLOCK, data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by PnormComponent::Propagate(), kaldi::TestCuMatrixDiffGroupPnorm(), kaldi::TestCuMatrixGroupPnorm(), and kaldi::UnitTestCuMatrixGroupPnorm().

1425  {
1426  int group_size = src.NumCols() / this->NumCols();
1427  KALDI_ASSERT(src.NumCols() == this->NumCols() * group_size &&
1428  this->NumRows() == src.NumRows());
1429 #if HAVE_CUDA == 1
1430  if (CuDevice::Instantiate().Enabled()) {
1431  CuTimer tim;
1432  if (power == Real(0) || power == Real(1) || power == Real(2)
1433  || power == std::numeric_limits<Real>::infinity()) {
1434  // One thread block per row.
1435  // Use 2D block for small group size to simplify the calculation
1436  // Each group is reduced by threads_per_group threads.
1437  // threads_per_group should be a power of 2 for fast tree reduction.
1438  int threads_per_group = CU1DBLOCK;
1439  while (threads_per_group * 3 / 2 >= group_size) {
1440  threads_per_group >>= 1;
1441  }
1442  if (group_size == 1) {
1443  threads_per_group = 1;
1444  }
1445  dim3 dimBlock(threads_per_group, CU1DBLOCK / threads_per_group);
1446  dim3 dimGrid(NumRows());
1447  cuda_group_spec_pnorm(dimGrid, dimBlock, this->data_, src.data_,
1448  this->Dim(), src.Stride(), group_size, power);
1449  } else {
1450  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
1451  dim3 dimGrid(n_blocks(NumCols(), CU2DBLOCK),
1452  n_blocks(NumRows(), CU2DBLOCK));
1453  cuda_group_pnorm(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
1454  src.Stride(), group_size, power);
1455  }
1456  CU_SAFE_CALL(cudaGetLastError());
1457  CuDevice::Instantiate().AccuProfile(__func__, tim);
1458  } else
1459 #endif
1460  {
1461  Mat().GroupPnorm(src.Mat(), power);
1462  }
1463 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void Heaviside ( const CuMatrixBase< Real > &  src)

Set each element to the Heaviside function of the corresponding element of "src", which we define as the function (x > 0 ? 1.0 : 0.0) [note: in general, there are different ways to deal with the situation when x==0.

]

Definition at line 2316 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by RectifiedLinearComponent::Backprop(), CuRand< Real >::BinarizeProbs(), Dropout::PropagateFnc(), RectifiedLinearComponent::StoreStats(), and kaldi::UnitTestCuMatrixHeaviside().

2316  {
2317  KALDI_ASSERT(SameDim(*this, src));
2318 #if HAVE_CUDA == 1
2319  if (CuDevice::Instantiate().Enabled()) {
2320  CuTimer tim;
2321  dim3 dimGrid, dimBlock;
2322  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2323  &dimGrid, &dimBlock);
2324  cuda_heaviside(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
2325  src.Stride());
2326  CU_SAFE_CALL(cudaGetLastError());
2327 
2328  CuDevice::Instantiate().AccuProfile(__func__, tim);
2329  } else
2330  #endif
2331  {
2332  Mat().Heaviside(src.Mat());
2333  }
2334 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void InvertElements ( )

invert the matrix by elements.

Definition at line 917 of file cu-matrix.cc.

References data_.

Referenced by kaldi::TestCuMatrixCompObjfAndDeriv(), NnetEnsembleTrainer::TrainOneMinibatch(), kaldi::UnitTestCuMatrixInvertElements(), and kaldi::UnitTestCuMatrixObjfDeriv().

917  {
918 #if HAVE_CUDA == 1
919  if (CuDevice::Instantiate().Enabled()) {
920  CuTimer tim;
921 
922  dim3 dimGrid, dimBlock;
923  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
924  &dimGrid, &dimBlock);
925 
926  cuda_invert_elements(dimGrid, dimBlock, data_, Dim());
927  CU_SAFE_CALL(cudaGetLastError());
928 
929  CuDevice::Instantiate().AccuProfile(__func__, tim);
930  } else
931 #endif
932  {
933  Mat().InvertElements();
934  }
935 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
bool IsUnit ( Real  tol = 0.001) const

Definition at line 593 of file cu-matrix.cc.

References KALDI_ASSERT, kaldi::kTrans, and kaldi::TraceMatMat().

Referenced by OnlinePreconditioner::InitOrthonormalSpecial(), kaldi::UnitTestCuMatrixSymInvertPosDef(), and kaldi::UnitTestCuSpMatrixInvert().

593  {
594  // want to return:
595  //FrobeniusNorm(*this - I) <= tol * NumRows(), i.e.:
596  //sqrt (trace((*this - I)(*this-I)) <= tol * NumRows()
597  // trace((*this - I)(*this - I)) <= tol * NumRows()
598  // trace(*this * *this) + trace(I) - 2 * trace(*this) <= tol * NumRows()
599  // trace(*this * *this) + dim - 2*this.Trace() <= tol * NumRows()
600  KALDI_ASSERT(this->NumRows() == this->NumCols());
601  return (TraceMatMat(*this, *this, kTrans) + this->NumRows() - 2.0 * this->Trace() <=
602  tol * this->NumRows());
603 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
friend Real TraceMatMat(const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, MatrixTransposeType trans)
Definition: cu-matrix.cc:1981
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
Real Trace(bool check_square=true) const
Return the trace. If check_square = true, will crash if matrix is not square.
Definition: cu-matrix.cc:2785
KALDI_DISALLOW_COPY_AND_ASSIGN ( CuMatrixBase< Real >  )
private
void Lookup ( const std::vector< Int32Pair > &  indexes,
Real *  output 
) const

Definition at line 3053 of file cu-matrix.cc.

References rnnlm::i, and KALDI_ASSERT.

Referenced by NnetDiscriminativeUpdater::LatticeComputations(), DiscriminativeComputation::LookupNnetOutput(), kaldi::TestCuMatrixLookup(), and kaldi::UnitTestCuMatrixLookup().

3054  {
3055  // Checks the dimension.
3056  MatrixIndexT num_rows = this->num_rows_, num_cols = this->num_cols_;
3057  for (int32 i = 0; i < indices.size(); ++i) {
3058  KALDI_ASSERT(indices[i].first < num_rows && indices[i].first >= 0 &&
3059  indices[i].second < num_cols && indices[i].second >= 0);
3060  }
3061  if (indices.size() == 0) return;
3062  KALDI_ASSERT(output != NULL);
3063 
3064 #if HAVE_CUDA == 1
3065  if (CuDevice::Instantiate().Enabled()) {
3066  CuArray<Int32Pair> cuda_indices(indices);
3067  Lookup(cuda_indices, output);
3068  } else
3069 #endif
3070  {
3071  for (int32 i = 0; i < indices.size(); i++) {
3072  output[i] = (*this)(indices[i].first, indices[i].second);
3073  }
3074  }
3075 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
int32 MatrixIndexT
Definition: matrix-common.h:96
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void Lookup(const std::vector< Int32Pair > &indexes, Real *output) const
Definition: cu-matrix.cc:3053
void Lookup ( const CuArray< Int32Pair > &  indexes,
Real *  output 
) const

Definition at line 3078 of file cu-matrix.cc.

References CuArray< T >::CopyToHost(), CU1DBLOCK, CuArray< T >::Data(), data_, CuArray< T >::Dim(), Int32Pair::first, rnnlm::i, KALDI_ASSERT, and Int32Pair::second.

3079  {
3080  int32 num_elements = indices.Dim();
3081  if (num_elements == 0) return;
3082  KALDI_ASSERT(output != NULL);
3083 
3084 #if HAVE_CUDA == 1
3085  if (CuDevice::Instantiate().Enabled()) {
3086  CuArray<Real> cuda_output(num_elements);
3087  CuTimer tim;
3088  dim3 dimBlock(CU1DBLOCK, 1);
3089  dim3 dimGrid(n_blocks(num_elements, CU1DBLOCK), 1);
3090 
3091  cuda_matrix_lookup(dimGrid, dimBlock, this->data_, this->Dim(),
3092  indices.Data(), num_elements, cuda_output.Data());
3093  CU_SAFE_CALL(cudaGetLastError());
3094 
3095  cuda_output.CopyToHost(output);
3096  CuDevice::Instantiate().AccuProfile(__func__, tim);
3097  } else
3098 #endif
3099  {
3100  MatrixIndexT num_rows = this->num_rows_, num_cols = this->num_cols_;
3101  const Int32Pair *index = indices.Data();
3102  for (int32 i = 0; i < num_elements; i++) {
3103  KALDI_ASSERT(index[i].first < num_rows && index[i].first >= 0 &&
3104  index[i].second < num_cols && index[i].second >= 0);
3105  output[i] = (*this)(index[i].first, index[i].second);
3106  }
3107  }
3108 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
int32 MatrixIndexT
Definition: matrix-common.h:96
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
int32_cuda second
Definition: cu-matrixdim.h:86
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
int32_cuda first
Definition: cu-matrixdim.h:85
const MatrixBase<Real>& Mat ( ) const
inline

Definition at line 630 of file cu-matrix.h.

Referenced by CuMatrixBase< Real >::AddCols(), CuVectorBase< Real >::AddColSumMat(), CuVectorBase< Real >::AddDiagMat2(), CuVectorBase< Real >::AddDiagMatMat(), CuMatrixBase< Real >::AddDiagVecMat(), CuMatrixBase< Real >::AddMat(), CuSpMatrix< Real >::AddMat2(), CuMatrixBase< Real >::AddMatBlocks(), CuMatrixBase< Real >::AddMatDiagVec(), CuMatrixBase< Real >::AddMatMat(), CuMatrixBase< Real >::AddMatMatElements(), CuVectorBase< Real >::AddMatVec(), CuMatrixBase< Real >::AddRows(), GeneralMatrix::AddToMat(), CuMatrixBase< Real >::ApplyLogSoftMaxPerRow(), CuMatrixBase< Real >::ApplySoftMaxPerRow(), kaldi::cu::BackpropLstmNonlinearity(), kaldi::cu::ComputeLstmNonlinearity(), kaldi::cu::Copy(), CuVectorBase< Real >::CopyColFromMat(), CuMatrixBase< Real >::CopyCols(), CuTpMatrix< Real >::CopyFromMat(), CuSpMatrix< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyRows(), CuVectorBase< Real >::CopyRowsFromMat(), VectorBase< Real >::CopyRowsFromMat(), CuSparseMatrix< Real >::CopyToMat(), GeneralMatrix::CopyToMat(), CuMatrixBase< Real >::DiffGroupPnorm(), CuMatrixBase< Real >::DiffParametricRelu(), CuMatrixBase< Real >::DiffSigmoid(), CuMatrixBase< Real >::DiffTanh(), CuMatrixBase< Real >::DivElements(), CuMatrixBase< Real >::GroupMax(), CuMatrixBase< Real >::GroupMaxDeriv(), CuMatrixBase< Real >::GroupPnorm(), CuMatrixBase< Real >::Heaviside(), CuMatrixBase< Real >::Max(), CuMatrixBase< Real >::Min(), CuMatrixBase< Real >::MulElements(), CuMatrixBase< Real >::MulRowsGroupMat(), CuMatrixBase< Real >::ParametricRelu(), CuRand< Real >::RandGaussian(), kaldi::cu::Randomize(), CuRand< Real >::RandUniform(), kaldi::cu::RegularizeL1(), CuMatrixBase< Real >::SetMatMatDivMat(), CuMatrixBase< Real >::Sigmoid(), CuMatrixBase< Real >::SoftHinge(), kaldi::cu::Splice(), CuMatrixBase< Real >::SymAddMat2(), CuMatrixBase< Real >::Tanh(), kaldi::TraceMatMat(), and kaldi::TraceMatSmat().

630  {
631  return *(reinterpret_cast<const MatrixBase<Real>* >(this));
632  }
MatrixBase<Real>& Mat ( )
inline

Definition at line 633 of file cu-matrix.h.

633  {
634  return *(reinterpret_cast<MatrixBase<Real>* >(this));
635  }
void Max ( const CuMatrixBase< Real > &  A)

Do, elementwise, *this = max(*this, A).

Definition at line 700 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by main(), SpliceMaxComponent::Propagate(), MaxpoolingComponent::Propagate(), kaldi::TestCuMatrixMax(), kaldi::UnitTestCuMatrixMax(), and kaldi::UnitTestCuMatrixReduceMax().

700  {
701  #if HAVE_CUDA == 1
702  if (CuDevice::Instantiate().Enabled()) {
703  CuTimer tim;
704 
705  KALDI_ASSERT(num_cols_ == A.NumCols());
706  KALDI_ASSERT(num_rows_ == A.NumRows());
707 
708  dim3 dimGrid, dimBlock;
709  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
710  &dimGrid, &dimBlock);
711 
712  cuda_max(dimGrid, dimBlock, data_, A.data_, Dim(), A.Stride());
713  CU_SAFE_CALL(cudaGetLastError());
714 
715  CuDevice::Instantiate().AccuProfile(__func__, tim);
716  } else
717  #endif
718  {
719  Mat().Max(A.Mat());
720  }
721 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
Real Max ( ) const

Definition at line 2743 of file cu-matrix.cc.

References CU1DBLOCK, CuVectorBase< Real >::Data(), data_, KALDI_ASSERT, kaldi::kUndefined, and CuVectorBase< Real >::Max().

2743  {
2744 #if HAVE_CUDA == 1
2745  if (CuDevice::Instantiate().Enabled()) {
2746  KALDI_ASSERT(num_rows_ > 0 && num_cols_ > 0);
2747  CuTimer tim;
2748 
2749  CuVector<Real> col_max(num_rows_, kUndefined);
2750  cuda_max_mat_cols(num_rows_, CU1DBLOCK, col_max.Data(), data_, Dim());
2751  Real ans = col_max.Max();
2752 
2753  CuDevice::Instantiate().AccuProfile(__func__, tim);
2754  return ans;
2755  } else
2756 #endif
2757  {
2758  return Mat().Max();
2759  }
2760 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void Min ( const CuMatrixBase< Real > &  A)

Do, elementwise, *this = min(*this, A).

Definition at line 725 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by main(), kaldi::TestCuMatrixMin(), kaldi::UnitTestCuMatrixMin(), and kaldi::UnitTestCuMatrixReduceMin().

725  {
726  #if HAVE_CUDA == 1
727  if (CuDevice::Instantiate().Enabled()) {
728  CuTimer tim;
729 
730  KALDI_ASSERT(num_cols_ == A.NumCols());
731  KALDI_ASSERT(num_rows_ == A.NumRows());
732 
733  dim3 dimGrid, dimBlock;
734  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
735  &dimGrid, &dimBlock);
736 
737  cuda_min(dimGrid, dimBlock, data_, A.data_, Dim(), A.Stride());
738  CU_SAFE_CALL(cudaGetLastError());
739 
740  CuDevice::Instantiate().AccuProfile(__func__, tim);
741  } else
742  #endif
743  {
744  Mat().Min(A.Mat());
745  }
746 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
Real Min ( ) const

Definition at line 2764 of file cu-matrix.cc.

References CU1DBLOCK, CuVectorBase< Real >::Data(), data_, KALDI_ASSERT, kaldi::kUndefined, and CuVectorBase< Real >::Min().

2764  {
2765 #if HAVE_CUDA == 1
2766  if (CuDevice::Instantiate().Enabled()) {
2767  KALDI_ASSERT(num_rows_ > 0 && num_cols_ > 0);
2768  CuTimer tim;
2769 
2770  CuVector<Real> col_min(num_rows_, kUndefined);
2771  cuda_min_mat_cols(num_rows_, CU1DBLOCK, col_min.Data(), data_, Dim());
2772  Real ans = col_min.Min();
2773 
2774  CuDevice::Instantiate().AccuProfile(__func__, tim);
2775  return ans;
2776  } else
2777 #endif
2778  {
2779  return Mat().Min();
2780  }
2781 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void MulColsVec ( const CuVectorBase< Real > &  scale)

scale i'th column by scale[i]

Definition at line 750 of file cu-matrix.cc.

References data_, CuVectorBase< Real >::data_, CuVectorBase< Real >::Dim(), KALDI_ASSERT, and CuVectorBase< Real >::Vec().

Referenced by PerElementScaleComponent::Backprop(), FixedScaleComponent::Backprop(), BatchNormComponent::Backprop(), Convolutional2DComponent::BackpropagateFnc(), Rescale::BackpropagateFnc(), PerElementScaleComponent::Propagate(), FixedScaleComponent::Propagate(), BatchNormComponent::Propagate(), Rescale::PropagateFnc(), kaldi::UnitTestCuMatrixAddMatDiagVec(), kaldi::UnitTestCuMatrixMulColsVec(), and NnetLogprobTask::~NnetLogprobTask().

750  {
751 #if HAVE_CUDA == 1
752  if (CuDevice::Instantiate().Enabled()) {
753  CuTimer tim;
754 
755  KALDI_ASSERT(scale.Dim() == NumCols());
756 
757 
758  dim3 dimGrid, dimBlock;
759  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
760  &dimGrid, &dimBlock);
761 
762  cuda_mul_cols_vec(dimGrid, dimBlock, data_, scale.data_, Dim());
763  CU_SAFE_CALL(cudaGetLastError());
764 
765 
766  CuDevice::Instantiate().AccuProfile(__func__, tim);
767  } else
768 #endif
769  {
770  Mat().MulColsVec(scale.Vec());
771  }
772 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void MulElements ( const CuMatrixBase< Real > &  A)

Multiply two matrices elementwise: C = C .* A.

Definition at line 652 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by ElementwiseProductComponent::Backprop(), BackpropTruncationComponent::Backprop(), SigmoidComponent::Backprop(), TanhComponent::Backprop(), PowerComponent::Backprop(), RectifiedLinearComponent::Backprop(), SoftHingeComponent::Backprop(), MaxpoolingComponent::Backprop(), HiddenSoftmax::BackpropagateFnc(), Dropout::BackpropagateFnc(), kaldi::nnet1::ComputeStdDev(), CuMatrixBase< Real >::DiffSoftmaxPerRow(), Xent::Eval(), Mse::Eval(), ElementwiseProductComponent::Propagate(), DropoutComponent::Propagate(), KlHmm::PropagateFnc(), LengthNormComponent::PropagateFnc(), Dropout::PropagateFnc(), ClipGradientComponent::RepairGradients(), NnetEnsembleTrainer::TrainOneMinibatch(), kaldi::UnitTestCuMatrixAddMatMatElements(), kaldi::UnitTestCuMatrixMulElements(), kaldi::nnet1::UnitTestLengthNorm(), ParametricRelu::Update(), AffineTransform::Update(), FramePoolingComponent::Update(), ConvolutionalComponent::Update(), Rescale::Update(), and NaturalGradientPerElementScaleComponent::Update().

652  {
653  #if HAVE_CUDA == 1
654  if (CuDevice::Instantiate().Enabled()) {
655  CuTimer tim;
656 
657  KALDI_ASSERT(num_cols_ == A.NumCols());
658  KALDI_ASSERT(num_rows_ == A.NumRows());
659 
660  dim3 dimGrid, dimBlock;
661  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
662  &dimGrid, &dimBlock);
663 
664  cuda_mul_elements(dimGrid, dimBlock, data_, A.data_, Dim(), A.Stride());
665  CU_SAFE_CALL(cudaGetLastError());
666 
667  CuDevice::Instantiate().AccuProfile(__func__, tim);
668  } else
669  #endif
670  {
671  Mat().MulElements(A.Mat());
672  }
673 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void MulRowsGroupMat ( const CuMatrixBase< Real > &  src)

divide each row into src.NumCols() groups, and then scale i'th row's jth group of elements by src[i, j].

Definition at line 801 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by MaxoutComponent::Backprop(), and kaldi::UnitTestCuMatrixMulRowsGroupMat().

801  {
802  KALDI_ASSERT(src.NumCols() > 0);
803 #if HAVE_CUDA == 1
804  if (CuDevice::Instantiate().Enabled()) {
805  CuTimer tim;
806  int group_size = this->NumCols() / src.NumCols();
807 
808  dim3 dimGrid, dimBlock;
809  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
810  &dimGrid, &dimBlock);
811 
812  cuda_mul_rows_group_mat(dimGrid, dimBlock, this->data_, src.data_,
813  this->Dim(), src.Stride(), group_size);
814  CU_SAFE_CALL(cudaGetLastError());
815 
816  CuDevice::Instantiate().AccuProfile(__func__, tim);
817  } else
818 #endif
819  {
820  Mat().MulRowsGroupMat(src.Mat());
821  }
822 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void MulRowsVec ( const CuVectorBase< Real > &  scale)

scale i'th row by scale[i]

Definition at line 777 of file cu-matrix.cc.

References data_, CuVectorBase< Real >::data_, CuVectorBase< Real >::Dim(), KALDI_ASSERT, and CuVectorBase< Real >::Vec().

Referenced by BackpropTruncationComponent::Backprop(), ClipGradientComponent::Backprop(), BlockSoftmax::BackpropagateFnc(), LengthNormComponent::BackpropagateFnc(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), kaldi::cu::DiffNormalizePerRow(), Xent::Eval(), Mse::Eval(), kaldi::nnet2::PreconditionDirections(), NnetChainTrainer::ProcessOutputs(), NnetDiscriminativeTrainer::ProcessOutputs(), LengthNormComponent::PropagateFnc(), OnlineNaturalGradient::ReorthogonalizeXt1(), OnlinePreconditioner::ReorthogonalizeXt1(), kaldi::UnitTestCuMatrixMulRowsVec(), AffineTransform::Update(), ConvolutionalComponent::Update(), and kaldi::nnet3::time_height_convolution::ZeroBlankRows().

777  {
778  #if HAVE_CUDA == 1
779  if (CuDevice::Instantiate().Enabled()) {
780  CuTimer tim;
781 
782  KALDI_ASSERT(scale.Dim() == NumRows());
783 
784  dim3 dimGrid, dimBlock;
785  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
786  &dimGrid, &dimBlock);
787 
788  cuda_mul_rows_vec(dimGrid, dimBlock, data_, scale.data_, Dim());
789  CU_SAFE_CALL(cudaGetLastError());
790 
791 
792  CuDevice::Instantiate().AccuProfile(__func__, tim);
793  } else
794  #endif
795  {
796  Mat().MulRowsVec(scale.Vec());
797  }
798 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT NumCols ( ) const
inline

Definition at line 196 of file cu-matrix.h.

Referenced by NnetComputer::AcceptInput(), NnetLdaStatsAccumulator::AccStatsFromOutput(), MatrixRandomizer::AddData(), CuVectorBase< Real >::AddDiagMatMat(), CuMatrixBase< Real >::AddDiagVecMat(), CuRand< Real >::AddGaussNoise(), CuMatrixBase< Real >::AddMat(), CuSpMatrix< Real >::AddMat2(), CuMatrixBase< Real >::AddMatBlock(), CuMatrixBase< Real >::AddMatBlocks(), CuMatrixBase< Real >::AddMatDiagVec(), CuBlockMatrix< Real >::AddMatMat(), CuMatrixBase< Real >::AddMatMat(), CuVectorBase< Real >::AddMatVec(), CuMatrixBase< Real >::AddRowRanges(), CuMatrixBase< Real >::AddRows(), CuVectorBase< Real >::AddRowSumMat(), NnetUpdater::Backprop(), StatisticsExtractionComponent::Backprop(), MaxoutComponent::Backprop(), MaxpoolingComponent::Backprop(), PnormComponent::Backprop(), NormalizeComponent::Backprop(), SigmoidComponent::Backprop(), RepeatedAffineComponent::Backprop(), TanhComponent::Backprop(), PowerComponent::Backprop(), RectifiedLinearComponent::Backprop(), SoftHingeComponent::Backprop(), ScaleComponent::Backprop(), SoftmaxComponent::Backprop(), LogSoftmaxComponent::Backprop(), SpliceComponent::Backprop(), SpliceMaxComponent::Backprop(), BlockAffineComponent::Backprop(), PermuteComponent::Backprop(), DctComponent::Backprop(), FixedLinearComponent::Backprop(), FixedAffineComponent::Backprop(), DropoutComponent::Backprop(), ConvolutionComponent::Backprop(), Convolutional1dComponent::Backprop(), LstmNonlinearityComponent::Backprop(), BatchNormComponent::Backprop(), CompositeComponent::Backprop(), Component::Backpropagate(), Splice::BackpropagateFnc(), SentenceAveragingComponent::BackpropagateFnc(), Convolutional2DComponent::BackpropagateFnc(), ConvolutionalComponent::BackpropagateFnc(), kaldi::cu::BackpropLstmNonlinearity(), CuRand< Real >::BinarizeProbs(), TimeHeightConvolutionComponent::Check(), ChunkInfo::CheckSize(), NnetComputerFromEg::Compute(), NnetOnlineComputer::Compute(), DiscriminativeComputation::Compute(), kaldi::nnet3::ComputeAccuracy(), NnetComputer::ComputeLastLayerDeriv(), kaldi::cu::ComputeLstmNonlinearity(), kaldi::nnet3::ComputeObjectiveFunction(), kaldi::nnet1::ComputeStdDev(), OnlineNaturalGradient::ComputeWt1(), OnlinePreconditioner::ComputeWt1(), ConvolutionComponent::ConvolutionComponent(), kaldi::nnet3::time_height_convolution::ConvolveBackwardData(), kaldi::nnet3::time_height_convolution::ConvolveBackwardDataInternal(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParams(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParamsInternal(), kaldi::nnet3::time_height_convolution::ConvolveForward(), kaldi::nnet3::time_height_convolution::ConvolveForwardInternal(), kaldi::cu::Copy(), CuVectorBase< Real >::CopyColFromMat(), CuVectorBase< Real >::CopyDiagFromMat(), CuMatrixBase< Real >::CopyFromBlock(), CuTpMatrix< Real >::CopyFromMat(), CuSpMatrix< Real >::CopyFromMat(), CuBlockMatrix< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyRows(), CuVectorBase< Real >::CopyRowsFromMat(), VectorBase< Real >::CopyRowsFromMat(), CuSparseMatrix< Real >::CopyToMat(), CuMatrix< Real >::CuMatrix(), kaldi::CuRandGaussianMatrixBaseSpeedTest(), kaldi::CuRandGaussianMatrixSpeedTest(), kaldi::CuRandUniformMatrixBaseSpeedTest(), kaldi::CuRandUniformMatrixSpeedTest(), CuSubVector< Real >::CuSubVector(), CuTpMatrix< Real >::CuTpMatrix(), CuMatrixBase< Real >::DiffGroupPnorm(), kaldi::cu::DiffNormalizePerRow(), CuMatrixBase< Real >::DivElements(), CuMatrixBase< Real >::EqualElementMask(), Xent::Eval(), Mse::Eval(), MultiTaskLoss::Eval(), NnetComputer::ExecuteCommand(), AffineTransform::GetGradient(), RecurrentComponent::GetGradient(), LstmProjected::GetGradient(), ConvolutionalComponent::GetGradient(), Convolutional2DComponent::GetGradient(), BlstmProjected::GetGradient(), Convolutional1dComponent::GetParameterDim(), AffineTransform::GetParams(), RecurrentComponent::GetParams(), LstmProjected::GetParams(), ConvolutionalComponent::GetParams(), Convolutional2DComponent::GetParams(), BlstmProjected::GetParams(), NnetComputer::GetPointers(), CuMatrixBase< Real >::GroupMax(), CuMatrixBase< Real >::GroupMaxDeriv(), CuMatrixBase< Real >::GroupPnorm(), ConvolutionComponent::InderivPatchesToInderiv(), MaxpoolingComponent::InderivPatchesToInderiv(), AffineComponent::Info(), AffineComponentPreconditioned::Info(), AffineComponentPreconditionedOnline::Info(), DctComponent::Info(), FixedLinearComponent::Info(), FixedAffineComponent::Info(), Convolutional1dComponent::Info(), LstmNonlinearityComponent::Info(), OnlineNaturalGradient::Init(), OnlinePreconditioner::Init(), NaturalGradientAffineComponent::Init(), AffineComponent::Init(), AffineComponentPreconditioned::Init(), AffineComponentPreconditionedOnline::Init(), FixedAffineComponent::Init(), Convolutional1dComponent::Init(), ConvolutionComponent::Init(), LinearTransform::InitData(), TimeHeightConvolutionComponent::InitFromConfig(), OnlineNaturalGradient::InitOrthonormalSpecial(), OnlinePreconditioner::InitOrthonormalSpecial(), RepeatedAffineComponent::InputDim(), AffineComponent::InputDim(), BlockAffineComponent::InputDim(), FixedLinearComponent::InputDim(), FixedAffineComponent::InputDim(), Convolutional1dComponent::InputDim(), LstmNonlinearityComponent::InputDim(), ConvolutionComponent::InputToInputPatches(), MaxpoolingComponent::InputToInputPatches(), NnetDiscriminativeUpdater::LatticeComputations(), main(), NnetComputer::MatrixStddev(), CuMatrixBase< Real >::Max(), kaldi::MeanVariance(), CuMatrixBase< Real >::Min(), kaldi::nnet1::MomentStatistics(), CuMatrixBase< Real >::MulElements(), CuMatrixBase< Real >::MulRowsGroupMat(), NnetComputer::NnetComputer(), kaldi::cu::NormalizePerRow(), TimeHeightConvolutionComponent::NumParameters(), BlockAffineComponent::NumParameters(), RepeatedAffineComponent::NumParameters(), ConvolutionComponent::NumParameters(), LstmNonlinearityComponent::NumParameters(), LinearTransform::NumParams(), AffineTransform::NumParams(), RecurrentComponent::NumParams(), LstmProjected::NumParams(), ConvolutionalComponent::NumParams(), Convolutional2DComponent::NumParams(), BlstmProjected::NumParams(), CuMatrix< BaseFloat >::operator=(), DctComponent::OutputDim(), LstmNonlinearityComponent::OutputDim(), CuMatrixBase< Real >::ParametricRelu(), TimeHeightConvolutionComponent::PerturbParams(), LstmNonlinearityComponent::PerturbParams(), kaldi::nnet2::PreconditionDirections(), OnlineNaturalGradient::PreconditionDirections(), OnlinePreconditioner::PreconditionDirections(), kaldi::nnet2::PreconditionDirectionsAlpha(), kaldi::nnet2::PreconditionDirectionsAlphaRescaled(), OnlineNaturalGradient::PreconditionDirectionsInternal(), OnlinePreconditioner::PreconditionDirectionsInternal(), kaldi::nnet3::PrintParameterStats(), NnetComputeProb::ProcessOutputs(), DistributeComponent::Propagate(), Component::Propagate(), ElementwiseProductComponent::Propagate(), NormalizeComponent::Propagate(), StatisticsExtractionComponent::Propagate(), TimeHeightConvolutionComponent::Propagate(), StatisticsPoolingComponent::Propagate(), RepeatedAffineComponent::Propagate(), DropoutMaskComponent::Propagate(), SpliceComponent::Propagate(), SumBlockComponent::Propagate(), BlockAffineComponent::Propagate(), DctComponent::Propagate(), DropoutComponent::Propagate(), AdditiveNoiseComponent::Propagate(), ConvolutionComponent::Propagate(), Convolutional1dComponent::Propagate(), BatchNormComponent::Propagate(), CompositeComponent::Propagate(), KlHmm::PropagateFnc(), FramePoolingComponent::PropagateFnc(), SentenceAveragingComponent::PropagateFnc(), ConvolutionalComponent::PropagateFnc(), Convolutional2DComponent::PropagateFnc(), Dropout::PropagateFnc(), kaldi::nnet1::RandGauss(), CuRand< Real >::RandGaussian(), kaldi::cu::Randomize(), CuRand< Real >::RandUniform(), kaldi::nnet1::RandUniform(), Rbm::RbmUpdate(), TimeHeightConvolutionComponent::Read(), LstmNonlinearityComponent::Read(), AffineTransform::ReadData(), LinearTransform::ReadData(), ConvolutionalComponent::ReadData(), Rbm::ReadData(), Rbm::Reconstruct(), kaldi::cu::RegularizeL1(), DctComponent::Reorder(), OnlineNaturalGradient::ReorthogonalizeXt1(), OnlinePreconditioner::ReorthogonalizeXt1(), Mse::Report(), NnetRescaler::RescaleComponent(), CuMatrixBase< BaseFloat >::Row(), OnlineNaturalGradient::SelfTest(), OnlinePreconditioner::SelfTest(), CuBlockMatrix< Real >::SetCudaData(), LinearTransform::SetLinearity(), AffineTransform::SetLinearity(), NaturalGradientRepeatedAffineComponent::SetNaturalGradientConfigs(), AffineTransform::SetParams(), RecurrentComponent::SetParams(), ConvolutionalComponent::SetParams(), Convolutional2DComponent::SetParams(), LstmProjected::SetParams(), BlstmProjected::SetParams(), kaldi::cu::Splice(), SigmoidComponent::StoreStats(), RectifiedLinearComponent::StoreStats(), BatchNormComponent::StoreStats(), NonlinearComponent::StoreStatsInternal(), PdfPrior::SubtractOnLogpost(), kaldi::TraceMatMat(), kaldi::TraceMatSmat(), kaldi::UnitTestCuMathCopy(), kaldi::UnitTestCuMathRandomize(), kaldi::UnitTestCuMathSplice(), kaldi::UnitTestCuTanh(), UnitTestMatrixRandomizer(), kaldi::nnet2::UnitTestNnetCompute(), kaldi::UnitTestSwapCu2Cu(), kaldi::UnitTestSwapCu2M(), TimeHeightConvolutionComponent::UnVectorize(), RepeatedAffineComponent::UnVectorize(), BlockAffineComponent::UnVectorize(), ConvolutionComponent::UnVectorize(), RecurrentComponent::Update(), ConvolutionalComponent::Update(), Convolutional2DComponent::Update(), RepeatedAffineComponent::Update(), NaturalGradientRepeatedAffineComponent::Update(), NaturalGradientAffineComponent::Update(), AffineComponentPreconditioned::Update(), AffineComponentPreconditionedOnline::Update(), BlockAffineComponentPreconditioned::Update(), ConvolutionComponent::Update(), Convolutional1dComponent::Update(), TimeHeightConvolutionComponent::UpdateNaturalGradient(), TimeHeightConvolutionComponent::UpdateSimple(), BlockAffineComponent::UpdateSimple(), NonlinearComponent::UpdateStats(), MatrixRandomizer::Value(), TimeHeightConvolutionComponent::Vectorize(), RepeatedAffineComponent::Vectorize(), BlockAffineComponent::Vectorize(), ConvolutionComponent::Vectorize(), DctComponent::Write(), and LstmNonlinearityComponent::Write().

196 { return num_cols_; }
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
MatrixIndexT NumRows ( ) const
inline

Dimensions.

Definition at line 195 of file cu-matrix.h.

Referenced by NnetComputer::AcceptInput(), NnetLdaStatsAccumulator::AccStatsFromOutput(), CuMatrixBase< Real >::AddCols(), CuVectorBase< Real >::AddColSumMat(), MatrixRandomizer::AddData(), CuVectorBase< Real >::AddDiagMatMat(), CuMatrixBase< Real >::AddDiagVecMat(), CuRand< Real >::AddGaussNoise(), CuMatrixBase< Real >::AddMat(), CuSpMatrix< Real >::AddMat2(), CuMatrixBase< Real >::AddMatBlock(), CuMatrixBase< Real >::AddMatBlocks(), CuMatrixBase< Real >::AddMatDiagVec(), CuBlockMatrix< Real >::AddMatMat(), CuMatrixBase< Real >::AddMatMat(), CuVectorBase< Real >::AddMatVec(), CuVectorBase< Real >::AddRowSumMat(), AffineComponent::AffineComponent(), DistributeComponent::Backprop(), NnetUpdater::Backprop(), ElementwiseProductComponent::Backprop(), StatisticsExtractionComponent::Backprop(), StatisticsPoolingComponent::Backprop(), MaxoutComponent::Backprop(), MaxpoolingComponent::Backprop(), BackpropTruncationComponent::Backprop(), PnormComponent::Backprop(), NormalizeComponent::Backprop(), SigmoidComponent::Backprop(), RepeatedAffineComponent::Backprop(), TanhComponent::Backprop(), PowerComponent::Backprop(), RectifiedLinearComponent::Backprop(), SoftHingeComponent::Backprop(), ScaleComponent::Backprop(), SoftmaxComponent::Backprop(), LogSoftmaxComponent::Backprop(), AffineComponent::Backprop(), SpliceComponent::Backprop(), SpliceMaxComponent::Backprop(), BlockAffineComponent::Backprop(), ClipGradientComponent::Backprop(), SumGroupComponent::Backprop(), PermuteComponent::Backprop(), DctComponent::Backprop(), FixedLinearComponent::Backprop(), FixedAffineComponent::Backprop(), DropoutComponent::Backprop(), ConvolutionComponent::Backprop(), Convolutional1dComponent::Backprop(), LstmNonlinearityComponent::Backprop(), BatchNormComponent::Backprop(), CompositeComponent::Backprop(), Component::Backpropagate(), HiddenSoftmax::BackpropagateFnc(), Splice::BackpropagateFnc(), SimpleSentenceAveragingComponent::BackpropagateFnc(), BlockSoftmax::BackpropagateFnc(), RecurrentComponent::BackpropagateFnc(), Convolutional2DComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), ConvolutionalComponent::BackpropagateFnc(), LstmProjected::BackpropagateFnc(), BlstmProjected::BackpropagateFnc(), kaldi::cu::BackpropLstmNonlinearity(), CuRand< Real >::BinarizeProbs(), BlockAffineComponent::BlockAffineComponent(), TimeHeightConvolutionComponent::Check(), ChunkInfo::CheckSize(), NnetComputerFromEg::Compute(), NnetOnlineComputer::Compute(), DiscriminativeComputation::Compute(), DecodableAmNnetParallel::Compute(), kaldi::nnet3::ComputeAccuracy(), NnetComputer::ComputeLastLayerDeriv(), kaldi::cu::ComputeLstmNonlinearity(), kaldi::nnet3::ComputeObjectiveFunction(), kaldi::nnet1::ComputeStdDev(), NnetUpdater::ComputeTotAccuracy(), Convolutional1dComponent::Convolutional1dComponent(), ConvolutionComponent::ConvolutionComponent(), kaldi::nnet3::time_height_convolution::ConvolveBackwardData(), kaldi::nnet3::time_height_convolution::ConvolveBackwardDataInternal(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParams(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParamsInternal(), kaldi::nnet3::time_height_convolution::ConvolveForward(), kaldi::nnet3::time_height_convolution::ConvolveForwardInternal(), kaldi::cu::Copy(), CuVectorBase< Real >::CopyColFromMat(), CuMatrixBase< Real >::CopyCols(), CuVectorBase< Real >::CopyDiagFromMat(), CuMatrixBase< Real >::CopyFromBlock(), CuTpMatrix< Real >::CopyFromMat(), CuSpMatrix< Real >::CopyFromMat(), CuBlockMatrix< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyFromMat(), CuVectorBase< Real >::CopyRowsFromMat(), VectorBase< Real >::CopyRowsFromMat(), CuSparseMatrix< Real >::CopyToMat(), CuMatrix< Real >::CuMatrix(), kaldi::CuRandGaussianMatrixBaseSpeedTest(), kaldi::CuRandGaussianMatrixSpeedTest(), kaldi::CuRandUniformMatrixBaseSpeedTest(), kaldi::CuRandUniformMatrixSpeedTest(), CuTpMatrix< Real >::CuTpMatrix(), DecodableAmNnet::DecodableAmNnet(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), kaldi::cu::DiffNormalizePerRow(), CuMatrixBase< Real >::DiffSoftmaxPerRow(), CuMatrixBase< Real >::DivElements(), CuMatrixBase< Real >::EqualElementMask(), Xent::Eval(), Mse::Eval(), MultiTaskLoss::Eval(), AffineTransform::GetGradient(), RecurrentComponent::GetGradient(), LstmProjected::GetGradient(), ConvolutionalComponent::GetGradient(), Convolutional2DComponent::GetGradient(), BlstmProjected::GetGradient(), Convolutional1dComponent::GetParameterDim(), AffineTransform::GetParams(), RecurrentComponent::GetParams(), LstmProjected::GetParams(), ConvolutionalComponent::GetParams(), Convolutional2DComponent::GetParams(), BlstmProjected::GetParams(), NnetComputer::GetPointers(), AffineComponentPreconditioned::GetScalingFactor(), CuMatrixBase< Real >::GroupMax(), CuMatrixBase< Real >::GroupPnorm(), AffineComponent::Info(), AffineComponentPreconditioned::Info(), AffineComponentPreconditionedOnline::Info(), DctComponent::Info(), FixedLinearComponent::Info(), FixedAffineComponent::Info(), ConvolutionComponent::Info(), Convolutional1dComponent::Info(), OnlineNaturalGradient::Init(), OnlinePreconditioner::Init(), NaturalGradientAffineComponent::Init(), AffineComponent::Init(), AffineComponentPreconditioned::Init(), AffineComponentPreconditionedOnline::Init(), FixedAffineComponent::Init(), Convolutional1dComponent::Init(), ConvolutionComponent::Init(), LinearTransform::InitData(), TimeHeightConvolutionComponent::InitFromConfig(), FixedAffineComponent::InitFromConfig(), FixedLinearComponent::InitFromString(), FixedAffineComponent::InitFromString(), OnlineNaturalGradient::InitOrthonormalSpecial(), OnlinePreconditioner::InitOrthonormalSpecial(), TimeHeightConvolutionComponent::InitUnit(), NnetDiscriminativeUpdater::LatticeComputations(), main(), NnetComputer::MatrixStddev(), CuMatrixBase< Real >::Max(), kaldi::MeanVariance(), CuMatrixBase< Real >::Min(), kaldi::nnet1::MomentStatistics(), CuMatrixBase< Real >::MulElements(), NnetComputer::NnetComputer(), kaldi::cu::NormalizePerRow(), DecodableAmNnetParallel::NumFramesReady(), TimeHeightConvolutionComponent::NumParameters(), BlockAffineComponent::NumParameters(), RepeatedAffineComponent::NumParameters(), ConvolutionComponent::NumParameters(), LstmNonlinearityComponent::NumParameters(), LinearTransform::NumParams(), AffineTransform::NumParams(), RecurrentComponent::NumParams(), LstmProjected::NumParams(), ConvolutionalComponent::NumParams(), Convolutional2DComponent::NumParams(), BlstmProjected::NumParams(), NnetLogprobTask::operator()(), CuMatrix< BaseFloat >::operator=(), RepeatedAffineComponent::OutputDim(), AffineComponent::OutputDim(), BlockAffineComponent::OutputDim(), DctComponent::OutputDim(), FixedLinearComponent::OutputDim(), FixedAffineComponent::OutputDim(), ConvolutionComponent::OutputDim(), Convolutional1dComponent::OutputDim(), CuMatrixBase< Real >::ParametricRelu(), TimeHeightConvolutionComponent::PerturbParams(), LstmNonlinearityComponent::PerturbParams(), kaldi::nnet2::PreconditionDirections(), OnlineNaturalGradient::PreconditionDirections(), OnlinePreconditioner::PreconditionDirections(), kaldi::nnet2::PreconditionDirectionsAlpha(), kaldi::nnet2::PreconditionDirectionsAlphaRescaled(), OnlineNaturalGradient::PreconditionDirectionsInternal(), OnlinePreconditioner::PreconditionDirectionsInternal(), kaldi::nnet3::PrintParameterStats(), DistributeComponent::Propagate(), NnetOnlineComputer::Propagate(), DropoutComponent::Propagate(), Component::Propagate(), ElementwiseProductComponent::Propagate(), StatisticsExtractionComponent::Propagate(), TimeHeightConvolutionComponent::Propagate(), StatisticsPoolingComponent::Propagate(), RepeatedAffineComponent::Propagate(), DropoutMaskComponent::Propagate(), SpliceComponent::Propagate(), SumBlockComponent::Propagate(), BlockAffineComponent::Propagate(), DctComponent::Propagate(), AdditiveNoiseComponent::Propagate(), ConvolutionComponent::Propagate(), Convolutional1dComponent::Propagate(), MaxpoolingComponent::Propagate(), BatchNormComponent::Propagate(), CompositeComponent::Propagate(), KlHmm::PropagateFnc(), SimpleSentenceAveragingComponent::PropagateFnc(), RecurrentComponent::PropagateFnc(), SentenceAveragingComponent::PropagateFnc(), LengthNormComponent::PropagateFnc(), ConvolutionalComponent::PropagateFnc(), Convolutional2DComponent::PropagateFnc(), Dropout::PropagateFnc(), LstmProjected::PropagateFnc(), BlstmProjected::PropagateFnc(), kaldi::nnet1::RandGauss(), CuRand< Real >::RandGaussian(), kaldi::cu::Randomize(), CuRand< Real >::RandUniform(), kaldi::nnet1::RandUniform(), Rbm::RbmUpdate(), TimeHeightConvolutionComponent::Read(), AffineTransform::ReadData(), LinearTransform::ReadData(), ConvolutionalComponent::ReadData(), Rbm::ReadData(), Rbm::Reconstruct(), kaldi::cu::RegularizeL1(), DctComponent::Reorder(), OnlineNaturalGradient::ReorthogonalizeXt1(), OnlinePreconditioner::ReorthogonalizeXt1(), ClipGradientComponent::RepairGradients(), NnetRescaler::RescaleComponent(), LstmProjected::ResetStreams(), OnlineNaturalGradient::SelfTest(), OnlinePreconditioner::SelfTest(), CuBlockMatrix< Real >::SetCudaData(), LinearTransform::SetLinearity(), AffineTransform::SetLinearity(), AffineTransform::SetParams(), RecurrentComponent::SetParams(), ConvolutionalComponent::SetParams(), Convolutional2DComponent::SetParams(), LstmProjected::SetParams(), BlstmProjected::SetParams(), AffineComponent::SetParams(), ConvolutionComponent::SetParams(), Convolutional1dComponent::SetParams(), kaldi::cu::Splice(), SigmoidComponent::StoreStats(), RectifiedLinearComponent::StoreStats(), BatchNormComponent::StoreStats(), NonlinearComponent::StoreStatsInternal(), CuMatrixBase< Real >::SumColumnRanges(), kaldi::TraceMatMat(), kaldi::TraceMatSmat(), kaldi::UnitTestCheck(), kaldi::UnitTestCuMathCopy(), kaldi::UnitTestCuMathRandomize(), kaldi::UnitTestCuMathSplice(), kaldi::UnitTestCuTanh(), UnitTestMatrixRandomizer(), kaldi::nnet2::UnitTestNnetCompute(), kaldi::nnet3::UnitTestPreconditionDirectionsOnline(), kaldi::nnet2::UnitTestPreconditionDirectionsOnline(), kaldi::nnet1::UnitTestSimpleSentenceAveragingComponent(), kaldi::UnitTestSwapCu2Cu(), kaldi::UnitTestSwapCu2M(), TimeHeightConvolutionComponent::UnVectorize(), RepeatedAffineComponent::UnVectorize(), BlockAffineComponent::UnVectorize(), ConvolutionComponent::UnVectorize(), LinearTransform::Update(), AffineTransform::Update(), SentenceAveragingComponent::Update(), RecurrentComponent::Update(), ConvolutionalComponent::Update(), Convolutional2DComponent::Update(), RepeatedAffineComponent::Update(), NaturalGradientRepeatedAffineComponent::Update(), NaturalGradientAffineComponent::Update(), AffineComponentPreconditioned::Update(), AffineComponentPreconditionedOnline::Update(), BlockAffineComponentPreconditioned::Update(), ConvolutionComponent::Update(), Convolutional1dComponent::Update(), TimeHeightConvolutionComponent::UpdateNaturalGradient(), TimeHeightConvolutionComponent::UpdateSimple(), BlockAffineComponent::UpdateSimple(), NonlinearComponent::UpdateStats(), TimeHeightConvolutionComponent::Vectorize(), RepeatedAffineComponent::Vectorize(), BlockAffineComponent::Vectorize(), ConvolutionComponent::Vectorize(), DctComponent::Write(), kaldi::nnet3::time_height_convolution::ZeroBlankRows(), and NnetLogprobTask::~NnetLogprobTask().

195 { return num_rows_; }
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
CuValue<Real> operator() ( MatrixIndexT  r,
MatrixIndexT  c 
)
inline

Definition at line 557 of file cu-matrix.h.

557  {
558  KALDI_PARANOID_ASSERT(static_cast<UnsignedMatrixIndexT>(r) <
559  static_cast<UnsignedMatrixIndexT>(num_rows_) &&
560  static_cast<UnsignedMatrixIndexT>(c) <
561  static_cast<UnsignedMatrixIndexT>(num_cols_));
562  return CuValue<Real>(data_ + r * stride_ + c);
563  }
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
MatrixIndexT stride_
Definition: cu-matrix.h:662
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define KALDI_PARANOID_ASSERT(cond)
Definition: kaldi-error.h:182
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
Real operator() ( MatrixIndexT  r,
MatrixIndexT  c 
) const
inline

Definition at line 565 of file cu-matrix.h.

565  {
566  KALDI_PARANOID_ASSERT(static_cast<UnsignedMatrixIndexT>(r) <
567  static_cast<UnsignedMatrixIndexT>(num_rows_) &&
568  static_cast<UnsignedMatrixIndexT>(c) <
569  static_cast<UnsignedMatrixIndexT>(num_cols_));
570  return CuValue<Real>(data_ + r * stride_ + c); // will be casted to Real.
571  }
MatrixIndexT num_cols_
Definition: cu-matrix.h:660
MatrixIndexT stride_
Definition: cu-matrix.h:662
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define KALDI_PARANOID_ASSERT(cond)
Definition: kaldi-error.h:182
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
void ParametricRelu ( const CuMatrixBase< Real > &  src,
const CuVectorBase< Real > &  alpha,
const CuVectorBase< Real > &  beta 
)

Compute the parametric rectified linear unit function; element by element, *this = src * (src > 0 ? alpha : beta)

Definition at line 1316 of file cu-matrix.cc.

References CU2DBLOCK, data_, CuVectorBase< Real >::data_, CuMatrixBase< Real >::data_, CuVectorBase< Real >::Dim(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), CuMatrixBase< Real >::Stride(), and CuVectorBase< Real >::Vec().

Referenced by ParametricRelu::PropagateFnc().

1319  {
1320  KALDI_ASSERT(src.NumRows() == this->NumRows());
1321  KALDI_ASSERT(src.NumCols() == this->NumCols());
1322  KALDI_ASSERT(alpha.Dim() == this->NumCols());
1323  KALDI_ASSERT(beta.Dim() == this->NumCols());
1324 #if HAVE_CUDA == 1
1325  if (CuDevice::Instantiate().Enabled()) {
1326  CuTimer tim;
1327 
1328  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
1329  dim3 dimGrid(n_blocks(src.NumCols(), CU2DBLOCK), n_blocks(src.NumRows(), CU2DBLOCK));
1330 
1331  cuda_parametric_relu(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
1332  src.Stride(), alpha.data_, beta.data_);
1333  CU_SAFE_CALL(cudaGetLastError());
1334 
1335  CuDevice::Instantiate().AccuProfile(__func__, tim);
1336  } else
1337 #endif
1338  {
1339  // Do it on CPU,
1340  for (MatrixIndexT r = 0; r < NumRows(); r++) {
1341  for (MatrixIndexT c = 0; c < NumCols(); c++) {
1342  Real src_elem = src.Mat()(r,c);
1343  this->Mat()(r,c) = src_elem * (src_elem >= 0.0 ? alpha.Vec()(c) : beta.Vec()(c));
1344  }
1345  }
1346  }
1347 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
const CuSubVector<Real> Row ( MatrixIndexT  i) const
inline
CuSubVector<Real> Row ( MatrixIndexT  i)
inline

Definition at line 551 of file cu-matrix.h.

551  {
552  KALDI_ASSERT(static_cast<UnsignedMatrixIndexT>(i) <
553  static_cast<UnsignedMatrixIndexT>(num_rows_));
554  return CuSubVector<Real>(data_ + (i * stride_), NumCols());
555  }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT stride_
Definition: cu-matrix.h:662
friend class CuSubVector< Real >
Definition: cu-matrix.h:90
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:661
const Real* RowData ( MatrixIndexT  r) const
inline

Get raw row pointer (const).

Warning: may return a pointer to GPU memory. Use at your own risk.

Definition at line 615 of file cu-matrix.h.

Referenced by CuVectorBase< Real >::CopyRowsFromMat(), VectorBase< Real >::CopyRowsFromMat(), CuSubVector< Real >::CuSubVector(), RectifiedLinearComponent::RepairGradients(), kaldi::TestCuMatrixAddRows2(), kaldi::TestCuMatrixAddToRows(), kaldi::TestCuMatrixCopyRows2(), kaldi::TestCuMatrixCopyToRows(), kaldi::UnitTestCuMatrixAddRows(), and kaldi::UnitTestCuMatrixCopyRows().

615 { return data_ + r * stride_; }
MatrixIndexT stride_
Definition: cu-matrix.h:662
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
Real* RowData ( MatrixIndexT  r)
inline

Get raw row pointer.

Warning: may return a pointer to GPU memory. Use at your own risk.

Definition at line 618 of file cu-matrix.h.

618 { return data_ + r * stride_; }
MatrixIndexT stride_
Definition: cu-matrix.h:662
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:652
void Scale ( Real  value)

Definition at line 608 of file cu-matrix.cc.

References data_.

Referenced by DecodableNnetLoopedOnlineBase::AdvanceChunk(), DecodableNnetSimpleLooped::AdvanceChunk(), BackpropTruncationComponent::Backprop(), TanhComponent::Backprop(), PowerComponent::Backprop(), ScaleComponent::Backprop(), AveragePoolingComponent::BackpropagateFnc(), AveragePooling2DComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), Dropout::BackpropagateFnc(), DecodableAmNnetParallel::Compute(), DecodableNnet2Online::ComputeForFrame(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), DecodableNnetSimple::DoNnetComputation(), MultiTaskLoss::Eval(), RepeatedAffineComponent::Init(), NaturalGradientAffineComponent::Init(), AffineComponent::Init(), AffineComponentPreconditioned::Init(), AffineComponentPreconditionedOnline::Init(), BlockAffineComponent::Init(), Convolutional1dComponent::Init(), ConvolutionComponent::Init(), LstmNonlinearityComponent::Init(), OnlineNaturalGradient::InitDefault(), OnlinePreconditioner::InitDefault(), TimeHeightConvolutionComponent::InitFromConfig(), main(), kaldi::nnet2::PreconditionDirectionsAlphaRescaled(), NnetChainTrainer::ProcessOutputs(), NnetDiscriminativeTrainer::ProcessOutputs(), BackpropTruncationComponent::Propagate(), ScaleComponent::Propagate(), DropoutComponent::Propagate(), KlHmm::PropagateFnc(), Dropout::PropagateFnc(), Rbm::RbmUpdate(), LstmNonlinearityComponent::Read(), ClipGradientComponent::RepairGradients(), TimeHeightConvolutionComponent::Scale(), RepeatedAffineComponent::Scale(), NaturalGradientAffineComponent::Scale(), AffineComponent::Scale(), BlockAffineComponent::Scale(), ConvolutionComponent::Scale(), Convolutional1dComponent::Scale(), LstmNonlinearityComponent::Scale(), TanhComponent::StoreStats(), kaldi::nnet3::TestSimpleComponentPropagateProperties(), kaldi::UnitTestCuMatrixAddMatDiagVec(), kaldi::UnitTestCuMatrixAddMatMatElements(), kaldi::UnitTestCuMatrixScale(), kaldi::UnitTestLstmNonlinearity(), and Convolutional2DComponent::Update().

608  {
609 #if HAVE_CUDA == 1
610  if (CuDevice::Instantiate().Enabled()) {
611  if (num_rows_ == 0) return;
612  CuTimer tim;
613 
614  dim3 dimGrid, dimBlock;
615  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
616  &dimGrid, &dimBlock);
617 
618  cuda_scale(dimGrid, dimBlock, data_, value, Dim());
619  CU_SAFE_CALL(cudaGetLastError());
620 
621  CuDevice::Instantiate().AccuProfile(__func__, tim);
622  } else
623 #endif
624  {
625  Mat().Scale(value);
626  }
627 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:630