All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
CuMatrixBase< Real > Singleton Reference

Matrix for CUDA computing. More...

#include <matrix-common.h>

Inheritance diagram for CuMatrixBase< Real >:
Collaboration diagram for CuMatrixBase< Real >:

Public Member Functions

void CopyCols (const CuMatrixBase< Real > &src, const CuArray< MatrixIndexT > &indexes)
 Copies column r from column indexes[r] of src. More...
 
void AddCols (const CuMatrixBase< Real > &src, const CuArray< MatrixIndexT > &indices)
 Add column indices[r] of src to column r. More...
 
void CopyRows (const CuMatrixBase< Real > &src, const CuArray< MatrixIndexT > &indexes)
 Copies row r from row indexes[r] of src. More...
 
void CopyRows (const CuArray< const Real * > &src)
 Copies row r of this matrix from an array of floats at the location given by src[r], where src[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (the point is: the data it points to should be on the GPU if we're using a GPU, and on a CPU otherwise). More...
 
void CopyToRows (const CuArray< Real * > &dst) const
 For each row r of this matrix, copies it to the array of floats at the location given by dst[r], where dst[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (i.e. More...
 
void AddRows (Real alpha, const CuMatrixBase< Real > &src, const CuArray< MatrixIndexT > &indexes)
 Does for each row r, this.Row(r) += alpha * src.row(indexes[r]). More...
 
void AddRows (Real alpha, const CuArray< const Real * > &src)
 Does for each row r, this.Row(r) += alpha * src[r], treating src[r] as the beginning of a region of memory representing a vector of floats, of the same length as this.NumCols(). More...
 
void AddToRows (Real alpha, const CuArray< Real * > &dst) const
 For each row r of this matrix, adds it (times alpha) to the array of floats at the location given by dst[r], where dst[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (i.e. More...
 
void SumColumnRanges (const CuMatrixBase< Real > &src, const CuArray< Int32Pair > &indexes)
 For each row r of this and for each column c, sets (*this)(r, c) to the sum src(r, j), where j ranges from indexes[c].first through indexes[c].second - 1. More...
 
void AddRowRanges (const CuMatrixBase< Real > &src, const CuArray< Int32Pair > &indexes)
 For each row r of this and for each column c, do (*this)(r, c) += src(j, c), where j ranges from indexes[r].first through indexes[r].second - 1. More...
 
void AddToDiag (Real value)
 Adds "value" to the diagonal elements of the matrix. More...
 
MatrixIndexT NumRows () const
 Dimensions. More...
 
MatrixIndexT NumCols () const
 
MatrixIndexT Stride () const
 
::MatrixDim Dim () const
 
Real FrobeniusNorm () const
 
bool IsUnit (Real tol=0.001) const
 
bool ApproxEqual (const CuMatrixBase< Real > &other, float tol=0.01) const
 True if ((*this)-other).FrobeniusNorm() <= tol * this->FrobeniusNorm() More...
 
MatrixIndexT SizeInBytes () const
 Get size of matrix in bytes. More...
 
template<typename OtherReal >
void CopyFromMat (const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
 
void CopyFromGeneralMat (const GeneralMatrix &src, MatrixTransposeType trans=kNoTrans)
 
void CopyFromMat (const MatrixBase< Real > &src, MatrixTransposeType trans=kNoTrans)
 
void CopyFromSp (const CuSpMatrix< Real > &M)
 
template<typename OtherReal >
void CopyFromTp (const CuTpMatrix< OtherReal > &M, MatrixTransposeType trans=kNoTrans)
 
template<typename OtherReal >
void CopyFromMat (const CuMatrixBase< OtherReal > &M, MatrixTransposeType trans=kNoTrans)
 
template<typename OtherReal >
void CopyToMat (MatrixBase< OtherReal > *dst, MatrixTransposeType trans=kNoTrans) const
 
void CopyRowsFromVec (const CuVectorBase< Real > &v)
 This function has two modes of operation. More...
 
void CopyRowsFromVec (const VectorBase< Real > &v)
 Version of CopyRowsFromVec() that takes a CPU-based vector. More...
 
void CopyColsFromVec (const CuVectorBase< Real > &v)
 Copies vector into matrix, column-by-column. More...
 
void CopyColFromVec (const CuVectorBase< Real > &v, const MatrixIndexT col)
 Copy vector into specific column of matrix. More...
 
void Sigmoid (const CuMatrixBase< Real > &src)
 Set each element to the sigmoid of the corresponding element of "src": element by element, x = 1 / (1 + exp(-x)) More...
 
void Heaviside (const CuMatrixBase< Real > &src)
 Set each element to the Heaviside function of the corresponding element of "src", which we define as the function (x > 0 ? 1.0 : 0.0) [note: in general, there are different ways to deal with the situation when x==0. More...
 
void SoftHinge (const CuMatrixBase< Real > &src)
 Apply the function y = log(1 + exp(x)), to each element. More...
 
void GroupPnorm (const CuMatrixBase< Real > &src, Real pow)
 Apply the function y(i) = (sum_{j = i*G}^{(i+1)*G-1} x_j ^ (power)) ^ (1 / p) where G = x.NumCols() / y.NumCols() must be an integer. More...
 
void DiffGroupPnorm (const CuMatrixBase< Real > &in_value, const CuMatrixBase< Real > &out_value, const CuMatrixBase< Real > &out_deriv, Real power)
 Differentiate backward through the GroupPnorm function. More...
 
void GroupMax (const CuMatrixBase< Real > &src)
 Apply the function y(i) = (max_{j = i*G}^{(i+1)*G-1} x_j where G = x.NumCols() / y.NumCols() must be an integer. More...
 
void GroupMaxDeriv (const CuMatrixBase< Real > &input, const CuMatrixBase< Real > &output)
 Calculate derivatives for the GroupMax function above, where "input" is the input to the GroupMax function above (i.e. More...
 
void ParametricRelu (const CuMatrixBase< Real > &src, const CuVectorBase< Real > &alpha, const CuVectorBase< Real > &beta)
 Compute the parametric rectified linear unit function; element by element, *this = src * (src > 0 ? alpha : beta) More...
 
void DiffParametricRelu (const CuMatrixBase< Real > &value, const CuMatrixBase< Real > &diff, const CuVectorBase< Real > &alpha, const CuVectorBase< Real > &beta)
 Differentiate backward through the parametric relu function. More...
 
void Tanh (const CuMatrixBase< Real > &src)
 Compute the hyperbolic tangent (tanh) function; element by element, *this = tanh(src). More...
 
void DiffSigmoid (const CuMatrixBase< Real > &value, const CuMatrixBase< Real > &diff)
 Differentiate backward through the sigmoid function. More...
 
void DiffTanh (const CuMatrixBase< Real > &value, const CuMatrixBase< Real > &diff)
 Differentiate backward through the tanh function. More...
 
void DiffSoftmaxPerRow (const CuMatrixBase< Real > &value, const CuMatrixBase< Real > &diff)
 Differentiate backward through the softmax function. More...
 
void DiffLogSoftmaxPerRow (const CuMatrixBase< Real > &out_value, const CuMatrixBase< Real > &out_deriv)
 Differentiate backward through the log softmax function. More...
 
void DiffXent (const CuArray< int32 > &tgt, CuVector< Real > *log_post_tgt)
 Differentiate the block [softmax+cross-entropy] : dE/da = posterior_mat - target_mat, 'E' is error function, 'a' is activation on softmax input. More...
 
void Cholesky (CuMatrixBase< Real > *inv_cholesky=NULL)
 This function does sets *this to the Cholesky factor of *this (i.e. More...
 
void SymInvertPosDef ()
 Inversion for positive definite symmetric matrices. More...
 
void ApplyPow (Real power)
 Apply power to the absolute value of each element. More...
 
void ApplyPowAbs (Real power, bool include_sign=false)
 
void ApplyHeaviside ()
 For each element, sets x = (x > 0 ? 1.0 : 0.0). More...
 
void ApplyFloor (Real floor_val)
 
void ApplyCeiling (Real ceiling_val)
 
void ApplyExp ()
 
void ApplySoftMaxPerRow (const CuMatrixBase< Real > &src)
 Softmax nonlinearity Y = Softmax(X) : Yij = e^Xij / sum_k(e^Xik), done to each row for each row, the max value is first subtracted for good numerical stability. More...
 
void ApplyLogSoftMaxPerRow (const CuMatrixBase< Real > &src)
 LogSoftmax nonlinearity Y = LogSoftmax(X) : Yij = Xij - log(sum_k(e^Xik)), done to each row for each row, the max value is first subtracted for good numerical stability. More...
 
void FindRowMaxId (CuArray< int32 > *id) const
 Find the id of the maximal element for each row. More...
 
void SetZero ()
 Math operations, some calling kernels. More...
 
void Set (Real value)
 
void Add (Real value)
 
void SetZeroAboveDiag ()
 Zeroes all elements for which col > row. More...
 
void Scale (Real value)
 
void ApplyLog ()
 
void MulElements (const CuMatrixBase< Real > &A)
 Multiply two matrices elementwise: C = C .* A. More...
 
void DivElements (const CuMatrixBase< Real > &A)
 Divide two matrices elementwise: C = A ./ A. More...
 
void Max (const CuMatrixBase< Real > &A)
 Do, elementwise, *this = max(*this, A). More...
 
void Min (const CuMatrixBase< Real > &A)
 Do, elementwise, *this = min(*this, A). More...
 
void MulColsVec (const CuVectorBase< Real > &scale)
 scale i'th column by scale[i] More...
 
void MulRowsVec (const CuVectorBase< Real > &scale)
 scale i'th row by scale[i] More...
 
void MulRowsGroupMat (const CuMatrixBase< Real > &src)
 divide each row into src.NumCols() groups, and then scale i'th row's jth group of elements by src[i, j]. More...
 
void DivRowsVec (const CuVectorBase< Real > &div)
 divide i'th row by scale[i] More...
 
void InvertElements ()
 invert the matrix by elements. More...
 
void AddMat (Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType trans=kNoTrans)
 *this += alpha * A More...
 
void AddMatBlocks (Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType trans=kNoTrans)
 if A.NumRows() is multiple of (*this)->NumRows and A.NumCols() is multiple of (*this)->NumCols divide A into blocks of the same size as (*this) and add them to *this (times alpha) More...
 
void AddVecToCols (Real alpha, const CuVectorBase< Real > &col, Real beta=1.0)
 (for each column c of *this), c = alpha * col + beta * c More...
 
void AddVecToRows (Real alpha, const CuVectorBase< Real > &row, Real beta=1.0)
 (for each row r of *this), r = alpha * row + beta * r More...
 
void AddMatMat (Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
 C = alpha * A(^T)*B(^T) + beta * C. More...
 
void AddVecVec (Real alpha, const CuVectorBase< Real > &x, const CuVectorBase< Real > &y)
 A = alpha * x * y^T + A . More...
 
void SetMatMatDivMat (const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, const CuMatrixBase< Real > &C)
 *this = a * b / c (by element; when c = 0, *this = a) *this can be an alias of a, b or c safely and get expected result. More...
 
void SymAddMat2 (const Real alpha, const CuMatrixBase< Real > &M, MatrixTransposeType transA, Real beta)
 *this = beta * *this + alpha * M M^T, for symmetric matrices. More...
 
void AddMatBlock (Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuBlockMatrix< Real > &B, MatrixTransposeType transB, Real beta)
 This function is like AddMatMat but for where the second argument is of type CuBlockMatrix (a block-diagonal matrix of blocks). More...
 
void AddDiagVecMat (const Real alpha, const CuVectorBase< Real > &v, const CuMatrixBase< Real > &M, MatrixTransposeType transM, Real beta=1.0)
 *this = beta * *this + alpha * diag(v) * M [or M^T]. More...
 
void AddMatDiagVec (const Real alpha, const CuMatrixBase< Real > &M, MatrixTransposeType transM, CuVectorBase< Real > &v, Real beta=1.0)
 
void AddMatMatElements (const Real alpha, const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, const Real beta)
 *this = beta * *this + alpha * A .* B (.* element by element multiplication) More...
 
void AddMatSp (const Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuSpMatrix< Real > &B, const Real beta)
 this <– beta*this + alpha*A*B More...
 
void AddSpMat (const Real alpha, const CuSpMatrix< Real > &A, const CuMatrixBase< Real > &B, MatrixTransposeType transB, const Real beta)
 this <– beta*this + alpha*SpA*B More...
 
void AddTpMat (const Real alpha, const CuTpMatrix< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, const Real beta)
 this <– beta*this + alpha*A*B. More...
 
void AddMatTp (const Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuTpMatrix< Real > &B, MatrixTransposeType transB, const Real beta)
 this <– beta*this + alpha*A*B. More...
 
void CopyFromBlock (const CuBlockMatrix< Real > &B, MatrixTransposeType trans=kNoTrans)
 
void CopyLowerToUpper ()
 
void CopyUpperToLower ()
 
CuSubMatrix< Real > Range (const MatrixIndexT row_offset, const MatrixIndexT num_rows, const MatrixIndexT col_offset, const MatrixIndexT num_cols) const
 
CuSubMatrix< Real > RowRange (const MatrixIndexT row_offset, const MatrixIndexT num_rows) const
 
CuSubMatrix< Real > ColRange (const MatrixIndexT col_offset, const MatrixIndexT num_cols) const
 
const CuSubVector< Real > Row (MatrixIndexT i) const
 
CuSubVector< Real > Row (MatrixIndexT i)
 
CuValue< Real > operator() (MatrixIndexT r, MatrixIndexT c)
 
Real operator() (MatrixIndexT r, MatrixIndexT c) const
 
Real Sum () const
 
Real Max () const
 
Real Min () const
 
Real Trace (bool check_square=true) const
 Return the trace. If check_square = true, will crash if matrix is not square. More...
 
void SetRandn ()
 
void SetRandUniform ()
 
void Write (std::ostream &os, bool binary) const
 
void AddElements (Real alpha, const std::vector< MatrixElement< Real > > &input)
 
void AddElements (Real alpha, const CuArray< Int32Pair > &indexes, const Real *input)
 
void Lookup (const std::vector< Int32Pair > &indexes, Real *output) const
 
void Lookup (const CuArray< Int32Pair > &indexes, Real *output) const
 
void EqualElementMask (const CuMatrixBase< Real > &mat, CuMatrix< Real > *mask) const
 
const Real * RowData (MatrixIndexT r) const
 Get raw row pointer (const). More...
 
Real * RowData (MatrixIndexT r)
 Get raw row pointer. More...
 
const Real * Data () const
 Return data pointer (const). More...
 
Real * Data ()
 Return data pointer. More...
 
const MatrixBase< Real > & Mat () const
 
MatrixBase< Real > & Mat ()
 

Protected Member Functions

 CuMatrixBase ()
 
 CuMatrixBase (Real *data, MatrixIndexT num_rows, MatrixIndexT num_cols, MatrixIndexT stride)
 This constructor takes the #rows, #cols and stride; it's called from the constructor of CuSubMatrix. More...
 

Protected Attributes

Real * data_
 GPU data pointer (or regular matrix data pointer,. More...
 
MatrixIndexT num_cols_
 
MatrixIndexT num_rows_
 
MatrixIndexT stride_
 

Private Member Functions

 KALDI_DISALLOW_COPY_AND_ASSIGN (CuMatrixBase)
 

Friends

class CuMatrixBase< float >
 
class CuMatrixBase< double >
 
class CuVectorBase< float >
 
class CuVectorBase< double >
 
class VectorBase< Real >
 
class CuSpMatrix< Real >
 
class CuTpMatrix< float >
 
class CuTpMatrix< double >
 
class CuVectorBase< Real >
 
class CuSubMatrix< Real >
 
class CuRand< Real >
 
class CuSubVector< Real >
 
class CuBlockMatrix< Real >
 
class CuSparseMatrix< float >
 
class CuSparseMatrix< double >
 
class CuSparseMatrix< Real >
 
Real TraceMatMat (const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, MatrixTransposeType trans)
 
Real TraceMatSmat (const CuMatrixBase< Real > &A, const CuSparseMatrix< Real > &B, MatrixTransposeType trans)
 
void AddMatMatBatched (const Real alpha, std::vector< CuSubMatrix< Real > * > &C, const std::vector< CuSubMatrix< Real > * > &A, MatrixTransposeType transA, const std::vector< CuSubMatrix< Real > * > &B, MatrixTransposeType transB, const Real beta)
 Does multiple matrix multiplications, executing them in parallel using cuBLAS's gemmBatched if we are using a GPU. More...
 

Detailed Description

template<typename Real>
singleton kaldi::CuMatrixBase< Real >

Matrix for CUDA computing.

Does the computation on the CUDA card when CUDA is compiled in and we have a suitable GPU (CuDevice::Instantiate().Enabled() == true); otherwise, does it on the CPU.

Definition at line 67 of file matrix-common.h.

Constructor & Destructor Documentation

CuMatrixBase ( )
inlineprotected

Definition at line 626 of file cu-matrix.h.

626 : data_(NULL), num_cols_(0), num_rows_(0), stride_(0) { }
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
MatrixIndexT stride_
Definition: cu-matrix.h:646
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
CuMatrixBase ( Real *  data,
MatrixIndexT  num_rows,
MatrixIndexT  num_cols,
MatrixIndexT  stride 
)
inlineprotected

This constructor takes the #rows, #cols and stride; it's called from the constructor of CuSubMatrix.

Definition at line 630 of file cu-matrix.h.

633  :
634  data_(data), num_cols_(num_cols), num_rows_(num_rows), stride_(stride) { }
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
MatrixIndexT stride_
Definition: cu-matrix.h:646
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
MatrixIndexT num_rows_
Definition: cu-matrix.h:645

Member Function Documentation

void Add ( Real  value)

Definition at line 546 of file cu-matrix.cc.

References data_, and Timer::Elapsed().

Referenced by BackpropTruncationComponent::Backprop(), TanhComponent::Backprop(), Xent::Eval(), main(), kaldi::MeanVariance(), DropoutComponent::Propagate(), Dropout::PropagateFnc(), ClipGradientComponent::RepairGradients(), TanhComponent::StoreStats(), kaldi::TestCuMatrixCompObjfAndDeriv(), kaldi::nnet3::TestSimpleComponentPropagateProperties(), kaldi::UnitTestCuMatrixAdd(), kaldi::UnitTestCuMatrixAdd2(), kaldi::UnitTestCuMatrixEqualElementMask(), kaldi::UnitTestCuMatrixObjfDeriv(), and kaldi::UnitTestCuMatrixSetRandUniform().

546  {
547 #if HAVE_CUDA == 1
548  if (CuDevice::Instantiate().Enabled()) {
549  if (num_rows_ == 0) return;
550  Timer tim;
551 
552  dim3 dimGrid, dimBlock;
553  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
554  &dimGrid, &dimBlock);
555 
556  cuda_add(dimGrid, dimBlock, data_, value, Dim());
557  CU_SAFE_CALL(cudaGetLastError());
558 
559  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
560  } else
561  #endif
562  {
563  Mat().Add(value);
564  }
565 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void AddCols ( const CuMatrixBase< Real > &  src,
const CuArray< MatrixIndexT > &  indices 
)

Add column indices[r] of src to column r.

As a special case, if indexes[i] == -1, skip column i indices.size() must equal this->NumCols(), and src.NumRows() must equal this.NumRows()

Definition at line 2431 of file cu-matrix.cc.

References CuArray< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuArray< T >::Dim(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by Convolutional1dComponent::Backprop(), ConvolutionalComponent::BackpropagateFnc(), ConvolutionComponent::InderivPatchesToInderiv(), and MaxpoolingComponent::InderivPatchesToInderiv().

2432  {
2433 #if HAVE_CUDA == 1
2434  if (CuDevice::Instantiate().Enabled()) {
2435  KALDI_ASSERT(indices.Dim() == NumCols());
2436  KALDI_ASSERT(NumRows() == src.NumRows());
2437  Timer tim;
2438  dim3 dimGrid, dimBlock;
2439  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2440  &dimGrid, &dimBlock);
2441  cuda_add_cols(dimGrid, dimBlock, data_, src.Data(), indices.Data(),
2442  Dim(), src.Stride());
2443  CU_SAFE_CALL(cudaGetLastError());
2444  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2445  } else
2446 #endif
2447  {
2448  Mat().AddCols(src.Mat(), indices.Data());
2449  }
2450 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddDiagVecMat ( const Real  alpha,
const CuVectorBase< Real > &  v,
const CuMatrixBase< Real > &  M,
MatrixTransposeType  transM,
Real  beta = 1.0 
)

*this = beta * *this + alpha * diag(v) * M [or M^T].

The same as adding M but scaling each row M_i by v(i).

Definition at line 1192 of file cu-matrix.cc.

References CU2DBLOCK, CuVectorBase< Real >::Data(), CuMatrixBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), kaldi::SameDim(), CuMatrixBase< Real >::Stride(), kaldi::swap(), and CuVectorBase< Real >::Vec().

Referenced by NormalizeComponent::Backprop(), HiddenSoftmax::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), OnlineNaturalGradient::ComputeWt1(), OnlinePreconditioner::ComputeWt1(), CuMatrixBase< Real >::DiffSoftmaxPerRow(), MultiBasisComponent::PropagateFnc(), and kaldi::TestCuMatrixAddDiagVecMat().

1195  {
1196 #if HAVE_CUDA == 1
1197  if (CuDevice::Instantiate().Enabled()) {
1198  if (transM == kNoTrans) {
1199  KALDI_ASSERT(SameDim(*this, M));
1200  } else {
1201  KALDI_ASSERT(M.NumRows() == NumCols() && M.NumCols() == NumRows());
1202  }
1203  KALDI_ASSERT(v.Dim() == this->NumRows());
1204 
1205  Timer tim;
1206  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
1207  dim3 dimGrid(n_blocks(num_cols_, CU2DBLOCK),
1208  n_blocks(num_rows_, CU2DBLOCK));
1209  MatrixIndexT M_row_stride = M.Stride(), M_col_stride = 1;
1210  if (transM == kTrans)
1211  std::swap(M_row_stride, M_col_stride);
1212  cuda_add_diag_vec_mat(dimGrid, dimBlock, alpha, data_, Dim(),
1213  v.Data(), M.Data(), M_row_stride, M_col_stride, beta);
1214  CU_SAFE_CALL(cudaGetLastError());
1215  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1216  } else
1217 #endif
1218  {
1219  Mat().AddDiagVecMat(alpha, v.Vec(), M.Mat(), transM, beta);
1220  }
1221 }
void swap(basic_filebuf< CharT, Traits > &x, basic_filebuf< CharT, Traits > &y)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void AddElements ( Real  alpha,
const std::vector< MatrixElement< Real > > &  input 
)

Definition at line 2948 of file cu-matrix.cc.

References CU1DBLOCK, data_, rnnlm::i, and KALDI_ASSERT.

Referenced by OnlineNaturalGradient::InitOrthonormalSpecial(), OnlinePreconditioner::InitOrthonormalSpecial(), DiscriminativeComputation::ProcessPosteriors(), and kaldi::UnitTestCuMatrixAddElements().

2949  {
2950  // Checks the dimension.
2951  MatrixIndexT num_rows = this->num_rows_, num_cols = this->num_cols_;
2952  for (int32 i = 0; i < input.size(); ++i) {
2953  KALDI_ASSERT(input[i].row < num_rows && input[i].row >= 0 &&
2954  input[i].column < num_cols && input[i].column >= 0);
2955  }
2956 #if HAVE_CUDA == 1
2957  if (CuDevice::Instantiate().Enabled()) {
2958  void *addr = CuDevice::Instantiate().Malloc(input.size() * sizeof(MatrixElement<Real>));
2959  CU_SAFE_CALL(cudaMemcpy(addr, input.data(),
2960  input.size() * sizeof(MatrixElement<Real>),
2961  cudaMemcpyHostToDevice));
2962 
2963  Timer tim;
2964  int dimBlock(CU1DBLOCK);
2965  int dimGrid(n_blocks(input.size(), CU1DBLOCK));
2966 
2967  cuda_matrix_add_elements(dimGrid, dimBlock, this->data_, this->Dim(),
2968  alpha, (MatrixElement<Real>*)addr, input.size());
2969  CU_SAFE_CALL(cudaGetLastError());
2970  CuDevice::Instantiate().Free(addr);
2971  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2972  } else
2973 #endif
2974  {
2975  for (int32 i = 0; i < input.size(); i++) {
2976  (*this)(input[i].row, input[i].column) += alpha * input[i].weight;
2977  }
2978  }
2979 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
int32 MatrixIndexT
Definition: matrix-common.h:96
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void AddElements ( Real  alpha,
const CuArray< Int32Pair > &  indexes,
const Real *  input 
)

Definition at line 2982 of file cu-matrix.cc.

References CU1DBLOCK, CuArray< T >::Data(), data_, CuArray< T >::Dim(), Timer::Elapsed(), Int32Pair::first, rnnlm::i, KALDI_ASSERT, kaldi::kUndefined, and Int32Pair::second.

2983  {
2984  if (indexes.Dim() == 0) return;
2985  KALDI_ASSERT(input != NULL);
2986 
2987 #if HAVE_CUDA == 1
2988  if (CuDevice::Instantiate().Enabled()) {
2989  Timer tim;
2990  CuVector<Real> tmp_vec(indexes.Dim(), kUndefined);
2991  CU_SAFE_CALL(cudaMemcpy(tmp_vec.Data(), input, indexes.Dim() * sizeof(Real),
2992  cudaMemcpyHostToDevice));
2993 
2994  int dimBlock(CU1DBLOCK);
2995  int dimGrid = n_blocks(indexes.Dim(), CU1DBLOCK);
2996  cuda_matrix_add_indexed_values(dimGrid, dimBlock, this->Dim(), alpha,
2997  indexes.Data(), tmp_vec.Data(), indexes.Dim(), this->data_);
2998  CU_SAFE_CALL(cudaGetLastError());
2999  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
3000  } else
3001 #endif
3002  {
3003  MatrixIndexT num_rows = this->num_rows_, num_cols = this->num_cols_;
3004  const Int32Pair *index = indexes.Data();
3005  for (int32 i = 0; i < indexes.Dim(); i++) {
3006  KALDI_ASSERT(index[i].first < num_rows && index[i].first >= 0 &&
3007  index[i].second < num_cols && index[i].second >= 0);
3008  (*this)(index[i].first, index[i].second) += alpha * input[i];
3009  }
3010  }
3011 }
const T * Data() const
Get raw pointer.
Definition: cu-array.h:65
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
int32 MatrixIndexT
Definition: matrix-common.h:96
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:62
int32_cuda second
Definition: cu-matrixdim.h:86
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
int32_cuda first
Definition: cu-matrixdim.h:85
void AddMat ( Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  trans = kNoTrans 
)

*this += alpha * A

Definition at line 938 of file cu-matrix.cc.

References CU2DBLOCK, data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by RepeatedAffineComponent::Add(), NaturalGradientAffineComponent::Add(), AffineComponent::Add(), BlockAffineComponent::Add(), ConvolutionComponent::Add(), LstmNonlinearityComponent::Add(), Convolutional1dComponent::Add(), CuRand< Real >::AddGaussNoise(), GeneralMatrix::AddToMat(), CuMatrixBase< Real >::ApproxEqual(), SigmoidComponent::Backprop(), LstmNonlinearityComponent::Backprop(), Splice::BackpropagateFnc(), AveragePoolingComponent::BackpropagateFnc(), AveragePooling2DComponent::BackpropagateFnc(), Convolutional2DComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), DiscriminativeComputation::Compute(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), Xent::Eval(), Mse::Eval(), NnetComputer::ExecuteCommand(), RepeatedAffineComponent::PerturbParams(), AffineComponent::PerturbParams(), BlockAffineComponent::PerturbParams(), ConvolutionComponent::PerturbParams(), LstmNonlinearityComponent::PerturbParams(), Convolutional1dComponent::PerturbParams(), SumReduceComponent::Propagate(), AdditiveNoiseComponent::Propagate(), Rbm::RbmUpdate(), ClipGradientComponent::RepairGradients(), kaldi::UnitTestCuMatrixAddMat(), kaldi::UnitTestCuMatrixAddMatDiagVec(), kaldi::UnitTestCuMatrixAddMatMatElements(), kaldi::UnitTestLstmNonlinearity(), kaldi::nnet3::UnitTestNnetInputDerivatives(), LinearTransform::Update(), AffineTransform::Update(), RecurrentComponent::Update(), ConvolutionalComponent::Update(), Convolutional2DComponent::Update(), NaturalGradientRepeatedAffineComponent::Update(), LstmProjected::Update(), BlstmProjected::Update(), ConvolutionComponent::Update(), and Convolutional1dComponent::Update().

939  {
940 
941 #if HAVE_CUDA == 1
942  if (CuDevice::Instantiate().Enabled()) {
943  if (transA == kNoTrans) {
944  KALDI_ASSERT(A.NumRows() == num_rows_ && A.NumCols() == num_cols_);
945  } else {
946  KALDI_ASSERT(A.NumCols() == num_rows_ && A.NumRows() == num_cols_);
947  }
948  if (num_rows_ == 0) return;
949  Timer tim;
950  // This block dimension seems to work better than the
951  // one from GetBlockSizesForSimpleMatrixOperation().
952  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
953  dim3 dimGrid(n_blocks(NumCols(), CU2DBLOCK),
954  n_blocks(NumRows(), CU2DBLOCK));
955  cuda_add_mat(dimGrid, dimBlock, alpha, A.data_,
956  data_, Dim(), A.Stride(),
957  (transA == kTrans ? 1 : 0));
958  CU_SAFE_CALL(cudaGetLastError());
959 
960  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
961  } else
962 #endif
963  {
964  Mat().AddMat(alpha, A.Mat(), transA);
965  }
966 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void AddMatBlock ( Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  transA,
const CuBlockMatrix< Real > &  B,
MatrixTransposeType  transB,
Real  beta 
)

This function is like AddMatMat but for where the second argument is of type CuBlockMatrix (a block-diagonal matrix of blocks).

Definition at line 2876 of file cu-matrix.cc.

References CuMatrixBase< Real >::AddMatMat(), CuBlockMatrix< Real >::Block(), CU2DBLOCK, CuMatrixBase< Real >::Data(), data_, Timer::Elapsed(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuBlockMatrix< Real >::NumBlocks(), CuBlockMatrix< Real >::NumCols(), CuMatrixBase< Real >::NumCols(), CuBlockMatrix< Real >::NumRows(), CuMatrixBase< Real >::NumRows(), CuMatrixBase< Real >::Stride(), and kaldi::swap().

Referenced by kaldi::UnitTestCuBlockMatrixAddMatBlock().

2880  {
2881  // Check dimensions
2882  int32 A_num_rows = A.NumRows(), A_num_cols = A.NumCols(),
2883  A_row_stride = A.Stride(), A_col_stride = 1,
2884  B_num_rows = B.NumRows(), B_num_cols = B.NumCols();
2885  if (transA == kTrans) {
2886  std::swap(A_num_rows, A_num_cols);
2887  std::swap(A_row_stride, A_col_stride);
2888  }
2889  if (transB == kTrans) {
2890  std::swap(B_num_rows, B_num_cols);
2891  }
2892  // At this point the {A,B}_{rows,cols} variables are
2893  // after any transposition.
2894  KALDI_ASSERT(NumRows() == A_num_rows && NumCols() == B_num_cols);
2895  KALDI_ASSERT(A_num_cols == B_num_rows);
2896  int32 B_num_blocks = B.NumBlocks();
2897 
2898  if (num_rows_ == 0) return;
2899 #if HAVE_CUDA == 1
2900  if (CuDevice::Instantiate().Enabled()) {
2901  Timer tim;
2902  MatrixDim this_dim = Dim();
2903 
2904  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2905  // (x,y) indices will be (row of *this, block of B)
2906  dim3 dimGrid(n_blocks(num_rows_, CU2DBLOCK),
2907  n_blocks(B_num_blocks, CU2DBLOCK));
2908 
2909  // caution: the use of x as the row-index is not good, but
2910  // this code is not much used, so I'm not updating it.a
2911  cuda_add_mat_blockmat(dimGrid, dimBlock, data_, this_dim, A.Data(),
2912  A_num_rows, A_num_cols, A_row_stride, A_col_stride,
2913  B.CuData(), B_num_blocks, alpha, beta,
2914  (transB == kTrans ? 1 : 0));
2915 
2916  CU_SAFE_CALL(cudaGetLastError());
2917 
2918  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2919  } else
2920 #endif
2921  {
2922  // "row_offset" and "col_offset" are offsets into B (or into B^T, if
2923  // transB == kTrans).
2924  int32 row_offset = 0, col_offset = 0;
2925  for (int32 b = 0; b < B_num_blocks; b++) {
2926  const CuSubMatrix<Real> this_block = B.Block(b);
2927  int32 this_num_rows = this_block.NumRows(),
2928  this_num_cols = this_block.NumCols();
2929  if (transB == kTrans) std::swap(this_num_rows, this_num_cols);
2930  CuSubMatrix<Real> this_part(*this, 0, num_rows_,
2931  col_offset, this_num_cols);
2932  CuSubMatrix<Real> A_part = (transA == kNoTrans ?
2934  row_offset, this_num_rows) :
2935  CuSubMatrix<Real>(A, row_offset, this_num_rows,
2936  0, num_rows_));
2937  this_part.AddMatMat(alpha, A_part, transA, this_block, transB, beta);
2938  row_offset += this_num_rows;
2939  col_offset += this_num_cols;
2940  }
2941  // Note: the values being compared below are all after applying any
2942  // transposition to B.
2943  KALDI_ASSERT(row_offset == B_num_rows && col_offset == B_num_cols);
2944  }
2945 }
friend class CuSubMatrix< Real >
Definition: cu-matrix.h:88
Structure containing size of the matrix plus stride.
Definition: cu-matrixdim.h:52
void swap(basic_filebuf< CharT, Traits > &x, basic_filebuf< CharT, Traits > &y)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void AddMatBlocks ( Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  trans = kNoTrans 
)

if A.NumRows() is multiple of (*this)->NumRows and A.NumCols() is multiple of (*this)->NumCols divide A into blocks of the same size as (*this) and add them to *this (times alpha)

Definition at line 969 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), rnnlm::i, rnnlm::j, KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by kaldi::UnitTestCuMatrixAddMatBlocks(), ConvolutionComponent::Update(), and Convolutional1dComponent::Update().

970  {
971  if (num_rows_ == 0 || num_cols_ == 0) return;
972  int32 num_row_blocks, num_col_blocks;
973  if (transA == kNoTrans) {
974  KALDI_ASSERT(A.NumRows() % num_rows_ == 0 && A.NumCols() % num_cols_ == 0);
975  num_row_blocks = A.Mat().NumRows() / num_rows_;
976  num_col_blocks = A.Mat().NumCols() / num_cols_;
977  } else {
978  KALDI_ASSERT(A.NumRows() % num_cols_ == 0 && A.NumCols() % num_rows_ == 0);
979  num_row_blocks = A.Mat().NumRows() / num_cols_;
980  num_col_blocks = A.Mat().NumCols() / num_rows_;
981  }
982 #if HAVE_CUDA == 1
983  if (CuDevice::Instantiate().Enabled()) {
984  Timer tim;
985  dim3 dimGrid, dimBlock;
986  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
987  &dimGrid, &dimBlock);
988  cuda_add_mat_blocks(dimGrid, dimBlock, alpha, A.data_, num_row_blocks,
989  num_col_blocks, data_, Dim(), A.Stride(),
990  (transA == kTrans ? 1 : 0));
991  CU_SAFE_CALL(cudaGetLastError());
992 
993  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
994  } else
995 #endif
996  {
997  int32 nr, nc;
998  if (transA == kNoTrans) {
999  nr = num_rows_;
1000  nc = num_cols_;
1001  } else {
1002  nr = num_cols_;
1003  nc = num_rows_;
1004  }
1005  for (int32 i = 0; i < num_row_blocks; i++) {
1006  for (int32 j = 0; j < num_col_blocks; j++) {
1007  Mat().AddMat(alpha, SubMatrix<Real>(A.Mat(), i * nr, nr, j * nc, nc),
1008  transA);
1009  }
1010  }
1011  }
1012 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void AddMatDiagVec ( const Real  alpha,
const CuMatrixBase< Real > &  M,
MatrixTransposeType  transM,
CuVectorBase< Real > &  v,
Real  beta = 1.0 
)

Definition at line 1225 of file cu-matrix.cc.

References CuVectorBase< Real >::Data(), CuMatrixBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), kaldi::SameDim(), CuMatrixBase< Real >::Stride(), kaldi::swap(), and CuVectorBase< Real >::Vec().

Referenced by SigmoidComponent::RepairGradients(), and TanhComponent::RepairGradients().

1229  {
1230 #if HAVE_CUDA == 1
1231  if (CuDevice::Instantiate().Enabled()) {
1232  if (transM == kNoTrans) {
1233  KALDI_ASSERT(SameDim(*this, M));
1234  } else {
1235  KALDI_ASSERT(M.NumRows() == NumCols() && M.NumCols() == NumRows());
1236  }
1237  KALDI_ASSERT(v.Dim() == this->NumCols());
1238 
1239  Timer tim;
1240  dim3 dimGrid, dimBlock;
1241  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1242  &dimGrid, &dimBlock);
1243  MatrixIndexT M_row_stride = M.Stride(), M_col_stride = 1;
1244  if (transM == kTrans) std::swap(M_row_stride, M_col_stride);
1245  cuda_add_mat_diag_vec(dimGrid, dimBlock, alpha, data_, Dim(),
1246  M.Data(), M_row_stride, M_col_stride, v.Data(), beta);
1247  CU_SAFE_CALL(cudaGetLastError());
1248  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1249  } else
1250 #endif
1251  {
1252  Mat().AddMatDiagVec(alpha, M.Mat(), transM, v.Vec(), beta);
1253  }
1254 }
void swap(basic_filebuf< CharT, Traits > &x, basic_filebuf< CharT, Traits > &y)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddMatMat ( Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  transA,
const CuMatrixBase< Real > &  B,
MatrixTransposeType  transB,
Real  beta 
)

C = alpha * A(^T)*B(^T) + beta * C.

Definition at line 1102 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, kaldi::kTrans, CuMatrixBase< Real >::Mat(), rnnlm::n, CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by CuMatrixBase< Real >::AddMatBlock(), CuBlockMatrix< Real >::AddMatMat(), CuMatrixBase< BaseFloat >::AddMatSp(), CuMatrixBase< BaseFloat >::AddMatTp(), CuMatrixBase< BaseFloat >::AddSpMat(), CuMatrixBase< BaseFloat >::AddTpMat(), RepeatedAffineComponent::Backprop(), AffineComponent::Backprop(), FixedLinearComponent::Backprop(), FixedAffineComponent::Backprop(), LinearTransform::BackpropagateFnc(), AffineTransform::BackpropagateFnc(), RecurrentComponent::BackpropagateFnc(), ConvolutionalComponent::BackpropagateFnc(), LstmProjected::BackpropagateFnc(), BlstmProjected::BackpropagateFnc(), AffineComponent::CollapseWithNext(), AffineComponent::CollapseWithPrevious(), OnlineNaturalGradient::ComputeWt1(), OnlinePreconditioner::ComputeWt1(), kaldi::CuVectorUnitTestAddDiagMatMat(), OnlineNaturalGradient::InitOrthonormalSpecial(), OnlinePreconditioner::InitOrthonormalSpecial(), kaldi::nnet2::PreconditionDirections(), OnlineNaturalGradient::PreconditionDirectionsInternal(), OnlinePreconditioner::PreconditionDirectionsInternal(), AffineComponent::Propagate(), DctComponent::Propagate(), FixedLinearComponent::Propagate(), FixedAffineComponent::Propagate(), KlHmm::PropagateFnc(), LinearTransform::PropagateFnc(), AffineTransform::PropagateFnc(), RecurrentComponent::PropagateFnc(), Rbm::PropagateFnc(), LstmProjected::PropagateFnc(), BlstmProjected::PropagateFnc(), Rbm::RbmUpdate(), Rbm::Reconstruct(), OnlineNaturalGradient::ReorthogonalizeXt1(), OnlinePreconditioner::ReorthogonalizeXt1(), kaldi::TestCuMatrixMatMat(), kaldi::UnitTestCuBlockMatrixAddMatMat(), kaldi::UnitTestCuCholesky(), kaldi::UnitTestCuMatrixAddMatMat(), kaldi::UnitTestCuMatrixSymAddMat2(), kaldi::UnitTestCuMatrixSymInvertPosDef(), kaldi::UnitTestCuSpMatrixInvert(), LinearTransform::Update(), AffineTransform::Update(), RecurrentComponent::Update(), ConvolutionalComponent::Update(), Convolutional2DComponent::Update(), RepeatedAffineComponent::Update(), NaturalGradientAffineComponent::Update(), AffineComponentPreconditioned::Update(), AffineComponentPreconditionedOnline::Update(), BlockAffineComponentPreconditioned::Update(), AffineComponent::UpdateSimple(), and BlockAffineComponent::UpdateSimple().

1104  {
1105 
1106 
1107  // CUBLAS is col-major, cudamatrix is row-major, how to do the mapping?
1108  // keep trans..., just swap A&B matrices: A->B B->A
1109  MatrixIndexT m = ((transB==kTrans)? B.NumRows() : B.NumCols());
1110  MatrixIndexT n = ((transA==kTrans)? A.NumCols() : A.NumRows());
1111  MatrixIndexT k = ((transB==kTrans)? B.NumCols() : B.NumRows());
1112  MatrixIndexT k1 = ((transA==kTrans)? A.NumRows() : A.NumCols());
1113 
1114  KALDI_ASSERT(m == NumCols());
1115  KALDI_ASSERT(n == NumRows());
1116  KALDI_ASSERT(k == k1);
1117 
1118  if (m == 0) return;
1119 
1120 
1121 #if HAVE_CUDA == 1
1122  if (CuDevice::Instantiate().Enabled()) {
1123  Timer tim;
1124  CU_SAFE_CALL(cublas_gemm(GetCublasHandle(),
1125  (transB==kTrans? CUBLAS_OP_T:CUBLAS_OP_N),
1126  (transA==kTrans? CUBLAS_OP_T:CUBLAS_OP_N),
1127  m, n, k, alpha, B.data_, B.Stride(),
1128  A.data_, A.Stride(), beta, data_, Stride()));
1129 
1130  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1131  } else
1132 #endif
1133  {
1134  Mat().AddMatMat(alpha, A.Mat(), transA, B.Mat(), transB, beta);
1135  }
1136 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
struct rnnlm::@11::@12 n
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT Stride() const
Definition: cu-matrix.h:197
void AddMatMatElements ( const Real  alpha,
const CuMatrixBase< Real > &  A,
const CuMatrixBase< Real > &  B,
const Real  beta 
)

*this = beta * *this + alpha * A .* B (.* element by element multiplication)

Definition at line 1257 of file cu-matrix.cc.

References CuMatrixBase< Real >::Data(), data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by StatisticsExtractionComponent::Backprop(), StatisticsPoolingComponent::Propagate(), and kaldi::UnitTestCuMatrixSetMatMatDivMat().

1258  {
1259 #if HAVE_CUDA == 1
1260  if (CuDevice::Instantiate().Enabled()) {
1261  KALDI_ASSERT(SameDim(*this, A) && SameDim(A, B));
1262  Timer tim;
1263  dim3 dimGrid, dimBlock;
1264  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1265  &dimGrid, &dimBlock);
1266  cuda_add_mat_mat_elements(dimGrid, dimBlock, this->data_, A.Data(),
1267  B.Data(), Dim(), A.Stride(), B.Stride(), alpha, beta);
1268  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1269  } else
1270 #endif
1271  {
1272  Mat().AddMatMatElements(alpha, A.Mat(), B.Mat(), beta);
1273  }
1274 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddMatSp ( const Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  transA,
const CuSpMatrix< Real > &  B,
const Real  beta 
)
inline

this <– beta*this + alpha*A*B

Definition at line 473 of file cu-matrix.h.

476  {
477  CuMatrix<Real> M(B);
478  return AddMatMat(alpha, A, transA, M, kNoTrans, beta);
479  }
void AddMatMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
C = alpha * A(^T)*B(^T) + beta * C.
Definition: cu-matrix.cc:1102
void AddMatTp ( const Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  transA,
const CuTpMatrix< Real > &  B,
MatrixTransposeType  transB,
const Real  beta 
)
inline

this <– beta*this + alpha*A*B.

Definition at line 500 of file cu-matrix.h.

Referenced by kaldi::UnitTestCuMatrixAddMatTp().

503  {
504  CuMatrix<Real> M(B);
505  return AddMatMat(alpha, A, transA, M, transB, beta);
506  }
void AddMatMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
C = alpha * A(^T)*B(^T) + beta * C.
Definition: cu-matrix.cc:1102
void AddRowRanges ( const CuMatrixBase< Real > &  src,
const CuArray< Int32Pair > &  indexes 
)

For each row r of this and for each column c, do (*this)(r, c) += src(j, c), where j ranges from indexes[r].first through indexes[r].second - 1.

In general indexes must be >= 0 and < src.NumRows(); but to represent an empty range you may use the pair (-1, -1) or any pair of numbers (i, j) such that i >= j.

Definition at line 2602 of file cu-matrix.cc.

References CuArray< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuMatrixBase< Real >::data_, CuArray< T >::Dim(), CuMatrixBase< Real >::Dim(), Timer::Elapsed(), Int32Pair::first, KALDI_ASSERT, CuMatrixBase< Real >::NumCols(), Int32Pair::second, and CuMatrixBase< Real >::stride_.

Referenced by StatisticsPoolingComponent::Backprop(), NnetComputer::ExecuteCommand(), StatisticsPoolingComponent::Propagate(), and kaldi::UnitTestCuMatrixAddRowRanges().

2603  {
2604  KALDI_ASSERT(static_cast<MatrixIndexT>(indexes.Dim()) == NumRows());
2605  KALDI_ASSERT(src.NumCols() == NumCols());
2606  if (NumRows() == 0) return;
2607 #if HAVE_CUDA == 1
2608  if (CuDevice::Instantiate().Enabled()) {
2609  Timer tim;
2610  dim3 dimGrid, dimBlock;
2611  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2612  &dimGrid, &dimBlock);
2613  cuda_add_row_ranges(dimGrid, dimBlock,
2614  data_, Dim(), src.Data(), src.Dim(), indexes.Data());
2615  CU_SAFE_CALL(cudaGetLastError());
2616  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2617  } else
2618 #endif
2619  { // Implement here for the CPU..
2620  int32 num_rows = this->num_rows_, num_cols = this->num_cols_,
2621  this_stride = this->stride_, src_stride = src.stride_;
2622  Real *data = this->data_;
2623  const Real *src_data = src.data_;
2624  const Int32Pair *indexes_data = indexes.Data();
2625  for (int32 row = 0; row < num_rows; row++) {
2626  int32 start_row = indexes_data[row].first,
2627  end_row = indexes_data[row].second;
2628  for (int32 col = 0; col < num_cols; col++) {
2629  Real sum = 0.0;
2630  for (int32 src_row = start_row; src_row < end_row; src_row++)
2631  sum += src_data[src_row * src_stride + col];
2632  data[row * this_stride + col] += sum;
2633  }
2634  }
2635  }
2636 }
const T * Data() const
Get raw pointer.
Definition: cu-array.h:65
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
MatrixIndexT stride_
Definition: cu-matrix.h:646
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:62
int32_cuda second
Definition: cu-matrixdim.h:86
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
int32_cuda first
Definition: cu-matrixdim.h:85
void AddRows ( Real  alpha,
const CuMatrixBase< Real > &  src,
const CuArray< MatrixIndexT > &  indexes 
)

Does for each row r, this.Row(r) += alpha * src.row(indexes[r]).

If indexes[r] < 0, does not add anything. src.NumCols() must equal this.NumCols()

Definition at line 2496 of file cu-matrix.cc.

References CuArray< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuArray< T >::Dim(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by StatisticsExtractionComponent::Backprop(), and NnetComputer::ExecuteCommand().

2498  {
2499  if (NumRows() == 0) return;
2500 #if HAVE_CUDA == 1
2501  if (CuDevice::Instantiate().Enabled()) {
2502  KALDI_ASSERT(static_cast<MatrixIndexT>(indexes.Dim()) == NumRows());
2503  KALDI_ASSERT(src.NumCols() == NumCols());
2504  Timer tim;
2505  dim3 dimGrid, dimBlock;
2506  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2507  &dimGrid, &dimBlock);
2508  cuda_add_rows(dimGrid, dimBlock, alpha,
2509  data_, src.Data(), indexes.Data(), Dim(), src.Stride());
2510  CU_SAFE_CALL(cudaGetLastError());
2511  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2512  } else
2513 #endif
2514  {
2515  Mat().AddRows(alpha, src.Mat(), indexes.Data());
2516  }
2517 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddRows ( Real  alpha,
const CuArray< const Real * > &  src 
)

Does for each row r, this.Row(r) += alpha * src[r], treating src[r] as the beginning of a region of memory representing a vector of floats, of the same length as this.NumCols().

Definition at line 2521 of file cu-matrix.cc.

References CuArray< T >::Data(), data_, CuArray< T >::Dim(), and KALDI_ASSERT.

2521  {
2522  if (NumRows() == 0) return;
2523 #if HAVE_CUDA == 1
2524  if (CuDevice::Instantiate().Enabled()) {
2525  KALDI_ASSERT(static_cast<MatrixIndexT>(src.Dim()) == NumRows());
2526  Timer tim;
2527  dim3 dimGrid, dimBlock;
2528  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2529  &dimGrid, &dimBlock);
2530  cuda_add_rows(dimGrid, dimBlock, alpha, data_, src.Data(), Dim());
2531  CU_SAFE_CALL(cudaGetLastError());
2532  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2533  } else
2534 #endif
2535  {
2536  Mat().AddRows(alpha, src.Data());
2537  }
2538 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddSpMat ( const Real  alpha,
const CuSpMatrix< Real > &  A,
const CuMatrixBase< Real > &  B,
MatrixTransposeType  transB,
const Real  beta 
)
inline

this <– beta*this + alpha*SpA*B

Definition at line 482 of file cu-matrix.h.

485  {
486  CuMatrix<Real> M(A);
487  return AddMatMat(alpha, M, kNoTrans, B, transB, beta);
488  }
void AddMatMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
C = alpha * A(^T)*B(^T) + beta * C.
Definition: cu-matrix.cc:1102
void AddToDiag ( Real  value)

Adds "value" to the diagonal elements of the matrix.

The matrix *this does not have to be square.

Definition at line 568 of file cu-matrix.cc.

References CU1DBLOCK, rnnlm::d, data_, and Timer::Elapsed().

Referenced by kaldi::nnet2::PreconditionDirections(), kaldi::TestCuMatrixCholesky(), and kaldi::UnitTestCuMatrixAddToDiag().

568  {
569 #if HAVE_CUDA == 1
570  if (CuDevice::Instantiate().Enabled()) {
571  if (num_rows_ == 0) return;
572  Timer tim;
573  // We'll create a fake matrix with "num_diag" rows, one
574  // columnn, and a stride of "this_stride". The y-value of
575  // the grid/blocks corresponds to the row, in this kernel.
576  MatrixIndexT num_diag = std::min(num_rows_, num_cols_),
577  this_stride = stride_ + 1;
578  dim3 dimBlock(1, CU1DBLOCK);
579  dim3 dimGrid(1, n_blocks(num_diag, CU1DBLOCK));
580  ::MatrixDim d = { num_diag, 1, this_stride };
581  cuda_add(dimGrid, dimBlock, data_, value, d);
582  CU_SAFE_CALL(cudaGetLastError());
583 
584  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
585  } else
586  #endif
587  {
588  Mat().AddToDiag(value);
589  }
590 }
Structure containing size of the matrix plus stride.
Definition: cu-matrixdim.h:52
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT stride_
Definition: cu-matrix.h:646
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void AddToRows ( Real  alpha,
const CuArray< Real * > &  dst 
) const

For each row r of this matrix, adds it (times alpha) to the array of floats at the location given by dst[r], where dst[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (i.e.

it should point to memory on the GPU if we're using a GPU, or on the CPU otherwise). If dst[r] is NULL, does not do anything for that row. Requires that none of the memory regions pointed to by the pointers in "dst" overlap (e.g. none of the pointers should be the same).

Definition at line 2542 of file cu-matrix.cc.

References CuArray< T >::Data(), data_, CuArray< T >::Dim(), and KALDI_ASSERT.

Referenced by NnetComputer::ExecuteCommand(), and kaldi::UnitTestCuMatrixAddToRows().

2543  {
2544  if (NumRows() == 0) return;
2545 #if HAVE_CUDA == 1
2546  if (CuDevice::Instantiate().Enabled()) {
2547  KALDI_ASSERT(static_cast<MatrixIndexT>(dst.Dim()) == NumRows());
2548  Timer tim;
2549  dim3 dimGrid, dimBlock;
2550  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2551  &dimGrid, &dimBlock);
2552  cuda_add_to_rows(dimGrid, dimBlock, alpha, dst.Data(), data_, Dim());
2553  CU_SAFE_CALL(cudaGetLastError());
2554  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2555  } else
2556 #endif
2557  {
2558  Mat().AddToRows(alpha, dst.Data());
2559  }
2560 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddTpMat ( const Real  alpha,
const CuTpMatrix< Real > &  A,
MatrixTransposeType  transA,
const CuMatrixBase< Real > &  B,
MatrixTransposeType  transB,
const Real  beta 
)
inline

this <– beta*this + alpha*A*B.

Definition at line 491 of file cu-matrix.h.

Referenced by kaldi::UnitTestCuMatrixAddTpMat().

494  {
495  CuMatrix<Real> M(A);
496  return AddMatMat(alpha, M, transA, B, transB, beta);
497  }
void AddMatMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
C = alpha * A(^T)*B(^T) + beta * C.
Definition: cu-matrix.cc:1102
void AddVecToCols ( Real  alpha,
const CuVectorBase< Real > &  col,
Real  beta = 1.0 
)

(for each column c of *this), c = alpha * col + beta * c

Definition at line 1043 of file cu-matrix.cc.

References data_, CuVectorBase< Real >::data_, CuVectorBase< Real >::Dim(), Timer::Elapsed(), KALDI_ERR, and CuVectorBase< Real >::Vec().

Referenced by KlHmm::PropagateFnc(), and kaldi::UnitTestCuMatrixAddVecToCols().

1045  {
1046  if (col.Dim() != NumRows()) {
1047  KALDI_ERR << "Non matching dimensions: Rows:" << NumRows() << " VectorDim:" << col.Dim();
1048  }
1049 
1050  #if HAVE_CUDA == 1
1051  if (CuDevice::Instantiate().Enabled()) {
1052  Timer tim;
1053  dim3 dimGrid, dimBlock;
1054  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1055  &dimGrid, &dimBlock);
1056  cuda_add_vec_to_cols(dimGrid, dimBlock, alpha, col.data_, beta,
1057  data_, Dim());
1058  CU_SAFE_CALL(cudaGetLastError());
1059 
1060  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1061  } else
1062  #endif
1063  {
1064  if (beta != 1.0) Mat().Scale(beta);
1065  Mat().AddVecToCols(alpha, col.Vec());
1066  }
1067 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define KALDI_ERR
Definition: kaldi-error.h:127
::MatrixDim Dim() const
Definition: cu-matrix.h:201
void AddVecToRows ( Real  alpha,
const CuVectorBase< Real > &  row,
Real  beta = 1.0 
)

(for each row r of *this), r = alpha * row + beta * r

Definition at line 1072 of file cu-matrix.cc.

References data_, CuVectorBase< Real >::data_, CuVectorBase< Real >::Dim(), Timer::Elapsed(), KALDI_ERR, and CuVectorBase< Real >::Vec().

Referenced by DecodableNnetLoopedOnlineBase::AdvanceChunk(), DecodableNnetSimpleLooped::AdvanceChunk(), SimpleSentenceAveragingComponent::BackpropagateFnc(), DecodableAmNnetParallel::Compute(), DecodableNnet2Online::ComputeForFrame(), DecodableNnetSimple::DoNnetComputation(), PerElementOffsetComponent::Propagate(), FixedAffineComponent::Propagate(), ConvolutionComponent::Propagate(), FixedBiasComponent::Propagate(), Convolutional1dComponent::Propagate(), SimpleSentenceAveragingComponent::PropagateFnc(), AffineTransform::PropagateFnc(), RecurrentComponent::PropagateFnc(), Rbm::PropagateFnc(), ConvolutionalComponent::PropagateFnc(), Convolutional2DComponent::PropagateFnc(), AddShift::PropagateFnc(), Rbm::Reconstruct(), SigmoidComponent::RepairGradients(), RectifiedLinearComponent::RepairGradients(), PdfPrior::SubtractOnLogpost(), kaldi::UnitTestCuMatrixAddVecToRows(), and SentenceAveragingComponent::Update().

1074  {
1075  if (row.Dim() != NumCols()) {
1076  KALDI_ERR << "Non matching dimensions: Cols:" << NumCols() << " VectorDim:" << row.Dim();
1077  }
1078 #if HAVE_CUDA == 1
1079  if (CuDevice::Instantiate().Enabled()) {
1080  Timer tim;
1081  dim3 dimGrid, dimBlock;
1082  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1083  &dimGrid, &dimBlock);
1084  cuda_add_vec_to_rows(dimGrid, dimBlock, alpha, row.data_, beta, data_, Dim());
1085  CU_SAFE_CALL(cudaGetLastError());
1086 
1087  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1088  } else
1089 #endif
1090  {
1091  if (beta != 1.0) Mat().Scale(beta);
1092  Mat().AddVecToRows(alpha, row.Vec());
1093  }
1094 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define KALDI_ERR
Definition: kaldi-error.h:127
::MatrixDim Dim() const
Definition: cu-matrix.h:201
void AddVecVec ( Real  alpha,
const CuVectorBase< Real > &  x,
const CuVectorBase< Real > &  y 
)

A = alpha * x * y^T + A .

Definition at line 1140 of file cu-matrix.cc.

References CuVectorBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), Timer::Elapsed(), KALDI_ASSERT, rnnlm::n, and CuVectorBase< Real >::Vec().

Referenced by kaldi::UnitTestCuMatrixAddVecVec().

1141  {
1142 
1143  MatrixIndexT m = y.Dim();
1144  MatrixIndexT n = x.Dim();
1145  KALDI_ASSERT(m == NumCols());
1146  KALDI_ASSERT(n == NumRows());
1147 
1148 #if HAVE_CUDA == 1
1149  if (CuDevice::Instantiate().Enabled()) {
1150  Timer tim;
1151  CU_SAFE_CALL(cublas_ger(GetCublasHandle(), m, n, alpha,
1152  y.Data(), 1, x.Data(), 1, data_, Stride()));
1153 
1154  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1155  } else
1156 #endif
1157  {
1158  Mat().AddVecVec(alpha, x.Vec(), y.Vec());
1159  }
1160 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
struct rnnlm::@11::@12 n
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT Stride() const
Definition: cu-matrix.h:197
void ApplyCeiling ( Real  ceiling_val)

Definition at line 2335 of file cu-matrix.cc.

References data_, and Timer::Elapsed().

Referenced by ClipGradientComponent::Backprop(), RecurrentComponent::BackpropagateFnc(), kaldi::UnitTestCuMatrixApplyCeiling(), ParametricRelu::Update(), LstmProjected::Update(), and BlstmProjected::Update().

2335  {
2336 #if HAVE_CUDA == 1
2337  if (CuDevice::Instantiate().Enabled()) {
2338  Timer tim;
2339  dim3 dimGrid, dimBlock;
2340  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2341  &dimGrid, &dimBlock);
2342  cuda_apply_ceiling(dimGrid, dimBlock, data_, ceiling_val, Dim());
2343  CU_SAFE_CALL(cudaGetLastError());
2344  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2345  } else
2346 #endif
2347  {
2348  Mat().ApplyCeiling(ceiling_val);
2349  }
2350 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
void ApplyExp ( )

Definition at line 2298 of file cu-matrix.cc.

References data_, and Timer::Elapsed().

Referenced by DiscriminativeComputation::Compute(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), and kaldi::UnitTestCuMatrixApplyExp().

2298  {
2299 #if HAVE_CUDA == 1
2300  if (CuDevice::Instantiate().Enabled()) {
2301  Timer tim;
2302  dim3 dimGrid, dimBlock;
2303  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2304  &dimGrid, &dimBlock);
2305  cuda_apply_exp(dimGrid, dimBlock, data_, Dim());
2306  CU_SAFE_CALL(cudaGetLastError());
2307  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2308  } else
2309 #endif
2310  {
2311  Mat().ApplyExp();
2312  }
2313 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
void ApplyFloor ( Real  floor_val)

Definition at line 2317 of file cu-matrix.cc.

References data_, and Timer::Elapsed().

Referenced by ClipGradientComponent::Backprop(), RecurrentComponent::BackpropagateFnc(), DecodableAmNnetParallel::Compute(), DecodableNnet2Online::ComputeForFrame(), main(), StatisticsPoolingComponent::Propagate(), RectifiedLinearComponent::Propagate(), SoftmaxComponent::Propagate(), LogSoftmaxComponent::Propagate(), ClipGradientComponent::RepairGradients(), kaldi::TestCuMatrixCompObjfAndDeriv(), kaldi::UnitTestCuMatrixApplyFloor(), kaldi::UnitTestCuMatrixObjfDeriv(), ParametricRelu::Update(), LstmProjected::Update(), BlstmProjected::Update(), and NnetLogprobTask::~NnetLogprobTask().

2317  {
2318 #if HAVE_CUDA == 1
2319  if (CuDevice::Instantiate().Enabled()) {
2320  Timer tim;
2321  dim3 dimGrid, dimBlock;
2322  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2323  &dimGrid, &dimBlock);
2324  cuda_apply_floor(dimGrid, dimBlock, data_, floor_val, Dim());
2325  CU_SAFE_CALL(cudaGetLastError());
2326  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2327  } else
2328 #endif
2329  {
2330  Mat().ApplyFloor(floor_val);
2331  }
2332 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
void ApplyHeaviside ( )

For each element, sets x = (x > 0 ? 1.0 : 0.0).

See also Heaviside().

Definition at line 2259 of file cu-matrix.cc.

References data_, and Timer::Elapsed().

Referenced by BackpropTruncationComponent::Backprop(), RectifiedLinearComponent::Backprop(), DropoutComponent::Propagate(), SigmoidComponent::RepairGradients(), TanhComponent::RepairGradients(), ClipGradientComponent::RepairGradients(), kaldi::TestCuMatrixHeaviside(), and kaldi::UnitTestCuMatrixApplyHeaviside().

2259  {
2260 #if HAVE_CUDA == 1
2261  if (CuDevice::Instantiate().Enabled()) {
2262  Timer tim;
2263  dim3 dimGrid, dimBlock;
2264  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2265  &dimGrid, &dimBlock);
2266  cuda_apply_heaviside(dimGrid, dimBlock, data_, Dim());
2267  CU_SAFE_CALL(cudaGetLastError());
2268  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2269  } else
2270 #endif
2271  {
2272  Mat().ApplyHeaviside();
2273  }
2274 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
void ApplyLog ( )

Definition at line 630 of file cu-matrix.cc.

References data_, and Timer::Elapsed().

Referenced by DecodableAmNnetParallel::Compute(), DecodableNnet2Online::ComputeForFrame(), Xent::Eval(), main(), kaldi::TestCuMatrixCompObjfAndDeriv(), kaldi::UnitTestCuMatrixApplyLog(), kaldi::UnitTestCuMatrixObjfDeriv(), and NnetLogprobTask::~NnetLogprobTask().

630  {
631  #if HAVE_CUDA == 1
632  if (CuDevice::Instantiate().Enabled()) {
633  if (num_rows_ == 0) return;
634  Timer tim;
635 
636  dim3 dimGrid, dimBlock;
637  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
638  &dimGrid, &dimBlock);
639 
640  cuda_apply_log(dimGrid, dimBlock, data_, Dim());
641  CU_SAFE_CALL(cudaGetLastError());
642 
643  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
644  } else
645  #endif
646  {
647  Mat().ApplyLog();
648  }
649 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void ApplyLogSoftMaxPerRow ( const CuMatrixBase< Real > &  src)

LogSoftmax nonlinearity Y = LogSoftmax(X) : Yij = Xij - log(sum_k(e^Xik)), done to each row for each row, the max value is first subtracted for good numerical stability.

Definition at line 1547 of file cu-matrix.cc.

References MatrixBase< Real >::CopyFromMat(), CU1DBLOCK, data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::num_rows_, MatrixBase< Real >::NumRows(), MatrixBase< Real >::Row(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by LogSoftmaxComponent::Propagate(), kaldi::TestCuMatrixLogSoftmax(), and kaldi::UnitTestCuLogSoftmax().

1547  {
1548  KALDI_ASSERT(SameDim(*this, src));
1549 #if HAVE_CUDA == 1
1550  if (CuDevice::Instantiate().Enabled()) {
1551  Timer tim;
1552  size_t dimBlock = CU1DBLOCK;
1553  size_t dimGrid = src.num_rows_;
1554  cuda_log_softmax_reduce(dimGrid, dimBlock,
1555  data_, src.data_, Dim(), src.Stride());
1556  CU_SAFE_CALL(cudaGetLastError());
1557 
1558  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1559  } else
1560 #endif
1561  {
1562  MatrixBase<Real> &mat(this->Mat());
1563  mat.CopyFromMat(src.Mat());
1564  for(MatrixIndexT r = 0; r < mat.NumRows(); r++) {
1565  mat.Row(r).ApplyLogSoftMax();
1566  }
1567  }
1568 }
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
int32 MatrixIndexT
Definition: matrix-common.h:96
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void ApplyPow ( Real  power)

Apply power to the absolute value of each element.

If inlude_sign is true, the result will be multiplied with the sign of the input value. If the power is negative and the input to the power is zero, The output will be set zero. If include_sign is true, it will multiply the result by the sign of the input.

Definition at line 2223 of file cu-matrix.cc.

References data_, and Timer::Elapsed().

Referenced by TanhComponent::Backprop(), kaldi::MeanVariance(), StatisticsExtractionComponent::Propagate(), StatisticsPoolingComponent::Propagate(), TanhComponent::StoreStats(), kaldi::UnitTestCuMatrixApplyPow(), kaldi::UnitTestCuMatrixSetRandn(), and kaldi::UnitTestCuMatrixSetRandUniform().

2223  {
2224 #if HAVE_CUDA == 1
2225  if (CuDevice::Instantiate().Enabled()) {
2226  Timer tim;
2227  dim3 dimGrid, dimBlock;
2228  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2229  &dimGrid, &dimBlock);
2230  cuda_apply_pow(dimGrid, dimBlock, data_, power, Dim());
2231  CU_SAFE_CALL(cudaGetLastError());
2232  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2233  } else
2234 #endif
2235  {
2236  Mat().ApplyPow(power);
2237  }
2238 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
void ApplyPowAbs ( Real  power,
bool  include_sign = false 
)

Definition at line 2241 of file cu-matrix.cc.

References data_, and Timer::Elapsed().

Referenced by PowerComponent::Backprop(), PowerComponent::Propagate(), ClipGradientComponent::RepairGradients(), and kaldi::UnitTestCuMatrixApplyPowAbs().

2241  {
2242 #if HAVE_CUDA == 1
2243  if (CuDevice::Instantiate().Enabled()) {
2244  Timer tim;
2245  dim3 dimGrid, dimBlock;
2246  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2247  &dimGrid, &dimBlock);
2248  cuda_apply_pow_abs(dimGrid, dimBlock, data_, power, include_sign, Dim());
2249  CU_SAFE_CALL(cudaGetLastError());
2250  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2251  } else
2252 #endif
2253  {
2254  Mat().ApplyPowAbs(power, include_sign);
2255  }
2256 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
void ApplySoftMaxPerRow ( const CuMatrixBase< Real > &  src)

Softmax nonlinearity Y = Softmax(X) : Yij = e^Xij / sum_k(e^Xik), done to each row for each row, the max value is first subtracted for good numerical stability.

Definition at line 1524 of file cu-matrix.cc.

References MatrixBase< Real >::CopyFromMat(), CU1DBLOCK, data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::num_rows_, MatrixBase< Real >::NumRows(), MatrixBase< Real >::Row(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by SoftmaxComponent::Propagate(), Softmax::PropagateFnc(), HiddenSoftmax::PropagateFnc(), BlockSoftmax::PropagateFnc(), kaldi::TestCuMatrixSoftmax(), and kaldi::UnitTestCuSoftmax().

1524  {
1525  KALDI_ASSERT(SameDim(*this, src));
1526 #if HAVE_CUDA == 1
1527  if (CuDevice::Instantiate().Enabled()) {
1528  Timer tim;
1529  size_t dimBlock = CU1DBLOCK;
1530  size_t dimGrid = src.num_rows_;
1531  cuda_softmax_reduce(dimGrid, dimBlock, data_, src.data_, Dim(), src.Stride());
1532  CU_SAFE_CALL(cudaGetLastError());
1533 
1534  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1535  } else
1536  #endif
1537  {
1538  MatrixBase<Real> &mat(this->Mat());
1539  mat.CopyFromMat(src.Mat());
1540  for(MatrixIndexT r = 0; r < mat.NumRows(); r++) {
1541  mat.Row(r).ApplySoftMax();
1542  }
1543  }
1544 }
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
int32 MatrixIndexT
Definition: matrix-common.h:96
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
bool ApproxEqual ( const CuMatrixBase< Real > &  other,
float  tol = 0.01 
) const

True if ((*this)-other).FrobeniusNorm() <= tol * this->FrobeniusNorm()

Definition at line 1934 of file cu-matrix.cc.

References CuMatrixBase< Real >::AddMat(), and CuMatrixBase< Real >::FrobeniusNorm().

Referenced by kaldi::UnitTestCuCholesky(), and kaldi::UnitTestCuCopy().

1935  {
1936  CuMatrix<Real> diff(*this);
1937  diff.AddMat(-1.0, other);
1938  return (diff.FrobeniusNorm() <= tol * (*this).FrobeniusNorm());
1939 }
void Cholesky ( CuMatrixBase< Real > *  inv_cholesky = NULL)

This function does sets *this to the Cholesky factor of *this (i.e.

the C satisfying *this = C C^T), and sets "inv_cholesky" (if supplied) to its inverse. *this is treated as a symmetric matrix but only the lower triangle is accessed.

Definition at line 1785 of file cu-matrix.cc.

References TpMatrix< Real >::Cholesky(), CuMatrixBase< Real >::Cholesky(), CuSpMatrix< Real >::CopyFromMat(), CuTpMatrix< Real >::CopyFromTp(), CuMatrixBase< Real >::CopyFromTp(), TpMatrix< Real >::Invert(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTakeLower, kaldi::kTrans, kaldi::kUndefined, and CuMatrixBase< Real >::SymAddMat2().

Referenced by CuTpMatrix< Real >::Cholesky(), CuMatrixBase< Real >::Cholesky(), kaldi::TestCuMatrixCholesky(), kaldi::UnitTestCholesky(), and kaldi::UnitTestCuCholesky().

1785  {
1786  KALDI_ASSERT(this->NumRows() == this->NumCols());
1787  const int32 block_size = 64; // We can tune this.
1788 #if HAVE_CUDA == 1
1789  bool have_gpu = CuDevice::Instantiate().Enabled();
1790 #else
1791  bool have_gpu = false;
1792 #endif
1793  if (this->NumRows() == 0) {
1794  return;
1795  }
1796  if (inv_cholesky == NULL && this->NumRows() >= block_size * 2 && have_gpu) {
1797  // Even if the user did not request the inverse Cholesky, for large enough
1798  // matrices (on GPUs) it's going to be more efficient to compute it anyway
1799  // as the recursion depends on it.
1800  CuMatrix<Real> inv(this->NumRows(), this->NumCols());
1801  Cholesky(&inv);
1802  return;
1803  }
1804  if (this->NumRows() <= block_size || inv_cholesky == NULL || !have_gpu) {
1805  // Don't recurse: compute the Cholesky (and inverse Cholesky, if requested)
1806  // directly, on the CPu.
1807  int32 dim = this->NumRows();
1808  CuSpMatrix<Real> this_sp(dim, kUndefined);
1809  this_sp.CopyFromMat(*this, kTakeLower);
1810  SpMatrix<Real> this_sp_cpu(this_sp);
1811  TpMatrix<Real> C_cpu(dim);
1812  C_cpu.Cholesky(this_sp_cpu);
1813  CuTpMatrix<Real> C(C_cpu);
1814  this->CopyFromTp(C);
1815  if (inv_cholesky != NULL) {
1816  C_cpu.Invert(); // Get inverse Cholesky on CPU.
1817  C.CopyFromTp(C_cpu);
1818  inv_cholesky->CopyFromTp(C); // Copy inverse Cholesky from CPU.
1819  }
1820  return;
1821  }
1822  // At this point, if none of the other cases apply, we recurse.
1823 
1824  // The selection of dim1 is a heuristic. We could also just take half.
1825  int32 tot_dim = this->NumRows();
1826  int32 dim1;
1827  // Break it up into a whole number of blocks, for better memory alignment.
1828  // The line below, setting dim1 can be decided on a heuristic basis: from
1829  // the point of view of correctness, it can really be any value
1830  // 0 < dim1 < tot_dim.
1831  dim1 = block_size * std::max<int32>(1, tot_dim / (2 * block_size));
1832 
1833  int32 dim2 = tot_dim - dim1;
1834  CuSubMatrix<Real> this_11(*this, 0, dim1, 0, dim1),
1835  this_12(*this, 0, dim1, dim1, dim2),
1836  this_21(*this, dim1, dim2, 0, dim1),
1837  this_22(*this, dim1, dim2, dim1, dim2);
1838  CuSubMatrix<Real> inv_11(*inv_cholesky, 0, dim1, 0, dim1),
1839  inv_12(*inv_cholesky, 0, dim1, dim1, dim2),
1840  inv_21(*inv_cholesky, dim1, dim2, 0, dim1),
1841  inv_22(*inv_cholesky, dim1, dim2, dim1, dim2);
1842  /*
1843  Here is the math on block-wise Cholesky. We'll use a Matlab-like notation for blocks of a matrix,
1844  e.g. [ A B; C D ], and also for transposes, e.g. A' is the transpose of A.
1845  Let A be the input matrix; we want to compute both its Cholesky L and its inverse Cholesky, which
1846  we'll call M.
1847  OK. let L = [ L11 0; L21 L22 ] be the Cholesky factor of A.
1848  We have A = L L' = [ L11 0; L21 L22 ] * [ L11' L21'; 0 L22' ]. Multiplying it out,
1849  if A = [ A11 A12; A21 A22 ]; then
1850  A11 = L11 L11', A21 = L21 L11', A22 = L21 L21' + L22 L22', and A12 = A21'.
1851 
1852  We also want an expression for the inverse of L (we call this M).
1853  If M = [ M11 0; M21 M22 ], then it's not hard to see that
1854  M11 = inv(L11), M22 = inv(L22).
1855  We can work out M21 as follows. We know that [ L11 0; L21 L22 ] [ M11 0; M21 M22 ] = [ I 0; 0 I ].
1856  Considering the zero on the bottom of the rhs, we have: L21 M11 + L22 M21 = 0, which gives us:
1857  M21 = - L22^{-1} L21 M11 = - M22 L21 M11.
1858 
1859  Next, we want expressions for L21 and L22. From the equation A21 = L21 L11', we have:
1860  L21 = A21 inv(L11') = A21 M11'
1861  We can compute L22 and M22 recursively by doing Cholesky (and computing the inverse Cholesky)
1862  on the quantity T = (A22 - L21 L21'). [we give it the name T just for easy reference.]
1863 
1864  Computationally, we do this as follows:
1865  (1) Recurse to get L11 and M11.
1866  (2) Compute L21 = A21 M11'
1867  (3) Compute T = A22 - L21 L21'
1868  (4) Recurse on T to get L22 and M22.
1869  (5) Compute M21 = -M22 L21 M11.
1870  Next, we have to consider the in-place nature of the computation, since L overwrites A
1871  [M has its own storage, in "inv_cholesky"].
1872  We address this here:
1873  (1) is in-place [L11 replaces A11, M11 has its own storage].
1874  (2) L21 gets written where M21 belongs.
1875  (3) T replaces A22.
1876  (4) is in-place [L22 replaces T where A22 was, M22 has its own storage]
1877  (5):(a) we first compute the transpose of (L21 M11) is done in the upper part of A/L,
1878  where A12 or L12 would be. Define a temporary expression
1879  U = (L21 M11)' = M11' L21'; this goes where A12 or L12 would be.
1880  (b) copy L21 to where it should be, in *this.
1881  (c) Compute M21 = -M22 U', in the correct place for M21.
1882  (d) zero L12 and M12. */
1883 
1884  // (1) compute L11 and M11.
1885  this_11.Cholesky(&inv_11);
1886  // (2) compute L21 = A21 M11'. For now it's in the "wrong place", where M21 should be.
1887  inv_21.AddMatMat(1.0, this_21, kNoTrans, inv_11, kTrans, 0.0);
1888  // (3) compute T = A22 - L21 L21'. Note: only the lower triangle of T will be valid, but
1889  // that's OK because Cholesky will ignore the upper part.
1890  this_22.SymAddMat2(-1.0, inv_21, kNoTrans, 1.0);
1891  // (4) Recurse to compute L22 and M22.
1892  this_22.Cholesky(&inv_22);
1893  // (5)(a) compute U = M11' L21'. We use the storage of this_12 for this. Note that L21 is
1894  // currently where M21 should be.
1895  this_12.AddMatMat(1.0, inv_11, kTrans, inv_21, kTrans, 0.0);
1896  // (5)(b) copy L21 to where it should be.
1897  this_21.CopyFromMat(inv_21);
1898  // (5)(c) compute M21 = -M22 U'.
1899  inv_21.AddMatMat(-1.0, inv_22, kNoTrans, this_12, kTrans, 0.0);
1900  // (5)(d) zero L12 and M12.
1901  this_12.SetZero();
1902  inv_12.SetZero();
1903 }
friend class CuSubMatrix< Real >
Definition: cu-matrix.h:88
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
void Cholesky(CuMatrixBase< Real > *inv_cholesky=NULL)
This function does sets *this to the Cholesky factor of *this (i.e.
Definition: cu-matrix.cc:1785
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
void CopyFromTp(const CuTpMatrix< OtherReal > &M, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:275
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
friend class CuSpMatrix< Real >
Definition: cu-matrix.h:84
CuSubMatrix<Real> ColRange ( const MatrixIndexT  col_offset,
const MatrixIndexT  num_cols 
) const
inline

Definition at line 524 of file cu-matrix.h.

Referenced by StatisticsExtractionComponent::Backprop(), StatisticsPoolingComponent::Backprop(), MaxpoolingComponent::Backprop(), BlockAffineComponent::Backprop(), ConvolutionComponent::Backprop(), Convolutional1dComponent::Backprop(), MaxPoolingComponent::BackpropagateFnc(), AveragePoolingComponent::BackpropagateFnc(), AveragePooling2DComponent::BackpropagateFnc(), MaxPooling2DComponent::BackpropagateFnc(), BlockSoftmax::BackpropagateFnc(), ParallelComponent::BackpropagateFnc(), SentenceAveragingComponent::BackpropagateFnc(), Convolutional2DComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), ConvolutionalComponent::BackpropagateFnc(), LstmProjected::BackpropagateFnc(), BlstmProjected::BackpropagateFnc(), MultiTaskLoss::Eval(), LstmProjected::InfoGradient(), BlstmProjected::InfoGradient(), StatisticsExtractionComponent::Propagate(), StatisticsPoolingComponent::Propagate(), MaxpoolingComponent::Propagate(), BlockAffineComponent::Propagate(), ConvolutionComponent::Propagate(), Convolutional1dComponent::Propagate(), AveragePoolingComponent::PropagateFnc(), MaxPoolingComponent::PropagateFnc(), AveragePooling2DComponent::PropagateFnc(), MaxPooling2DComponent::PropagateFnc(), BlockSoftmax::PropagateFnc(), FramePoolingComponent::PropagateFnc(), ParallelComponent::PropagateFnc(), SentenceAveragingComponent::PropagateFnc(), ConvolutionalComponent::PropagateFnc(), Convolutional2DComponent::PropagateFnc(), MultiBasisComponent::PropagateFnc(), LstmProjected::PropagateFnc(), BlstmProjected::PropagateFnc(), kaldi::UnitTestLstmNonlinearity(), FramePoolingComponent::Update(), SentenceAveragingComponent::Update(), ConvolutionalComponent::Update(), Convolutional2DComponent::Update(), NaturalGradientRepeatedAffineComponent::Update(), ConvolutionComponent::Update(), and Convolutional1dComponent::Update().

525  {
526  return CuSubMatrix<Real>(*this, 0, num_rows_, col_offset, num_cols);
527  }
friend class CuSubMatrix< Real >
Definition: cu-matrix.h:88
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void CopyColFromVec ( const CuVectorBase< Real > &  v,
const MatrixIndexT  col 
)

Copy vector into specific column of matrix.

Definition at line 2202 of file cu-matrix.cc.

References CuVectorBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), Timer::Elapsed(), KALDI_ASSERT, and CuVectorBase< Real >::Vec().

Referenced by kaldi::cu::NormalizePerRow(), StatisticsExtractionComponent::Propagate(), and NaturalGradientRepeatedAffineComponent::Update().

2203  {
2204  KALDI_ASSERT(v.Dim() == num_rows_ &&
2205  static_cast<UnsignedMatrixIndexT>(col) <
2206  static_cast<UnsignedMatrixIndexT>(num_cols_));
2207 #if HAVE_CUDA == 1
2208  if (CuDevice::Instantiate().Enabled()) {
2209  Timer tim;
2210  cublas_copy(GetCublasHandle(),
2211  v.Dim(), v.Data(), 1,
2212  this->data_ + col, this->stride_);
2213  CU_SAFE_CALL(cudaGetLastError());
2214  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2215  } else
2216 #endif
2217  {
2218  Mat().CopyColFromVec(v.Vec(), col);
2219  }
2220 }
uint32 UnsignedMatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT stride_
Definition: cu-matrix.h:646
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void CopyCols ( const CuMatrixBase< Real > &  src,
const CuArray< MatrixIndexT > &  indexes 
)

Copies column r from column indexes[r] of src.

As a special case, if indexes[i] == -1, sets column i to zero indexes.size() must equal this->NumCols(), and src.NumRows() must equal this.NumRows()

Definition at line 2386 of file cu-matrix.cc.

References CuArray< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuArray< T >::Dim(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by SumGroupComponent::Backprop(), PermuteComponent::Backprop(), ConvolutionComponent::InputToInputPatches(), MaxpoolingComponent::InputToInputPatches(), PermuteComponent::Propagate(), Convolutional1dComponent::Propagate(), ConvolutionalComponent::PropagateFnc(), and Convolutional1dComponent::Update().

2387  {
2388 #if HAVE_CUDA == 1
2389  if (CuDevice::Instantiate().Enabled()) {
2390  KALDI_ASSERT(indices.Dim() == NumCols());
2391  KALDI_ASSERT(NumRows() == src.NumRows());
2392  Timer tim;
2393  dim3 dimGrid, dimBlock;
2394  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2395  &dimGrid, &dimBlock);
2396  cuda_copy_cols(dimGrid, dimBlock, data_, src.Data(), indices.Data(), Dim(), src.Stride());
2397  CU_SAFE_CALL(cudaGetLastError());
2398  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2399  } else
2400 #endif
2401  {
2402  Mat().CopyCols(src.Mat(), indices.Data());
2403  }
2404 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void CopyColsFromVec ( const CuVectorBase< Real > &  v)

Copies vector into matrix, column-by-column.

Note that rv.Dim() must either equal NumRows()*NumCols() or NumRows(); this has two modes of operation.

Definition at line 2164 of file cu-matrix.cc.

References MatrixDim_::cols, CU1DBLOCK, CuVectorBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), Timer::Elapsed(), KALDI_ERR, MatrixDim_::rows, and CuVectorBase< Real >::Vec().

Referenced by DropoutComponent::Propagate(), and kaldi::UnitTestCuMatrixCopyColsFromVec().

2164  {
2165 #if HAVE_CUDA == 1
2166  if (CuDevice::Instantiate().Enabled()) {
2167  Timer tim;
2168  if (rv.Dim() == num_rows_ * num_cols_) {
2169  // treat rv as a matrix of the size (num_cols x num_rows_)
2170  // and use transposed copy to fill *this
2171  // see CuMatrixBase<Real>::CopyFromMat() for more detail of the impl
2172  MatrixDim rv_dim = { num_cols_, num_rows_, num_rows_ };
2173  const int32 warpSize = 32;
2174  dim3 dimBlock(warpSize, CU1DBLOCK / warpSize);
2175  dim3 dimGrid(n_blocks(rv_dim.cols, warpSize),
2176  n_blocks(rv_dim.rows, warpSize));
2177  cuda_copy_from_mat_trans(dimGrid, dimBlock, data_, rv.Data(), Dim(),
2178  rv_dim);
2179  CU_SAFE_CALL(cudaGetLastError());
2180  } else if (rv.Dim() == num_rows_) {
2181  // use 2D block (8x32) and large enough grid to cover matrix *this
2182  // dimBlock.x need to be at least warpSize for coalesced memory access.
2183  const int32 warpSize = 32;
2184  dim3 dimBlock(warpSize, CU1DBLOCK / warpSize);
2185  dim3 dimGrid(n_blocks(num_cols_, dimBlock.x),
2186  n_blocks(num_rows_, dimBlock.y));
2187  cuda_copy_cols_from_vec(dimGrid, dimBlock, Data(), Dim(), rv.Data());
2188  CU_SAFE_CALL(cudaGetLastError());
2189  } else {
2190  KALDI_ERR<< "Wrong sized arguments";
2191  }
2192  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2193  } else
2194 #endif
2195  {
2196  Mat().CopyColsFromVec(rv.Vec());
2197  }
2198 }
int32_cuda rows
Definition: cu-matrixdim.h:53
Structure containing size of the matrix plus stride.
Definition: cu-matrixdim.h:52
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define KALDI_ERR
Definition: kaldi-error.h:127
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
int32_cuda cols
Definition: cu-matrixdim.h:54
::MatrixDim Dim() const
Definition: cu-matrix.h:201
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
const Real * Data() const
Return data pointer (const).
Definition: cu-matrix.h:605
void CopyFromBlock ( const CuBlockMatrix< Real > &  B,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 158 of file cu-matrix.cc.

References CuBlockMatrix< Real >::Block(), CuMatrixBase< Real >::CopyFromMat(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuBlockMatrix< Real >::NumBlocks(), CuBlockMatrix< Real >::NumCols(), CuMatrixBase< Real >::NumCols(), CuBlockMatrix< Real >::NumRows(), and CuMatrixBase< Real >::NumRows().

Referenced by CuMatrix< Real >::CuMatrix().

159  {
160  this->SetZero();
161  if (trans == kNoTrans) {
162  KALDI_ASSERT(NumRows() == B.NumRows() && NumCols() == B.NumCols());
163  int32 row_offset = 0, col_offset = 0;
164  for (int32 b = 0; b < B.NumBlocks(); b++) {
165  const CuMatrixBase<Real> &block = B.Block(b);
166  int32 num_rows = block.NumRows(), num_cols = block.NumCols();
167  CuSubMatrix<Real> this_block(*this, row_offset, num_rows,
168  col_offset, num_cols);
169  this_block.CopyFromMat(block);
170  row_offset += num_rows;
171  col_offset += num_cols;
172  }
173  KALDI_ASSERT(row_offset == NumRows() && col_offset == NumCols());
174  } else {
175  KALDI_ASSERT(NumRows() == B.NumCols() && NumCols() == B.NumRows());
176  int32 row_offset = 0, col_offset = 0;
177  for (int32 b = 0; b < B.NumBlocks(); b++) {
178  const CuMatrixBase<Real> &block = B.Block(b);
179  int32 num_rows = block.NumCols(), num_cols = block.NumRows();
180  CuSubMatrix<Real> this_block(*this, row_offset, num_rows,
181  col_offset, num_cols);
182  this_block.CopyFromMat(block, kTrans);
183  row_offset += num_rows;
184  col_offset += num_cols;
185  }
186  KALDI_ASSERT(row_offset == NumRows() && col_offset == NumCols());
187  }
188 }
friend class CuSubMatrix< Real >
Definition: cu-matrix.h:88
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
void SetZero()
Math operations, some calling kernels.
Definition: cu-matrix.cc:474
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void CopyFromGeneralMat ( const GeneralMatrix src,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 2767 of file cu-matrix.cc.

References CuSparseMatrix< Real >::CopyToMat(), SparseMatrix< Real >::CopyToMat(), GeneralMatrix::GetFullMatrix(), GeneralMatrix::GetMatrix(), GeneralMatrix::GetSparseMatrix(), KALDI_ERR, kaldi::kCompressedMatrix, kaldi::kFullMatrix, kaldi::kSparseMatrix, and GeneralMatrix::Type().

Referenced by NnetComputer::AcceptInputs(), and kaldi::nnet3::ComputeObjectiveFunction().

2768  {
2769  switch (src.Type()) {
2770  case kFullMatrix: {
2771  const Matrix<BaseFloat> &src_full_mat = src.GetFullMatrix();
2772  this->CopyFromMat(src_full_mat, trans);
2773  return;
2774  }
2775  case kCompressedMatrix: {
2776  Matrix<BaseFloat> mat;
2777  src.GetMatrix(&mat);
2778  this->CopyFromMat(mat, trans);
2779  return;
2780  }
2781  case kSparseMatrix: {
2782  const SparseMatrix<BaseFloat> &smat = src.GetSparseMatrix();
2783 #if HAVE_CUDA == 1
2784  if (CuDevice::Instantiate().Enabled()) {
2785  // only take this branch if we're actually using CUDA, or it would
2786  // entail a wasteful copy of the sparse matrix.
2787  CuSparseMatrix<BaseFloat> cu_smat(smat);
2788  cu_smat.CopyToMat(this, trans);
2789  return;
2790  }
2791 #endif
2792  smat.CopyToMat(&(Mat()), trans);
2793  return;
2794  }
2795  default:
2796  KALDI_ERR << "Invalid GeneralMatrix type.";
2797  }
2798 }
void CopyFromMat(const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:337
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
#define KALDI_ERR
Definition: kaldi-error.h:127
void CopyFromMat ( const MatrixBase< OtherReal > &  src,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 337 of file cu-matrix.cc.

Referenced by ElementwiseProductComponent::Backprop(), SumReduceComponent::Backprop(), BackpropTruncationComponent::Backprop(), TanhComponent::Backprop(), PowerComponent::Backprop(), RectifiedLinearComponent::Backprop(), ScaleComponent::Backprop(), FixedScaleComponent::Backprop(), FixedBiasComponent::Backprop(), NoOpComponent::Backprop(), ClipGradientComponent::Backprop(), PerElementScaleComponent::Backprop(), PerElementOffsetComponent::Backprop(), Softmax::BackpropagateFnc(), HiddenSoftmax::BackpropagateFnc(), BlockSoftmax::BackpropagateFnc(), ParallelComponent::BackpropagateFnc(), SentenceAveragingComponent::BackpropagateFnc(), LengthNormComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), Dropout::BackpropagateFnc(), AddShift::BackpropagateFnc(), Rescale::BackpropagateFnc(), BlockAffineComponent::BlockAffineComponent(), NnetOnlineComputer::Compute(), CuMatrixBase< Real >::CopyFromBlock(), CuBlockMatrix< Real >::CopyFromMat(), GeneralMatrix::CopyToMat(), CuMatrix< Real >::CuMatrix(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), CuMatrixBase< Real >::DiffSoftmaxPerRow(), NnetComputer::ExecuteCommand(), NnetRescaler::FormatInput(), OnlineNaturalGradient::Init(), OnlinePreconditioner::Init(), NaturalGradientAffineComponent::Init(), AffineComponent::Init(), AffineComponentPreconditioned::Init(), AffineComponentPreconditionedOnline::Init(), ConvolutionComponent::Init(), Convolutional1dComponent::Init(), main(), kaldi::nnet2::NnetComputation(), kaldi::cu::NormalizePerRow(), CuMatrix< BaseFloat >::operator=(), kaldi::nnet2::PreconditionDirections(), OnlinePreconditionerSimple::PreconditionDirections(), OnlineNaturalGradientSimple::PreconditionDirections(), kaldi::nnet2::PreconditionDirectionsAlphaRescaled(), ElementwiseProductComponent::Propagate(), SumReduceComponent::Propagate(), BackpropTruncationComponent::Propagate(), PowerComponent::Propagate(), RectifiedLinearComponent::Propagate(), ScaleComponent::Propagate(), NoOpComponent::Propagate(), ClipGradientComponent::Propagate(), SpliceMaxComponent::Propagate(), PerElementScaleComponent::Propagate(), PerElementOffsetComponent::Propagate(), FixedScaleComponent::Propagate(), FixedBiasComponent::Propagate(), AdditiveNoiseComponent::Propagate(), KlHmm::PropagateFnc(), ParallelComponent::PropagateFnc(), LengthNormComponent::PropagateFnc(), Dropout::PropagateFnc(), LstmProjected::PropagateFnc(), AddShift::PropagateFnc(), Rescale::PropagateFnc(), BlstmProjected::PropagateFnc(), kaldi::nnet1::RandGauss(), CuRand< Real >::RandGaussian(), CuRand< Real >::RandUniform(), kaldi::nnet1::RandUniform(), OnlineNaturalGradient::ReorthogonalizeXt1(), OnlinePreconditioner::ReorthogonalizeXt1(), LinearTransform::SetLinearity(), AffineTransform::SetLinearity(), kaldi::TestCuFindRowMaxId(), kaldi::TestCuMatrixTransposeCross(), kaldi::nnet3::TestSimpleComponentPropagateProperties(), kaldi::TestSymInvertPosDef(), kaldi::UnitInvert(), kaldi::UnitTestCheck(), kaldi::UnitTestCholesky(), kaldi::UnitTestConstructor(), kaldi::UnitTestCopyFromMat(), kaldi::UnitTestCopySp(), kaldi::UnitTestCuCopy(), kaldi::UnitTestCuDiffLogSoftmax(), kaldi::UnitTestCuDiffSigmoid(), kaldi::UnitTestCuDiffSoftmax(), kaldi::UnitTestCuDiffTanh(), kaldi::UnitTestCuDiffXent(), kaldi::UnitTestCuFindRowMaxId(), kaldi::UnitTestCuLogSoftmax(), kaldi::UnitTestCuMathNormalizePerRow(), kaldi::UnitTestCuMatrixAddMat(), kaldi::UnitTestCuMatrixAddMatBlocks(), kaldi::UnitTestCuMatrixAddMatDiagVec(), kaldi::UnitTestCuMatrixAddMatMat(), kaldi::UnitTestCuMatrixAddMatMatBatched(), kaldi::UnitTestCuMatrixAddMatMatElements(), kaldi::UnitTestCuMatrixAddVecToCols(), kaldi::UnitTestCuMatrixAddVecToRows(), kaldi::UnitTestCuMatrixCopyCross(), kaldi::UnitTestCuMatrixCopyCross2(), kaldi::UnitTestCuMatrixCopyFromMat(), kaldi::UnitTestCuMatrixDiffGroupPnorm(), kaldi::UnitTestCuMatrixDivElements(), kaldi::UnitTestCuMatrixDivRowsVec(), kaldi::UnitTestCuMatrixGroupMaxDeriv(), kaldi::UnitTestCuMatrixInvertElements(), kaldi::UnitTestCuMatrixMax(), kaldi::UnitTestCuMatrixMin(), kaldi::UnitTestCuMatrixMulColsVec(), kaldi::UnitTestCuMatrixMulElements(), kaldi::UnitTestCuMatrixMulRowsGroupMat(), kaldi::UnitTestCuMatrixMulRowsVec(), kaldi::UnitTestCuSigmoid(), kaldi::UnitTestCuSoftmax(), kaldi::UnitTestCuTanh(), kaldi::UnitTestCuVectorAddColSumMat(), kaldi::UnitTestCuVectorAddColSumMatLarge(), kaldi::UnitTestCuVectorAddRowSumMat(), kaldi::UnitTestCuVectorAddRowSumMatLarge(), kaldi::UnitTestInvert(), kaldi::UnitTestSwapCu2Cu(), kaldi::UnitTestSwapCu2M(), BlockAffineComponentPreconditioned::Update(), and MatrixRandomizer::Value().

338  {
339  CuMatrix<OtherReal> temp(src);
340  this->CopyFromMat(temp, trans);
341 }
void CopyFromMat(const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:337
void CopyFromMat ( const MatrixBase< Real > &  src,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 309 of file cu-matrix.cc.

References MatrixBase< Real >::Data(), data_, KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, MatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), and MatrixBase< Real >::Stride().

310  {
311 #if HAVE_CUDA == 1
312  if (CuDevice::Instantiate().Enabled()) {
313  if (trans == kNoTrans) {
314  KALDI_ASSERT(src.NumRows() == num_rows_ && src.NumCols() == num_cols_);
315  Timer tim;
316 
317  MatrixIndexT dst_pitch = stride_*sizeof(Real);
318  MatrixIndexT src_pitch = src.Stride()*sizeof(Real);
319  MatrixIndexT width = src.NumCols()*sizeof(Real);
320  CU_SAFE_CALL(cudaMemcpy2D(data_, dst_pitch, src.Data(), src_pitch,
321  width, src.NumRows(), cudaMemcpyHostToDevice));
322 
323  CuDevice::Instantiate().AccuProfile("CuMatrixBase::CopyFromMat(from CPU)",tim.Elapsed());
324  } else {
325  CuMatrix<Real> trans_mat(src); // Do the transpose on the GPU board.
326  this->CopyFromMat(trans_mat, kTrans);
327  }
328  } else
329 #endif
330  {
331  Mat().CopyFromMat(src, trans);
332  }
333 }
void CopyFromMat(const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:337
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT stride_
Definition: cu-matrix.h:646
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void CopyFromMat ( const CuMatrixBase< OtherReal > &  M,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 205 of file cu-matrix.cc.

References CU1DBLOCK, CuMatrixBase< Real >::Data(), data_, CuMatrixBase< Real >::data_, CuMatrixBase< Real >::Dim(), Timer::Elapsed(), KALDI_ASSERT, kaldi::kNoTrans, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::num_rows_, CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

206  {
207  if (sizeof(Real) == sizeof(OtherReal) &&
208  static_cast<const void*>(M.Data()) ==
209  static_cast<const void*>(this->Data())) {
210  if (M.Data() == NULL)
211  return;
212  // CopyFromMat called on same data. Nothing to do (except sanity checks)
213  KALDI_ASSERT(trans == kNoTrans && M.NumRows() == NumRows() &&
214  M.NumCols() == NumCols() && M.Stride() == Stride());
215  return;
216  }
217 #if HAVE_CUDA == 1
218  if (CuDevice::Instantiate().Enabled()) {
219  if (trans == kNoTrans) {
220  KALDI_ASSERT(M.NumRows() == num_rows_ && M.NumCols() == num_cols_);
221  } else {
222  KALDI_ASSERT(M.NumCols() == num_rows_ && M.NumRows() == num_cols_);
223  }
224  if (M.num_rows_ == 0) return; // Nothing to do.
225  Timer tim;
226  if (sizeof(Real) == sizeof(OtherReal) && trans == kNoTrans ) {
227  MatrixIndexT dst_pitch = stride_ * sizeof(Real);
228  MatrixIndexT src_pitch = M.Stride() * sizeof(Real);
229  MatrixIndexT width = M.NumCols() * sizeof(Real);
230  CU_SAFE_CALL(cudaMemcpy2D(data_, dst_pitch, M.data_, src_pitch,
231  width, M.num_rows_, cudaMemcpyDeviceToDevice));
232  } else {
233  if (trans == kNoTrans) {
234  dim3 dimGrid, dimBlock;
235  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
236  &dimGrid, &dimBlock);
237  cuda_copy_from_mat(dimGrid, dimBlock, data_, M.data_, Dim(), M.Dim());
238  } else {
239  // 2D thread block with warps (blockDim.x) along the row-dim of input M.
240  // Each (8x32) thread block will transpose (32x32) data
241  const int32 warpSize = 32;
242  dim3 dimBlock(warpSize, CU1DBLOCK / warpSize);
243  dim3 dimGrid(n_blocks(M.NumCols(), warpSize),
244  n_blocks(M.NumRows(), warpSize));
245  cuda_copy_from_mat_trans(dimGrid, dimBlock, data_, M.data_, Dim(),
246  M.Dim());
247  }
248  CU_SAFE_CALL(cudaGetLastError());
249  }
250  CuDevice::Instantiate().AccuProfile("CuMatrixBase::CopyFromMat(from other CuMatrixBase)", tim.Elapsed());
251  } else
252 #endif
253  {
254  Mat().CopyFromMat(M.Mat(), trans);
255  }
256 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT stride_
Definition: cu-matrix.h:646
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
MatrixIndexT Stride() const
Definition: cu-matrix.h:197
const Real * Data() const
Return data pointer (const).
Definition: cu-matrix.h:605
void CopyFromSp ( const CuSpMatrix< Real > &  M)

Definition at line 353 of file cu-matrix.cc.

References CU2DBLOCK, CuPackedMatrix< Real >::Data(), data_, Timer::Elapsed(), KALDI_ASSERT, CuSpMatrix< Real >::Mat(), and CuPackedMatrix< Real >::NumRows().

Referenced by CuMatrix< BaseFloat >::CuMatrix(), CuSpMatrix< Real >::Invert(), and kaldi::TestCuMatrixCopyFromSp().

353  {
354  KALDI_ASSERT(num_rows_ == M.NumRows() && num_cols_ == num_rows_);
355  if (num_rows_ == 0)
356  return;
357 #if HAVE_CUDA == 1
358  if (CuDevice::Instantiate().Enabled()) {
359  Timer tim;
360  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
361  dim3 dimGrid(n_blocks(NumRows(), CU2DBLOCK),
362  n_blocks(NumRows(), CU2DBLOCK));
363  cuda_copy_from_sp(dimGrid, dimBlock, M.Data(), data_, Dim());
364  CuDevice::Instantiate().AccuProfile("CuMatrix::CopyFromSp",tim.Elapsed());
365  } else
366 #endif
367  {
368  Mat().CopyFromSp(M.Mat());
369  }
370 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
template void CopyFromTp ( const CuTpMatrix< OtherReal > &  M,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 275 of file cu-matrix.cc.

References CU2DBLOCK, CuPackedMatrix< Real >::Data(), data_, Timer::Elapsed(), KALDI_ASSERT, kaldi::kNoTrans, CuTpMatrix< Real >::Mat(), and CuPackedMatrix< Real >::NumRows().

Referenced by CuMatrixBase< Real >::Cholesky(), CuMatrix< BaseFloat >::CuMatrix(), CuTpMatrix< Real >::Invert(), kaldi::TestCuMatrixCopyFromTp(), and kaldi::UnitTestCuMatrixCopyFromTp().

276  {
277  KALDI_ASSERT(num_rows_ == M.NumRows() && num_cols_ == num_rows_);
278  if (num_rows_ == 0)
279  return;
280 #if HAVE_CUDA == 1
281  if (CuDevice::Instantiate().Enabled()) {
282  Timer tim;
283  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
284  dim3 dimGrid(n_blocks(num_rows_, CU2DBLOCK),
285  n_blocks(num_rows_, CU2DBLOCK));
286  if (trans == kNoTrans) {
287  cuda_copy_from_tp(dimGrid, dimBlock, data_, M.Data(), Dim());
288  } else {
289  cuda_copy_from_tp_trans(dimGrid, dimBlock, data_, M.Data(), Dim());
290  }
291  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
292  } else
293 #endif
294  {
295  Mat().CopyFromTp(M.Mat(), trans);
296  }
297 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void CopyLowerToUpper ( )

Definition at line 2640 of file cu-matrix.cc.

References CU2DBLOCK, data_, Timer::Elapsed(), and KALDI_ASSERT.

Referenced by kaldi::nnet2::PreconditionDirections(), kaldi::TestCuMatrixCopyLowerToUpper(), kaldi::UnitTestCuCholesky(), and kaldi::UnitTestCuMatrixCopyLowerToUpper().

2640  {
2642  if (num_rows_ == 0) return;
2643 #if HAVE_CUDA == 1
2644  if (CuDevice::Instantiate().Enabled()) {
2645  Timer tim;
2646  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2647  int32 dim = num_rows_;
2648  dim3 dimGrid(n_blocks(dim, CU2DBLOCK),
2649  n_blocks(dim, CU2DBLOCK));
2650  cuda_copy_low_upp(dimGrid, dimBlock, data_, Dim());
2651  CU_SAFE_CALL(cudaGetLastError());
2652  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2653  } else
2654 #endif
2655  {
2656  Mat().CopyLowerToUpper();
2657  }
2658 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void CopyRows ( const CuMatrixBase< Real > &  src,
const CuArray< MatrixIndexT > &  indexes 
)

Copies row r from row indexes[r] of src.

As a special case, if indexes[i] < 0, sets row i to zero. src.NumCols() must equal this.NumCols()

Definition at line 2408 of file cu-matrix.cc.

References CuArray< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuArray< T >::Dim(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by StatisticsExtractionComponent::Backprop(), SpliceComponent::Backprop(), NnetComputer::ExecuteCommand(), main(), DistributeComponent::Propagate(), and SpliceMaxComponent::Propagate().

2409  {
2410 #if HAVE_CUDA == 1
2411  if (CuDevice::Instantiate().Enabled()) {
2412  KALDI_ASSERT(static_cast<MatrixIndexT>(indices.Dim()) == NumRows());
2413  KALDI_ASSERT(NumCols() == src.NumCols());
2414 
2415  Timer tim;
2416  dim3 dimGrid, dimBlock;
2417  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2418  &dimGrid, &dimBlock);
2419  cuda_copy_rows(dimGrid, dimBlock, data_, src.Data(), indices.Data(),
2420  Dim(), src.Stride());
2421  CU_SAFE_CALL(cudaGetLastError());
2422  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2423  } else
2424 #endif
2425  {
2426  Mat().CopyRows(src.Mat(), indices.Data());
2427  }
2428 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void CopyRows ( const CuArray< const Real * > &  src)

Copies row r of this matrix from an array of floats at the location given by src[r], where src[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (the point is: the data it points to should be on the GPU if we're using a GPU, and on a CPU otherwise).

src.size() must equal this.NumRows(), and if any src[r] is NULL then this.Row(r) will be set to zero.

Definition at line 2453 of file cu-matrix.cc.

References CU2DBLOCK, CuArray< T >::Data(), data_, CuArray< T >::Dim(), and KALDI_ASSERT.

2453  {
2454  if (NumRows() == 0) return;
2455 #if HAVE_CUDA == 1
2456  if (CuDevice::Instantiate().Enabled()) {
2457  KALDI_ASSERT(static_cast<MatrixIndexT>(src.Dim()) == NumRows());
2458  Timer tim;
2459  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2460  dim3 dimGrid(n_blocks(num_cols_, CU2DBLOCK),
2461  n_blocks(num_rows_, CU2DBLOCK));
2462  cuda_copy_rows(dimGrid, dimBlock, data_, src.Data(), Dim());
2463  CU_SAFE_CALL(cudaGetLastError());
2464  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2465  } else
2466 #endif
2467  {
2468  Mat().CopyRows(src.Data());
2469  }
2470 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void CopyRowsFromVec ( const CuVectorBase< Real > &  v)

This function has two modes of operation.

If v.Dim() == NumRows() * NumCols(), then treats the vector as a row-by-row concatenation of a matrix and copies to *this. if v.Dim() == NumCols(), it sets each row of *this to a copy of v.

Definition at line 2096 of file cu-matrix.cc.

References CuVectorBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), Timer::Elapsed(), KALDI_ERR, and CuVectorBase< Real >::Vec().

Referenced by kaldi::CuVectorUnitTestCopyFromMat(), NnetOnlineComputer::Flush(), NnetRescaler::FormatInput(), RepeatedAffineComponent::Propagate(), ConstantComponent::Propagate(), FixedAffineComponent::Propagate(), AffineComponent::Propagate(), BlockAffineComponent::Propagate(), ConstantFunctionComponent::Propagate(), LinearTransform::SetParams(), AffineTransform::SetParams(), RecurrentComponent::SetParams(), ConvolutionalComponent::SetParams(), Convolutional2DComponent::SetParams(), LstmProjected::SetParams(), BlstmProjected::SetParams(), kaldi::UnitTestCuMatrixCopyRowsFromVec(), RepeatedAffineComponent::UnVectorize(), AffineComponent::UnVectorize(), BlockAffineComponent::UnVectorize(), ConvolutionComponent::UnVectorize(), and LstmNonlinearityComponent::UnVectorize().

2096  {
2097 #if HAVE_CUDA == 1
2098  if (CuDevice::Instantiate().Enabled()) {
2099  Timer tim;
2100  if (v.Dim() == num_rows_*num_cols_) {
2101  if (stride_ == num_cols_) {
2102  const Real* v_data = v.Data();
2103  CU_SAFE_CALL(cudaMemcpy(data_, v_data,
2104  sizeof(Real)*num_rows_*num_cols_,
2105  cudaMemcpyDeviceToDevice));
2106  } else {
2107  CU_SAFE_CALL(cudaMemcpy2D(data_, stride_ * sizeof(Real), v.Data(),
2108  num_cols_*sizeof(Real), num_cols_*sizeof(Real),
2109  num_rows_,
2110  cudaMemcpyDeviceToDevice));
2111  }
2112  } else if (v.Dim() == num_cols_) {
2113  dim3 dimGrid, dimBlock;
2114  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2115  &dimGrid, &dimBlock);
2116  cuda_copy_rows_from_vec(dimGrid, dimBlock, data_, this->Dim(), v.Data());
2117  CU_SAFE_CALL(cudaGetLastError());
2118  } else {
2119  KALDI_ERR << "Wrong sized arguments";
2120  }
2121  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2122  } else
2123 #endif
2124  {
2125  Mat().CopyRowsFromVec(v.Vec());
2126  }
2127 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT stride_
Definition: cu-matrix.h:646
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define KALDI_ERR
Definition: kaldi-error.h:127
::MatrixDim Dim() const
Definition: cu-matrix.h:201
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void CopyRowsFromVec ( const VectorBase< Real > &  v)

Version of CopyRowsFromVec() that takes a CPU-based vector.

Definition at line 2130 of file cu-matrix.cc.

References VectorBase< Real >::Data(), data_, VectorBase< Real >::Dim(), Timer::Elapsed(), and KALDI_ERR.

2130  {
2131 #if HAVE_CUDA == 1
2132  if (CuDevice::Instantiate().Enabled()) {
2133  Timer tim;
2134  if (v.Dim() == num_rows_*num_cols_) {
2135  if (stride_ == num_cols_) {
2136  const Real* v_data = v.Data();
2137  cudaMemcpy(data_, v_data, sizeof(Real)*num_rows_*num_cols_, cudaMemcpyHostToDevice);
2138  } else {
2139  const Real *v_data = v.Data();
2140  for (MatrixIndexT r = 0; r < num_rows_; r++) {
2141  Real *row_data = RowData(r);
2142  cudaMemcpy(row_data, v_data, sizeof(Real)*num_cols_, cudaMemcpyHostToDevice);
2143  v_data += num_cols_;
2144  }
2145  }
2146  } else if (v.Dim() == num_cols_) {
2147  dim3 dimGrid, dimBlock;
2148  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2149  &dimGrid, &dimBlock);
2150  cuda_copy_rows_from_vec(dimGrid, dimBlock, this->data_, this->Dim(), v.Data());
2151  CU_SAFE_CALL(cudaGetLastError());
2152  } else {
2153  KALDI_ERR << "Wrong sized arguments";
2154  }
2155  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2156  } else
2157 #endif
2158  {
2159  Mat().CopyRowsFromVec(v);
2160  }
2161 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT stride_
Definition: cu-matrix.h:646
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define KALDI_ERR
Definition: kaldi-error.h:127
::MatrixDim Dim() const
Definition: cu-matrix.h:201
const Real * RowData(MatrixIndexT r) const
Get raw row pointer (const).
Definition: cu-matrix.h:599
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
template void CopyToMat ( MatrixBase< OtherReal > *  dst,
MatrixTransposeType  trans = kNoTrans 
) const

Definition at line 413 of file cu-matrix.cc.

References MatrixBase< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyToMat(), MatrixBase< Real >::Data(), data_, Timer::Elapsed(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, MatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), and MatrixBase< Real >::Stride().

Referenced by NnetComputerFromEg::Compute(), CuMatrixBase< Real >::CopyToMat(), kaldi::nnet1::MomentStatistics(), KlHmm::PropagateFnc(), kaldi::UnitInvert(), kaldi::UnitTestCholesky(), kaldi::UnitTestCuDiffLogSoftmax(), kaldi::UnitTestCuDiffSigmoid(), kaldi::UnitTestCuDiffSoftmax(), kaldi::UnitTestCuDiffTanh(), kaldi::UnitTestCuDiffXent(), kaldi::UnitTestCuMatrixAddMat(), kaldi::UnitTestCuMatrixAddMatBlocks(), kaldi::UnitTestCuMatrixAddMatMat(), kaldi::UnitTestCuMatrixAddVecToCols(), kaldi::UnitTestCuMatrixAddVecToRows(), kaldi::UnitTestCuMatrixAddVecVec(), kaldi::UnitTestCuMatrixDiffGroupPnorm(), kaldi::UnitTestCuMatrixDivElements(), kaldi::UnitTestCuMatrixDivRowsVec(), kaldi::UnitTestCuMatrixGroupMaxDeriv(), kaldi::UnitTestCuMatrixInvertElements(), kaldi::UnitTestCuMatrixMax(), kaldi::UnitTestCuMatrixMin(), kaldi::UnitTestCuMatrixMulColsVec(), kaldi::UnitTestCuMatrixMulElements(), kaldi::UnitTestCuMatrixMulRowsGroupMat(), kaldi::UnitTestCuMatrixMulRowsVec(), kaldi::UnitTestCuSigmoid(), kaldi::UnitTestCuTanh(), kaldi::UnitTestInvert(), kaldi::UnitTestMatrix(), UnitTestMatrixRandomizer(), kaldi::UnitTestSetZeroAboveDiag(), kaldi::UnitTestSwapCu2Cu(), and kaldi::UnitTestSwapCu2M().

414  {
415 #if HAVE_CUDA == 1
416  if (CuDevice::Instantiate().Enabled()) {
417  if (trans == kTrans || sizeof(OtherReal) != sizeof(Real)) {
418  CuMatrix<OtherReal> this_trans(*this, trans);
419  this_trans.CopyToMat(dst, kNoTrans);
420  } else {
421  KALDI_ASSERT(dst->NumRows() == NumRows() && dst->NumCols() == NumCols());
422  if (num_rows_ == 0) return;
423  Timer tim;
424 
425  MatrixIndexT src_pitch = stride_*sizeof(Real);
426  MatrixIndexT dst_pitch = dst->Stride()*sizeof(Real);
427  MatrixIndexT width = NumCols()*sizeof(Real);
428  CU_SAFE_CALL(cudaMemcpy2D(dst->Data(), dst_pitch, this->data_, src_pitch,
429  width, this->num_rows_, cudaMemcpyDeviceToHost));
430 
431  CuDevice::Instantiate().AccuProfile("CuMatrix::CopyToMatD2H",tim.Elapsed());
432  }
433  } else
434  #endif
435  {
436  dst->CopyFromMat(Mat(), trans);
437  }
438 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT stride_
Definition: cu-matrix.h:646
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void CopyToRows ( const CuArray< Real * > &  dst) const

For each row r of this matrix, copies it to the array of floats at the location given by dst[r], where dst[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (i.e.

it should point to memory on the GPU if we're using a GPU, or on the CPU otherwise). If dst[r] is NULL, does not copy anywhere. Requires that none of the memory regions pointed to by the pointers in "dst" overlap (e.g. none of the pointers should be the same).

Definition at line 2474 of file cu-matrix.cc.

References CU2DBLOCK, CuArray< T >::Data(), data_, CuArray< T >::Dim(), and KALDI_ASSERT.

Referenced by DistributeComponent::Backprop(), NnetComputer::ExecuteCommand(), and kaldi::UnitTestCuMatrixCopyToRows().

2474  {
2475  if (NumRows() == 0) return;
2476 #if HAVE_CUDA == 1
2477  if (CuDevice::Instantiate().Enabled()) {
2478  KALDI_ASSERT(static_cast<MatrixIndexT>(dst.Dim()) == NumRows());
2479 
2480  Timer tim;
2481  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2482  dim3 dimGrid(n_blocks(num_cols_, CU2DBLOCK),
2483  n_blocks(num_rows_, CU2DBLOCK));
2484  cuda_copy_to_rows(dimGrid, dimBlock, dst.Data(), data_, Dim());
2485  CU_SAFE_CALL(cudaGetLastError());
2486  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2487  } else
2488 #endif
2489  {
2490  Mat().CopyToRows(dst.Data());
2491  }
2492 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void CopyUpperToLower ( )

Definition at line 2661 of file cu-matrix.cc.

References CU2DBLOCK, data_, Timer::Elapsed(), and KALDI_ASSERT.

Referenced by kaldi::TestCuMatrixCopyUpperToLower(), and kaldi::UnitTestCuMatrixCopyUpperToLower().

2661  {
2663  if (num_rows_ == 0) return;
2664 #if HAVE_CUDA == 1
2665  if (CuDevice::Instantiate().Enabled()) {
2666  Timer tim;
2667  int32 dim = this->num_rows_;
2668  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2669  dim3 dimGrid(n_blocks(dim, CU2DBLOCK),
2670  n_blocks(dim, CU2DBLOCK));
2671  cuda_copy_upp_low(dimGrid, dimBlock, data_, Dim());
2672  CU_SAFE_CALL(cudaGetLastError());
2673  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2674  } else
2675 #endif
2676  {
2677  Mat().CopyUpperToLower();
2678  }
2679 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
const Real* Data ( ) const
inline

Return data pointer (const).

Warning: may return a pointer to GPU memory. Use at your own risk.

Definition at line 605 of file cu-matrix.h.

Referenced by CuMatrixBase< Real >::AddCols(), CuVectorBase< Real >::AddDiagMatMat(), CuMatrixBase< Real >::AddDiagVecMat(), CuSpMatrix< Real >::AddMat2(), CuMatrixBase< Real >::AddMatBlock(), CuMatrixBase< Real >::AddMatDiagVec(), CuBlockMatrix< Real >::AddMatMat(), CuMatrixBase< Real >::AddMatMatElements(), CuVectorBase< Real >::AddMatVec(), CuMatrixBase< Real >::AddRowRanges(), CuMatrixBase< Real >::AddRows(), NormalizeComponent::Backprop(), RepeatedAffineComponent::Backprop(), kaldi::cu::BackpropLstmNonlinearity(), CuMatrix< Real >::CompObjfAndDeriv(), DistributeComponent::ComputeInputPointers(), kaldi::cu::ComputeLstmNonlinearity(), kaldi::cu::Copy(), CuVectorBase< Real >::CopyColFromMat(), CuMatrixBase< Real >::CopyCols(), CuVectorBase< Real >::CopyDiagFromMat(), CuTpMatrix< Real >::CopyFromMat(), CuSpMatrix< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyRows(), CuVectorBase< Real >::CopyRowsFromMat(), VectorBase< Real >::CopyRowsFromMat(), CuSparseMatrix< Real >::CopyToMat(), CuMatrixBase< Real >::DiffGroupPnorm(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), CuMatrixBase< Real >::DiffSoftmaxPerRow(), CuMatrixBase< Real >::EqualElementMask(), NnetComputer::GetPointers(), CuMatrixBase< Real >::GroupMaxDeriv(), CuTpMatrix< Real >::Invert(), kaldi::cu::NormalizePerRow(), RepeatedAffineComponent::Propagate(), CuRand< Real >::RandGaussian(), kaldi::cu::Randomize(), CuRand< Real >::RandUniform(), kaldi::cu::RegularizeL1(), CuBlockMatrix< Real >::SetCudaData(), kaldi::cu::Splice(), CuMatrixBase< Real >::SumColumnRanges(), CuMatrixBase< Real >::SymAddMat2(), kaldi::TraceMatMat(), kaldi::TraceMatSmat(), RepeatedAffineComponent::Update(), and NaturalGradientRepeatedAffineComponent::Update().

605 { return data_; }
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
Real* Data ( )
inline

Return data pointer.

Warning: may return a pointer to GPU memory. Use at your own risk.

Definition at line 608 of file cu-matrix.h.

608 { return data_; }
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
void DiffGroupPnorm ( const CuMatrixBase< Real > &  in_value,
const CuMatrixBase< Real > &  out_value,
const CuMatrixBase< Real > &  out_deriv,
Real  power 
)

Differentiate backward through the GroupPnorm function.

It is a combination of GroupPnormDeriv and MulRowsGroupMat.

Definition at line 826 of file cu-matrix.cc.

References CU1DBLOCK, CuMatrixBase< Real >::Data(), data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by PnormComponent::Backprop(), and kaldi::UnitTestCuMatrixDiffGroupPnorm().

829  {
830  KALDI_ASSERT(out_value.NumCols() > 0);
831  KALDI_ASSERT(out_value.NumCols() == out_deriv.NumCols());
832  int group_size = this->NumCols() / out_value.NumCols();
833  KALDI_ASSERT(this->NumCols() == out_value.NumCols() * group_size);
834 #if HAVE_CUDA == 1
835  if (CuDevice::Instantiate().Enabled()) {
836  Timer tim;
837  const int kWarpSize = 32;
838  dim3 dimBlock(kWarpSize, CU1DBLOCK / kWarpSize);
839  dim3 dimGrid(n_blocks(NumCols(), dimBlock.x),
840  n_blocks(NumRows(), dimBlock.y));
841  if (dimGrid.x * dimGrid.y > 1024) {
842  dimGrid.y = std::max(1024 / dimGrid.x, unsigned(1));
843  }
844  cuda_diff_group_pnorm(dimGrid, dimBlock, this->data_, in_value.Data(),
845  out_value.Data(), out_deriv.Data(), Dim(),
846  in_value.Stride(), out_value.Stride(),
847  out_deriv.Stride(), group_size, power);
848  CU_SAFE_CALL(cudaGetLastError());
849  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
850  } else
851 #endif
852  {
853  Mat().GroupPnormDeriv(in_value.Mat(), out_value.Mat(), power);
854  MulRowsGroupMat(out_deriv);
855  }
856 }
void MulRowsGroupMat(const CuMatrixBase< Real > &src)
divide each row into src.NumCols() groups, and then scale i'th row's jth group of elements by src[i...
Definition: cu-matrix.cc:801
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void DiffLogSoftmaxPerRow ( const CuMatrixBase< Real > &  out_value,
const CuMatrixBase< Real > &  out_deriv 
)

Differentiate backward through the log softmax function.

Here, "out_value" is the log softmax output. Does, for each row i, *this(i) = out_deriv(i) - sum(out_deriv(i)) .* exp(out_value(i)) xxxx(i) is row-vector.

Definition at line 1709 of file cu-matrix.cc.

References CuVectorBase< Real >::AddColSumMat(), CuMatrixBase< Real >::AddMat(), CuMatrixBase< Real >::ApplyExp(), CuMatrixBase< Real >::CopyFromMat(), CU1DBLOCK, CuMatrixBase< Real >::Data(), data_, Timer::Elapsed(), KALDI_ASSERT, kaldi::kNoTrans, CuMatrixBase< Real >::MulRowsVec(), CuMatrixBase< Real >::NumRows(), kaldi::SameDim(), CuMatrixBase< Real >::Scale(), and CuMatrixBase< Real >::Stride().

Referenced by LogSoftmaxComponent::Backprop(), and kaldi::UnitTestCuDiffLogSoftmax().

1710  {
1711 
1712  KALDI_ASSERT(SameDim(out_value, out_deriv) && SameDim(out_value, *this));
1713 
1714 #if HAVE_CUDA == 1
1715  if (CuDevice::Instantiate().Enabled()) {
1716  Timer tim;
1717 
1718  // CUDA thread layout: one thread block per matrix-row.
1719  dim3 dimBlock(CU1DBLOCK);
1720  dim3 dimGrid(num_rows_);
1721  cuda_diff_log_softmax(dimGrid, dimBlock, this->Dim(), out_value.Data(),
1722  out_value.Stride(), out_deriv.Data(),
1723  out_deriv.Stride(), data_);
1724  CU_SAFE_CALL(cudaGetLastError());
1725 
1726  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1727  } else
1728 #endif
1729  {
1730  /*
1731  Let the output be y, then
1732  y_i = x_i - log(sum_i exp(x_i))
1733  where x_i is the input to the component. The Jacobian matrix of this
1734  function is
1735  J = I - 1 exp(y^T)
1736  where 1 is a vector of ones. Let the derivative vector at the output be e,
1737  and at the input be d, then we have
1738  d = e - exp(y) Sum(e)
1739  d_i = e_i - exp(y_i) Sum(e)
1740  */
1741  const CuMatrixBase<Real> &Y(out_value), &E(out_deriv);
1742  CuMatrixBase<Real> &D(*this);
1743 
1744  D.CopyFromMat(Y);
1745  D.ApplyExp(); // exp(y)
1746  CuVector<Real> E_sum(D.NumRows()); // Initializes to zero
1747  E_sum.AddColSumMat(1.0, E); // Sum(e)
1748  D.MulRowsVec(E_sum); // exp(y) Sum(e)
1749  D.Scale(-1.0); // - exp(y) Sum(e)
1750  D.AddMat(1.0, E, kNoTrans); // e - exp(y_i) Sum(e)
1751  }
1752 }
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void DiffParametricRelu ( const CuMatrixBase< Real > &  value,
const CuMatrixBase< Real > &  diff,
const CuVectorBase< Real > &  alpha,
const CuVectorBase< Real > &  beta 
)

Differentiate backward through the parametric relu function.

Here the "value" is the Relu input. Does, element-by-element. *this = diff * (value > 0 ? alpha : beta)

Definition at line 1311 of file cu-matrix.cc.

References CU2DBLOCK, data_, CuVectorBase< Real >::data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::Stride(), and CuVectorBase< Real >::Vec().

Referenced by ParametricRelu::BackpropagateFnc().

1315  {
1316 #if HAVE_CUDA == 1
1317  if (CuDevice::Instantiate().Enabled()) {
1318  Timer tim;
1319 
1320  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
1321  dim3 dimGrid(n_blocks(num_cols_, CU2DBLOCK), n_blocks(num_rows_, CU2DBLOCK));
1322 
1323  cuda_diff_parametric_relu(dimGrid, dimBlock, data_, diff.data_, value.data_,
1324  Dim(), diff.Stride(), value.Stride(),
1325  alpha.data_, beta.data_);
1326  CU_SAFE_CALL(cudaGetLastError());
1327 
1328  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1329  } else
1330 #endif
1331  {
1332  // Do it on CPU,
1333  for (MatrixIndexT r = 0; r < NumRows(); r++) {
1334  for (MatrixIndexT c = 0; c < NumCols(); c++) {
1335  Real value_elem = value.Mat()(r,c);
1336  this->Mat()(r,c) = diff.Mat()(r,c) *
1337  (value_elem >= 0.0 ? alpha.Vec()(c) : beta.Vec()(c));
1338  }
1339  }
1340  }
1341 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:201
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void DiffSigmoid ( const CuMatrixBase< Real > &  value,
const CuMatrixBase< Real > &  diff 
)

Differentiate backward through the sigmoid function.

Here, "value" is the sigmoid output. Does, element-by-element, *this = diff * value * (1 - value).

Definition at line 1571 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by SigmoidComponent::Backprop(), Sigmoid::BackpropagateFnc(), and kaldi::UnitTestCuDiffSigmoid().

1572  {
1573  KALDI_ASSERT(SameDim(*this, value) && SameDim(*this, diff));
1574 #if HAVE_CUDA == 1
1575  if (CuDevice::Instantiate().Enabled()) {
1576  Timer tim;
1577  dim3 dimGrid, dimBlock;
1578  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1579  &dimGrid, &dimBlock);
1580  cuda_diff_sigmoid(dimGrid, dimBlock, data_, diff.data_, value.data_, Dim(), diff.Stride(), value.Stride());
1581  CU_SAFE_CALL(cudaGetLastError());
1582 
1583  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1584  } else
1585 #endif
1586  {
1587  Mat().DiffSigmoid(value.Mat(), diff.Mat());
1588  }
1589 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void DiffSoftmaxPerRow ( const CuMatrixBase< Real > &  value,
const CuMatrixBase< Real > &  diff 
)

Differentiate backward through the softmax function.

Here, "value" is the softmax output. Does, for each row i, *this(i) = diff(i) * diag(value(i)) - diff(i) * (value(i)^T * value(i)) xxxx(i) is row-vector; '*' and '-' are matrix operations.

Definition at line 1675 of file cu-matrix.cc.

References CuVectorBase< Real >::AddDiagMatMat(), CuMatrixBase< Real >::AddDiagVecMat(), CuMatrixBase< Real >::CopyFromMat(), CU1DBLOCK, CuMatrixBase< Real >::Data(), data_, Timer::Elapsed(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::MulElements(), CuMatrixBase< Real >::NumRows(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by SoftmaxComponent::Backprop(), and kaldi::UnitTestCuDiffSoftmax().

1676  {
1677 
1678  KALDI_ASSERT(SameDim(value, diff) && SameDim(value, *this));
1679 
1680 #if HAVE_CUDA == 1
1681  if (CuDevice::Instantiate().Enabled()) {
1682  Timer tim;
1683 
1684  // CUDA thread layout: one thread block per matrix-row.
1685  dim3 dimBlock(CU1DBLOCK);
1686  dim3 dimGrid(num_rows_);
1687  cuda_diff_softmax(dimGrid, dimBlock, data_, this->Dim(), value.Data(),
1688  value.Stride(), diff.Data(), diff.Stride());
1689  CU_SAFE_CALL(cudaGetLastError());
1690 
1691  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1692  } else
1693 #endif
1694  {
1695  const CuMatrixBase<Real> &P(value), &E(diff);
1696  CuMatrixBase<Real> &D(*this);
1697 
1698  D.CopyFromMat(P);
1699  D.MulElements(E);
1700  // At this point, D = P .* E (in matlab notation)
1701  CuVector<Real> pe_vec(D.NumRows()); // For each row i, the dot product (p_t . e_t).
1702  pe_vec.AddDiagMatMat(1.0, P, kNoTrans, E, kTrans, 0.0);
1703 
1704  D.AddDiagVecMat(-1.0, pe_vec, P, kNoTrans, 1.0); // does D -= diag(pe_vec) * P.
1705  }
1706 }
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void DiffTanh ( const CuMatrixBase< Real > &  value,
const CuMatrixBase< Real > &  diff 
)

Differentiate backward through the tanh function.

Here, "value" is the tanh output. Does, element-by-element, *this = diff * (1 - value^2).

Definition at line 1616 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), CuMatrixBase< Real >::Mat(), and CuMatrixBase< Real >::Stride().

Referenced by TanhComponent::Backprop(), RecurrentComponent::BackpropagateFnc(), Tanh::BackpropagateFnc(), and kaldi::UnitTestCuDiffTanh().

1617  {
1618 #if HAVE_CUDA == 1
1619  if (CuDevice::Instantiate().Enabled()) {
1620  Timer tim;
1621  dim3 dimGrid, dimBlock;
1622  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1623  &dimGrid, &dimBlock);
1624  cuda_diff_tanh(dimGrid, dimBlock, data_, diff.data_, value.data_, Dim(), diff.Stride(), value.Stride());
1625  CU_SAFE_CALL(cudaGetLastError());
1626 
1627  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1628  } else
1629 #endif
1630  {
1631  Mat().DiffTanh(value.Mat(), diff.Mat());
1632  }
1633 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
void DiffXent ( const CuArray< int32 > &  tgt,
CuVector< Real > *  log_post_tgt 
)

Differentiate the block [softmax+cross-entropy] : dE/da = posterior_mat - target_mat, 'E' is error function, 'a' is activation on softmax input.

Interface: tgt ... index vector, encodes the matrix of targets net_out_or_diff ... before invocation net output, after diff dE/da log_post_tgt ... per-frame statistics for cross-entropy computations : log(sum_row(posterior_mat .* target_mat))

Definition at line 1755 of file cu-matrix.cc.

References CU2DBLOCK, CuArray< T >::Data(), data_, CuVectorBase< Real >::data_, CuArray< T >::Dim(), Timer::Elapsed(), KALDI_ASSERT, kaldi::Log(), CuVector< Real >::Resize(), and CuVectorBase< Real >::Vec().

Referenced by kaldi::UnitTestCuDiffXent().

1756  {
1757 
1758  KALDI_ASSERT(tgt.Dim() == num_rows_);
1759  log_post_tgt->Resize(tgt.Dim());
1760 
1761 #if HAVE_CUDA == 1
1762  if (CuDevice::Instantiate().Enabled()) {
1763  Timer tim;
1764  dim3 dimBlock(1, CU2DBLOCK*8);
1765  dim3 dimGrid(1, n_blocks(tgt.Dim(), CU2DBLOCK*8));
1766  cuda_diff_xent(dimGrid, dimBlock, tgt.Data(), data_,
1767  log_post_tgt->data_, Dim());
1768 
1769  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1770  } else
1771 #endif
1772  {
1773  MatrixIndexT num_rows = num_rows_;
1774  for(int32 r = 0; r < num_rows; r++) {
1775  int32 col_tgt = tgt.Data()[r];
1776  Real &value = Mat()(r, col_tgt);
1777  log_post_tgt->Vec()(r) = Log(value);
1778  value -= 1.0;
1779  }
1780  }
1781 }
const T * Data() const
Get raw pointer.
Definition: cu-array.h:65
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
int32 MatrixIndexT
Definition: matrix-common.h:96
double Log(double x)
Definition: kaldi-math.h:100
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:62
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void DivElements ( const CuMatrixBase< Real > &  A)

Divide two matrices elementwise: C = A ./ A.

Definition at line 676 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by kaldi::UnitTestCuMatrixDivElements(), and kaldi::UnitTestCuMatrixSetMatMatDivMat().

676  {
677  #if HAVE_CUDA == 1
678  if (CuDevice::Instantiate().Enabled()) {
679  Timer tim;
680 
681  KALDI_ASSERT(num_cols_ == A.NumCols());
682  KALDI_ASSERT(num_rows_ == A.NumRows());
683 
684  dim3 dimGrid, dimBlock;
685  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
686  &dimGrid, &dimBlock);
687 
688  cuda_div_elements(dimGrid, dimBlock, data_, A.data_, Dim(), A.Stride());
689  CU_SAFE_CALL(cudaGetLastError());
690 
691  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
692  } else
693  #endif
694  {
695  Mat().DivElements(A.Mat());
696  }
697 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void DivRowsVec ( const CuVectorBase< Real > &  div)

divide i'th row by scale[i]

Definition at line 884 of file cu-matrix.cc.

References data_, CuVectorBase< Real >::data_, CuVectorBase< Real >::Dim(), Timer::Elapsed(), VectorBase< Real >::InvertElements(), KALDI_ASSERT, and CuVectorBase< Real >::Vec().

Referenced by StatisticsPoolingComponent::Backprop(), StatisticsPoolingComponent::Propagate(), kaldi::TestCuMatrixDivRowsVec(), and kaldi::UnitTestCuMatrixDivRowsVec().

884  {
885 #if HAVE_CUDA == 1
886  if (CuDevice::Instantiate().Enabled()) {
887  Timer tim;
888 
889  KALDI_ASSERT(div.Dim() == NumRows());
890 
891  dim3 dimGrid, dimBlock;
892  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
893  &dimGrid, &dimBlock);
894  // For large matrix we do more work per thread by limiting the
895  // the grid size to reduce the block launching overhead.
896  if (dimGrid.x * dimGrid.y > 1024) {
897  dimGrid.x = 1024 / dimGrid.y;
898  if (dimGrid.x == 0) {
899  dimGrid.x = 1;
900  }
901  }
902  cuda_div_rows_vec(dimGrid, dimBlock, data_, div.data_, Dim());
903  CU_SAFE_CALL(cudaGetLastError());
904 
905  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
906  } else
907 #endif
908  {
909  Vector<Real> temp(div.Vec()); // will copy.
910  temp.InvertElements();
911  Mat().MulRowsVec(temp);
912  }
913 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void EqualElementMask ( const CuMatrixBase< Real > &  mat,
CuMatrix< Real > *  mask 
) const

Definition at line 3073 of file cu-matrix.cc.

References CuMatrixBase< Real >::Data(), data_, Timer::Elapsed(), KALDI_ASSERT, kaldi::kSetZero, CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), CuMatrix< Real >::Resize(), and CuMatrixBase< Real >::Stride().

Referenced by MaxpoolingComponent::Backprop(), MaxPoolingComponent::BackpropagateFnc(), and MaxPooling2DComponent::BackpropagateFnc().

3073  {
3074  // Check the inputs:
3075  KALDI_ASSERT(mat.NumRows() == NumRows() && mat.NumCols() == NumCols());
3076  KALDI_ASSERT(mask != NULL);
3077  // Resizes the output matrix:
3078  mask->Resize(NumRows(), NumCols(), kSetZero);
3079 
3080 #if HAVE_CUDA == 1
3081  if (CuDevice::Instantiate().Enabled()) {
3082  Timer tim;
3083  dim3 dimGrid, dimBlock;
3084  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
3085  &dimGrid, &dimBlock);
3086  cuda_equal_element_mask(dimGrid, dimBlock, this->data_, mat.Data(),
3087  mask->Data(), this->Dim(), mat.Stride(),
3088  mask->Stride());
3089  CU_SAFE_CALL(cudaGetLastError());
3090 
3091  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
3092  } else
3093 #endif
3094  {
3095  for (int32 r = 0; r < NumRows(); r++) {
3096  for (int32 c = 0; c < NumCols(); c++) {
3097  (*mask)(r,c) = ((*this)(r,c) == mat(r,c) ? 1.0 : 0.0);
3098  }
3099  }
3100  }
3101 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void FindRowMaxId ( CuArray< int32 > *  id) const

Find the id of the maximal element for each row.

Definition at line 1636 of file cu-matrix.cc.

References CU1DBLOCK, rnnlm::d, CuArray< T >::Data(), data_, and Timer::Elapsed().

Referenced by kaldi::nnet3::ComputeAccuracy(), NnetUpdater::ComputeTotAccuracy(), Xent::Eval(), kaldi::TestCuFindRowMaxId(), and kaldi::UnitTestCuFindRowMaxId().

1636  {
1637 #if HAVE_CUDA == 1
1638  if (CuDevice::Instantiate().Enabled()) {
1639  Timer tim;
1640  id->Resize(num_rows_);
1641  MatrixDim d = Dim();
1642 
1643  // CUDA thread layout: one thread block per matrix-row.
1644  dim3 dimBlock(CU1DBLOCK);
1645  dim3 dimGrid(num_rows_);
1646  cuda_find_row_max_id(dimGrid, dimBlock, data_, NULL, id->Data(), d);
1647  CU_SAFE_CALL(cudaGetLastError());
1648 
1649  // now we have the indices!
1650  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1651  } else
1652 #endif
1653  {
1654  // allocate index buffer
1655  id->Resize(num_rows_);
1656  id->Set(-1);
1657  // find maxima
1658  MatrixIndexT num_rows = num_rows_, num_cols = num_cols_;
1659  for (MatrixIndexT r = 0; r < num_rows; r++) {
1660  Real max = -1e21;
1661  int32 max_id = -1;
1662  const Real *row_data = Mat().RowData(r);
1663  for (MatrixIndexT c = 0; c < num_cols; c++) {
1664  if (max < row_data[c]) {
1665  max = row_data[c];
1666  max_id = c;
1667  }
1668  }
1669  id->Data()[r] = max_id;
1670  }
1671  }
1672 }
const T * Data() const
Get raw pointer.
Definition: cu-array.h:65
Structure containing size of the matrix plus stride.
Definition: cu-matrixdim.h:52
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
int32 MatrixIndexT
Definition: matrix-common.h:96
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:201
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
Real FrobeniusNorm ( ) const
inline

Definition at line 206 of file cu-matrix.h.

Referenced by CuMatrixBase< Real >::ApproxEqual(), and kaldi::UnitTestCuSparseMatrixFrobeniusNorm().

206 { return sqrt(TraceMatMat(*this, *this, kTrans)); }
friend Real TraceMatMat(const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, MatrixTransposeType trans)
Definition: cu-matrix.cc:1942
void GroupMax ( const CuMatrixBase< Real > &  src)

Apply the function y(i) = (max_{j = i*G}^{(i+1)*G-1} x_j where G = x.NumCols() / y.NumCols() must be an integer.

[note: y corresponds to *this and x to src, so src.NumCols() / this->NumCols() must be an integer.

Definition at line 1427 of file cu-matrix.cc.

References CU1DBLOCK, data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by MaxoutComponent::Propagate(), kaldi::TestCuMatrixGroupMax(), kaldi::TestCuMatrixGroupMaxAllGroupSizes(), and kaldi::UnitTestCuMatrixGroupMax().

1427  {
1428  int group_size = src.NumCols() / this->NumCols();
1429  KALDI_ASSERT(src.NumCols() == this->NumCols() * group_size &&
1430  this->NumRows() == src.NumRows());
1431 #if HAVE_CUDA == 1
1432  if (CuDevice::Instantiate().Enabled()) {
1433  Timer tim;
1434  // One thread block per row.
1435  // Use 2D block for small group size to simplify the calculation.
1436  // Each group is reduced by threads_per_group threads.
1437  // threads_per_group should be a power of 2 for fast tree reduction.
1438  // group size: 1 2 3 4 5 6 7 .. 12 13 .. 24 25 .. 48 ...
1439  // threads_per_group: 1 1 1 2 2 2 4 .. 4 8 .. 8 16 .. 16 ...
1440  int threads_per_group = CU1DBLOCK;
1441  while (threads_per_group * 3 / 2 >= group_size) {
1442  threads_per_group >>= 1;
1443  }
1444  if (group_size == 1) {
1445  threads_per_group = 1;
1446  }
1447  dim3 dimBlock(threads_per_group, CU1DBLOCK / threads_per_group);
1448  dim3 dimGrid(NumRows());
1449  cuda_group_max(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
1450  src.Stride(), group_size);
1451  CU_SAFE_CALL(cudaGetLastError());
1452  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1453  } else
1454 #endif
1455  {
1456  Mat().GroupMax(src.Mat());
1457  }
1458 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void GroupMaxDeriv ( const CuMatrixBase< Real > &  input,
const CuMatrixBase< Real > &  output 
)

Calculate derivatives for the GroupMax function above, where "input" is the input to the GroupMax function above (i.e.

the "src" variable), and "output" is the result of the computation (i.e. the "this" of that function call), and *this must have the same dimension as "input". Each element of *this will be set to 1 if the corresponding input equals the output of the group, and 0 otherwise. The equals the function derivative where it is defined (it's not defined where multiple inputs in the group are equal to the output).

Definition at line 859 of file cu-matrix.cc.

References CU2DBLOCK, CuMatrixBase< Real >::Data(), data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by MaxoutComponent::Backprop(), kaldi::TestCuMatrixGroupMaxDeriv(), and kaldi::UnitTestCuMatrixGroupMaxDeriv().

860  {
861  KALDI_ASSERT(src2.NumCols() > 0);
862  int group_size = this->NumCols() / src2.NumCols();
863  KALDI_ASSERT(this->NumCols() == src2.NumCols() * group_size);
864 #if HAVE_CUDA == 1
865  if (CuDevice::Instantiate().Enabled()) {
866  Timer tim;
867  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
868  dim3 dimGrid(n_blocks(NumCols(), CU2DBLOCK),
869  n_blocks(NumRows(), CU2DBLOCK));
870  cuda_calc_group_max_deriv(dimGrid, dimBlock, this->data_, src1.Data(),
871  src2.Data(), Dim(), src1.Stride(), src2.Stride(),
872  group_size);
873  CU_SAFE_CALL(cudaGetLastError());
874 
875  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
876  } else
877 #endif
878  {
879  Mat().GroupMaxDeriv(src1.Mat(), src2.Mat());
880  }
881 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void GroupPnorm ( const CuMatrixBase< Real > &  src,
Real  pow 
)

Apply the function y(i) = (sum_{j = i*G}^{(i+1)*G-1} x_j ^ (power)) ^ (1 / p) where G = x.NumCols() / y.NumCols() must be an integer.

[note: y corresponds to *this and x to src, so src.NumCols() / this->NumCols() must be an integer.

Definition at line 1386 of file cu-matrix.cc.

References CU1DBLOCK, CU2DBLOCK, data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by PnormComponent::Propagate(), kaldi::TestCuMatrixDiffGroupPnorm(), kaldi::TestCuMatrixGroupPnorm(), and kaldi::UnitTestCuMatrixGroupPnorm().

1386  {
1387  int group_size = src.NumCols() / this->NumCols();
1388  KALDI_ASSERT(src.NumCols() == this->NumCols() * group_size &&
1389  this->NumRows() == src.NumRows());
1390 #if HAVE_CUDA == 1
1391  if (CuDevice::Instantiate().Enabled()) {
1392  Timer tim;
1393  if (power == Real(0) || power == Real(1) || power == Real(2)
1394  || power == std::numeric_limits<Real>::infinity()) {
1395  // One thread block per row.
1396  // Use 2D block for small group size to simplify the calculation
1397  // Each group is reduced by threads_per_group threads.
1398  // threads_per_group should be a power of 2 for fast tree reduction.
1399  int threads_per_group = CU1DBLOCK;
1400  while (threads_per_group * 3 / 2 >= group_size) {
1401  threads_per_group >>= 1;
1402  }
1403  if (group_size == 1) {
1404  threads_per_group = 1;
1405  }
1406  dim3 dimBlock(threads_per_group, CU1DBLOCK / threads_per_group);
1407  dim3 dimGrid(NumRows());
1408  cuda_group_spec_pnorm(dimGrid, dimBlock, this->data_, src.data_,
1409  this->Dim(), src.Stride(), group_size, power);
1410  } else {
1411  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
1412  dim3 dimGrid(n_blocks(NumCols(), CU2DBLOCK),
1413  n_blocks(NumRows(), CU2DBLOCK));
1414  cuda_group_pnorm(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
1415  src.Stride(), group_size, power);
1416  }
1417  CU_SAFE_CALL(cudaGetLastError());
1418  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1419  } else
1420 #endif
1421  {
1422  Mat().GroupPnorm(src.Mat(), power);
1423  }
1424 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void Heaviside ( const CuMatrixBase< Real > &  src)

Set each element to the Heaviside function of the corresponding element of "src", which we define as the function (x > 0 ? 1.0 : 0.0) [note: in general, there are different ways to deal with the situation when x==0.

]

Definition at line 2277 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by RectifiedLinearComponent::Backprop(), CuRand< Real >::BinarizeProbs(), Dropout::PropagateFnc(), RectifiedLinearComponent::StoreStats(), and kaldi::UnitTestCuMatrixHeaviside().

2277  {
2278  KALDI_ASSERT(SameDim(*this, src));
2279 #if HAVE_CUDA == 1
2280  if (CuDevice::Instantiate().Enabled()) {
2281  Timer tim;
2282  dim3 dimGrid, dimBlock;
2283  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2284  &dimGrid, &dimBlock);
2285  cuda_heaviside(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
2286  src.Stride());
2287  CU_SAFE_CALL(cudaGetLastError());
2288 
2289  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2290  } else
2291  #endif
2292  {
2293  Mat().Heaviside(src.Mat());
2294  }
2295 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void InvertElements ( )

invert the matrix by elements.

Definition at line 916 of file cu-matrix.cc.

References data_, and Timer::Elapsed().

Referenced by kaldi::TestCuMatrixCompObjfAndDeriv(), NnetEnsembleTrainer::TrainOneMinibatch(), kaldi::UnitTestCuMatrixInvertElements(), and kaldi::UnitTestCuMatrixObjfDeriv().

916  {
917 #if HAVE_CUDA == 1
918  if (CuDevice::Instantiate().Enabled()) {
919  Timer tim;
920 
921  dim3 dimGrid, dimBlock;
922  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
923  &dimGrid, &dimBlock);
924 
925  cuda_invert_elements(dimGrid, dimBlock, data_, Dim());
926  CU_SAFE_CALL(cudaGetLastError());
927 
928  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
929  } else
930 #endif
931  {
932  Mat().InvertElements();
933  }
934 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
bool IsUnit ( Real  tol = 0.001) const

Definition at line 593 of file cu-matrix.cc.

References KALDI_ASSERT, kaldi::kTrans, and kaldi::TraceMatMat().

Referenced by OnlineNaturalGradient::InitOrthonormalSpecial(), OnlinePreconditioner::InitOrthonormalSpecial(), kaldi::UnitTestCuMatrixSymInvertPosDef(), and kaldi::UnitTestCuSpMatrixInvert().

593  {
594  // want to return:
595  //FrobeniusNorm(*this - I) <= tol * NumRows(), i.e.:
596  //sqrt (trace((*this - I)(*this-I)) <= tol * NumRows()
597  // trace((*this - I)(*this - I)) <= tol * NumRows()
598  // trace(*this * *this) + trace(I) - 2 * trace(*this) <= tol * NumRows()
599  // trace(*this * *this) + dim - 2*this.Trace() <= tol * NumRows()
600  KALDI_ASSERT(this->NumRows() == this->NumCols());
601  return (TraceMatMat(*this, *this, kTrans) + this->NumRows() - 2.0 * this->Trace() <=
602  tol * this->NumRows());
603 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
friend Real TraceMatMat(const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, MatrixTransposeType trans)
Definition: cu-matrix.cc:1942
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
Real Trace(bool check_square=true) const
Return the trace. If check_square = true, will crash if matrix is not square.
Definition: cu-matrix.cc:2746
KALDI_DISALLOW_COPY_AND_ASSIGN ( CuMatrixBase< Real >  )
private
void Lookup ( const std::vector< Int32Pair > &  indexes,
Real *  output 
) const

Definition at line 3014 of file cu-matrix.cc.

References rnnlm::i, and KALDI_ASSERT.

Referenced by NnetDiscriminativeUpdater::LatticeComputations(), DiscriminativeComputation::LookupNnetOutput(), kaldi::TestCuMatrixLookup(), and kaldi::UnitTestCuMatrixLookup().

3015  {
3016  // Checks the dimension.
3017  MatrixIndexT num_rows = this->num_rows_, num_cols = this->num_cols_;
3018  for (int32 i = 0; i < indices.size(); ++i) {
3019  KALDI_ASSERT(indices[i].first < num_rows && indices[i].first >= 0 &&
3020  indices[i].second < num_cols && indices[i].second >= 0);
3021  }
3022  if (indices.size() == 0) return;
3023  KALDI_ASSERT(output != NULL);
3024 
3025 #if HAVE_CUDA == 1
3026  if (CuDevice::Instantiate().Enabled()) {
3027  CuArray<Int32Pair> cuda_indices(indices);
3028  Lookup(cuda_indices, output);
3029  } else
3030 #endif
3031  {
3032  for (int32 i = 0; i < indices.size(); i++) {
3033  output[i] = (*this)(indices[i].first, indices[i].second);
3034  }
3035  }
3036 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
int32 MatrixIndexT
Definition: matrix-common.h:96
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void Lookup(const std::vector< Int32Pair > &indexes, Real *output) const
Definition: cu-matrix.cc:3014
void Lookup ( const CuArray< Int32Pair > &  indexes,
Real *  output 
) const

Definition at line 3039 of file cu-matrix.cc.

References CuArray< T >::CopyToHost(), CU1DBLOCK, CuArray< T >::Data(), data_, CuArray< T >::Dim(), Timer::Elapsed(), Int32Pair::first, rnnlm::i, KALDI_ASSERT, and Int32Pair::second.

3040  {
3041  int32 num_elements = indices.Dim();
3042  if (num_elements == 0) return;
3043  KALDI_ASSERT(output != NULL);
3044 
3045 #if HAVE_CUDA == 1
3046  if (CuDevice::Instantiate().Enabled()) {
3047  CuArray<Real> cuda_output(num_elements);
3048  Timer tim;
3049  dim3 dimBlock(CU1DBLOCK, 1);
3050  dim3 dimGrid(n_blocks(num_elements, CU1DBLOCK), 1);
3051 
3052  cuda_matrix_lookup(dimGrid, dimBlock, this->data_, this->Dim(),
3053  indices.Data(), num_elements, cuda_output.Data());
3054  CU_SAFE_CALL(cudaGetLastError());
3055 
3056  cuda_output.CopyToHost(output);
3057  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
3058  } else
3059 #endif
3060  {
3061  MatrixIndexT num_rows = this->num_rows_, num_cols = this->num_cols_;
3062  const Int32Pair *index = indices.Data();
3063  for (int32 i = 0; i < num_elements; i++) {
3064  KALDI_ASSERT(index[i].first < num_rows && index[i].first >= 0 &&
3065  index[i].second < num_cols && index[i].second >= 0);
3066  output[i] = (*this)(index[i].first, index[i].second);
3067  }
3068  }
3069 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
int32 MatrixIndexT
Definition: matrix-common.h:96
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
int32_cuda second
Definition: cu-matrixdim.h:86
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
int32_cuda first
Definition: cu-matrixdim.h:85
const MatrixBase<Real>& Mat ( ) const
inline

Definition at line 614 of file cu-matrix.h.

Referenced by CuMatrixBase< Real >::AddCols(), CuVectorBase< Real >::AddDiagMat2(), CuVectorBase< Real >::AddDiagMatMat(), CuMatrixBase< Real >::AddDiagVecMat(), CuMatrixBase< Real >::AddMat(), CuSpMatrix< Real >::AddMat2(), CuMatrixBase< Real >::AddMatBlocks(), CuMatrixBase< Real >::AddMatDiagVec(), CuMatrixBase< Real >::AddMatMat(), CuMatrixBase< Real >::AddMatMatElements(), CuVectorBase< Real >::AddMatVec(), CuMatrixBase< Real >::AddRows(), GeneralMatrix::AddToMat(), CuMatrixBase< Real >::ApplyLogSoftMaxPerRow(), CuMatrixBase< Real >::ApplySoftMaxPerRow(), kaldi::cu::BackpropLstmNonlinearity(), kaldi::cu::ComputeLstmNonlinearity(), kaldi::cu::Copy(), CuVectorBase< Real >::CopyColFromMat(), CuMatrixBase< Real >::CopyCols(), CuTpMatrix< Real >::CopyFromMat(), CuSpMatrix< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyRows(), CuVectorBase< Real >::CopyRowsFromMat(), VectorBase< Real >::CopyRowsFromMat(), CuSparseMatrix< Real >::CopyToMat(), GeneralMatrix::CopyToMat(), CuMatrixBase< Real >::DiffGroupPnorm(), CuMatrixBase< Real >::DiffParametricRelu(), CuMatrixBase< Real >::DiffSigmoid(), CuMatrixBase< Real >::DiffTanh(), CuMatrixBase< Real >::DivElements(), CuMatrixBase< Real >::GroupMax(), CuMatrixBase< Real >::GroupMaxDeriv(), CuMatrixBase< Real >::GroupPnorm(), CuMatrixBase< Real >::Heaviside(), CuMatrixBase< Real >::Max(), CuMatrixBase< Real >::Min(), CuMatrixBase< Real >::MulElements(), CuMatrixBase< Real >::MulRowsGroupMat(), CuMatrixBase< Real >::ParametricRelu(), CuRand< Real >::RandGaussian(), kaldi::cu::Randomize(), CuRand< Real >::RandUniform(), kaldi::cu::RegularizeL1(), CuMatrixBase< Real >::SetMatMatDivMat(), CuMatrixBase< Real >::Sigmoid(), CuMatrixBase< Real >::SoftHinge(), kaldi::cu::Splice(), CuMatrixBase< Real >::SymAddMat2(), CuMatrixBase< Real >::Tanh(), kaldi::TraceMatMat(), and kaldi::TraceMatSmat().

614  {
615  return *(reinterpret_cast<const MatrixBase<Real>* >(this));
616  }
MatrixBase<Real>& Mat ( )
inline

Definition at line 617 of file cu-matrix.h.

617  {
618  return *(reinterpret_cast<MatrixBase<Real>* >(this));
619  }
void Max ( const CuMatrixBase< Real > &  A)

Do, elementwise, *this = max(*this, A).

Definition at line 700 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by main(), SpliceMaxComponent::Propagate(), MaxpoolingComponent::Propagate(), kaldi::TestCuMatrixMax(), kaldi::UnitTestCuMatrixMax(), and kaldi::UnitTestCuMatrixReduceMax().

700  {
701  #if HAVE_CUDA == 1
702  if (CuDevice::Instantiate().Enabled()) {
703  Timer tim;
704 
705  KALDI_ASSERT(num_cols_ == A.NumCols());
706  KALDI_ASSERT(num_rows_ == A.NumRows());
707 
708  dim3 dimGrid, dimBlock;
709  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
710  &dimGrid, &dimBlock);
711 
712  cuda_max(dimGrid, dimBlock, data_, A.data_, Dim(), A.Stride());
713  CU_SAFE_CALL(cudaGetLastError());
714 
715  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
716  } else
717  #endif
718  {
719  Mat().Max(A.Mat());
720  }
721 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
Real Max ( ) const

Definition at line 2704 of file cu-matrix.cc.

References CU1DBLOCK, CuVectorBase< Real >::Data(), data_, Timer::Elapsed(), KALDI_ASSERT, kaldi::kUndefined, and CuVectorBase< Real >::Max().

2704  {
2705 #if HAVE_CUDA == 1
2706  if (CuDevice::Instantiate().Enabled()) {
2707  KALDI_ASSERT(num_rows_ > 0 && num_cols_ > 0);
2708  Timer tim;
2709 
2710  CuVector<Real> col_max(num_rows_, kUndefined);
2711  cuda_max_mat_cols(num_rows_, CU1DBLOCK, col_max.Data(), data_, Dim());
2712  Real ans = col_max.Max();
2713 
2714  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2715  return ans;
2716  } else
2717 #endif
2718  {
2719  return Mat().Max();
2720  }
2721 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void Min ( const CuMatrixBase< Real > &  A)

Do, elementwise, *this = min(*this, A).

Definition at line 725 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by main(), kaldi::TestCuMatrixMin(), kaldi::UnitTestCuMatrixMin(), and kaldi::UnitTestCuMatrixReduceMin().

725  {
726  #if HAVE_CUDA == 1
727  if (CuDevice::Instantiate().Enabled()) {
728  Timer tim;
729 
730  KALDI_ASSERT(num_cols_ == A.NumCols());
731  KALDI_ASSERT(num_rows_ == A.NumRows());
732 
733  dim3 dimGrid, dimBlock;
734  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
735  &dimGrid, &dimBlock);
736 
737  cuda_min(dimGrid, dimBlock, data_, A.data_, Dim(), A.Stride());
738  CU_SAFE_CALL(cudaGetLastError());
739 
740  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
741  } else
742  #endif
743  {
744  Mat().Min(A.Mat());
745  }
746 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
Real Min ( ) const

Definition at line 2725 of file cu-matrix.cc.

References CU1DBLOCK, CuVectorBase< Real >::Data(), data_, Timer::Elapsed(), KALDI_ASSERT, kaldi::kUndefined, and CuVectorBase< Real >::Min().

2725  {
2726 #if HAVE_CUDA == 1
2727  if (CuDevice::Instantiate().Enabled()) {
2728  KALDI_ASSERT(num_rows_ > 0 && num_cols_ > 0);
2729  Timer tim;
2730 
2731  CuVector<Real> col_min(num_rows_, kUndefined);
2732  cuda_min_mat_cols(num_rows_, CU1DBLOCK, col_min.Data(), data_, Dim());
2733  Real ans = col_min.Min();
2734 
2735  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2736  return ans;
2737  } else
2738 #endif
2739  {
2740  return Mat().Min();
2741  }
2742 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void MulColsVec ( const CuVectorBase< Real > &  scale)

scale i'th column by scale[i]

Definition at line 750 of file cu-matrix.cc.

References data_, CuVectorBase< Real >::data_, CuVectorBase< Real >::Dim(), Timer::Elapsed(), KALDI_ASSERT, and CuVectorBase< Real >::Vec().

Referenced by PerElementScaleComponent::Backprop(), FixedScaleComponent::Backprop(), Convolutional2DComponent::BackpropagateFnc(), Rescale::BackpropagateFnc(), PerElementScaleComponent::Propagate(), FixedScaleComponent::Propagate(), Rescale::PropagateFnc(), kaldi::UnitTestCuMatrixAddMatDiagVec(), kaldi::UnitTestCuMatrixMulColsVec(), and NnetLogprobTask::~NnetLogprobTask().

750  {
751 #if HAVE_CUDA == 1
752  if (CuDevice::Instantiate().Enabled()) {
753  Timer tim;
754 
755  KALDI_ASSERT(scale.Dim() == NumCols());
756 
757 
758  dim3 dimGrid, dimBlock;
759  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
760  &dimGrid, &dimBlock);
761 
762  cuda_mul_cols_vec(dimGrid, dimBlock, data_, scale.data_, Dim());
763  CU_SAFE_CALL(cudaGetLastError());
764 
765 
766  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
767  } else
768 #endif
769  {
770  Mat().MulColsVec(scale.Vec());
771  }
772 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void MulElements ( const CuMatrixBase< Real > &  A)

Multiply two matrices elementwise: C = C .* A.

Definition at line 652 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by ElementwiseProductComponent::Backprop(), BackpropTruncationComponent::Backprop(), SigmoidComponent::Backprop(), TanhComponent::Backprop(), PowerComponent::Backprop(), RectifiedLinearComponent::Backprop(), SoftHingeComponent::Backprop(), MaxpoolingComponent::Backprop(), HiddenSoftmax::BackpropagateFnc(), Dropout::BackpropagateFnc(), kaldi::nnet1::ComputeStdDev(), CuMatrixBase< Real >::DiffSoftmaxPerRow(), Xent::Eval(), Mse::Eval(), ElementwiseProductComponent::Propagate(), DropoutComponent::Propagate(), KlHmm::PropagateFnc(), LengthNormComponent::PropagateFnc(), Dropout::PropagateFnc(), ClipGradientComponent::RepairGradients(), NnetEnsembleTrainer::TrainOneMinibatch(), kaldi::UnitTestCuMatrixAddMatMatElements(), kaldi::UnitTestCuMatrixMulElements(), kaldi::nnet1::UnitTestLengthNorm(), ParametricRelu::Update(), AffineTransform::Update(), FramePoolingComponent::Update(), ConvolutionalComponent::Update(), Rescale::Update(), and NaturalGradientPerElementScaleComponent::Update().

652  {
653  #if HAVE_CUDA == 1
654  if (CuDevice::Instantiate().Enabled()) {
655  Timer tim;
656 
657  KALDI_ASSERT(num_cols_ == A.NumCols());
658  KALDI_ASSERT(num_rows_ == A.NumRows());
659 
660  dim3 dimGrid, dimBlock;
661  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
662  &dimGrid, &dimBlock);
663 
664  cuda_mul_elements(dimGrid, dimBlock, data_, A.data_, Dim(), A.Stride());
665  CU_SAFE_CALL(cudaGetLastError());
666 
667  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
668  } else
669  #endif
670  {
671  Mat().MulElements(A.Mat());
672  }
673 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void MulRowsGroupMat ( const CuMatrixBase< Real > &  src)

divide each row into src.NumCols() groups, and then scale i'th row's jth group of elements by src[i, j].

Definition at line 801 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by MaxoutComponent::Backprop(), and kaldi::UnitTestCuMatrixMulRowsGroupMat().

801  {
802  KALDI_ASSERT(src.NumCols() > 0);
803 #if HAVE_CUDA == 1
804  if (CuDevice::Instantiate().Enabled()) {
805  Timer tim;
806  int group_size = this->NumCols() / src.NumCols();
807 
808  dim3 dimGrid, dimBlock;
809  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
810  &dimGrid, &dimBlock);
811 
812  cuda_mul_rows_group_mat(dimGrid, dimBlock, this->data_, src.data_,
813  this->Dim(), src.Stride(), group_size);
814  CU_SAFE_CALL(cudaGetLastError());
815 
816  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
817  } else
818 #endif
819  {
820  Mat().MulRowsGroupMat(src.Mat());
821  }
822 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void MulRowsVec ( const CuVectorBase< Real > &  scale)

scale i'th row by scale[i]

Definition at line 777 of file cu-matrix.cc.

References data_, CuVectorBase< Real >::data_, CuVectorBase< Real >::Dim(), Timer::Elapsed(), KALDI_ASSERT, and CuVectorBase< Real >::Vec().

Referenced by NormalizeComponent::Backprop(), BackpropTruncationComponent::Backprop(), ClipGradientComponent::Backprop(), BlockSoftmax::BackpropagateFnc(), LengthNormComponent::BackpropagateFnc(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), Xent::Eval(), Mse::Eval(), kaldi::nnet2::PreconditionDirections(), NnetChainTrainer::ProcessOutputs(), NnetDiscriminativeTrainer::ProcessOutputs(), LengthNormComponent::PropagateFnc(), OnlineNaturalGradient::ReorthogonalizeXt1(), OnlinePreconditioner::ReorthogonalizeXt1(), kaldi::UnitTestCuMatrixMulRowsVec(), AffineTransform::Update(), and ConvolutionalComponent::Update().

777  {
778  #if HAVE_CUDA == 1
779  if (CuDevice::Instantiate().Enabled()) {
780  Timer tim;
781 
782  KALDI_ASSERT(scale.Dim() == NumRows());
783 
784  dim3 dimGrid, dimBlock;
785  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
786  &dimGrid, &dimBlock);
787 
788  cuda_mul_rows_vec(dimGrid, dimBlock, data_, scale.data_, Dim());
789  CU_SAFE_CALL(cudaGetLastError());
790 
791 
792  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
793  } else
794  #endif
795  {
796  Mat().MulRowsVec(scale.Vec());
797  }
798 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT NumCols ( ) const
inline

Definition at line 196 of file cu-matrix.h.

Referenced by NnetComputer::AcceptInput(), NnetLdaStatsAccumulator::AccStatsFromOutput(), CuVectorBase< Real >::AddColSumMat(), MatrixRandomizer::AddData(), CuVectorBase< Real >::AddDiagMatMat(), CuMatrixBase< Real >::AddDiagVecMat(), CuRand< Real >::AddGaussNoise(), CuMatrixBase< Real >::AddMat(), CuSpMatrix< Real >::AddMat2(), CuMatrixBase< Real >::AddMatBlock(), CuMatrixBase< Real >::AddMatBlocks(), CuMatrixBase< Real >::AddMatDiagVec(), CuBlockMatrix< Real >::AddMatMat(), CuMatrixBase< Real >::AddMatMat(), CuVectorBase< Real >::AddMatVec(), CuMatrixBase< Real >::AddRowRanges(), CuMatrixBase< Real >::AddRows(), CuVectorBase< Real >::AddRowSumMat(), NnetUpdater::Backprop(), StatisticsExtractionComponent::Backprop(), SumReduceComponent::Backprop(), MaxoutComponent::Backprop(), MaxpoolingComponent::Backprop(), PnormComponent::Backprop(), NormalizeComponent::Backprop(), RepeatedAffineComponent::Backprop(), SigmoidComponent::Backprop(), TanhComponent::Backprop(), PowerComponent::Backprop(), RectifiedLinearComponent::Backprop(), SoftHingeComponent::Backprop(), ScaleComponent::Backprop(), SoftmaxComponent::Backprop(), LogSoftmaxComponent::Backprop(), SpliceComponent::Backprop(), SpliceMaxComponent::Backprop(), BlockAffineComponent::Backprop(), PermuteComponent::Backprop(), DctComponent::Backprop(), FixedLinearComponent::Backprop(), FixedAffineComponent::Backprop(), ConvolutionComponent::Backprop(), DropoutComponent::Backprop(), LstmNonlinearityComponent::Backprop(), Convolutional1dComponent::Backprop(), CompositeComponent::Backprop(), Component::Backpropagate(), Splice::BackpropagateFnc(), SentenceAveragingComponent::BackpropagateFnc(), Convolutional2DComponent::BackpropagateFnc(), ConvolutionalComponent::BackpropagateFnc(), kaldi::cu::BackpropLstmNonlinearity(), CuRand< Real >::BinarizeProbs(), ChunkInfo::CheckSize(), NnetComputerFromEg::Compute(), NnetOnlineComputer::Compute(), DiscriminativeComputation::Compute(), kaldi::nnet3::ComputeAccuracy(), NnetComputer::ComputeLastLayerDeriv(), kaldi::cu::ComputeLstmNonlinearity(), kaldi::nnet3::ComputeObjectiveFunction(), kaldi::nnet1::ComputeStdDev(), OnlineNaturalGradient::ComputeWt1(), OnlinePreconditioner::ComputeWt1(), ConvolutionComponent::ConvolutionComponent(), kaldi::cu::Copy(), CuVectorBase< Real >::CopyColFromMat(), CuVectorBase< Real >::CopyDiagFromMat(), CuMatrixBase< Real >::CopyFromBlock(), CuTpMatrix< Real >::CopyFromMat(), CuSpMatrix< Real >::CopyFromMat(), CuBlockMatrix< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyRows(), CuVectorBase< Real >::CopyRowsFromMat(), VectorBase< Real >::CopyRowsFromMat(), CuSparseMatrix< Real >::CopyToMat(), CuMatrix< Real >::CuMatrix(), kaldi::CuRandGaussianMatrixBaseSpeedTest(), kaldi::CuRandGaussianMatrixSpeedTest(), kaldi::CuRandUniformMatrixBaseSpeedTest(), kaldi::CuRandUniformMatrixSpeedTest(), CuSubVector< Real >::CuSubVector(), CuTpMatrix< Real >::CuTpMatrix(), CuMatrixBase< Real >::DiffGroupPnorm(), CuMatrixBase< Real >::DivElements(), CuMatrixBase< Real >::EqualElementMask(), Xent::Eval(), Mse::Eval(), MultiTaskLoss::Eval(), NnetComputer::ExecuteCommand(), AffineTransform::GetGradient(), RecurrentComponent::GetGradient(), LstmProjected::GetGradient(), ConvolutionalComponent::GetGradient(), Convolutional2DComponent::GetGradient(), BlstmProjected::GetGradient(), Convolutional1dComponent::GetParameterDim(), AffineTransform::GetParams(), RecurrentComponent::GetParams(), LstmProjected::GetParams(), ConvolutionalComponent::GetParams(), Convolutional2DComponent::GetParams(), BlstmProjected::GetParams(), NnetComputer::GetPointers(), CuMatrixBase< Real >::GroupMax(), CuMatrixBase< Real >::GroupMaxDeriv(), CuMatrixBase< Real >::GroupPnorm(), ConvolutionComponent::InderivPatchesToInderiv(), MaxpoolingComponent::InderivPatchesToInderiv(), AffineComponent::Info(), AffineComponentPreconditioned::Info(), AffineComponentPreconditionedOnline::Info(), DctComponent::Info(), FixedLinearComponent::Info(), FixedAffineComponent::Info(), LstmNonlinearityComponent::Info(), Convolutional1dComponent::Info(), OnlineNaturalGradient::Init(), OnlinePreconditioner::Init(), NaturalGradientAffineComponent::Init(), AffineComponent::Init(), AffineComponentPreconditioned::Init(), AffineComponentPreconditionedOnline::Init(), FixedAffineComponent::Init(), ConvolutionComponent::Init(), Convolutional1dComponent::Init(), LinearTransform::InitData(), OnlineNaturalGradient::InitOrthonormalSpecial(), OnlinePreconditioner::InitOrthonormalSpecial(), RepeatedAffineComponent::InputDim(), AffineComponent::InputDim(), BlockAffineComponent::InputDim(), FixedLinearComponent::InputDim(), FixedAffineComponent::InputDim(), LstmNonlinearityComponent::InputDim(), Convolutional1dComponent::InputDim(), ConvolutionComponent::InputToInputPatches(), MaxpoolingComponent::InputToInputPatches(), NnetDiscriminativeUpdater::LatticeComputations(), main(), NnetComputer::MatrixStddev(), CuMatrixBase< Real >::Max(), kaldi::MeanVariance(), CuMatrixBase< Real >::Min(), kaldi::nnet1::MomentStatistics(), CuMatrixBase< Real >::MulElements(), CuMatrixBase< Real >::MulRowsGroupMat(), NnetComputer::NnetComputer(), kaldi::cu::NormalizePerRow(), BlockAffineComponent::NumParameters(), RepeatedAffineComponent::NumParameters(), ConvolutionComponent::NumParameters(), LstmNonlinearityComponent::NumParameters(), LinearTransform::NumParams(), AffineTransform::NumParams(), RecurrentComponent::NumParams(), LstmProjected::NumParams(), ConvolutionalComponent::NumParams(), Convolutional2DComponent::NumParams(), BlstmProjected::NumParams(), CuMatrix< BaseFloat >::operator=(), DctComponent::OutputDim(), LstmNonlinearityComponent::OutputDim(), CuMatrixBase< Real >::ParametricRelu(), LstmNonlinearityComponent::PerturbParams(), kaldi::nnet2::PreconditionDirections(), OnlineNaturalGradient::PreconditionDirections(), OnlinePreconditioner::PreconditionDirections(), kaldi::nnet2::PreconditionDirectionsAlpha(), kaldi::nnet2::PreconditionDirectionsAlphaRescaled(), OnlineNaturalGradient::PreconditionDirectionsInternal(), OnlinePreconditioner::PreconditionDirectionsInternal(), kaldi::nnet3::PrintParameterStats(), NnetComputeProb::ProcessOutputs(), DistributeComponent::Propagate(), Component::Propagate(), ElementwiseProductComponent::Propagate(), NormalizeComponent::Propagate(), StatisticsExtractionComponent::Propagate(), SumReduceComponent::Propagate(), StatisticsPoolingComponent::Propagate(), RepeatedAffineComponent::Propagate(), SpliceComponent::Propagate(), BlockAffineComponent::Propagate(), DctComponent::Propagate(), ConvolutionComponent::Propagate(), DropoutComponent::Propagate(), AdditiveNoiseComponent::Propagate(), Convolutional1dComponent::Propagate(), CompositeComponent::Propagate(), KlHmm::PropagateFnc(), FramePoolingComponent::PropagateFnc(), SentenceAveragingComponent::PropagateFnc(), ConvolutionalComponent::PropagateFnc(), Convolutional2DComponent::PropagateFnc(), Dropout::PropagateFnc(), kaldi::nnet1::RandGauss(), CuRand< Real >::RandGaussian(), kaldi::cu::Randomize(), CuRand< Real >::RandUniform(), kaldi::nnet1::RandUniform(), Rbm::RbmUpdate(), LstmNonlinearityComponent::Read(), AffineTransform::ReadData(), LinearTransform::ReadData(), ConvolutionalComponent::ReadData(), Rbm::ReadData(), Rbm::Reconstruct(), kaldi::cu::RegularizeL1(), DctComponent::Reorder(), OnlineNaturalGradient::ReorthogonalizeXt1(), OnlinePreconditioner::ReorthogonalizeXt1(), Mse::Report(), NnetRescaler::RescaleComponent(), CuMatrixBase< BaseFloat >::Row(), OnlineNaturalGradient::SelfTest(), OnlinePreconditioner::SelfTest(), CuBlockMatrix< Real >::SetCudaData(), LinearTransform::SetLinearity(), AffineTransform::SetLinearity(), NaturalGradientRepeatedAffineComponent::SetNaturalGradientConfigs(), AffineTransform::SetParams(), RecurrentComponent::SetParams(), ConvolutionalComponent::SetParams(), Convolutional2DComponent::SetParams(), LstmProjected::SetParams(), BlstmProjected::SetParams(), kaldi::cu::Splice(), SigmoidComponent::StoreStats(), RectifiedLinearComponent::StoreStats(), NonlinearComponent::StoreStatsInternal(), PdfPrior::SubtractOnLogpost(), kaldi::TraceMatMat(), kaldi::TraceMatSmat(), kaldi::UnitTestCuMathCopy(), kaldi::UnitTestCuMathRandomize(), kaldi::UnitTestCuMathSplice(), kaldi::UnitTestCuTanh(), UnitTestMatrixRandomizer(), kaldi::nnet2::UnitTestNnetCompute(), kaldi::UnitTestSwapCu2Cu(), kaldi::UnitTestSwapCu2M(), RepeatedAffineComponent::UnVectorize(), BlockAffineComponent::UnVectorize(), ConvolutionComponent::UnVectorize(), RecurrentComponent::Update(), ConvolutionalComponent::Update(), Convolutional2DComponent::Update(), RepeatedAffineComponent::Update(), NaturalGradientRepeatedAffineComponent::Update(), NaturalGradientAffineComponent::Update(), AffineComponentPreconditioned::Update(), AffineComponentPreconditionedOnline::Update(), BlockAffineComponentPreconditioned::Update(), ConvolutionComponent::Update(), Convolutional1dComponent::Update(), BlockAffineComponent::UpdateSimple(), NonlinearComponent::UpdateStats(), MatrixRandomizer::Value(), RepeatedAffineComponent::Vectorize(), BlockAffineComponent::Vectorize(), ConvolutionComponent::Vectorize(), DctComponent::Write(), and LstmNonlinearityComponent::Write().

196 { return num_cols_; }
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
MatrixIndexT NumRows ( ) const
inline

Dimensions.

Definition at line 195 of file cu-matrix.h.

Referenced by NnetComputer::AcceptInput(), NnetLdaStatsAccumulator::AccStatsFromOutput(), CuMatrixBase< Real >::AddCols(), CuVectorBase< Real >::AddColSumMat(), MatrixRandomizer::AddData(), CuVectorBase< Real >::AddDiagMatMat(), CuMatrixBase< Real >::AddDiagVecMat(), CuRand< Real >::AddGaussNoise(), CuMatrixBase< Real >::AddMat(), CuSpMatrix< Real >::AddMat2(), CuMatrixBase< Real >::AddMatBlock(), CuMatrixBase< Real >::AddMatBlocks(), CuMatrixBase< Real >::AddMatDiagVec(), CuBlockMatrix< Real >::AddMatMat(), CuMatrixBase< Real >::AddMatMat(), CuVectorBase< Real >::AddMatVec(), CuVectorBase< Real >::AddRowSumMat(), AffineComponent::AffineComponent(), DistributeComponent::Backprop(), NnetUpdater::Backprop(), ElementwiseProductComponent::Backprop(), StatisticsExtractionComponent::Backprop(), StatisticsPoolingComponent::Backprop(), SumReduceComponent::Backprop(), MaxoutComponent::Backprop(), BackpropTruncationComponent::Backprop(), MaxpoolingComponent::Backprop(), PnormComponent::Backprop(), NormalizeComponent::Backprop(), RepeatedAffineComponent::Backprop(), SigmoidComponent::Backprop(), TanhComponent::Backprop(), PowerComponent::Backprop(), RectifiedLinearComponent::Backprop(), SoftHingeComponent::Backprop(), ScaleComponent::Backprop(), SoftmaxComponent::Backprop(), LogSoftmaxComponent::Backprop(), AffineComponent::Backprop(), ClipGradientComponent::Backprop(), SpliceComponent::Backprop(), SpliceMaxComponent::Backprop(), BlockAffineComponent::Backprop(), SumGroupComponent::Backprop(), PermuteComponent::Backprop(), DctComponent::Backprop(), FixedLinearComponent::Backprop(), FixedAffineComponent::Backprop(), ConvolutionComponent::Backprop(), DropoutComponent::Backprop(), LstmNonlinearityComponent::Backprop(), Convolutional1dComponent::Backprop(), CompositeComponent::Backprop(), Component::Backpropagate(), HiddenSoftmax::BackpropagateFnc(), Splice::BackpropagateFnc(), SimpleSentenceAveragingComponent::BackpropagateFnc(), BlockSoftmax::BackpropagateFnc(), RecurrentComponent::BackpropagateFnc(), Convolutional2DComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), ConvolutionalComponent::BackpropagateFnc(), LstmProjected::BackpropagateFnc(), BlstmProjected::BackpropagateFnc(), kaldi::cu::BackpropLstmNonlinearity(), CuRand< Real >::BinarizeProbs(), BlockAffineComponent::BlockAffineComponent(), ChunkInfo::CheckSize(), NnetComputerFromEg::Compute(), NnetOnlineComputer::Compute(), DiscriminativeComputation::Compute(), DecodableAmNnetParallel::Compute(), kaldi::nnet3::ComputeAccuracy(), NnetComputer::ComputeLastLayerDeriv(), kaldi::cu::ComputeLstmNonlinearity(), kaldi::nnet3::ComputeObjectiveFunction(), kaldi::nnet1::ComputeStdDev(), NnetUpdater::ComputeTotAccuracy(), Convolutional1dComponent::Convolutional1dComponent(), ConvolutionComponent::ConvolutionComponent(), kaldi::cu::Copy(), CuVectorBase< Real >::CopyColFromMat(), CuMatrixBase< Real >::CopyCols(), CuVectorBase< Real >::CopyDiagFromMat(), CuMatrixBase< Real >::CopyFromBlock(), CuTpMatrix< Real >::CopyFromMat(), CuSpMatrix< Real >::CopyFromMat(), CuBlockMatrix< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyFromMat(), CuVectorBase< Real >::CopyRowsFromMat(), VectorBase< Real >::CopyRowsFromMat(), CuSparseMatrix< Real >::CopyToMat(), CuMatrix< Real >::CuMatrix(), kaldi::CuRandGaussianMatrixBaseSpeedTest(), kaldi::CuRandGaussianMatrixSpeedTest(), kaldi::CuRandUniformMatrixBaseSpeedTest(), kaldi::CuRandUniformMatrixSpeedTest(), CuTpMatrix< Real >::CuTpMatrix(), DecodableAmNnet::DecodableAmNnet(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), CuMatrixBase< Real >::DiffSoftmaxPerRow(), CuMatrixBase< Real >::DivElements(), CuMatrixBase< Real >::EqualElementMask(), Xent::Eval(), Mse::Eval(), MultiTaskLoss::Eval(), AffineTransform::GetGradient(), RecurrentComponent::GetGradient(), LstmProjected::GetGradient(), ConvolutionalComponent::GetGradient(), Convolutional2DComponent::GetGradient(), BlstmProjected::GetGradient(), Convolutional1dComponent::GetParameterDim(), AffineTransform::GetParams(), RecurrentComponent::GetParams(), LstmProjected::GetParams(), ConvolutionalComponent::GetParams(), Convolutional2DComponent::GetParams(), BlstmProjected::GetParams(), NnetComputer::GetPointers(), AffineComponentPreconditioned::GetScalingFactor(), CuMatrixBase< Real >::GroupMax(), CuMatrixBase< Real >::GroupPnorm(), AffineComponent::Info(), AffineComponentPreconditioned::Info(), AffineComponentPreconditionedOnline::Info(), DctComponent::Info(), FixedLinearComponent::Info(), FixedAffineComponent::Info(), ConvolutionComponent::Info(), Convolutional1dComponent::Info(), OnlineNaturalGradient::Init(), OnlinePreconditioner::Init(), NaturalGradientAffineComponent::Init(), AffineComponent::Init(), AffineComponentPreconditioned::Init(), AffineComponentPreconditionedOnline::Init(), FixedAffineComponent::Init(), ConvolutionComponent::Init(), Convolutional1dComponent::Init(), LinearTransform::InitData(), FixedAffineComponent::InitFromConfig(), FixedLinearComponent::InitFromString(), FixedAffineComponent::InitFromString(), OnlineNaturalGradient::InitOrthonormalSpecial(), OnlinePreconditioner::InitOrthonormalSpecial(), NnetDiscriminativeUpdater::LatticeComputations(), main(), NnetComputer::MatrixStddev(), CuMatrixBase< Real >::Max(), kaldi::MeanVariance(), CuMatrixBase< Real >::Min(), kaldi::nnet1::MomentStatistics(), CuMatrixBase< Real >::MulElements(), NnetComputer::NnetComputer(), kaldi::cu::NormalizePerRow(), DecodableAmNnetParallel::NumFramesReady(), BlockAffineComponent::NumParameters(), RepeatedAffineComponent::NumParameters(), ConvolutionComponent::NumParameters(), LstmNonlinearityComponent::NumParameters(), LinearTransform::NumParams(), AffineTransform::NumParams(), RecurrentComponent::NumParams(), LstmProjected::NumParams(), ConvolutionalComponent::NumParams(), Convolutional2DComponent::NumParams(), BlstmProjected::NumParams(), NnetLogprobTask::operator()(), CuMatrix< BaseFloat >::operator=(), RepeatedAffineComponent::OutputDim(), AffineComponent::OutputDim(), BlockAffineComponent::OutputDim(), DctComponent::OutputDim(), FixedLinearComponent::OutputDim(), FixedAffineComponent::OutputDim(), ConvolutionComponent::OutputDim(), Convolutional1dComponent::OutputDim(), CuMatrixBase< Real >::ParametricRelu(), LstmNonlinearityComponent::PerturbParams(), kaldi::nnet2::PreconditionDirections(), OnlineNaturalGradient::PreconditionDirections(), OnlinePreconditioner::PreconditionDirections(), kaldi::nnet2::PreconditionDirectionsAlpha(), kaldi::nnet2::PreconditionDirectionsAlphaRescaled(), OnlineNaturalGradient::PreconditionDirectionsInternal(), OnlinePreconditioner::PreconditionDirectionsInternal(), kaldi::nnet3::PrintParameterStats(), DistributeComponent::Propagate(), NnetOnlineComputer::Propagate(), DropoutComponent::Propagate(), Component::Propagate(), ElementwiseProductComponent::Propagate(), StatisticsExtractionComponent::Propagate(), StatisticsPoolingComponent::Propagate(), SumReduceComponent::Propagate(), RepeatedAffineComponent::Propagate(), SpliceComponent::Propagate(), BlockAffineComponent::Propagate(), DctComponent::Propagate(), ConvolutionComponent::Propagate(), AdditiveNoiseComponent::Propagate(), Convolutional1dComponent::Propagate(), MaxpoolingComponent::Propagate(), CompositeComponent::Propagate(), KlHmm::PropagateFnc(), SimpleSentenceAveragingComponent::PropagateFnc(), RecurrentComponent::PropagateFnc(), SentenceAveragingComponent::PropagateFnc(), LengthNormComponent::PropagateFnc(), ConvolutionalComponent::PropagateFnc(), Convolutional2DComponent::PropagateFnc(), Dropout::PropagateFnc(), LstmProjected::PropagateFnc(), BlstmProjected::PropagateFnc(), kaldi::nnet1::RandGauss(), CuRand< Real >::RandGaussian(), kaldi::cu::Randomize(), CuRand< Real >::RandUniform(), kaldi::nnet1::RandUniform(), Rbm::RbmUpdate(), AffineTransform::ReadData(), LinearTransform::ReadData(), ConvolutionalComponent::ReadData(), Rbm::ReadData(), Rbm::Reconstruct(), kaldi::cu::RegularizeL1(), DctComponent::Reorder(), OnlineNaturalGradient::ReorthogonalizeXt1(), OnlinePreconditioner::ReorthogonalizeXt1(), ClipGradientComponent::RepairGradients(), NnetRescaler::RescaleComponent(), LstmProjected::ResetStreams(), OnlineNaturalGradient::SelfTest(), OnlinePreconditioner::SelfTest(), CuBlockMatrix< Real >::SetCudaData(), LinearTransform::SetLinearity(), AffineTransform::SetLinearity(), AffineTransform::SetParams(), RecurrentComponent::SetParams(), ConvolutionalComponent::SetParams(), Convolutional2DComponent::SetParams(), LstmProjected::SetParams(), BlstmProjected::SetParams(), AffineComponent::SetParams(), ConvolutionComponent::SetParams(), Convolutional1dComponent::SetParams(), kaldi::cu::Splice(), SigmoidComponent::StoreStats(), RectifiedLinearComponent::StoreStats(), NonlinearComponent::StoreStatsInternal(), CuMatrixBase< Real >::SumColumnRanges(), kaldi::TraceMatMat(), kaldi::TraceMatSmat(), kaldi::UnitTestCheck(), kaldi::UnitTestCuMathCopy(), kaldi::UnitTestCuMathRandomize(), kaldi::UnitTestCuMathSplice(), kaldi::UnitTestCuTanh(), UnitTestMatrixRandomizer(), kaldi::nnet2::UnitTestNnetCompute(), kaldi::nnet2::UnitTestPreconditionDirectionsOnline(), kaldi::nnet3::UnitTestPreconditionDirectionsOnline(), kaldi::nnet1::UnitTestSimpleSentenceAveragingComponent(), kaldi::UnitTestSwapCu2Cu(), kaldi::UnitTestSwapCu2M(), RepeatedAffineComponent::UnVectorize(), BlockAffineComponent::UnVectorize(), ConvolutionComponent::UnVectorize(), LinearTransform::Update(), AffineTransform::Update(), SentenceAveragingComponent::Update(), RecurrentComponent::Update(), ConvolutionalComponent::Update(), Convolutional2DComponent::Update(), RepeatedAffineComponent::Update(), NaturalGradientRepeatedAffineComponent::Update(), NaturalGradientAffineComponent::Update(), AffineComponentPreconditioned::Update(), AffineComponentPreconditionedOnline::Update(), BlockAffineComponentPreconditioned::Update(), ConvolutionComponent::Update(), Convolutional1dComponent::Update(), BlockAffineComponent::UpdateSimple(), NonlinearComponent::UpdateStats(), RepeatedAffineComponent::Vectorize(), BlockAffineComponent::Vectorize(), ConvolutionComponent::Vectorize(), DctComponent::Write(), and NnetLogprobTask::~NnetLogprobTask().

195 { return num_rows_; }
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
CuValue<Real> operator() ( MatrixIndexT  r,
MatrixIndexT  c 
)
inline

Definition at line 541 of file cu-matrix.h.

541  {
542  KALDI_PARANOID_ASSERT(static_cast<UnsignedMatrixIndexT>(r) <
543  static_cast<UnsignedMatrixIndexT>(num_rows_) &&
544  static_cast<UnsignedMatrixIndexT>(c) <
545  static_cast<UnsignedMatrixIndexT>(num_cols_));
546  return CuValue<Real>(data_ + r * stride_ + c);
547  }
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
MatrixIndexT stride_
Definition: cu-matrix.h:646
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define KALDI_PARANOID_ASSERT(cond)
Definition: kaldi-error.h:182
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
Real operator() ( MatrixIndexT  r,
MatrixIndexT  c 
) const
inline

Definition at line 549 of file cu-matrix.h.

549  {
550  KALDI_PARANOID_ASSERT(static_cast<UnsignedMatrixIndexT>(r) <
551  static_cast<UnsignedMatrixIndexT>(num_rows_) &&
552  static_cast<UnsignedMatrixIndexT>(c) <
553  static_cast<UnsignedMatrixIndexT>(num_cols_));
554  return CuValue<Real>(data_ + r * stride_ + c); // will be casted to Real.
555  }
MatrixIndexT num_cols_
Definition: cu-matrix.h:644
MatrixIndexT stride_
Definition: cu-matrix.h:646
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define KALDI_PARANOID_ASSERT(cond)
Definition: kaldi-error.h:182
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void ParametricRelu ( const CuMatrixBase< Real > &  src,
const CuVectorBase< Real > &  alpha,
const CuVectorBase< Real > &  beta 
)

Compute the parametric rectified linear unit function; element by element, *this = src * (src > 0 ? alpha : beta)

Definition at line 1277 of file cu-matrix.cc.

References CU2DBLOCK, data_, CuVectorBase< Real >::data_, CuMatrixBase< Real >::data_, CuVectorBase< Real >::Dim(), Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), CuMatrixBase< Real >::Stride(), and CuVectorBase< Real >::Vec().

Referenced by ParametricRelu::PropagateFnc().

1280  {
1281  KALDI_ASSERT(src.NumRows() == this->NumRows());
1282  KALDI_ASSERT(src.NumCols() == this->NumCols());
1283  KALDI_ASSERT(alpha.Dim() == this->NumCols());
1284  KALDI_ASSERT(beta.Dim() == this->NumCols());
1285 #if HAVE_CUDA == 1
1286  if (CuDevice::Instantiate().Enabled()) {
1287  Timer tim;
1288 
1289  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
1290  dim3 dimGrid(n_blocks(src.NumCols(), CU2DBLOCK), n_blocks(src.NumRows(), CU2DBLOCK));
1291 
1292  cuda_parametric_relu(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
1293  src.Stride(), alpha.data_, beta.data_);
1294  CU_SAFE_CALL(cudaGetLastError());
1295 
1296  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1297  } else
1298 #endif
1299  {
1300  // Do it on CPU,
1301  for (MatrixIndexT r = 0; r < NumRows(); r++) {
1302  for (MatrixIndexT c = 0; c < NumCols(); c++) {
1303  Real src_elem = src.Mat()(r,c);
1304  this->Mat()(r,c) = src_elem * (src_elem >= 0.0 ? alpha.Vec()(c) : beta.Vec()(c));
1305  }
1306  }
1307  }
1308 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
CuSubVector<Real> Row ( MatrixIndexT  i)
inline

Definition at line 535 of file cu-matrix.h.

535  {
536  KALDI_ASSERT(static_cast<UnsignedMatrixIndexT>(i) <
537  static_cast<UnsignedMatrixIndexT>(num_rows_));
538  return CuSubVector<Real>(data_ + (i * stride_), NumCols());
539  }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
MatrixIndexT stride_
Definition: cu-matrix.h:646
friend class CuSubVector< Real >
Definition: cu-matrix.h:90
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
const Real* RowData ( MatrixIndexT  r) const
inline

Get raw row pointer (const).

Warning: may return a pointer to GPU memory. Use at your own risk.

Definition at line 599 of file cu-matrix.h.

Referenced by CuVectorBase< Real >::CopyRowsFromMat(), VectorBase< Real >::CopyRowsFromMat(), CuSubVector< Real >::CuSubVector(), RectifiedLinearComponent::RepairGradients(), kaldi::TestCuMatrixAddRows2(), kaldi::TestCuMatrixAddToRows(), kaldi::TestCuMatrixCopyRows2(), kaldi::TestCuMatrixCopyToRows(), kaldi::UnitTestCuMatrixAddRows(), and kaldi::UnitTestCuMatrixCopyRows().

599 { return data_ + r * stride_; }
MatrixIndexT stride_
Definition: cu-matrix.h:646
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
Real* RowData ( MatrixIndexT  r)
inline

Get raw row pointer.

Warning: may return a pointer to GPU memory. Use at your own risk.

Definition at line 602 of file cu-matrix.h.

602 { return data_ + r * stride_; }
MatrixIndexT stride_
Definition: cu-matrix.h:646
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
void Scale ( Real  value)

Definition at line 608 of file cu-matrix.cc.

References data_, and Timer::Elapsed().

Referenced by DecodableNnetLoopedOnlineBase::AdvanceChunk(), DecodableNnetSimpleLooped::AdvanceChunk(), BackpropTruncationComponent::Backprop(), TanhComponent::Backprop(), PowerComponent::Backprop(), ScaleComponent::Backprop(), AveragePoolingComponent::BackpropagateFnc(), AveragePooling2DComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), Dropout::BackpropagateFnc(), DecodableAmNnetParallel::Compute(), DecodableNnet2Online::ComputeForFrame(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), DecodableNnetSimple::DoNnetComputation(), MultiTaskLoss::Eval(), RepeatedAffineComponent::Init(), NaturalGradientAffineComponent::Init(), AffineComponent::Init(), AffineComponentPreconditioned::Init(), AffineComponentPreconditionedOnline::Init(), BlockAffineComponent::Init(), ConvolutionComponent::Init(), Convolutional1dComponent::Init(), LstmNonlinearityComponent::Init(), OnlineNaturalGradient::InitDefault(), OnlinePreconditioner::InitDefault(), main(), kaldi::nnet2::PreconditionDirectionsAlphaRescaled(), NnetChainTrainer::ProcessOutputs(), NnetDiscriminativeTrainer::ProcessOutputs(), BackpropTruncationComponent::Propagate(), ScaleComponent::Propagate(), DropoutComponent::Propagate(), KlHmm::PropagateFnc(), Dropout::PropagateFnc(), Rbm::RbmUpdate(), LstmNonlinearityComponent::Read(), ClipGradientComponent::RepairGradients(), RepeatedAffineComponent::Scale(), NaturalGradientAffineComponent::Scale(), AffineComponent::Scale(), BlockAffineComponent::Scale(), ConvolutionComponent::Scale(), LstmNonlinearityComponent::Scale(), Convolutional1dComponent::Scale(), TanhComponent::StoreStats(), kaldi::nnet3::TestSimpleComponentPropagateProperties(), kaldi::UnitTestCuMatrixAddMatDiagVec(), kaldi::UnitTestCuMatrixAddMatMatElements(), kaldi::UnitTestCuMatrixScale(), kaldi::UnitTestLstmNonlinearity(), and Convolutional2DComponent::Update().

608  {
609 #if HAVE_CUDA == 1
610  if (CuDevice::Instantiate().Enabled()) {
611  if (num_rows_ == 0) return;
612  Timer tim;
613 
614  dim3 dimGrid, dimBlock;
615  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
616  &dimGrid, &dimBlock);
617 
618  cuda_scale(dimGrid, dimBlock, data_, value, Dim());
619  CU_SAFE_CALL(cudaGetLastError());
620 
621  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
622  } else
623 #endif
624  {
625  Mat().Scale(value);
626  }
627 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void Set ( Real  value)

Definition at line 495 of file cu-matrix.cc.

References data_, and Timer::Elapsed().

Referenced by BackpropTruncationComponent::Backprop(), SigmoidComponent::Backprop(), MaxpoolingComponent::Propagate(), MaxPoolingComponent::PropagateFnc(), MaxPooling2DComponent::PropagateFnc(), SigmoidComponent::StoreStats(), kaldi::UnitTestCuMatrixObjfDeriv(), kaldi::UnitTestCuMatrixSet(), kaldi::nnet1::UnitTestDropoutComponent(), and kaldi::nnet1::UnitTestMaxPoolingComponent().

495  {
496  #if HAVE_CUDA == 1
497  if (CuDevice::Instantiate().Enabled()) {
498  if (num_rows_ == 0) return;
499  Timer tim;
500 
501  dim3 dimGrid, dimBlock;
502  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
503  &dimGrid, &dimBlock);
504 
505  cuda_set_const(dimGrid, dimBlock, data_, value, Dim());
506  CU_SAFE_CALL(cudaGetLastError());
507 
508  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
509  } else
510  #endif
511  {
512  Mat().Set(value);
513  }
514 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:196
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:614
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:195
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:636
::MatrixDim Dim() const
Definition: cu-matrix.h:201
MatrixIndexT num_rows_
Definition: cu-matrix.h:645
void SetMatMatDivMat ( const CuMatrixBase< Real > &  A,
const CuMatrixBase< Real > &  B,
const CuMatrixBase< Real > &  C 
)

*this = a * b / c (by element; when c = 0, *this = a) *this can be an alias of a, b or c safely and get expected result.

dst = a * b / c (by element; when c = 0, dst = a) dst can be an alias of a, b or c safely and get expected result.

Definition at line 1017 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::num_cols_, CuMatrixBase< Real >::num_rows_, and CuMatrixBase< Real >::Stride().

Referenced by DropoutComponent::Backprop().

1018  {
1019 #if HAVE_CUDA == 1
1020  if (CuDevice::Instantiate().Enabled()) {
1021  Timer tim;
1022 
1023  KALDI_ASSERT(num_rows_ == A.num_rows_ && num_cols_ == A.num_cols_);
1024  KALDI_ASSERT(num_rows_ == B.num_rows_ && num_cols_ == B.num_cols_);
1025  KALDI_ASSERT(num_rows_ == C.num_rows_ && num_cols_ == C.num_cols_);
1026  if (num_rows_ == 0) return;
1027  dim3 dimGrid, dimBlock;
1028  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),