All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
CuMatrixBase< Real > Singleton Reference

Matrix for CUDA computing. More...

#include <matrix-common.h>

Inheritance diagram for CuMatrixBase< Real >:
Collaboration diagram for CuMatrixBase< Real >:

Public Member Functions

void CopyCols (const CuMatrixBase< Real > &src, const CuArrayBase< MatrixIndexT > &indexes)
 Copies column r from column indexes[r] of src. More...
 
void AddCols (const CuMatrixBase< Real > &src, const CuArrayBase< MatrixIndexT > &indices)
 Add column indices[r] of src to column r. More...
 
void CopyRows (const CuMatrixBase< Real > &src, const CuArrayBase< MatrixIndexT > &indexes)
 Copies row r from row indexes[r] of src. More...
 
void CopyRows (const CuArrayBase< const Real * > &src)
 Copies row r of this matrix from an array of floats at the location given by src[r], where src[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (the point is: the data it points to should be on the GPU if we're using a GPU, and on a CPU otherwise). More...
 
void CopyToRows (const CuArrayBase< Real * > &dst) const
 For each row r of this matrix, copies it to the array of floats at the location given by dst[r], where dst[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (i.e. More...
 
void AddRows (Real alpha, const CuMatrixBase< Real > &src, const CuArrayBase< MatrixIndexT > &indexes)
 Does for each row r, this.Row(r) += alpha * src.row(indexes[r]). More...
 
void AddRows (Real alpha, const CuArrayBase< const Real * > &src)
 Does for each row r, this.Row(r) += alpha * src[r], treating src[r] as the beginning of a region of memory representing a vector of floats, of the same length as this.NumCols(). More...
 
void AddToRows (Real alpha, const CuArrayBase< MatrixIndexT > &indexes, CuMatrixBase< Real > *dst) const
 For each row i of *this, adds this->Row(i) to dst->Row(indexes(i)) if indexes(i) >= 0, else do nothing. More...
 
void AddToRows (Real alpha, const CuArrayBase< Real * > &dst) const
 For each row r of this matrix, adds it (times alpha) to the array of floats at the location given by dst[r], where dst[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (i.e. More...
 
void SumColumnRanges (const CuMatrixBase< Real > &src, const CuArrayBase< Int32Pair > &indexes)
 For each row r of this and for each column c, sets (*this)(r, c) to the sum src(r, j), where j ranges from indexes[c].first through indexes[c].second - 1. More...
 
void AddRowRanges (const CuMatrixBase< Real > &src, const CuArrayBase< Int32Pair > &indexes)
 For each row r of this and for each column c, do (*this)(r, c) += src(j, c), where j ranges from indexes[r].first through indexes[r].second - 1. More...
 
void AddToDiag (Real value)
 Adds "value" to the diagonal elements of the matrix. More...
 
MatrixIndexT NumRows () const
 Dimensions. More...
 
MatrixIndexT NumCols () const
 
MatrixIndexT Stride () const
 
::MatrixDim Dim () const
 
Real FrobeniusNorm () const
 
bool IsUnit (Real tol=0.001) const
 
bool ApproxEqual (const CuMatrixBase< Real > &other, float tol=0.01) const
 True if ((*this)-other).FrobeniusNorm() <= tol * this->FrobeniusNorm() More...
 
MatrixIndexT SizeInBytes () const
 Get size of matrix in bytes. More...
 
template<typename OtherReal >
void CopyFromMat (const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
 
void CopyFromGeneralMat (const GeneralMatrix &src, MatrixTransposeType trans=kNoTrans)
 
void CopyFromMat (const MatrixBase< Real > &src, MatrixTransposeType trans=kNoTrans)
 
void CopyFromSp (const CuSpMatrix< Real > &M)
 
template<typename OtherReal >
void CopyFromTp (const CuTpMatrix< OtherReal > &M, MatrixTransposeType trans=kNoTrans)
 
template<typename OtherReal >
void CopyFromMat (const CuMatrixBase< OtherReal > &M, MatrixTransposeType trans=kNoTrans)
 
template<typename OtherReal >
void CopyToMat (MatrixBase< OtherReal > *dst, MatrixTransposeType trans=kNoTrans) const
 
void CopyRowsFromVec (const CuVectorBase< Real > &v)
 This function has two modes of operation. More...
 
void CopyRowsFromVec (const VectorBase< Real > &v)
 Version of CopyRowsFromVec() that takes a CPU-based vector. More...
 
void CopyColsFromVec (const CuVectorBase< Real > &v)
 Copies vector into matrix, column-by-column. More...
 
void CopyColFromVec (const CuVectorBase< Real > &v, const MatrixIndexT col)
 Copy vector into specific column of matrix. More...
 
void Sigmoid (const CuMatrixBase< Real > &src)
 Set each element to the sigmoid of the corresponding element of "src": element by element, x = 1 / (1 + exp(-x)) More...
 
void Heaviside (const CuMatrixBase< Real > &src)
 Set each element to the Heaviside function of the corresponding element of "src", which we define as the function (x > 0 ? 1.0 : 0.0) [note: in general, there are different ways to deal with the situation when x==0. More...
 
void SoftHinge (const CuMatrixBase< Real > &src)
 Apply the function y = log(1 + exp(x)), to each element. More...
 
void GroupPnorm (const CuMatrixBase< Real > &src, Real pow)
 Apply the function y(i) = (sum_{j = i*G}^{(i+1)*G-1} x_j ^ (power)) ^ (1 / p) where G = x.NumCols() / y.NumCols() must be an integer. More...
 
void DiffGroupPnorm (const CuMatrixBase< Real > &in_value, const CuMatrixBase< Real > &out_value, const CuMatrixBase< Real > &out_deriv, Real power)
 Differentiate backward through the GroupPnorm function. More...
 
void GroupMax (const CuMatrixBase< Real > &src)
 Apply the function y(i) = (max_{j = i*G}^{(i+1)*G-1} x_j where G = x.NumCols() / y.NumCols() must be an integer. More...
 
void GroupMaxDeriv (const CuMatrixBase< Real > &input, const CuMatrixBase< Real > &output)
 Calculate derivatives for the GroupMax function above, where "input" is the input to the GroupMax function above (i.e. More...
 
void ParametricRelu (const CuMatrixBase< Real > &src, const CuVectorBase< Real > &alpha, const CuVectorBase< Real > &beta)
 Compute the parametric rectified linear unit function; element by element, *this = src * (src > 0 ? alpha : beta) More...
 
void DiffParametricRelu (const CuMatrixBase< Real > &value, const CuMatrixBase< Real > &diff, const CuVectorBase< Real > &alpha, const CuVectorBase< Real > &beta)
 Differentiate backward through the parametric relu function. More...
 
void Tanh (const CuMatrixBase< Real > &src)
 Compute the hyperbolic tangent (tanh) function; element by element, *this = tanh(src). More...
 
void DiffSigmoid (const CuMatrixBase< Real > &value, const CuMatrixBase< Real > &diff)
 Differentiate backward through the sigmoid function. More...
 
void DiffTanh (const CuMatrixBase< Real > &value, const CuMatrixBase< Real > &diff)
 Differentiate backward through the tanh function. More...
 
void DiffSoftmaxPerRow (const CuMatrixBase< Real > &value, const CuMatrixBase< Real > &diff)
 Differentiate backward through the softmax function. More...
 
void DiffLogSoftmaxPerRow (const CuMatrixBase< Real > &out_value, const CuMatrixBase< Real > &out_deriv)
 Differentiate backward through the log softmax function. More...
 
void DiffXent (const CuArrayBase< int32 > &tgt, CuVector< Real > *log_post_tgt)
 Differentiate the block [softmax+cross-entropy] : dE/da = posterior_mat - target_mat, 'E' is error function, 'a' is activation on softmax input. More...
 
void Cholesky (CuMatrixBase< Real > *inv_cholesky=NULL)
 This function does sets *this to the Cholesky factor of *this (i.e. More...
 
void SymInvertPosDef ()
 Inversion for positive definite symmetric matrices. More...
 
void ApplyPow (Real power)
 
void ApplyPowAbs (Real power, bool include_sign=false)
 Apply power to the absolute value of each element. More...
 
void ApplyHeaviside ()
 For each element, sets x = (x > 0 ? 1.0 : 0.0). More...
 
void ApplyFloor (Real floor_val)
 
void ApplyCeiling (Real ceiling_val)
 
void ApplyExp ()
 
void ApplyExpSpecial ()
 For each element x of the matrix, set it to (x < 0 ? exp(x) : x + 1). More...
 
void ApplySoftMaxPerRow (const CuMatrixBase< Real > &src)
 Softmax nonlinearity Y = Softmax(X) : Yij = e^Xij / sum_k(e^Xik), done to each row, with attention to avoiding overflow or underflow. More...
 
void ApplyLogSoftMaxPerRow (const CuMatrixBase< Real > &src)
 LogSoftmax nonlinearity Y = LogSoftmax(X) : Yij = Xij - log(sum_k(e^Xik)), done to each row, with attention to avoiding overflow or underflow. More...
 
void FindRowMaxId (CuArray< int32 > *id) const
 Find the id of the maximal element for each row (resizes the 'id' array to the appropriate size). More...
 
void SetZero ()
 Math operations, some calling kernels. More...
 
void Set (Real value)
 
void Add (Real value)
 
void SetZeroAboveDiag ()
 Zeroes all elements for which col > row. More...
 
void Scale (Real value)
 
void ApplyLog ()
 
void MulElements (const CuMatrixBase< Real > &A)
 Multiply two matrices elementwise: C = C .* A. More...
 
void DivElements (const CuMatrixBase< Real > &A)
 Divide two matrices elementwise: C = A ./ A. More...
 
void Max (const CuMatrixBase< Real > &A)
 Do, elementwise, *this = max(*this, A). More...
 
void Min (const CuMatrixBase< Real > &A)
 Do, elementwise, *this = min(*this, A). More...
 
void MulColsVec (const CuVectorBase< Real > &scale)
 scale i'th column by scale[i] More...
 
void MulRowsVec (const CuVectorBase< Real > &scale)
 scale i'th row by scale[i] More...
 
void MulRowsGroupMat (const CuMatrixBase< Real > &src)
 divide each row into src.NumCols() groups, and then scale i'th row's jth group of elements by src[i, j]. More...
 
void DivRowsVec (const CuVectorBase< Real > &div)
 divide i'th row by scale[i] More...
 
void InvertElements ()
 invert the matrix by elements. More...
 
void AddMat (Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType trans=kNoTrans)
 *this += alpha * A More...
 
void AddSmat (Real alpha, const CuSparseMatrix< Real > &A, MatrixTransposeType trans=kNoTrans)
 *this += alpha * A. More...
 
void AddSmatMat (Real alpha, const CuSparseMatrix< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, Real beta)
 (*this) = alpha * op(A) * B + beta * (*this), where A is sparse. More...
 
void AddMatSmat (Real alpha, const CuMatrixBase< Real > &A, const CuSparseMatrix< Real > &B, MatrixTransposeType transB, Real beta)
 (*this) = alpha * A * op(B) + beta * (*this), where B is sparse and op(B) is either B or trans(B) depending on the 'transB' argument. More...
 
void AddToElements (Real alpha, const CuArrayBase< int32 > &elements)
 This is a rather special purpose function; we might generalize it later by adding a transpose-type option. More...
 
void AddMatBlocks (Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType trans=kNoTrans)
 This function is like AddMat (it does *this += alpha * src), except that it supports cases where *this and src have different dimension. More...
 
void AddVecToCols (Real alpha, const CuVectorBase< Real > &col, Real beta=1.0)
 (for each column c of *this), c = alpha * col + beta * c More...
 
void AddVecToRows (Real alpha, const CuVectorBase< Real > &row, Real beta=1.0)
 (for each row r of *this), r = alpha * row + beta * r More...
 
void AddMatMat (Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
 C = alpha * A(^T)*B(^T) + beta * C. More...
 
void AddVecVec (Real alpha, const CuVectorBase< Real > &x, const CuVectorBase< Real > &y)
 A = alpha * x * y^T + A . More...
 
void SetMatMatDivMat (const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, const CuMatrixBase< Real > &C)
 *this = a * b / c (by element; when c = 0, *this = a) *this can be an alias of a, b or c safely and get expected result. More...
 
void SymAddMat2 (const Real alpha, const CuMatrixBase< Real > &M, MatrixTransposeType transA, Real beta)
 *this = beta * *this + alpha * M M^T, for symmetric matrices. More...
 
void AddMatBlock (Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuBlockMatrix< Real > &B, MatrixTransposeType transB, Real beta)
 This function is like AddMatMat but for where the second argument is of type CuBlockMatrix (a block-diagonal matrix of blocks). More...
 
void AddDiagVecMat (const Real alpha, const CuVectorBase< Real > &v, const CuMatrixBase< Real > &M, MatrixTransposeType transM, Real beta=1.0)
 *this = beta * *this + alpha * diag(v) * M [or M^T]. More...
 
void AddMatDiagVec (const Real alpha, const CuMatrixBase< Real > &M, MatrixTransposeType transM, CuVectorBase< Real > &v, Real beta=1.0)
 
void AddMatMatElements (const Real alpha, const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, const Real beta)
 *this = beta * *this + alpha * A .* B (.* element by element multiplication) More...
 
void AddMatSp (const Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuSpMatrix< Real > &B, const Real beta)
 this <– beta*this + alpha*A*B More...
 
void AddSpMat (const Real alpha, const CuSpMatrix< Real > &A, const CuMatrixBase< Real > &B, MatrixTransposeType transB, const Real beta)
 this <– beta*this + alpha*SpA*B More...
 
void AddTpMat (const Real alpha, const CuTpMatrix< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, const Real beta)
 this <– beta*this + alpha*A*B. More...
 
void AddMatTp (const Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuTpMatrix< Real > &B, MatrixTransposeType transB, const Real beta)
 this <– beta*this + alpha*A*B. More...
 
void CopyFromBlock (const CuBlockMatrix< Real > &B, MatrixTransposeType trans=kNoTrans)
 
void CopyLowerToUpper ()
 
void CopyUpperToLower ()
 
CuSubMatrix< Real > Range (const MatrixIndexT row_offset, const MatrixIndexT num_rows, const MatrixIndexT col_offset, const MatrixIndexT num_cols) const
 
CuSubMatrix< Real > RowRange (const MatrixIndexT row_offset, const MatrixIndexT num_rows) const
 
CuSubMatrix< Real > ColRange (const MatrixIndexT col_offset, const MatrixIndexT num_cols) const
 
const CuSubVector< Real > Row (MatrixIndexT i) const
 
CuSubVector< Real > Row (MatrixIndexT i)
 
CuValue< Real > operator() (MatrixIndexT r, MatrixIndexT c)
 
Real operator() (MatrixIndexT r, MatrixIndexT c) const
 
Real Sum () const
 
Real Max () const
 
Real Min () const
 
Real Trace (bool check_square=true) const
 Return the trace. If check_square = true, will crash if matrix is not square. More...
 
void SetRandn ()
 
void SetRandUniform ()
 
void Write (std::ostream &os, bool binary) const
 
void AddElements (Real alpha, const std::vector< MatrixElement< Real > > &input)
 
void AddElements (Real alpha, const CuArrayBase< Int32Pair > &indexes, const Real *input)
 
void Lookup (const std::vector< Int32Pair > &indexes, Real *output) const
 
void Lookup (const CuArrayBase< Int32Pair > &indexes, Real *output) const
 
void EqualElementMask (const CuMatrixBase< Real > &mat, CuMatrix< Real > *mask) const
 
const Real * RowData (MatrixIndexT r) const
 Get raw row pointer (const). More...
 
Real * RowData (MatrixIndexT r)
 Get raw row pointer. More...
 
const Real * Data () const
 Return data pointer (const). More...
 
Real * Data ()
 Return data pointer. More...
 
const MatrixBase< Real > & Mat () const
 
MatrixBase< Real > & Mat ()
 

Protected Member Functions

 CuMatrixBase ()
 
 CuMatrixBase (Real *data, MatrixIndexT num_rows, MatrixIndexT num_cols, MatrixIndexT stride)
 This constructor takes the #rows, #cols and stride; it's called from the constructor of CuSubMatrix. More...
 

Protected Attributes

Real * data_
 GPU data pointer (or regular matrix data pointer,. More...
 
MatrixIndexT num_cols_
 
MatrixIndexT num_rows_
 
MatrixIndexT stride_
 

Private Member Functions

 KALDI_DISALLOW_COPY_AND_ASSIGN (CuMatrixBase)
 

Friends

class CuMatrixBase< float >
 
class CuMatrixBase< double >
 
class CuVectorBase< float >
 
class CuVectorBase< double >
 
class VectorBase< Real >
 
class CuSpMatrix< Real >
 
class CuTpMatrix< float >
 
class CuTpMatrix< double >
 
class CuVectorBase< Real >
 
class CuSubMatrix< Real >
 
class CuRand< Real >
 
class CuSubVector< Real >
 
class CuBlockMatrix< Real >
 
class CuSparseMatrix< float >
 
class CuSparseMatrix< double >
 
class CuSparseMatrix< Real >
 
Real TraceMatMat (const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, MatrixTransposeType trans)
 
Real TraceMatSmat (const CuMatrixBase< Real > &A, const CuSparseMatrix< Real > &B, MatrixTransposeType trans)
 
void AddMatMatBatched (const Real alpha, std::vector< CuSubMatrix< Real > * > &C, const std::vector< CuSubMatrix< Real > * > &A, MatrixTransposeType transA, const std::vector< CuSubMatrix< Real > * > &B, MatrixTransposeType transB, const Real beta)
 Does multiple matrix multiplications, executing them in parallel using cuBLAS's gemmBatched if we are using a GPU. More...
 

Detailed Description

template<typename Real>
singleton kaldi::CuMatrixBase< Real >

Matrix for CUDA computing.

Does the computation on the CUDA card when CUDA is compiled in and we have a suitable GPU (CuDevice::Instantiate().Enabled() == true); otherwise, does it on the CPU.

Definition at line 69 of file matrix-common.h.

Constructor & Destructor Documentation

CuMatrixBase ( )
inlineprotected

Definition at line 694 of file cu-matrix.h.

694 : data_(NULL), num_cols_(0), num_rows_(0), stride_(0) { }
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
MatrixIndexT stride_
Definition: cu-matrix.h:714
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
CuMatrixBase ( Real *  data,
MatrixIndexT  num_rows,
MatrixIndexT  num_cols,
MatrixIndexT  stride 
)
inlineprotected

This constructor takes the #rows, #cols and stride; it's called from the constructor of CuSubMatrix.

Definition at line 698 of file cu-matrix.h.

701  :
702  data_(data), num_cols_(num_cols), num_rows_(num_rows), stride_(stride) { }
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
MatrixIndexT stride_
Definition: cu-matrix.h:714
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
MatrixIndexT num_rows_
Definition: cu-matrix.h:713

Member Function Documentation

void Add ( Real  value)

Definition at line 548 of file cu-matrix.cc.

References data_.

Referenced by BackpropTruncationComponent::Backprop(), TanhComponent::Backprop(), kaldi::CuCompressedMatrixTestNonnegative(), kaldi::CuCompressedMatrixTestSymmetric(), Xent::Eval(), main(), kaldi::MeanVariance(), DropoutMaskComponent::Propagate(), DropoutComponent::Propagate(), Dropout::PropagateFnc(), ClipGradientComponent::RepairGradients(), TanhComponent::StoreStats(), kaldi::TestCuMatrixCompObjfAndDeriv(), kaldi::nnet3::TestSimpleComponentPropagateProperties(), kaldi::UnitTestCuMatrixAdd(), kaldi::UnitTestCuMatrixAdd2(), kaldi::UnitTestCuMatrixEqualElementMask(), kaldi::UnitTestCuMatrixObjfDeriv(), kaldi::UnitTestCuMatrixSetRandUniform(), and kaldi::UnitTestCuMatrixTraceMatMat().

548  {
549 #if HAVE_CUDA == 1
550  if (CuDevice::Instantiate().Enabled()) {
551  if (num_rows_ == 0) return;
552  CuTimer tim;
553 
554  dim3 dimGrid, dimBlock;
555  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
556  &dimGrid, &dimBlock);
557 
558  cuda_add(dimGrid, dimBlock, data_, value, Dim());
559  CU_SAFE_CALL(cudaGetLastError());
560 
561  CuDevice::Instantiate().AccuProfile(__func__, tim);
562  } else
563  #endif
564  {
565  Mat().Add(value);
566  }
567 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
::MatrixDim Dim() const
Definition: cu-matrix.h:211
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void AddCols ( const CuMatrixBase< Real > &  src,
const CuArrayBase< MatrixIndexT > &  indices 
)

Add column indices[r] of src to column r.

As a special case, if indexes[i] == -1, skip column i indices.size() must equal this->NumCols(), and src.NumRows() must equal this.NumRows()

Definition at line 2637 of file cu-matrix.cc.

References CuArrayBase< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuArrayBase< T >::Dim(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by Convolutional1dComponent::Backprop(), ConvolutionalComponent::BackpropagateFnc(), ConvolutionComponent::InderivPatchesToInderiv(), and MaxpoolingComponent::InderivPatchesToInderiv().

2638  {
2639 #if HAVE_CUDA == 1
2640  if (CuDevice::Instantiate().Enabled()) {
2641  KALDI_ASSERT(indices.Dim() == NumCols());
2642  KALDI_ASSERT(NumRows() == src.NumRows());
2643  CuTimer tim;
2644  dim3 dimGrid, dimBlock;
2645  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2646  &dimGrid, &dimBlock);
2647  cuda_add_cols(dimGrid, dimBlock, data_, src.Data(), indices.Data(),
2648  Dim(), src.Stride());
2649  CU_SAFE_CALL(cudaGetLastError());
2650  CuDevice::Instantiate().AccuProfile(__func__, tim);
2651  } else
2652 #endif
2653  {
2654  Mat().AddCols(src.Mat(), indices.Data());
2655  }
2656 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddDiagVecMat ( const Real  alpha,
const CuVectorBase< Real > &  v,
const CuMatrixBase< Real > &  M,
MatrixTransposeType  transM,
Real  beta = 1.0 
)

*this = beta * *this + alpha * diag(v) * M [or M^T].

The same as adding M but scaling each row M_i by v(i).

Definition at line 1369 of file cu-matrix.cc.

References CU2DBLOCK, CuVectorBase< Real >::Data(), CuMatrixBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), kaldi::SameDim(), CuMatrixBase< Real >::Stride(), kaldi::swap(), and CuVectorBase< Real >::Vec().

Referenced by kaldi::nnet3::attention::ApplyScalesToInput(), kaldi::nnet3::attention::ApplyScalesToOutput(), HiddenSoftmax::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), OnlinePreconditioner::ComputeWt1(), OnlineNaturalGradient::ComputeWt1(), kaldi::cu::DiffNormalizePerRow(), CuMatrixBase< Real >::DiffSoftmaxPerRow(), MultiBasisComponent::PropagateFnc(), and kaldi::TestCuMatrixAddDiagVecMat().

1372  {
1373 #if HAVE_CUDA == 1
1374  if (CuDevice::Instantiate().Enabled()) {
1375  if (transM == kNoTrans) {
1376  KALDI_ASSERT(SameDim(*this, M));
1377  } else {
1378  KALDI_ASSERT(M.NumRows() == NumCols() && M.NumCols() == NumRows());
1379  }
1380  KALDI_ASSERT(v.Dim() == this->NumRows());
1381 
1382  CuTimer tim;
1383  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
1384  dim3 dimGrid(n_blocks(num_cols_, CU2DBLOCK),
1385  n_blocks(num_rows_, CU2DBLOCK));
1386  MatrixIndexT M_row_stride = M.Stride(), M_col_stride = 1;
1387  if (transM == kTrans)
1388  std::swap(M_row_stride, M_col_stride);
1389  cuda_add_diag_vec_mat(dimGrid, dimBlock, alpha, data_, Dim(),
1390  v.Data(), M.Data(), M_row_stride, M_col_stride, beta);
1391  CU_SAFE_CALL(cudaGetLastError());
1392  CuDevice::Instantiate().AccuProfile(__func__, tim);
1393  } else
1394 #endif
1395  {
1396  Mat().AddDiagVecMat(alpha, v.Vec(), M.Mat(), transM, beta);
1397  }
1398 }
void swap(basic_filebuf< CharT, Traits > &x, basic_filebuf< CharT, Traits > &y)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
int32 MatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void AddElements ( Real  alpha,
const std::vector< MatrixElement< Real > > &  input 
)

Definition at line 3178 of file cu-matrix.cc.

References CU1DBLOCK, data_, rnnlm::i, and KALDI_ASSERT.

Referenced by OnlinePreconditioner::InitOrthonormalSpecial(), OnlineNaturalGradient::InitOrthonormalSpecial(), DiscriminativeComputation::ProcessPosteriors(), and kaldi::UnitTestCuMatrixAddElements().

3179  {
3180  // Checks the dimension.
3181  MatrixIndexT num_rows = this->num_rows_, num_cols = this->num_cols_;
3182  for (int32 i = 0; i < input.size(); ++i) {
3183  KALDI_ASSERT(input[i].row < num_rows && input[i].row >= 0 &&
3184  input[i].column < num_cols && input[i].column >= 0);
3185  }
3186 #if HAVE_CUDA == 1
3187  if (CuDevice::Instantiate().Enabled()) {
3188  void *addr = CuDevice::Instantiate().Malloc(input.size() * sizeof(MatrixElement<Real>));
3189  CU_SAFE_CALL(cudaMemcpy(addr, input.data(),
3190  input.size() * sizeof(MatrixElement<Real>),
3191  cudaMemcpyHostToDevice));
3192 
3193  CuTimer tim;
3194  int dimBlock(CU1DBLOCK);
3195  int dimGrid(n_blocks(input.size(), CU1DBLOCK));
3196 
3197  cuda_matrix_add_elements(dimGrid, dimBlock, this->data_, this->Dim(),
3198  alpha, (MatrixElement<Real>*)addr, input.size());
3199  CU_SAFE_CALL(cudaGetLastError());
3200  CuDevice::Instantiate().Free(addr);
3201  CuDevice::Instantiate().AccuProfile(__func__, tim);
3202  } else
3203 #endif
3204  {
3205  for (int32 i = 0; i < input.size(); i++) {
3206  (*this)(input[i].row, input[i].column) += alpha * input[i].weight;
3207  }
3208  }
3209 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
int32 MatrixIndexT
Definition: matrix-common.h:98
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void AddElements ( Real  alpha,
const CuArrayBase< Int32Pair > &  indexes,
const Real *  input 
)

Definition at line 3212 of file cu-matrix.cc.

References CU1DBLOCK, CuArrayBase< T >::Data(), data_, CuArrayBase< T >::Dim(), Int32Pair::first, rnnlm::i, KALDI_ASSERT, kaldi::kUndefined, and Int32Pair::second.

3213  {
3214  if (indexes.Dim() == 0) return;
3215  KALDI_ASSERT(input != NULL);
3216 
3217 #if HAVE_CUDA == 1
3218  if (CuDevice::Instantiate().Enabled()) {
3219  CuTimer tim;
3220  CuVector<Real> tmp_vec(indexes.Dim(), kUndefined);
3221  CU_SAFE_CALL(cudaMemcpy(tmp_vec.Data(), input, indexes.Dim() * sizeof(Real),
3222  cudaMemcpyHostToDevice));
3223 
3224  int dimBlock(CU1DBLOCK);
3225  int dimGrid = n_blocks(indexes.Dim(), CU1DBLOCK);
3226  cuda_matrix_add_indexed_values(dimGrid, dimBlock, this->Dim(), alpha,
3227  indexes.Data(), tmp_vec.Data(), indexes.Dim(), this->data_);
3228  CU_SAFE_CALL(cudaGetLastError());
3229  CuDevice::Instantiate().AccuProfile(__func__, tim);
3230  } else
3231 #endif
3232  {
3233  MatrixIndexT num_rows = this->num_rows_, num_cols = this->num_cols_;
3234  const Int32Pair *index = indexes.Data();
3235  for (int32 i = 0; i < indexes.Dim(); i++) {
3236  KALDI_ASSERT(index[i].first < num_rows && index[i].first >= 0 &&
3237  index[i].second < num_cols && index[i].second >= 0);
3238  (*this)(index[i].first, index[i].second) += alpha * input[i];
3239  }
3240  }
3241 }
const T * Data() const
Get raw pointer.
Definition: cu-array.h:52
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
int32 MatrixIndexT
Definition: matrix-common.h:98
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:49
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
int32_cuda second
Definition: cu-matrixdim.h:86
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
int32_cuda first
Definition: cu-matrixdim.h:85
void AddMat ( Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  trans = kNoTrans 
)

*this += alpha * A

Definition at line 941 of file cu-matrix.cc.

References CU2DBLOCK, data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by RestrictedAttentionComponent::Add(), TimeHeightConvolutionComponent::Add(), RepeatedAffineComponent::Add(), NaturalGradientAffineComponent::Add(), AffineComponent::Add(), LinearComponent::Add(), BlockAffineComponent::Add(), Convolutional1dComponent::Add(), ConvolutionComponent::Add(), LstmNonlinearityComponent::Add(), CuRand< Real >::AddGaussNoise(), GeneralMatrix::AddToMat(), CuMatrixBase< Real >::ApproxEqual(), kaldi::nnet3::attention::AttentionBackward(), kaldi::nnet3::attention::AttentionForward(), SigmoidComponent::Backprop(), LstmNonlinearityComponent::Backprop(), Splice::BackpropagateFnc(), AveragePoolingComponent::BackpropagateFnc(), AveragePooling2DComponent::BackpropagateFnc(), Convolutional2DComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), DiscriminativeComputation::Compute(), kaldi::nnet3::ConstrainOrthonormalInternal(), kaldi::CuCompressedMatrixTestNonnegative(), kaldi::CuCompressedMatrixTestSymmetric(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), Xent::Eval(), Mse::Eval(), NnetComputer::ExecuteCommand(), TimeHeightConvolutionComponent::PerturbParams(), RepeatedAffineComponent::PerturbParams(), AffineComponent::PerturbParams(), LinearComponent::PerturbParams(), BlockAffineComponent::PerturbParams(), Convolutional1dComponent::PerturbParams(), ConvolutionComponent::PerturbParams(), LstmNonlinearityComponent::PerturbParams(), AdditiveNoiseComponent::Propagate(), Rbm::RbmUpdate(), ClipGradientComponent::RepairGradients(), RestrictedAttentionComponent::StoreStats(), kaldi::nnet3::attention::TestAttentionForwardBackward(), kaldi::UnitTestCuMatrixAddMat(), kaldi::UnitTestCuMatrixAddMatBlocks1(), kaldi::UnitTestCuMatrixAddMatBlocks1Trans(), kaldi::UnitTestCuMatrixAddMatBlocks2(), kaldi::UnitTestCuMatrixAddMatDiagVec(), kaldi::UnitTestCuMatrixAddMatMatElements(), kaldi::UnitTestLstmNonlinearity(), kaldi::nnet3::UnitTestNnetInputDerivatives(), LinearTransform::Update(), AffineTransform::Update(), RecurrentComponent::Update(), ConvolutionalComponent::Update(), Convolutional2DComponent::Update(), NaturalGradientRepeatedAffineComponent::Update(), LstmProjected::Update(), BlstmProjected::Update(), Convolutional1dComponent::Update(), ConvolutionComponent::Update(), and TimeHeightConvolutionComponent::UpdateNaturalGradient().

942  {
943 
944 #if HAVE_CUDA == 1
945  if (CuDevice::Instantiate().Enabled()) {
946  if (transA == kNoTrans) {
947  KALDI_ASSERT(A.NumRows() == num_rows_ && A.NumCols() == num_cols_);
948  } else {
949  KALDI_ASSERT(A.NumCols() == num_rows_ && A.NumRows() == num_cols_);
950  }
951  if (num_rows_ == 0) return;
952  CuTimer tim;
953  // This block dimension seems to work better than the
954  // one from GetBlockSizesForSimpleMatrixOperation().
955  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
956  dim3 dimGrid(n_blocks(NumCols(), CU2DBLOCK),
957  n_blocks(NumRows(), CU2DBLOCK));
958  cuda_add_mat(dimGrid, dimBlock, alpha, A.data_,
959  data_, Dim(), A.Stride(),
960  (transA == kTrans ? 1 : 0));
961  CU_SAFE_CALL(cudaGetLastError());
962 
963  CuDevice::Instantiate().AccuProfile(__func__, tim);
964  } else
965 #endif
966  {
967  Mat().AddMat(alpha, A.Mat(), transA);
968  }
969 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
KALDI_ASSERT & A
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void AddMatBlock ( Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  transA,
const CuBlockMatrix< Real > &  B,
MatrixTransposeType  transB,
Real  beta 
)

This function is like AddMatMat but for where the second argument is of type CuBlockMatrix (a block-diagonal matrix of blocks).

Definition at line 3106 of file cu-matrix.cc.

References A, CuMatrixBase< Real >::AddMatMat(), CuBlockMatrix< Real >::Block(), CU2DBLOCK, CuMatrixBase< Real >::Data(), data_, KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuBlockMatrix< Real >::NumBlocks(), CuBlockMatrix< Real >::NumCols(), CuMatrixBase< Real >::NumCols(), CuBlockMatrix< Real >::NumRows(), CuMatrixBase< Real >::NumRows(), CuMatrixBase< Real >::Stride(), and kaldi::swap().

Referenced by kaldi::UnitTestCuBlockMatrixAddMatBlock().

3110  {
3111  // Check dimensions
3112  int32 A_num_rows = A.NumRows(), A_num_cols = A.NumCols(),
3113  A_row_stride = A.Stride(), A_col_stride = 1,
3114  B_num_rows = B.NumRows(), B_num_cols = B.NumCols();
3115  if (transA == kTrans) {
3116  std::swap(A_num_rows, A_num_cols);
3117  std::swap(A_row_stride, A_col_stride);
3118  }
3119  if (transB == kTrans) {
3120  std::swap(B_num_rows, B_num_cols);
3121  }
3122  // At this point the {A,B}_{rows,cols} variables are
3123  // after any transposition.
3124  KALDI_ASSERT(NumRows() == A_num_rows && NumCols() == B_num_cols);
3125  KALDI_ASSERT(A_num_cols == B_num_rows);
3126  int32 B_num_blocks = B.NumBlocks();
3127 
3128  if (num_rows_ == 0) return;
3129 #if HAVE_CUDA == 1
3130  if (CuDevice::Instantiate().Enabled()) {
3131  CuTimer tim;
3132  MatrixDim this_dim = Dim();
3133 
3134  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
3135  // (x,y) indices will be (row of *this, block of B)
3136  dim3 dimGrid(n_blocks(num_rows_, CU2DBLOCK),
3137  n_blocks(B_num_blocks, CU2DBLOCK));
3138 
3139  // caution: the use of x as the row-index is not good, but
3140  // this code is not much used, so I'm not updating it.a
3141  cuda_add_mat_blockmat(dimGrid, dimBlock, data_, this_dim, A.Data(),
3142  A_num_rows, A_num_cols, A_row_stride, A_col_stride,
3143  B.CuData(), B_num_blocks, alpha, beta,
3144  (transB == kTrans ? 1 : 0));
3145 
3146  CU_SAFE_CALL(cudaGetLastError());
3147 
3148  CuDevice::Instantiate().AccuProfile(__func__, tim);
3149  } else
3150 #endif
3151  {
3152  // "row_offset" and "col_offset" are offsets into B (or into B^T, if
3153  // transB == kTrans).
3154  int32 row_offset = 0, col_offset = 0;
3155  for (int32 b = 0; b < B_num_blocks; b++) {
3156  const CuSubMatrix<Real> this_block = B.Block(b);
3157  int32 this_num_rows = this_block.NumRows(),
3158  this_num_cols = this_block.NumCols();
3159  if (transB == kTrans) std::swap(this_num_rows, this_num_cols);
3160  CuSubMatrix<Real> this_part(*this, 0, num_rows_,
3161  col_offset, this_num_cols);
3162  CuSubMatrix<Real> A_part = (transA == kNoTrans ?
3164  row_offset, this_num_rows) :
3165  CuSubMatrix<Real>(A, row_offset, this_num_rows,
3166  0, num_rows_));
3167  this_part.AddMatMat(alpha, A_part, transA, this_block, transB, beta);
3168  row_offset += this_num_rows;
3169  col_offset += this_num_cols;
3170  }
3171  // Note: the values being compared below are all after applying any
3172  // transposition to B.
3173  KALDI_ASSERT(row_offset == B_num_rows && col_offset == B_num_cols);
3174  }
3175 }
friend class CuSubMatrix< Real >
Definition: cu-matrix.h:89
Structure containing size of the matrix plus stride.
Definition: cu-matrixdim.h:52
void swap(basic_filebuf< CharT, Traits > &x, basic_filebuf< CharT, Traits > &y)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
KALDI_ASSERT & A
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void AddMatBlocks ( Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  trans = kNoTrans 
)

This function is like AddMat (it does *this += alpha * src), except that it supports cases where *this and src have different dimension.

There are two allowed cases:

(1) *this is larger than src; we do a broadcasting operation. *this must have NumRows() == a * src.NumRows() and NumCols() == b * src.NumCols() for integer a >= 1, b >= 1. *this will be treated as a being made up of of blocks with the same size as src, and to each block we'll add alpha * src. This case does not support trans == kTrans.

(2) *this is smaller than src; we sum. src.NumRows() must == a * this->NumRows(), and src.NumCols() must == b * this->NumCols(), for a >= 1, b >= 1. In this case, src will be treated as being made up of blocks with the same size as *this, and to *this we will add the summation of all of those blocks.

Definition at line 1106 of file cu-matrix.cc.

References MatrixBase< Real >::AddMat(), data_, CuMatrixBase< Real >::data_, CuMatrixBase< Real >::Dim(), rnnlm::i, rnnlm::j, KALDI_ASSERT, KALDI_ERR, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::Mat(), MatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by SumBlockComponent::Backprop(), SumBlockComponent::Propagate(), kaldi::UnitTestCuMatrixAddMatBlocks1(), kaldi::UnitTestCuMatrixAddMatBlocks1Trans(), kaldi::UnitTestCuMatrixAddMatBlocks2(), Convolutional1dComponent::Update(), and ConvolutionComponent::Update().

1107  {
1108  if (num_rows_ == 0 || num_cols_ == 0) return;
1109 
1110  if (A.NumRows() >= (transA == kNoTrans ? num_rows_ : num_cols_) &&
1111  A.NumCols() >= (transA == kNoTrans ? num_cols_ : num_rows_)) {
1112  // This is the "summing", not broadcasting, version of AddMatBlocks.
1113  // It supports both regular and transposed operation.
1114  int32 num_row_blocks, num_col_blocks;
1115  if (transA == kNoTrans) {
1116  KALDI_ASSERT(A.NumRows() % num_rows_ == 0 && A.NumCols() % num_cols_ == 0);
1117  num_row_blocks = A.Mat().NumRows() / num_rows_;
1118  num_col_blocks = A.Mat().NumCols() / num_cols_;
1119  } else {
1120  KALDI_ASSERT(A.NumRows() % num_cols_ == 0 && A.NumCols() % num_rows_ == 0);
1121  num_row_blocks = A.Mat().NumRows() / num_cols_;
1122  num_col_blocks = A.Mat().NumCols() / num_rows_;
1123  }
1124 #if HAVE_CUDA == 1
1125  if (CuDevice::Instantiate().Enabled()) {
1126  CuTimer tim;
1127  dim3 dimGrid, dimBlock;
1128  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1129  &dimGrid, &dimBlock);
1130  cuda_add_mat_blocks(dimGrid, dimBlock, alpha, A.data_, num_row_blocks,
1131  num_col_blocks, data_, Dim(), A.Stride(),
1132  (transA == kTrans ? 1 : 0));
1133  CU_SAFE_CALL(cudaGetLastError());
1134 
1135  CuDevice::Instantiate().AccuProfile(__func__, tim);
1136  } else
1137 #endif
1138  {
1139  int32 nr, nc;
1140  if (transA == kNoTrans) {
1141  nr = num_rows_;
1142  nc = num_cols_;
1143  } else {
1144  nr = num_cols_;
1145  nc = num_rows_;
1146  }
1147  for (int32 i = 0; i < num_row_blocks; i++) {
1148  for (int32 j = 0; j < num_col_blocks; j++) {
1149  Mat().AddMat(alpha, SubMatrix<Real>(A.Mat(), i * nr, nr, j * nc, nc),
1150  transA);
1151  }
1152  }
1153  }
1154  } else {
1155  // This is the "broadcasting" version of AddMatBlocks, where
1156  // *this is larger than src.
1157  if (transA != kNoTrans)
1158  KALDI_ERR << "Transposed operation not supported currently.";
1159  if (!(num_rows_ % A.NumRows() == 0 && num_cols_ % A.NumCols() == 0))
1160  KALDI_ERR << "Invalid sizes of arguments";
1161 #if HAVE_CUDA == 1
1162  if (CuDevice::Instantiate().Enabled()) {
1163  CuTimer tim;
1164  dim3 dimGrid, dimBlock;
1165  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1166  &dimGrid, &dimBlock);
1167  cuda_add_mat_repeated(dimGrid, dimBlock, alpha,
1168  A.data_, A.Dim(), data_, Dim());
1169  CU_SAFE_CALL(cudaGetLastError());
1170  CuDevice::Instantiate().AccuProfile(__func__, tim);
1171  } else
1172 #endif
1173  {
1174  const MatrixBase<Real> &src_mat = A.Mat(),
1175  &this_mat = this->Mat();
1176  for (int32 row_offset = 0; row_offset < NumRows();
1177  row_offset += src_mat.NumRows()) {
1178  for (int32 col_offset = 0; col_offset < NumCols();
1179  col_offset += src_mat.NumCols()) {
1180  SubMatrix<Real> this_part(this_mat,
1181  row_offset, src_mat.NumRows(),
1182  col_offset, src_mat.NumCols());
1183  this_part.AddMat(alpha, src_mat);
1184  }
1185  }
1186  }
1187  }
1188 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define KALDI_ERR
Definition: kaldi-error.h:127
KALDI_ASSERT & A
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void AddMatDiagVec ( const Real  alpha,
const CuMatrixBase< Real > &  M,
MatrixTransposeType  transM,
CuVectorBase< Real > &  v,
Real  beta = 1.0 
)

Definition at line 1402 of file cu-matrix.cc.

References CuVectorBase< Real >::Data(), CuMatrixBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), kaldi::SameDim(), CuMatrixBase< Real >::Stride(), kaldi::swap(), and CuVectorBase< Real >::Vec().

Referenced by BatchNormComponent::Backprop(), SigmoidComponent::RepairGradients(), and TanhComponent::RepairGradients().

1406  {
1407 #if HAVE_CUDA == 1
1408  if (CuDevice::Instantiate().Enabled()) {
1409  if (transM == kNoTrans) {
1410  KALDI_ASSERT(SameDim(*this, M));
1411  } else {
1412  KALDI_ASSERT(M.NumRows() == NumCols() && M.NumCols() == NumRows());
1413  }
1414  KALDI_ASSERT(v.Dim() == this->NumCols());
1415 
1416  CuTimer tim;
1417  dim3 dimGrid, dimBlock;
1418  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1419  &dimGrid, &dimBlock);
1420  MatrixIndexT M_row_stride = M.Stride(), M_col_stride = 1;
1421  if (transM == kTrans) std::swap(M_row_stride, M_col_stride);
1422  cuda_add_mat_diag_vec(dimGrid, dimBlock, alpha, data_, Dim(),
1423  M.Data(), M_row_stride, M_col_stride, v.Data(), beta);
1424  CU_SAFE_CALL(cudaGetLastError());
1425  CuDevice::Instantiate().AccuProfile(__func__, tim);
1426  } else
1427 #endif
1428  {
1429  Mat().AddMatDiagVec(alpha, M.Mat(), transM, v.Vec(), beta);
1430  }
1431 }
void swap(basic_filebuf< CharT, Traits > &x, basic_filebuf< CharT, Traits > &y)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
int32 MatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddMatMat ( Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  transA,
const CuMatrixBase< Real > &  B,
MatrixTransposeType  transB,
Real  beta 
)

C = alpha * A(^T)*B(^T) + beta * C.

Definition at line 1278 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, kaldi::kTrans, CuMatrixBase< Real >::Mat(), rnnlm::n, CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by CuMatrixBase< Real >::AddMatBlock(), CuBlockMatrix< Real >::AddMatMat(), CuMatrixBase< BaseFloat >::AddMatSp(), CuMatrixBase< BaseFloat >::AddMatTp(), CuMatrixBase< BaseFloat >::AddSpMat(), CuMatrixBase< BaseFloat >::AddTpMat(), RepeatedAffineComponent::Backprop(), AffineComponent::Backprop(), LinearComponent::Backprop(), FixedLinearComponent::Backprop(), FixedAffineComponent::Backprop(), LinearTransform::BackpropagateFnc(), AffineTransform::BackpropagateFnc(), RecurrentComponent::BackpropagateFnc(), ConvolutionalComponent::BackpropagateFnc(), LstmProjected::BackpropagateFnc(), BlstmProjected::BackpropagateFnc(), ModelCollapser::CollapseComponentsAffine(), AffineComponent::CollapseWithNext(), AffineComponent::CollapseWithPrevious(), OnlinePreconditioner::ComputeWt1(), OnlineNaturalGradient::ComputeWt1(), kaldi::nnet3::ConstrainOrthonormalInternal(), kaldi::nnet3::time_height_convolution::ConvolveBackwardDataInternal(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParamsInternal(), kaldi::nnet3::time_height_convolution::ConvolveForwardInternal(), kaldi::CuVectorUnitTestAddDiagMatMat(), OnlinePreconditioner::InitOrthonormalSpecial(), kaldi::nnet2::PreconditionDirections(), OnlinePreconditioner::PreconditionDirectionsInternal(), OnlineNaturalGradient::PreconditionDirectionsInternal(), AffineComponent::Propagate(), LinearComponent::Propagate(), DctComponent::Propagate(), FixedLinearComponent::Propagate(), FixedAffineComponent::Propagate(), KlHmm::PropagateFnc(), LinearTransform::PropagateFnc(), AffineTransform::PropagateFnc(), RecurrentComponent::PropagateFnc(), Rbm::PropagateFnc(), LstmProjected::PropagateFnc(), BlstmProjected::PropagateFnc(), Rbm::RbmUpdate(), Rbm::Reconstruct(), OnlinePreconditioner::ReorthogonalizeXt1(), OnlineNaturalGradient::ReorthogonalizeXt1(), kaldi::TestCuMatrixMatMat(), kaldi::UnitTestCuBlockMatrixAddMatMat(), kaldi::UnitTestCuCholesky(), kaldi::UnitTestCuMatrixAddMatMat(), kaldi::UnitTestCuMatrixSymAddMat2(), kaldi::UnitTestCuMatrixSymInvertPosDef(), kaldi::UnitTestCuSpMatrixInvert(), LinearTransform::Update(), AffineTransform::Update(), RecurrentComponent::Update(), ConvolutionalComponent::Update(), Convolutional2DComponent::Update(), RepeatedAffineComponent::Update(), NaturalGradientAffineComponent::Update(), AffineComponentPreconditioned::Update(), AffineComponentPreconditionedOnline::Update(), BlockAffineComponentPreconditioned::Update(), AffineComponent::UpdateSimple(), and BlockAffineComponent::UpdateSimple().

1280  {
1281 
1282 
1283  // CUBLAS is col-major, cudamatrix is row-major, how to do the mapping?
1284  // keep trans..., just swap A&B matrices: A->B B->A
1285  MatrixIndexT m = ((transB==kTrans)? B.NumRows() : B.NumCols());
1286  MatrixIndexT n = ((transA==kTrans)? A.NumCols() : A.NumRows());
1287  MatrixIndexT k = ((transB==kTrans)? B.NumCols() : B.NumRows());
1288  MatrixIndexT k1 = ((transA==kTrans)? A.NumRows() : A.NumCols());
1289 
1290  KALDI_ASSERT(m == NumCols());
1291  KALDI_ASSERT(n == NumRows());
1292  KALDI_ASSERT(k == k1);
1293 
1294  if (m == 0) return;
1295 
1296 
1297 #if HAVE_CUDA == 1
1298  if (CuDevice::Instantiate().Enabled()) {
1299  CuTimer tim;
1300  CUBLAS_SAFE_CALL(cublas_gemm(GetCublasHandle(),
1301  (transB==kTrans? CUBLAS_OP_T:CUBLAS_OP_N),
1302  (transA==kTrans? CUBLAS_OP_T:CUBLAS_OP_N),
1303  m, n, k, alpha, B.data_, B.Stride(),
1304  A.data_, A.Stride(), beta, data_, Stride()));
1305 
1306  CuDevice::Instantiate().AccuProfile(__func__, tim);
1307  } else
1308 #endif
1309  {
1310  Mat().AddMatMat(alpha, A.Mat(), transA, B.Mat(), transB, beta);
1311  }
1312 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
int32 MatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
struct rnnlm::@11::@12 n
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
KALDI_ASSERT & A
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT Stride() const
Definition: cu-matrix.h:207
void AddMatMatElements ( const Real  alpha,
const CuMatrixBase< Real > &  A,
const CuMatrixBase< Real > &  B,
const Real  beta 
)

*this = beta * *this + alpha * A .* B (.* element by element multiplication)

Definition at line 1434 of file cu-matrix.cc.

References CuMatrixBase< Real >::Data(), data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by StatisticsExtractionComponent::Backprop(), StatisticsPoolingComponent::Propagate(), and kaldi::UnitTestCuMatrixSetMatMatDivMat().

1435  {
1436 #if HAVE_CUDA == 1
1437  if (CuDevice::Instantiate().Enabled()) {
1438  KALDI_ASSERT(SameDim(*this, A) && SameDim(A, B));
1439  CuTimer tim;
1440  dim3 dimGrid, dimBlock;
1441  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1442  &dimGrid, &dimBlock);
1443  cuda_add_mat_mat_elements(dimGrid, dimBlock, this->data_, A.Data(),
1444  B.Data(), Dim(), A.Stride(), B.Stride(), alpha, beta);
1445  CuDevice::Instantiate().AccuProfile(__func__, tim);
1446  } else
1447 #endif
1448  {
1449  Mat().AddMatMatElements(alpha, A.Mat(), B.Mat(), beta);
1450  }
1451 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
KALDI_ASSERT & A
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddMatSmat ( Real  alpha,
const CuMatrixBase< Real > &  A,
const CuSparseMatrix< Real > &  B,
MatrixTransposeType  transB,
Real  beta 
)

(*this) = alpha * A * op(B) + beta * (*this), where B is sparse and op(B) is either B or trans(B) depending on the 'transB' argument.

This is multiplication of a dense by a sparse matrix. See also AddSmatMat.

Definition at line 1067 of file cu-matrix.cc.

References CuSparseMatrix< Real >::CsrColIdx(), CuSparseMatrix< Real >::CsrRowPtr(), CuSparseMatrix< Real >::CsrVal(), CuMatrixBase< Real >::Data(), KALDI_ASSERT, kaldi::kNoTrans, CuMatrixBase< Real >::Mat(), CuSparseMatrix< Real >::NumCols(), CuMatrixBase< Real >::NumCols(), CuSparseMatrix< Real >::NumElements(), CuSparseMatrix< Real >::NumRows(), CuMatrixBase< Real >::NumRows(), CuSparseMatrix< Real >::Smat(), and CuMatrixBase< Real >::Stride().

Referenced by kaldi::UnitTextCuMatrixAddMatSmat().

1069  {
1070 #if HAVE_CUDA == 1
1071  if (CuDevice::Instantiate().Enabled()) {
1072  if (transB == kNoTrans) {
1073  KALDI_ASSERT(NumRows() == A.NumRows());
1074  KALDI_ASSERT(NumCols() == B.NumCols());
1075  KALDI_ASSERT(A.NumCols() == B.NumRows());
1076  } else {
1077  KALDI_ASSERT(NumRows() == A.NumRows());
1078  KALDI_ASSERT(NumCols() == B.NumRows());
1079  KALDI_ASSERT(A.NumCols() == B.NumCols());
1080  }
1081 
1082  CuTimer tim;
1083 
1084  cusparseMatDescr_t descr;
1085  CUSPARSE_SAFE_CALL(cusparseCreateMatDescr(&descr));
1086  CU_SAFE_CALL(
1087  cusparse_csrmm(
1088  GetCusparseHandle(),
1089  transB == kNoTrans ?
1090  CUSPARSE_OPERATION_TRANSPOSE : CUSPARSE_OPERATION_NON_TRANSPOSE,
1091  B.NumRows(), NumRows(), B.NumCols(), B.NumElements(), &alpha, descr,
1092  B.CsrVal(), B.CsrRowPtr(), B.CsrColIdx(), A.Data(), A.Stride(),
1093  &beta, Data(), Stride()));
1094  CUSPARSE_SAFE_CALL(cusparseDestroyMatDescr(descr));
1095 
1096  CuDevice::Instantiate().AccuProfile(__func__, tim);
1097  } else
1098 #endif
1099  {
1100  Mat().AddMatSmat(alpha, A.Mat(), B.Smat(), transB, beta);
1101  }
1102 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
KALDI_ASSERT & A
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT Stride() const
Definition: cu-matrix.h:207
const Real * Data() const
Return data pointer (const).
Definition: cu-matrix.h:673
void AddMatSp ( const Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  transA,
const CuSpMatrix< Real > &  B,
const Real  beta 
)
inline

this <– beta*this + alpha*A*B

Definition at line 541 of file cu-matrix.h.

544  {
545  CuMatrix<Real> M(B);
546  return AddMatMat(alpha, A, transA, M, kNoTrans, beta);
547  }
void AddMatMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
C = alpha * A(^T)*B(^T) + beta * C.
Definition: cu-matrix.cc:1278
KALDI_ASSERT & A
void AddMatTp ( const Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  transA,
const CuTpMatrix< Real > &  B,
MatrixTransposeType  transB,
const Real  beta 
)
inline

this <– beta*this + alpha*A*B.

Definition at line 568 of file cu-matrix.h.

Referenced by kaldi::UnitTestCuMatrixAddMatTp().

571  {
572  CuMatrix<Real> M(B);
573  return AddMatMat(alpha, A, transA, M, transB, beta);
574  }
void AddMatMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
C = alpha * A(^T)*B(^T) + beta * C.
Definition: cu-matrix.cc:1278
KALDI_ASSERT & A
void AddRowRanges ( const CuMatrixBase< Real > &  src,
const CuArrayBase< Int32Pair > &  indexes 
)

For each row r of this and for each column c, do (*this)(r, c) += src(j, c), where j ranges from indexes[r].first through indexes[r].second - 1.

In general indexes must be >= 0 and < src.NumRows(); but to represent an empty range you may use the pair (-1, -1) or any pair of numbers (i, j) such that i >= j.

Definition at line 2832 of file cu-matrix.cc.

References CuArrayBase< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuMatrixBase< Real >::data_, CuArrayBase< T >::Dim(), CuMatrixBase< Real >::Dim(), Int32Pair::first, KALDI_ASSERT, CuMatrixBase< Real >::NumCols(), Int32Pair::second, and CuMatrixBase< Real >::stride_.

Referenced by StatisticsPoolingComponent::Backprop(), NnetComputer::ExecuteCommand(), StatisticsPoolingComponent::Propagate(), and kaldi::UnitTestCuMatrixAddRowRanges().

2833  {
2834  KALDI_ASSERT(static_cast<MatrixIndexT>(indexes.Dim()) == NumRows());
2835  KALDI_ASSERT(src.NumCols() == NumCols());
2836  if (NumRows() == 0) return;
2837 #if HAVE_CUDA == 1
2838  if (CuDevice::Instantiate().Enabled()) {
2839  CuTimer tim;
2840  dim3 dimGrid, dimBlock;
2841  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2842  &dimGrid, &dimBlock);
2843  cuda_add_row_ranges(dimGrid, dimBlock,
2844  data_, Dim(), src.Data(), src.Dim(), indexes.Data());
2845  CU_SAFE_CALL(cudaGetLastError());
2846  CuDevice::Instantiate().AccuProfile(__func__, tim);
2847  } else
2848 #endif
2849  { // Implement here for the CPU..
2850  int32 num_rows = this->num_rows_, num_cols = this->num_cols_,
2851  this_stride = this->stride_, src_stride = src.stride_;
2852  Real *data = this->data_;
2853  const Real *src_data = src.data_;
2854  const Int32Pair *indexes_data = indexes.Data();
2855  for (int32 row = 0; row < num_rows; row++) {
2856  int32 start_row = indexes_data[row].first,
2857  end_row = indexes_data[row].second;
2858  for (int32 col = 0; col < num_cols; col++) {
2859  Real sum = 0.0;
2860  for (int32 src_row = start_row; src_row < end_row; src_row++)
2861  sum += src_data[src_row * src_stride + col];
2862  data[row * this_stride + col] += sum;
2863  }
2864  }
2865  }
2866 }
const T * Data() const
Get raw pointer.
Definition: cu-array.h:52
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
MatrixIndexT stride_
Definition: cu-matrix.h:714
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:49
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
int32_cuda second
Definition: cu-matrixdim.h:86
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
int32_cuda first
Definition: cu-matrixdim.h:85
void AddRows ( Real  alpha,
const CuMatrixBase< Real > &  src,
const CuArrayBase< MatrixIndexT > &  indexes 
)

Does for each row r, this.Row(r) += alpha * src.row(indexes[r]).

If indexes[r] < 0, does not add anything. src.NumCols() must equal this.NumCols()

Definition at line 2702 of file cu-matrix.cc.

References CuArrayBase< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuArrayBase< T >::Dim(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by StatisticsExtractionComponent::Backprop(), and NnetComputer::ExecuteCommand().

2704  {
2705  if (NumRows() == 0) return;
2706 #if HAVE_CUDA == 1
2707  if (CuDevice::Instantiate().Enabled()) {
2708  KALDI_ASSERT(static_cast<MatrixIndexT>(indexes.Dim()) == NumRows());
2709  KALDI_ASSERT(src.NumCols() == NumCols());
2710  CuTimer tim;
2711  dim3 dimGrid, dimBlock;
2712  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2713  &dimGrid, &dimBlock);
2714  cuda_add_rows(dimGrid, dimBlock, alpha,
2715  data_, src.Data(), indexes.Data(), Dim(), src.Stride());
2716  CU_SAFE_CALL(cudaGetLastError());
2717  CuDevice::Instantiate().AccuProfile(__func__, tim);
2718  } else
2719 #endif
2720  {
2721  Mat().AddRows(alpha, src.Mat(), indexes.Data());
2722  }
2723 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddRows ( Real  alpha,
const CuArrayBase< const Real * > &  src 
)

Does for each row r, this.Row(r) += alpha * src[r], treating src[r] as the beginning of a region of memory representing a vector of floats, of the same length as this.NumCols().

Definition at line 2727 of file cu-matrix.cc.

References CuArrayBase< T >::Data(), data_, CuArrayBase< T >::Dim(), and KALDI_ASSERT.

2727  {
2728  if (NumRows() == 0) return;
2729 #if HAVE_CUDA == 1
2730  if (CuDevice::Instantiate().Enabled()) {
2731  KALDI_ASSERT(static_cast<MatrixIndexT>(src.Dim()) == NumRows());
2732  CuTimer tim;
2733  dim3 dimGrid, dimBlock;
2734  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2735  &dimGrid, &dimBlock);
2736  cuda_add_rows(dimGrid, dimBlock, alpha, data_, src.Data(), Dim());
2737  CU_SAFE_CALL(cudaGetLastError());
2738  CuDevice::Instantiate().AccuProfile(__func__, tim);
2739  } else
2740 #endif
2741  {
2742  Mat().AddRows(alpha, src.Data());
2743  }
2744 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddSmat ( Real  alpha,
const CuSparseMatrix< Real > &  A,
MatrixTransposeType  trans = kNoTrans 
)

*this += alpha * A.

Definition at line 972 of file cu-matrix.cc.

References CuSparseMatrix< Real >::CsrColIdx(), CuSparseMatrix< Real >::CsrRowPtr(), CuSparseMatrix< Real >::CsrVal(), CU1DBLOCK, KALDI_ASSERT, kaldi::kNoTrans, CuSparseMatrix< Real >::NumCols(), CuSparseMatrix< Real >::NumRows(), and CuSparseMatrix< Real >::Smat().

Referenced by GeneralMatrix::AddToMat(), and kaldi::UnitTextCuMatrixAddSmat().

973  {
974 #if HAVE_CUDA == 1
975  if (CuDevice::Instantiate().Enabled()) {
976  if (trans == kNoTrans) {
977  KALDI_ASSERT(NumRows() == A.NumRows());
978  KALDI_ASSERT(NumCols() == A.NumCols());
979  } else {
980  KALDI_ASSERT(NumRows() == A.NumCols());
981  KALDI_ASSERT(NumCols() == A.NumRows());
982  }
983 
984  CuTimer tim;
985 
986  // We use warpSize threads per row to access only the nonzero elements.
987  // Every CU1DBLOCK/warpSize rows share one thread block.
988  // 1D grid to cover all rows of A.
989  const int warpSize = 32;
990  dim3 dimBlock(warpSize, CU1DBLOCK / warpSize);
991  dim3 dimGrid(n_blocks(A.NumRows(), dimBlock.y));
992 
993  if (trans == kNoTrans) {
994  cuda_add_smat(dimGrid, dimBlock, Data(), Dim(), alpha, A.CsrRowPtr(),
995  A.CsrColIdx(), A.CsrVal());
996  } else {
997  cuda_add_smat_trans(dimGrid, dimBlock, Data(), Dim(), alpha,
998  A.CsrRowPtr(), A.CsrColIdx(), A.CsrVal());
999  }
1000 
1001  CU_SAFE_CALL(cudaGetLastError());
1002  CuDevice::Instantiate().AccuProfile(__func__, tim);
1003  } else
1004 #endif
1005  {
1006  Mat().AddSmat(alpha, A.Smat(), trans);
1007  }
1008 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
KALDI_ASSERT & A
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
const Real * Data() const
Return data pointer (const).
Definition: cu-matrix.h:673
void AddSmatMat ( Real  alpha,
const CuSparseMatrix< Real > &  A,
MatrixTransposeType  transA,
const CuMatrixBase< Real > &  B,
Real  beta 
)

(*this) = alpha * op(A) * B + beta * (*this), where A is sparse.

Multiplication of sparse with dense matrix. See also AddMatSmat. Note: we recommend, for greatest efficiency, that transA be kNoTrans. Use AddMatSmat() for better efficiency, as 2 dense mat transpose ops are called in this API.

Definition at line 1011 of file cu-matrix.cc.

References CuSparseMatrix< Real >::CsrColIdx(), CuSparseMatrix< Real >::CsrRowPtr(), CuSparseMatrix< Real >::CsrVal(), CuMatrixBase< Real >::Data(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::Mat(), CuSparseMatrix< Real >::NumCols(), CuMatrixBase< Real >::NumCols(), CuSparseMatrix< Real >::NumElements(), CuSparseMatrix< Real >::NumRows(), CuMatrixBase< Real >::NumRows(), CuSparseMatrix< Real >::Smat(), and CuMatrixBase< Real >::Stride().

Referenced by kaldi::UnitTextCuMatrixAddSmatMat().

1013  {
1014 #if HAVE_CUDA == 1
1015  if (CuDevice::Instantiate().Enabled()) {
1016  if (transA == kNoTrans) {
1017  KALDI_ASSERT(NumRows() == A.NumRows());
1018  KALDI_ASSERT(NumCols() == B.NumCols());
1019  KALDI_ASSERT(A.NumCols() == B.NumRows());
1020  } else {
1021  KALDI_ASSERT(NumRows() == A.NumCols());
1022  KALDI_ASSERT(NumCols() == B.NumCols());
1023  KALDI_ASSERT(A.NumRows() == B.NumRows());
1024  }
1025 
1026  CuTimer tim;
1027 
1028  // We have op(A) and BT in col-major (B in row-major).
1029  // We first compute C in col-major (CT in row-major)
1030  // with C = op(A) * BT^T by cusparse_csrmm2,
1031  // then transpose CT to get C in row-major
1032  CuMatrix<Real> CT(*this, kTrans);
1033 
1034  cusparseMatDescr_t descr;
1035  CUSPARSE_SAFE_CALL(cusparseCreateMatDescr(&descr));
1036  if (transA == kTrans) {
1037  // Note: only op(A)=A is supported if op(B)=B^T according to cusparse doc
1038  // http://docs.nvidia.com/cuda/cusparse/index.html#cusparse-lt-t-gt-csrmm2
1040  CU_SAFE_CALL(
1041  cusparse_csrmm2(GetCusparseHandle(), CUSPARSE_OPERATION_NON_TRANSPOSE,
1042  CUSPARSE_OPERATION_TRANSPOSE, AT.NumRows(),
1043  CT.NumRows(), AT.NumCols(), AT.NumElements(), &alpha,
1044  descr, AT.CsrVal(), AT.CsrRowPtr(), AT.CsrColIdx(),
1045  B.Data(), B.Stride(), &beta, CT.Data(), CT.Stride()));
1046  } else {
1047  CU_SAFE_CALL(
1048  cusparse_csrmm2(GetCusparseHandle(), CUSPARSE_OPERATION_NON_TRANSPOSE,
1049  CUSPARSE_OPERATION_TRANSPOSE, A.NumRows(),
1050  CT.NumRows(), A.NumCols(), A.NumElements(), &alpha,
1051  descr, A.CsrVal(), A.CsrRowPtr(), A.CsrColIdx(),
1052  B.Data(), B.Stride(), &beta, CT.Data(), CT.Stride()));
1053  }
1054  CUSPARSE_SAFE_CALL(cusparseDestroyMatDescr(descr));
1055 
1056  this->CopyFromMat(CT, kTrans);
1057 
1058  CuDevice::Instantiate().AccuProfile(__func__, tim);
1059  } else
1060 #endif
1061  {
1062  Mat().AddSmatMat(alpha, A.Smat(), transA, B.Mat(), beta);
1063  }
1064 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
void CopyFromMat(const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:339
friend class CuSparseMatrix< Real >
Definition: cu-matrix.h:95
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
KALDI_ASSERT & A
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddSpMat ( const Real  alpha,
const CuSpMatrix< Real > &  A,
const CuMatrixBase< Real > &  B,
MatrixTransposeType  transB,
const Real  beta 
)
inline

this <– beta*this + alpha*SpA*B

Definition at line 550 of file cu-matrix.h.

553  {
554  CuMatrix<Real> M(A);
555  return AddMatMat(alpha, M, kNoTrans, B, transB, beta);
556  }
void AddMatMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
C = alpha * A(^T)*B(^T) + beta * C.
Definition: cu-matrix.cc:1278
KALDI_ASSERT & A
void AddToDiag ( Real  value)

Adds "value" to the diagonal elements of the matrix.

The matrix *this does not have to be square.

Definition at line 570 of file cu-matrix.cc.

References CU1DBLOCK, rnnlm::d, and data_.

Referenced by kaldi::nnet3::ConstrainOrthonormalInternal(), kaldi::nnet2::PreconditionDirections(), kaldi::TestCuMatrixCholesky(), and kaldi::UnitTestCuMatrixAddToDiag().

570  {
571 #if HAVE_CUDA == 1
572  if (CuDevice::Instantiate().Enabled()) {
573  if (num_rows_ == 0) return;
574  CuTimer tim;
575  // We'll create a fake matrix with "num_diag" rows, one
576  // columnn, and a stride of "this_stride". The y-value of
577  // the grid/blocks corresponds to the row, in this kernel.
578  MatrixIndexT num_diag = std::min(num_rows_, num_cols_),
579  this_stride = stride_ + 1;
580  dim3 dimBlock(1, CU1DBLOCK);
581  dim3 dimGrid(1, n_blocks(num_diag, CU1DBLOCK));
582  ::MatrixDim d = { num_diag, 1, this_stride };
583  cuda_add(dimGrid, dimBlock, data_, value, d);
584  CU_SAFE_CALL(cudaGetLastError());
585 
586  CuDevice::Instantiate().AccuProfile(__func__, tim);
587  } else
588  #endif
589  {
590  Mat().AddToDiag(value);
591  }
592 }
Structure containing size of the matrix plus stride.
Definition: cu-matrixdim.h:52
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
int32 MatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT stride_
Definition: cu-matrix.h:714
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void AddToElements ( Real  alpha,
const CuArrayBase< int32 > &  elements 
)

This is a rather special purpose function; we might generalize it later by adding a transpose-type option.

It expects 'elements.Dim()' to equal NumRows(), and for each elements[i] to be either -1, or 0 <= element[i] < NumCols(). It adds alpha to each element (*this)(i, elements[i]) for 0 <= i < NumRows().

Definition at line 3244 of file cu-matrix.cc.

References CU1DBLOCK, CuArrayBase< T >::Data(), data_, CuArrayBase< T >::Dim(), KALDI_ASSERT, and MatrixBase< Real >::NumRows().

Referenced by kaldi::UnitTestCuMatrixAddToElements().

3244  {
3245  KALDI_ASSERT(elements.Dim() == NumRows());
3246 #if HAVE_CUDA == 1
3247  if (CuDevice::Instantiate().Enabled()) {
3248  CuTimer tim;
3249 
3250  dim3 dimBlock(CU1DBLOCK);
3251  dim3 dimGrid(n_blocks(NumRows(), CU1DBLOCK));
3252 
3253  cuda_matrix_add_to_elements(dimGrid, dimBlock, alpha, data_, Dim(), elements.Data());
3254  CU_SAFE_CALL(cudaGetLastError());
3255  CuDevice::Instantiate().AccuProfile(__func__, tim);
3256  } else
3257 #endif
3258  {
3259  MatrixBase<Real> &this_mat = this->Mat();
3260  const int32* row_to_col = elements.Data();
3261  for (int32 r = 0; r < this_mat.NumRows(); r++) {
3262  KALDI_ASSERT(row_to_col[r] >= -1);
3263  if (row_to_col[r] >= 0)
3264  this_mat(r, row_to_col[r]) += alpha;
3265  }
3266  }
3267 }
const T * Data() const
Get raw pointer.
Definition: cu-array.h:52
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:49
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddToRows ( Real  alpha,
const CuArrayBase< MatrixIndexT > &  indexes,
CuMatrixBase< Real > *  dst 
) const

For each row i of *this, adds this->Row(i) to dst->Row(indexes(i)) if indexes(i) >= 0, else do nothing.

Requires that all the indexes[i] that are >= 0 be distinct, otherwise the behavior is undefined.

Definition at line 2770 of file cu-matrix.cc.

References CuArrayBase< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuArrayBase< T >::Dim(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by NnetComputer::ExecuteCommand(), and kaldi::UnitTestCuMatrixAddToRows().

2772  {
2773  if (NumRows() == 0) return;
2774 #if HAVE_CUDA == 1
2775  if (CuDevice::Instantiate().Enabled()) {
2776  KALDI_ASSERT(static_cast<MatrixIndexT>(indexes.Dim()) == NumRows());
2777  KALDI_ASSERT(dst->NumCols() == NumCols());
2778  CuTimer tim;
2779  dim3 dimGrid, dimBlock;
2780  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2781  &dimGrid, &dimBlock);
2782  cuda_add_to_rows(dimGrid, dimBlock, alpha, dst->Data(), data_, indexes.Data(), Dim(), dst->Stride());
2783  CU_SAFE_CALL(cudaGetLastError());
2784  CuDevice::Instantiate().AccuProfile(__func__, tim);
2785  } else
2786 #endif
2787  {
2788  Mat().AddToRows(alpha, indexes.Data(), &(dst->Mat()));
2789  }
2790 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddToRows ( Real  alpha,
const CuArrayBase< Real * > &  dst 
) const

For each row r of this matrix, adds it (times alpha) to the array of floats at the location given by dst[r], where dst[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (i.e.

it should point to memory on the GPU if we're using a GPU, or on the CPU otherwise). If dst[r] is NULL, does not do anything for that row. Requires that none of the memory regions pointed to by the pointers in "dst" overlap (e.g. none of the pointers should be the same).

Definition at line 2748 of file cu-matrix.cc.

References CuArrayBase< T >::Data(), data_, CuArrayBase< T >::Dim(), and KALDI_ASSERT.

2749  {
2750  if (NumRows() == 0) return;
2751 #if HAVE_CUDA == 1
2752  if (CuDevice::Instantiate().Enabled()) {
2753  KALDI_ASSERT(static_cast<MatrixIndexT>(dst.Dim()) == NumRows());
2754  CuTimer tim;
2755  dim3 dimGrid, dimBlock;
2756  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2757  &dimGrid, &dimBlock);
2758  cuda_add_to_rows(dimGrid, dimBlock, alpha, dst.Data(), data_, Dim());
2759  CU_SAFE_CALL(cudaGetLastError());
2760  CuDevice::Instantiate().AccuProfile(__func__, tim);
2761  } else
2762 #endif
2763  {
2764  Mat().AddToRows(alpha, dst.Data());
2765  }
2766 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddTpMat ( const Real  alpha,
const CuTpMatrix< Real > &  A,
MatrixTransposeType  transA,
const CuMatrixBase< Real > &  B,
MatrixTransposeType  transB,
const Real  beta 
)
inline

this <– beta*this + alpha*A*B.

Definition at line 559 of file cu-matrix.h.

Referenced by kaldi::UnitTestCuMatrixAddTpMat().

562  {
563  CuMatrix<Real> M(A);
564  return AddMatMat(alpha, M, transA, B, transB, beta);
565  }
void AddMatMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
C = alpha * A(^T)*B(^T) + beta * C.
Definition: cu-matrix.cc:1278
KALDI_ASSERT & A
void AddVecToCols ( Real  alpha,
const CuVectorBase< Real > &  col,
Real  beta = 1.0 
)

(for each column c of *this), c = alpha * col + beta * c

Definition at line 1219 of file cu-matrix.cc.

References data_, CuVectorBase< Real >::data_, CuVectorBase< Real >::Dim(), KALDI_ERR, and CuVectorBase< Real >::Vec().

Referenced by KlHmm::PropagateFnc(), and kaldi::UnitTestCuMatrixAddVecToCols().

1221  {
1222  if (col.Dim() != NumRows()) {
1223  KALDI_ERR << "Non matching dimensions: Rows:" << NumRows() << " VectorDim:" << col.Dim();
1224  }
1225 
1226  #if HAVE_CUDA == 1
1227  if (CuDevice::Instantiate().Enabled()) {
1228  CuTimer tim;
1229  dim3 dimGrid, dimBlock;
1230  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1231  &dimGrid, &dimBlock);
1232  cuda_add_vec_to_cols(dimGrid, dimBlock, alpha, col.data_, beta,
1233  data_, Dim());
1234  CU_SAFE_CALL(cudaGetLastError());
1235 
1236  CuDevice::Instantiate().AccuProfile(__func__, tim);
1237  } else
1238  #endif
1239  {
1240  if (beta != 1.0) Mat().Scale(beta);
1241  Mat().AddVecToCols(alpha, col.Vec());
1242  }
1243 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define KALDI_ERR
Definition: kaldi-error.h:127
::MatrixDim Dim() const
Definition: cu-matrix.h:211
void AddVecToRows ( Real  alpha,
const CuVectorBase< Real > &  row,
Real  beta = 1.0 
)

(for each row r of *this), r = alpha * row + beta * r

Definition at line 1248 of file cu-matrix.cc.

References data_, CuVectorBase< Real >::data_, CuVectorBase< Real >::Dim(), KALDI_ERR, and CuVectorBase< Real >::Vec().

Referenced by DecodableNnetLoopedOnlineBase::AdvanceChunk(), DecodableNnetSimpleLooped::AdvanceChunk(), BatchNormComponent::Backprop(), SimpleSentenceAveragingComponent::BackpropagateFnc(), ScaleAndOffsetComponent::BackpropInternal(), DecodableAmNnetParallel::Compute(), DecodableNnet2Online::ComputeForFrame(), DecodableNnetSimple::DoNnetComputation(), BatchNormComponent::Propagate(), FixedAffineComponent::Propagate(), PerElementOffsetComponent::Propagate(), FixedBiasComponent::Propagate(), Convolutional1dComponent::Propagate(), ConvolutionComponent::Propagate(), SimpleSentenceAveragingComponent::PropagateFnc(), AffineTransform::PropagateFnc(), RecurrentComponent::PropagateFnc(), Rbm::PropagateFnc(), ConvolutionalComponent::PropagateFnc(), Convolutional2DComponent::PropagateFnc(), AddShift::PropagateFnc(), ScaleAndOffsetComponent::PropagateInternal(), Rbm::Reconstruct(), SigmoidComponent::RepairGradients(), RectifiedLinearComponent::RepairGradients(), PdfPrior::SubtractOnLogpost(), kaldi::UnitTestCuMatrixAddVecToRows(), and SentenceAveragingComponent::Update().

1250  {
1251  if (row.Dim() != NumCols()) {
1252  KALDI_ERR << "Non matching dimensions: Cols:" << NumCols() << " VectorDim:" << row.Dim();
1253  }
1254 #if HAVE_CUDA == 1
1255  if (CuDevice::Instantiate().Enabled()) {
1256  CuTimer tim;
1257  dim3 dimGrid, dimBlock;
1258  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1259  &dimGrid, &dimBlock);
1260  cuda_add_vec_to_rows(dimGrid, dimBlock, alpha, row.data_, beta, data_, Dim());
1261  CU_SAFE_CALL(cudaGetLastError());
1262 
1263  CuDevice::Instantiate().AccuProfile(__func__, tim);
1264  } else
1265 #endif
1266  {
1267  if (beta != 1.0) Mat().Scale(beta);
1268  Mat().AddVecToRows(alpha, row.Vec());
1269  }
1270 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define KALDI_ERR
Definition: kaldi-error.h:127
::MatrixDim Dim() const
Definition: cu-matrix.h:211
void AddVecVec ( Real  alpha,
const CuVectorBase< Real > &  x,
const CuVectorBase< Real > &  y 
)

A = alpha * x * y^T + A .

Definition at line 1316 of file cu-matrix.cc.

References CuVectorBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), KALDI_ASSERT, rnnlm::n, and CuVectorBase< Real >::Vec().

Referenced by kaldi::UnitTestCuMatrixAddVecVec().

1317  {
1318 
1319  MatrixIndexT m = y.Dim();
1320  MatrixIndexT n = x.Dim();
1321  KALDI_ASSERT(m == NumCols());
1322  KALDI_ASSERT(n == NumRows());
1323 
1324 #if HAVE_CUDA == 1
1325  if (CuDevice::Instantiate().Enabled()) {
1326  CuTimer tim;
1327  CUBLAS_SAFE_CALL(cublas_ger(GetCublasHandle(), m, n, alpha,
1328  y.Data(), 1, x.Data(), 1, data_, Stride()));
1329 
1330  CuDevice::Instantiate().AccuProfile(__func__, tim);
1331  } else
1332 #endif
1333  {
1334  Mat().AddVecVec(alpha, x.Vec(), y.Vec());
1335  }
1336 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
int32 MatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
struct rnnlm::@11::@12 n
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT Stride() const
Definition: cu-matrix.h:207
void ApplyCeiling ( Real  ceiling_val)

Definition at line 2541 of file cu-matrix.cc.

References data_.

Referenced by ClipGradientComponent::Backprop(), RecurrentComponent::BackpropagateFnc(), kaldi::UnitTestCuMatrixApplyCeiling(), ParametricRelu::Update(), LstmProjected::Update(), and BlstmProjected::Update().

2541  {
2542 #if HAVE_CUDA == 1
2543  if (CuDevice::Instantiate().Enabled()) {
2544  CuTimer tim;
2545  dim3 dimGrid, dimBlock;
2546  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2547  &dimGrid, &dimBlock);
2548  cuda_apply_ceiling(dimGrid, dimBlock, data_, ceiling_val, Dim());
2549  CU_SAFE_CALL(cudaGetLastError());
2550  CuDevice::Instantiate().AccuProfile(__func__, tim);
2551  } else
2552 #endif
2553  {
2554  Mat().ApplyCeiling(ceiling_val);
2555  }
2556 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
::MatrixDim Dim() const
Definition: cu-matrix.h:211
void ApplyExp ( )

Definition at line 2484 of file cu-matrix.cc.

References data_.

Referenced by DiscriminativeComputation::Compute(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), and kaldi::UnitTestCuMatrixApplyExp().

2484  {
2485 #if HAVE_CUDA == 1
2486  if (CuDevice::Instantiate().Enabled()) {
2487  CuTimer tim;
2488  dim3 dimGrid, dimBlock;
2489  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2490  &dimGrid, &dimBlock);
2491  cuda_apply_exp(dimGrid, dimBlock, data_, Dim());
2492  CU_SAFE_CALL(cudaGetLastError());
2493  CuDevice::Instantiate().AccuProfile(__func__, tim);
2494  } else
2495 #endif
2496  {
2497  Mat().ApplyExp();
2498  }
2499 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
::MatrixDim Dim() const
Definition: cu-matrix.h:211
void ApplyExpSpecial ( )

For each element x of the matrix, set it to (x < 0 ? exp(x) : x + 1).

This function is used in our RNNLM training.

Definition at line 2502 of file cu-matrix.cc.

References CU1DBLOCK.

Referenced by kaldi::UnitTestCuMatrixApplyExpSpecial().

2502  {
2503 #if HAVE_CUDA == 1
2504  if (CuDevice::Instantiate().Enabled()) {
2505  CuTimer tim;
2506 
2507  const int warpSize = 32;
2508  dim3 dimBlock(CU1DBLOCK / warpSize, warpSize);
2509  dim3 dimGrid(n_blocks(NumRows(), dimBlock.x),
2510  n_blocks(NumCols(), dimBlock.y));
2511 
2512  cuda_apply_exp_special(dimGrid, dimBlock, Data(), Dim(), Data(), Stride());
2513  CU_SAFE_CALL(cudaGetLastError());
2514  CuDevice::Instantiate().AccuProfile(__func__, tim);
2515  } else
2516 #endif
2517  {
2518  Mat().ApplyExpSpecial();
2519  }
2520 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:211
MatrixIndexT Stride() const
Definition: cu-matrix.h:207
const Real * Data() const
Return data pointer (const).
Definition: cu-matrix.h:673
void ApplyFloor ( Real  floor_val)

Definition at line 2523 of file cu-matrix.cc.

References data_.

Referenced by ClipGradientComponent::Backprop(), RecurrentComponent::BackpropagateFnc(), DecodableAmNnetParallel::Compute(), DecodableNnet2Online::ComputeForFrame(), main(), StatisticsPoolingComponent::Propagate(), RectifiedLinearComponent::Propagate(), SoftmaxComponent::Propagate(), LogSoftmaxComponent::Propagate(), ClipGradientComponent::RepairGradients(), RestrictedAttentionComponent::StoreStats(), kaldi::TestCuMatrixCompObjfAndDeriv(), kaldi::UnitTestCuMatrixApplyFloor(), kaldi::UnitTestCuMatrixObjfDeriv(), ParametricRelu::Update(), LstmProjected::Update(), and BlstmProjected::Update().

2523  {
2524 #if HAVE_CUDA == 1
2525  if (CuDevice::Instantiate().Enabled()) {
2526  CuTimer tim;
2527  dim3 dimGrid, dimBlock;
2528  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2529  &dimGrid, &dimBlock);
2530  cuda_apply_floor(dimGrid, dimBlock, data_, floor_val, Dim());
2531  CU_SAFE_CALL(cudaGetLastError());
2532  CuDevice::Instantiate().AccuProfile(__func__, tim);
2533  } else
2534 #endif
2535  {
2536  Mat().ApplyFloor(floor_val);
2537  }
2538 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
::MatrixDim Dim() const
Definition: cu-matrix.h:211
void ApplyHeaviside ( )

For each element, sets x = (x > 0 ? 1.0 : 0.0).

See also Heaviside().

Definition at line 2445 of file cu-matrix.cc.

References data_.

Referenced by BackpropTruncationComponent::Backprop(), RectifiedLinearComponent::Backprop(), DropoutMaskComponent::Propagate(), DropoutComponent::Propagate(), SigmoidComponent::RepairGradients(), TanhComponent::RepairGradients(), ClipGradientComponent::RepairGradients(), kaldi::TestCuMatrixHeaviside(), and kaldi::UnitTestCuMatrixApplyHeaviside().

2445  {
2446 #if HAVE_CUDA == 1
2447  if (CuDevice::Instantiate().Enabled()) {
2448  CuTimer tim;
2449  dim3 dimGrid, dimBlock;
2450  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2451  &dimGrid, &dimBlock);
2452  cuda_apply_heaviside(dimGrid, dimBlock, data_, Dim());
2453  CU_SAFE_CALL(cudaGetLastError());
2454  CuDevice::Instantiate().AccuProfile(__func__, tim);
2455  } else
2456 #endif
2457  {
2458  Mat().ApplyHeaviside();
2459  }
2460 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
::MatrixDim Dim() const
Definition: cu-matrix.h:211
void ApplyLog ( )

Definition at line 632 of file cu-matrix.cc.

References data_.

Referenced by DecodableAmNnetParallel::Compute(), DecodableNnet2Online::ComputeForFrame(), Xent::Eval(), main(), RestrictedAttentionComponent::StoreStats(), kaldi::TestCuMatrixCompObjfAndDeriv(), kaldi::UnitTestCuMatrixApplyLog(), and kaldi::UnitTestCuMatrixObjfDeriv().

632  {
633  #if HAVE_CUDA == 1
634  if (CuDevice::Instantiate().Enabled()) {
635  if (num_rows_ == 0) return;
636  CuTimer tim;
637 
638  dim3 dimGrid, dimBlock;
639  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
640  &dimGrid, &dimBlock);
641 
642  cuda_apply_log(dimGrid, dimBlock, data_, Dim());
643  CU_SAFE_CALL(cudaGetLastError());
644 
645  CuDevice::Instantiate().AccuProfile(__func__, tim);
646  } else
647  #endif
648  {
649  Mat().ApplyLog();
650  }
651 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
::MatrixDim Dim() const
Definition: cu-matrix.h:211
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void ApplyLogSoftMaxPerRow ( const CuMatrixBase< Real > &  src)

LogSoftmax nonlinearity Y = LogSoftmax(X) : Yij = Xij - log(sum_k(e^Xik)), done to each row, with attention to avoiding overflow or underflow.

Supports in-place operation (i.e. this == &src).

Definition at line 1724 of file cu-matrix.cc.

References MatrixBase< Real >::CopyFromMat(), CU1DBLOCK, data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::num_rows_, MatrixBase< Real >::NumRows(), MatrixBase< Real >::Row(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by LogSoftmaxComponent::Propagate(), kaldi::TestCuMatrixLogSoftmax(), and kaldi::UnitTestCuLogSoftmax().

1724  {
1725  KALDI_ASSERT(SameDim(*this, src));
1726 #if HAVE_CUDA == 1
1727  if (CuDevice::Instantiate().Enabled()) {
1728  CuTimer tim;
1729  size_t dimBlock = CU1DBLOCK;
1730  size_t dimGrid = src.num_rows_;
1731  cuda_log_softmax_reduce(dimGrid, dimBlock,
1732  data_, src.data_, Dim(), src.Stride());
1733  CU_SAFE_CALL(cudaGetLastError());
1734 
1735  CuDevice::Instantiate().AccuProfile(__func__, tim);
1736  } else
1737 #endif
1738  {
1739  MatrixBase<Real> &mat(this->Mat());
1740  mat.CopyFromMat(src.Mat());
1741  for(MatrixIndexT r = 0; r < mat.NumRows(); r++) {
1742  mat.Row(r).ApplyLogSoftMax();
1743  }
1744  }
1745 }
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
int32 MatrixIndexT
Definition: matrix-common.h:98
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void ApplyPow ( Real  power)

Definition at line 2409 of file cu-matrix.cc.

References data_.

Referenced by TanhComponent::Backprop(), kaldi::MeanVariance(), StatisticsExtractionComponent::Propagate(), StatisticsPoolingComponent::Propagate(), TanhComponent::StoreStats(), kaldi::UnitTestCuMatrixApplyPow(), kaldi::UnitTestCuMatrixSetRandn(), and kaldi::UnitTestCuMatrixSetRandUniform().

2409  {
2410 #if HAVE_CUDA == 1
2411  if (CuDevice::Instantiate().Enabled()) {
2412  CuTimer tim;
2413  dim3 dimGrid, dimBlock;
2414  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2415  &dimGrid, &dimBlock);
2416  cuda_apply_pow(dimGrid, dimBlock, data_, power, Dim());
2417  CU_SAFE_CALL(cudaGetLastError());
2418  CuDevice::Instantiate().AccuProfile(__func__, tim);
2419  } else
2420 #endif
2421  {
2422  Mat().ApplyPow(power);
2423  }
2424 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
::MatrixDim Dim() const
Definition: cu-matrix.h:211
void ApplyPowAbs ( Real  power,
bool  include_sign = false 
)

Apply power to the absolute value of each element.

If include_sign is true, the result will be multiplied with the sign of the input value. If the power is negative and the input to the power is zero, The output will be set zero. If include_sign is true, it will multiply the result by the sign of the input.

Definition at line 2427 of file cu-matrix.cc.

References data_.

Referenced by PowerComponent::Backprop(), PowerComponent::Propagate(), ClipGradientComponent::RepairGradients(), and kaldi::UnitTestCuMatrixApplyPowAbs().

2427  {
2428 #if HAVE_CUDA == 1
2429  if (CuDevice::Instantiate().Enabled()) {
2430  CuTimer tim;
2431  dim3 dimGrid, dimBlock;
2432  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2433  &dimGrid, &dimBlock);
2434  cuda_apply_pow_abs(dimGrid, dimBlock, data_, power, include_sign, Dim());
2435  CU_SAFE_CALL(cudaGetLastError());
2436  CuDevice::Instantiate().AccuProfile(__func__, tim);
2437  } else
2438 #endif
2439  {
2440  Mat().ApplyPowAbs(power, include_sign);
2441  }
2442 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
::MatrixDim Dim() const
Definition: cu-matrix.h:211
void ApplySoftMaxPerRow ( const CuMatrixBase< Real > &  src)

Softmax nonlinearity Y = Softmax(X) : Yij = e^Xij / sum_k(e^Xik), done to each row, with attention to avoiding overflow or underflow.

Supports in-place operation (i.e. this == &src).

Definition at line 1701 of file cu-matrix.cc.

References MatrixBase< Real >::CopyFromMat(), CU1DBLOCK, data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::num_rows_, MatrixBase< Real >::NumRows(), MatrixBase< Real >::Row(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by kaldi::nnet3::attention::AttentionForward(), SoftmaxComponent::Propagate(), Softmax::PropagateFnc(), HiddenSoftmax::PropagateFnc(), BlockSoftmax::PropagateFnc(), kaldi::TestCuMatrixSoftmax(), and kaldi::UnitTestCuSoftmax().

1701  {
1702  KALDI_ASSERT(SameDim(*this, src));
1703 #if HAVE_CUDA == 1
1704  if (CuDevice::Instantiate().Enabled()) {
1705  CuTimer tim;
1706  size_t dimBlock = CU1DBLOCK;
1707  size_t dimGrid = src.num_rows_;
1708  cuda_softmax_reduce(dimGrid, dimBlock, data_, src.data_, Dim(), src.Stride());
1709  CU_SAFE_CALL(cudaGetLastError());
1710 
1711  CuDevice::Instantiate().AccuProfile(__func__, tim);
1712  } else
1713  #endif
1714  {
1715  MatrixBase<Real> &mat(this->Mat());
1716  mat.CopyFromMat(src.Mat());
1717  for(MatrixIndexT r = 0; r < mat.NumRows(); r++) {
1718  mat.Row(r).ApplySoftMax();
1719  }
1720  }
1721 }
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
int32 MatrixIndexT
Definition: matrix-common.h:98
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
bool ApproxEqual ( const CuMatrixBase< Real > &  other,
float  tol = 0.01 
) const

True if ((*this)-other).FrobeniusNorm() <= tol * this->FrobeniusNorm()

Definition at line 2120 of file cu-matrix.cc.

References CuMatrixBase< Real >::AddMat(), and CuMatrixBase< Real >::FrobeniusNorm().

Referenced by kaldi::UnitTestCuCholesky(), and kaldi::UnitTestCuCopy().

2121  {
2122  CuMatrix<Real> diff(*this);
2123  diff.AddMat(-1.0, other);
2124  return (diff.FrobeniusNorm() <= tol * (*this).FrobeniusNorm());
2125 }
void Cholesky ( CuMatrixBase< Real > *  inv_cholesky = NULL)

This function does sets *this to the Cholesky factor of *this (i.e.

the C satisfying *this = C C^T), and sets "inv_cholesky" (if supplied) to its inverse. *this is treated as a symmetric matrix but only the lower triangle is accessed.

Definition at line 1971 of file cu-matrix.cc.

References TpMatrix< Real >::Cholesky(), CuMatrixBase< Real >::Cholesky(), CuSpMatrix< Real >::CopyFromMat(), CuTpMatrix< Real >::CopyFromTp(), CuMatrixBase< Real >::CopyFromTp(), TpMatrix< Real >::Invert(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTakeLower, kaldi::kTrans, kaldi::kUndefined, and CuMatrixBase< Real >::SymAddMat2().

Referenced by CuTpMatrix< Real >::Cholesky(), CuMatrixBase< Real >::Cholesky(), kaldi::TestCuMatrixCholesky(), kaldi::UnitTestCholesky(), and kaldi::UnitTestCuCholesky().

1971  {
1972  KALDI_ASSERT(this->NumRows() == this->NumCols());
1973  const int32 block_size = 64; // We can tune this.
1974 #if HAVE_CUDA == 1
1975  bool have_gpu = CuDevice::Instantiate().Enabled();
1976 #else
1977  bool have_gpu = false;
1978 #endif
1979  if (this->NumRows() == 0) {
1980  return;
1981  }
1982  if (inv_cholesky == NULL && this->NumRows() >= block_size * 2 && have_gpu) {
1983  // Even if the user did not request the inverse Cholesky, for large enough
1984  // matrices (on GPUs) it's going to be more efficient to compute it anyway
1985  // as the recursion depends on it.
1986  CuMatrix<Real> inv(this->NumRows(), this->NumCols());
1987  Cholesky(&inv);
1988  return;
1989  }
1990  if (this->NumRows() <= block_size || inv_cholesky == NULL || !have_gpu) {
1991  // Don't recurse: compute the Cholesky (and inverse Cholesky, if requested)
1992  // directly, on the CPu.
1993  int32 dim = this->NumRows();
1994  CuSpMatrix<Real> this_sp(dim, kUndefined);
1995  this_sp.CopyFromMat(*this, kTakeLower);
1996  SpMatrix<Real> this_sp_cpu(this_sp);
1997  TpMatrix<Real> C_cpu(dim);
1998  C_cpu.Cholesky(this_sp_cpu);
1999  CuTpMatrix<Real> C(C_cpu);
2000  this->CopyFromTp(C);
2001  if (inv_cholesky != NULL) {
2002  C_cpu.Invert(); // Get inverse Cholesky on CPU.
2003  C.CopyFromTp(C_cpu);
2004  inv_cholesky->CopyFromTp(C); // Copy inverse Cholesky from CPU.
2005  }
2006  return;
2007  }
2008  // At this point, if none of the other cases apply, we recurse.
2009 
2010  // The selection of dim1 is a heuristic. We could also just take half.
2011  int32 tot_dim = this->NumRows();
2012  int32 dim1;
2013  // Break it up into a whole number of blocks, for better memory alignment.
2014  // The line below, setting dim1 can be decided on a heuristic basis: from
2015  // the point of view of correctness, it can really be any value
2016  // 0 < dim1 < tot_dim.
2017  dim1 = block_size * std::max<int32>(1, tot_dim / (2 * block_size));
2018 
2019  int32 dim2 = tot_dim - dim1;
2020  CuSubMatrix<Real> this_11(*this, 0, dim1, 0, dim1),
2021  this_12(*this, 0, dim1, dim1, dim2),
2022  this_21(*this, dim1, dim2, 0, dim1),
2023  this_22(*this, dim1, dim2, dim1, dim2);
2024  CuSubMatrix<Real> inv_11(*inv_cholesky, 0, dim1, 0, dim1),
2025  inv_12(*inv_cholesky, 0, dim1, dim1, dim2),
2026  inv_21(*inv_cholesky, dim1, dim2, 0, dim1),
2027  inv_22(*inv_cholesky, dim1, dim2, dim1, dim2);
2028  /*
2029  Here is the math on block-wise Cholesky. We'll use a Matlab-like notation for blocks of a matrix,
2030  e.g. [ A B; C D ], and also for transposes, e.g. A' is the transpose of A.
2031  Let A be the input matrix; we want to compute both its Cholesky L and its inverse Cholesky, which
2032  we'll call M.
2033  OK. let L = [ L11 0; L21 L22 ] be the Cholesky factor of A.
2034  We have A = L L' = [ L11 0; L21 L22 ] * [ L11' L21'; 0 L22' ]. Multiplying it out,
2035  if A = [ A11 A12; A21 A22 ]; then
2036  A11 = L11 L11', A21 = L21 L11', A22 = L21 L21' + L22 L22', and A12 = A21'.
2037 
2038  We also want an expression for the inverse of L (we call this M).
2039  If M = [ M11 0; M21 M22 ], then it's not hard to see that
2040  M11 = inv(L11), M22 = inv(L22).
2041  We can work out M21 as follows. We know that [ L11 0; L21 L22 ] [ M11 0; M21 M22 ] = [ I 0; 0 I ].
2042  Considering the zero on the bottom of the rhs, we have: L21 M11 + L22 M21 = 0, which gives us:
2043  M21 = - L22^{-1} L21 M11 = - M22 L21 M11.
2044 
2045  Next, we want expressions for L21 and L22. From the equation A21 = L21 L11', we have:
2046  L21 = A21 inv(L11') = A21 M11'
2047  We can compute L22 and M22 recursively by doing Cholesky (and computing the inverse Cholesky)
2048  on the quantity T = (A22 - L21 L21'). [we give it the name T just for easy reference.]
2049 
2050  Computationally, we do this as follows:
2051  (1) Recurse to get L11 and M11.
2052  (2) Compute L21 = A21 M11'
2053  (3) Compute T = A22 - L21 L21'
2054  (4) Recurse on T to get L22 and M22.
2055  (5) Compute M21 = -M22 L21 M11.
2056  Next, we have to consider the in-place nature of the computation, since L overwrites A
2057  [M has its own storage, in "inv_cholesky"].
2058  We address this here:
2059  (1) is in-place [L11 replaces A11, M11 has its own storage].
2060  (2) L21 gets written where M21 belongs.
2061  (3) T replaces A22.
2062  (4) is in-place [L22 replaces T where A22 was, M22 has its own storage]
2063  (5):(a) we first compute the transpose of (L21 M11) is done in the upper part of A/L,
2064  where A12 or L12 would be. Define a temporary expression
2065  U = (L21 M11)' = M11' L21'; this goes where A12 or L12 would be.
2066  (b) copy L21 to where it should be, in *this.
2067  (c) Compute M21 = -M22 U', in the correct place for M21.
2068  (d) zero L12 and M12. */
2069 
2070  // (1) compute L11 and M11.
2071  this_11.Cholesky(&inv_11);
2072  // (2) compute L21 = A21 M11'. For now it's in the "wrong place", where M21 should be.
2073  inv_21.AddMatMat(1.0, this_21, kNoTrans, inv_11, kTrans, 0.0);
2074  // (3) compute T = A22 - L21 L21'. Note: only the lower triangle of T will be valid, but
2075  // that's OK because Cholesky will ignore the upper part.
2076  this_22.SymAddMat2(-1.0, inv_21, kNoTrans, 1.0);
2077  // (4) Recurse to compute L22 and M22.
2078  this_22.Cholesky(&inv_22);
2079  // (5)(a) compute U = M11' L21'. We use the storage of this_12 for this. Note that L21 is
2080  // currently where M21 should be.
2081  this_12.AddMatMat(1.0, inv_11, kTrans, inv_21, kTrans, 0.0);
2082  // (5)(b) copy L21 to where it should be.
2083  this_21.CopyFromMat(inv_21);
2084  // (5)(c) compute M21 = -M22 U'.
2085  inv_21.AddMatMat(-1.0, inv_22, kNoTrans, this_12, kTrans, 0.0);
2086  // (5)(d) zero L12 and M12.
2087  this_12.SetZero();
2088  inv_12.SetZero();
2089 }
friend class CuSubMatrix< Real >
Definition: cu-matrix.h:89
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
void Cholesky(CuMatrixBase< Real > *inv_cholesky=NULL)
This function does sets *this to the Cholesky factor of *this (i.e.
Definition: cu-matrix.cc:1971
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
void CopyFromTp(const CuTpMatrix< OtherReal > &M, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:277
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
friend class CuSpMatrix< Real >
Definition: cu-matrix.h:85
CuSubMatrix<Real> ColRange ( const MatrixIndexT  col_offset,
const MatrixIndexT  num_cols 
) const
inline

Definition at line 592 of file cu-matrix.h.

Referenced by StatisticsExtractionComponent::Backprop(), StatisticsPoolingComponent::Backprop(), MaxpoolingComponent::Backprop(), BlockAffineComponent::Backprop(), Convolutional1dComponent::Backprop(), ConvolutionComponent::Backprop(), MaxPoolingComponent::BackpropagateFnc(), AveragePoolingComponent::BackpropagateFnc(), AveragePooling2DComponent::BackpropagateFnc(), MaxPooling2DComponent::BackpropagateFnc(), BlockSoftmax::BackpropagateFnc(), ParallelComponent::BackpropagateFnc(), SentenceAveragingComponent::BackpropagateFnc(), Convolutional2DComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), ConvolutionalComponent::BackpropagateFnc(), LstmProjected::BackpropagateFnc(), BlstmProjected::BackpropagateFnc(), MultiTaskLoss::Eval(), LstmProjected::InfoGradient(), BlstmProjected::InfoGradient(), StatisticsExtractionComponent::Propagate(), StatisticsPoolingComponent::Propagate(), MaxpoolingComponent::Propagate(), BlockAffineComponent::Propagate(), Convolutional1dComponent::Propagate(), ConvolutionComponent::Propagate(), AveragePoolingComponent::PropagateFnc(), MaxPoolingComponent::PropagateFnc(), AveragePooling2DComponent::PropagateFnc(), MaxPooling2DComponent::PropagateFnc(), BlockSoftmax::PropagateFnc(), FramePoolingComponent::PropagateFnc(), ParallelComponent::PropagateFnc(), SentenceAveragingComponent::PropagateFnc(), ConvolutionalComponent::PropagateFnc(), Convolutional2DComponent::PropagateFnc(), MultiBasisComponent::PropagateFnc(), LstmProjected::PropagateFnc(), BlstmProjected::PropagateFnc(), kaldi::UnitTestLstmNonlinearity(), FramePoolingComponent::Update(), SentenceAveragingComponent::Update(), ConvolutionalComponent::Update(), Convolutional2DComponent::Update(), NaturalGradientRepeatedAffineComponent::Update(), Convolutional1dComponent::Update(), and ConvolutionComponent::Update().

593  {
594  return CuSubMatrix<Real>(*this, 0, num_rows_, col_offset, num_cols);
595  }
friend class CuSubMatrix< Real >
Definition: cu-matrix.h:89
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void CopyColFromVec ( const CuVectorBase< Real > &  v,
const MatrixIndexT  col 
)

Copy vector into specific column of matrix.

Definition at line 2388 of file cu-matrix.cc.

References CuVectorBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), KALDI_ASSERT, and CuVectorBase< Real >::Vec().

Referenced by kaldi::cu::NormalizePerRow(), StatisticsExtractionComponent::Propagate(), DropoutMaskComponent::Propagate(), NaturalGradientRepeatedAffineComponent::Update(), and TimeHeightConvolutionComponent::UpdateNaturalGradient().

2389  {
2390  KALDI_ASSERT(v.Dim() == num_rows_ &&
2391  static_cast<UnsignedMatrixIndexT>(col) <
2392  static_cast<UnsignedMatrixIndexT>(num_cols_));
2393 #if HAVE_CUDA == 1
2394  if (CuDevice::Instantiate().Enabled()) {
2395  CuTimer tim;
2396  cublas_copy(GetCublasHandle(),
2397  v.Dim(), v.Data(), 1,
2398  this->data_ + col, this->stride_);
2399  CU_SAFE_CALL(cudaGetLastError());
2400  CuDevice::Instantiate().AccuProfile(__func__, tim);
2401  } else
2402 #endif
2403  {
2404  Mat().CopyColFromVec(v.Vec(), col);
2405  }
2406 }
uint32 UnsignedMatrixIndexT
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT stride_
Definition: cu-matrix.h:714
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void CopyCols ( const CuMatrixBase< Real > &  src,
const CuArrayBase< MatrixIndexT > &  indexes 
)

Copies column r from column indexes[r] of src.

As a special case, if indexes[i] == -1, sets column i to zero indexes.size() must equal this->NumCols(), and src.NumRows() must equal this.NumRows()

Definition at line 2592 of file cu-matrix.cc.

References CuArrayBase< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuArrayBase< T >::Dim(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by SumGroupComponent::Backprop(), PermuteComponent::Backprop(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParamsInternal(), kaldi::nnet3::time_height_convolution::ConvolveForwardInternal(), ConvolutionComponent::InputToInputPatches(), MaxpoolingComponent::InputToInputPatches(), PermuteComponent::Propagate(), Convolutional1dComponent::Propagate(), ConvolutionalComponent::PropagateFnc(), and Convolutional1dComponent::Update().

2593  {
2594 #if HAVE_CUDA == 1
2595  if (CuDevice::Instantiate().Enabled()) {
2596  KALDI_ASSERT(indices.Dim() == NumCols());
2597  KALDI_ASSERT(NumRows() == src.NumRows());
2598  CuTimer tim;
2599  dim3 dimGrid, dimBlock;
2600  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2601  &dimGrid, &dimBlock);
2602  cuda_copy_cols(dimGrid, dimBlock, data_, src.Data(), indices.Data(), Dim(), src.Stride());
2603  CU_SAFE_CALL(cudaGetLastError());
2604  CuDevice::Instantiate().AccuProfile(__func__, tim);
2605  } else
2606 #endif
2607  {
2608  Mat().CopyCols(src.Mat(), indices.Data());
2609  }
2610 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void CopyColsFromVec ( const CuVectorBase< Real > &  v)

Copies vector into matrix, column-by-column.

Note that rv.Dim() must either equal NumRows()*NumCols() or NumRows(); this has two modes of operation.

Definition at line 2350 of file cu-matrix.cc.

References MatrixDim_::cols, CU1DBLOCK, CuVectorBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), KALDI_ERR, MatrixDim_::rows, and CuVectorBase< Real >::Vec().

Referenced by DropoutComponent::Propagate(), and kaldi::UnitTestCuMatrixCopyColsFromVec().

2350  {
2351 #if HAVE_CUDA == 1
2352  if (CuDevice::Instantiate().Enabled()) {
2353  CuTimer tim;
2354  if (rv.Dim() == num_rows_ * num_cols_) {
2355  // treat rv as a matrix of the size (num_cols x num_rows_)
2356  // and use transposed copy to fill *this
2357  // see CuMatrixBase<Real>::CopyFromMat() for more detail of the impl
2358  MatrixDim rv_dim = { num_cols_, num_rows_, num_rows_ };
2359  const int32 warpSize = 32;
2360  dim3 dimBlock(warpSize, CU1DBLOCK / warpSize);
2361  dim3 dimGrid(n_blocks(rv_dim.cols, warpSize),
2362  n_blocks(rv_dim.rows, warpSize));
2363  cuda_copy_from_mat_trans(dimGrid, dimBlock, data_, rv.Data(), Dim(),
2364  rv_dim);
2365  CU_SAFE_CALL(cudaGetLastError());
2366  } else if (rv.Dim() == num_rows_) {
2367  // use 2D block (8x32) and large enough grid to cover matrix *this
2368  // dimBlock.x need to be at least warpSize for coalesced memory access.
2369  const int32 warpSize = 32;
2370  dim3 dimBlock(warpSize, CU1DBLOCK / warpSize);
2371  dim3 dimGrid(n_blocks(num_cols_, dimBlock.x),
2372  n_blocks(num_rows_, dimBlock.y));
2373  cuda_copy_cols_from_vec(dimGrid, dimBlock, Data(), Dim(), rv.Data());
2374  CU_SAFE_CALL(cudaGetLastError());
2375  } else {
2376  KALDI_ERR<< "Wrong sized arguments";
2377  }
2378  CuDevice::Instantiate().AccuProfile(__func__, tim);
2379  } else
2380 #endif
2381  {
2382  Mat().CopyColsFromVec(rv.Vec());
2383  }
2384 }
int32_cuda rows
Definition: cu-matrixdim.h:53
Structure containing size of the matrix plus stride.
Definition: cu-matrixdim.h:52
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define KALDI_ERR
Definition: kaldi-error.h:127
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
int32_cuda cols
Definition: cu-matrixdim.h:54
::MatrixDim Dim() const
Definition: cu-matrix.h:211
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
const Real * Data() const
Return data pointer (const).
Definition: cu-matrix.h:673
void CopyFromBlock ( const CuBlockMatrix< Real > &  B,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 160 of file cu-matrix.cc.

References CuBlockMatrix< Real >::Block(), CuMatrixBase< Real >::CopyFromMat(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuBlockMatrix< Real >::NumBlocks(), CuBlockMatrix< Real >::NumCols(), CuMatrixBase< Real >::NumCols(), CuBlockMatrix< Real >::NumRows(), and CuMatrixBase< Real >::NumRows().

Referenced by CuMatrix< Real >::CuMatrix().

161  {
162  this->SetZero();
163  if (trans == kNoTrans) {
164  KALDI_ASSERT(NumRows() == B.NumRows() && NumCols() == B.NumCols());
165  int32 row_offset = 0, col_offset = 0;
166  for (int32 b = 0; b < B.NumBlocks(); b++) {
167  const CuMatrixBase<Real> &block = B.Block(b);
168  int32 num_rows = block.NumRows(), num_cols = block.NumCols();
169  CuSubMatrix<Real> this_block(*this, row_offset, num_rows,
170  col_offset, num_cols);
171  this_block.CopyFromMat(block);
172  row_offset += num_rows;
173  col_offset += num_cols;
174  }
175  KALDI_ASSERT(row_offset == NumRows() && col_offset == NumCols());
176  } else {
177  KALDI_ASSERT(NumRows() == B.NumCols() && NumCols() == B.NumRows());
178  int32 row_offset = 0, col_offset = 0;
179  for (int32 b = 0; b < B.NumBlocks(); b++) {
180  const CuMatrixBase<Real> &block = B.Block(b);
181  int32 num_rows = block.NumCols(), num_cols = block.NumRows();
182  CuSubMatrix<Real> this_block(*this, row_offset, num_rows,
183  col_offset, num_cols);
184  this_block.CopyFromMat(block, kTrans);
185  row_offset += num_rows;
186  col_offset += num_cols;
187  }
188  KALDI_ASSERT(row_offset == NumRows() && col_offset == NumCols());
189  }
190 }
friend class CuSubMatrix< Real >
Definition: cu-matrix.h:89
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
void SetZero()
Math operations, some calling kernels.
Definition: cu-matrix.cc:476
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void CopyFromGeneralMat ( const GeneralMatrix src,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 2997 of file cu-matrix.cc.

References CuSparseMatrix< Real >::CopyToMat(), SparseMatrix< Real >::CopyToMat(), GeneralMatrix::GetFullMatrix(), GeneralMatrix::GetMatrix(), GeneralMatrix::GetSparseMatrix(), KALDI_ERR, kaldi::kCompressedMatrix, kaldi::kFullMatrix, kaldi::kSparseMatrix, and GeneralMatrix::Type().

Referenced by NnetComputer::AcceptInputs(), and kaldi::nnet3::ComputeObjectiveFunction().

2998  {
2999  switch (src.Type()) {
3000  case kFullMatrix: {
3001  const Matrix<BaseFloat> &src_full_mat = src.GetFullMatrix();
3002  this->CopyFromMat(src_full_mat, trans);
3003  return;
3004  }
3005  case kCompressedMatrix: {
3006  Matrix<BaseFloat> mat;
3007  src.GetMatrix(&mat);
3008  this->CopyFromMat(mat, trans);
3009  return;
3010  }
3011  case kSparseMatrix: {
3012  const SparseMatrix<BaseFloat> &smat = src.GetSparseMatrix();
3013 #if HAVE_CUDA == 1
3014  if (CuDevice::Instantiate().Enabled()) {
3015  // only take this branch if we're actually using CUDA, or it would
3016  // entail a wasteful copy of the sparse matrix.
3017  CuSparseMatrix<BaseFloat> cu_smat(smat);
3018  cu_smat.CopyToMat(this, trans);
3019  return;
3020  }
3021 #endif
3022  smat.CopyToMat(&(Mat()), trans);
3023  return;
3024  }
3025  default:
3026  KALDI_ERR << "Invalid GeneralMatrix type.";
3027  }
3028 }
void CopyFromMat(const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:339
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
#define KALDI_ERR
Definition: kaldi-error.h:127
void CopyFromMat ( const MatrixBase< OtherReal > &  src,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 339 of file cu-matrix.cc.

Referenced by kaldi::nnet3::attention::AttentionForward(), ElementwiseProductComponent::Backprop(), BatchNormComponent::Backprop(), BackpropTruncationComponent::Backprop(), TanhComponent::Backprop(), PowerComponent::Backprop(), RectifiedLinearComponent::Backprop(), ScaleComponent::Backprop(), FixedScaleComponent::Backprop(), FixedBiasComponent::Backprop(), NoOpComponent::Backprop(), ClipGradientComponent::Backprop(), PerElementScaleComponent::Backprop(), PerElementOffsetComponent::Backprop(), Softmax::BackpropagateFnc(), HiddenSoftmax::BackpropagateFnc(), BlockSoftmax::BackpropagateFnc(), ParallelComponent::BackpropagateFnc(), SentenceAveragingComponent::BackpropagateFnc(), LengthNormComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), Dropout::BackpropagateFnc(), AddShift::BackpropagateFnc(), Rescale::BackpropagateFnc(), ScaleAndOffsetComponent::BackpropInternal(), BlockAffineComponent::BlockAffineComponent(), NnetOnlineComputer::Compute(), kaldi::nnet3::ConstrainOrthonormal(), kaldi::nnet3::time_height_convolution::ConvolveForwardSimple(), CuMatrixBase< Real >::CopyFromBlock(), CuBlockMatrix< Real >::CopyFromMat(), GeneralMatrix::CopyToMat(), CuMatrix< Real >::CuMatrix(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), CuMatrixBase< Real >::DiffSoftmaxPerRow(), NnetComputer::ExecuteCommand(), NnetRescaler::FormatInput(), kaldi::nnet3::attention::GetAttentionDotProducts(), OnlinePreconditioner::Init(), AffineComponent::Init(), AffineComponentPreconditioned::Init(), AffineComponentPreconditionedOnline::Init(), Convolutional1dComponent::Init(), ConvolutionComponent::Init(), NaturalGradientAffineComponent::InitFromConfig(), main(), kaldi::nnet2::NnetComputation(), kaldi::cu::NormalizePerRow(), CuMatrix< BaseFloat >::operator=(), kaldi::nnet2::PreconditionDirections(), OnlinePreconditionerSimple::PreconditionDirections(), OnlineNaturalGradientSimple::PreconditionDirections(), kaldi::nnet2::PreconditionDirectionsAlphaRescaled(), DropoutComponent::Propagate(), ElementwiseProductComponent::Propagate(), BatchNormComponent::Propagate(), BackpropTruncationComponent::Propagate(), PowerComponent::Propagate(), RectifiedLinearComponent::Propagate(), ScaleComponent::Propagate(), SpliceMaxComponent::Propagate(), NoOpComponent::Propagate(), ClipGradientComponent::Propagate(), PerElementScaleComponent::Propagate(), FixedScaleComponent::Propagate(), PerElementOffsetComponent::Propagate(), FixedBiasComponent::Propagate(), AdditiveNoiseComponent::Propagate(), KlHmm::PropagateFnc(), ParallelComponent::PropagateFnc(), LengthNormComponent::PropagateFnc(), Dropout::PropagateFnc(), LstmProjected::PropagateFnc(), AddShift::PropagateFnc(), Rescale::PropagateFnc(), BlstmProjected::PropagateFnc(), ScaleAndOffsetComponent::PropagateInternal(), kaldi::nnet1::RandGauss(), CuRand< Real >::RandGaussian(), CuRand< Real >::RandUniform(), kaldi::nnet1::RandUniform(), OnlinePreconditioner::ReorthogonalizeXt1(), OnlineNaturalGradient::ReorthogonalizeXt1(), LinearTransform::SetLinearity(), AffineTransform::SetLinearity(), kaldi::TestCuFindRowMaxId(), kaldi::TestCuMatrixTransposeCross(), kaldi::nnet3::TestSimpleComponentPropagateProperties(), kaldi::TestSymInvertPosDef(), kaldi::UnitInvert(), kaldi::UnitTestCheck(), kaldi::UnitTestCholesky(), kaldi::UnitTestConstructor(), kaldi::UnitTestCopyFromMat(), kaldi::UnitTestCopySp(), kaldi::UnitTestCuCopy(), kaldi::UnitTestCuDiffLogSoftmax(), kaldi::UnitTestCuDiffNormalizePerRow(), kaldi::UnitTestCuDiffSigmoid(), kaldi::UnitTestCuDiffSoftmax(), kaldi::UnitTestCuDiffTanh(), kaldi::UnitTestCuDiffXent(), kaldi::UnitTestCuFindRowMaxId(), kaldi::UnitTestCuLogSoftmax(), kaldi::UnitTestCuMathNormalizePerRow(), kaldi::UnitTestCuMatrixAddMat(), kaldi::UnitTestCuMatrixAddMatDiagVec(), kaldi::UnitTestCuMatrixAddMatMat(), kaldi::UnitTestCuMatrixAddMatMatBatched(), kaldi::UnitTestCuMatrixAddMatMatElements(), kaldi::UnitTestCuMatrixAddVecToCols(), kaldi::UnitTestCuMatrixAddVecToRows(), kaldi::UnitTestCuMatrixCopyCross(), kaldi::UnitTestCuMatrixCopyCross2(), kaldi::UnitTestCuMatrixCopyFromMat(), kaldi::UnitTestCuMatrixDiffGroupPnorm(), kaldi::UnitTestCuMatrixDivElements(), kaldi::UnitTestCuMatrixDivRowsVec(), kaldi::UnitTestCuMatrixGroupMaxDeriv(), kaldi::UnitTestCuMatrixInvertElements(), kaldi::UnitTestCuMatrixMax(), kaldi::UnitTestCuMatrixMin(), kaldi::UnitTestCuMatrixMulColsVec(), kaldi::UnitTestCuMatrixMulElements(), kaldi::UnitTestCuMatrixMulRowsGroupMat(), kaldi::UnitTestCuMatrixMulRowsVec(), kaldi::UnitTestCuSigmoid(), kaldi::UnitTestCuSoftmax(), kaldi::UnitTestCuTanh(), kaldi::UnitTestCuVectorAddColSumMat(), kaldi::UnitTestCuVectorAddColSumMatLarge(), kaldi::UnitTestCuVectorAddRowSumMat(), kaldi::UnitTestCuVectorAddRowSumMatLarge(), kaldi::UnitTestInvert(), kaldi::UnitTestSwapCu2Cu(), kaldi::UnitTestSwapCu2M(), BlockAffineComponentPreconditioned::Update(), and MatrixRandomizer::Value().

340  {
341  CuMatrix<OtherReal> temp(src);
342  this->CopyFromMat(temp, trans);
343 }
void CopyFromMat(const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:339
void CopyFromMat ( const MatrixBase< Real > &  src,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 311 of file cu-matrix.cc.

References MatrixBase< Real >::Data(), data_, KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, MatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), and MatrixBase< Real >::Stride().

312  {
313 #if HAVE_CUDA == 1
314  if (CuDevice::Instantiate().Enabled()) {
315  if (trans == kNoTrans) {
316  KALDI_ASSERT(src.NumRows() == num_rows_ && src.NumCols() == num_cols_);
317  CuTimer tim;
318 
319  MatrixIndexT dst_pitch = stride_*sizeof(Real);
320  MatrixIndexT src_pitch = src.Stride()*sizeof(Real);
321  MatrixIndexT width = src.NumCols()*sizeof(Real);
322  CU_SAFE_CALL(cudaMemcpy2D(data_, dst_pitch, src.Data(), src_pitch,
323  width, src.NumRows(), cudaMemcpyHostToDevice));
324 
325  CuDevice::Instantiate().AccuProfile("CuMatrixBase::CopyFromMat(from CPU)", tim);
326  } else {
327  CuMatrix<Real> trans_mat(src); // Do the transpose on the GPU board.
328  this->CopyFromMat(trans_mat, kTrans);
329  }
330  } else
331 #endif
332  {
333  Mat().CopyFromMat(src, trans);
334  }
335 }
void CopyFromMat(const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:339
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
int32 MatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT stride_
Definition: cu-matrix.h:714
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void CopyFromMat ( const CuMatrixBase< OtherReal > &  M,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 207 of file cu-matrix.cc.

References CU1DBLOCK, CuMatrixBase< Real >::Data(), data_, CuMatrixBase< Real >::data_, CuMatrixBase< Real >::Dim(), KALDI_ASSERT, kaldi::kNoTrans, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::num_rows_, CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

208  {
209  if (sizeof(Real) == sizeof(OtherReal) &&
210  static_cast<const void*>(M.Data()) ==
211  static_cast<const void*>(this->Data())) {
212  if (M.Data() == NULL)
213  return;
214  // CopyFromMat called on same data. Nothing to do (except sanity checks)
215  KALDI_ASSERT(trans == kNoTrans && M.NumRows() == NumRows() &&
216  M.NumCols() == NumCols() && M.Stride() == Stride());
217  return;
218  }
219 #if HAVE_CUDA == 1
220  if (CuDevice::Instantiate().Enabled()) {
221  if (trans == kNoTrans) {
222  KALDI_ASSERT(M.NumRows() == num_rows_ && M.NumCols() == num_cols_);
223  } else {
224  KALDI_ASSERT(M.NumCols() == num_rows_ && M.NumRows() == num_cols_);
225  }
226  if (M.num_rows_ == 0) return; // Nothing to do.
227  CuTimer tim;
228  if (sizeof(Real) == sizeof(OtherReal) && trans == kNoTrans ) {
229  MatrixIndexT dst_pitch = stride_ * sizeof(Real);
230  MatrixIndexT src_pitch = M.Stride() * sizeof(Real);
231  MatrixIndexT width = M.NumCols() * sizeof(Real);
232  CU_SAFE_CALL(cudaMemcpy2D(data_, dst_pitch, M.data_, src_pitch,
233  width, M.num_rows_, cudaMemcpyDeviceToDevice));
234  } else {
235  if (trans == kNoTrans) {
236  dim3 dimGrid, dimBlock;
237  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
238  &dimGrid, &dimBlock);
239  cuda_copy_from_mat(dimGrid, dimBlock, data_, M.data_, Dim(), M.Dim());
240  } else {
241  // 2D thread block with warps (blockDim.x) along the row-dim of input M.
242  // Each (8x32) thread block will transpose (32x32) data
243  const int32 warpSize = 32;
244  dim3 dimBlock(warpSize, CU1DBLOCK / warpSize);
245  dim3 dimGrid(n_blocks(M.NumCols(), warpSize),
246  n_blocks(M.NumRows(), warpSize));
247  cuda_copy_from_mat_trans(dimGrid, dimBlock, data_, M.data_, Dim(),
248  M.Dim());
249  }
250  CU_SAFE_CALL(cudaGetLastError());
251  }
252  CuDevice::Instantiate().AccuProfile("CuMatrixBase::CopyFromMat(from other CuMatrixBase)", tim);
253  } else
254 #endif
255  {
256  Mat().CopyFromMat(M.Mat(), trans);
257  }
258 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
int32 MatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT stride_
Definition: cu-matrix.h:714
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
MatrixIndexT Stride() const
Definition: cu-matrix.h:207
const Real * Data() const
Return data pointer (const).
Definition: cu-matrix.h:673
void CopyFromSp ( const CuSpMatrix< Real > &  M)

Definition at line 355 of file cu-matrix.cc.

References CU2DBLOCK, CuPackedMatrix< Real >::Data(), data_, KALDI_ASSERT, CuSpMatrix< Real >::Mat(), and CuPackedMatrix< Real >::NumRows().

Referenced by CuMatrix< BaseFloat >::CuMatrix(), CuSpMatrix< Real >::Invert(), and kaldi::TestCuMatrixCopyFromSp().

355  {
356  KALDI_ASSERT(num_rows_ == M.NumRows() && num_cols_ == num_rows_);
357  if (num_rows_ == 0)
358  return;
359 #if HAVE_CUDA == 1
360  if (CuDevice::Instantiate().Enabled()) {
361  CuTimer tim;
362  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
363  dim3 dimGrid(n_blocks(NumRows(), CU2DBLOCK),
364  n_blocks(NumRows(), CU2DBLOCK));
365  cuda_copy_from_sp(dimGrid, dimBlock, M.Data(), data_, Dim());
366  CuDevice::Instantiate().AccuProfile("CuMatrix::CopyFromSp", tim);
367  } else
368 #endif
369  {
370  Mat().CopyFromSp(M.Mat());
371  }
372 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
template void CopyFromTp ( const CuTpMatrix< OtherReal > &  M,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 277 of file cu-matrix.cc.

References CU2DBLOCK, CuPackedMatrix< Real >::Data(), data_, KALDI_ASSERT, kaldi::kNoTrans, CuTpMatrix< Real >::Mat(), and CuPackedMatrix< Real >::NumRows().

Referenced by CuMatrixBase< Real >::Cholesky(), CuMatrix< BaseFloat >::CuMatrix(), CuTpMatrix< Real >::Invert(), kaldi::TestCuMatrixCopyFromTp(), and kaldi::UnitTestCuMatrixCopyFromTp().

278  {
279  KALDI_ASSERT(num_rows_ == M.NumRows() && num_cols_ == num_rows_);
280  if (num_rows_ == 0)
281  return;
282 #if HAVE_CUDA == 1
283  if (CuDevice::Instantiate().Enabled()) {
284  CuTimer tim;
285  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
286  dim3 dimGrid(n_blocks(num_rows_, CU2DBLOCK),
287  n_blocks(num_rows_, CU2DBLOCK));
288  if (trans == kNoTrans) {
289  cuda_copy_from_tp(dimGrid, dimBlock, data_, M.Data(), Dim());
290  } else {
291  cuda_copy_from_tp_trans(dimGrid, dimBlock, data_, M.Data(), Dim());
292  }
293  CuDevice::Instantiate().AccuProfile(__func__, tim);
294  } else
295 #endif
296  {
297  Mat().CopyFromTp(M.Mat(), trans);
298  }
299 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void CopyLowerToUpper ( )

Definition at line 2870 of file cu-matrix.cc.

References CU2DBLOCK, data_, and KALDI_ASSERT.

Referenced by kaldi::nnet3::ConstrainOrthonormalInternal(), kaldi::nnet2::PreconditionDirections(), kaldi::TestCuMatrixCopyLowerToUpper(), kaldi::UnitTestCuCholesky(), and kaldi::UnitTestCuMatrixCopyLowerToUpper().

2870  {
2872  if (num_rows_ == 0) return;
2873 #if HAVE_CUDA == 1
2874  if (CuDevice::Instantiate().Enabled()) {
2875  CuTimer tim;
2876  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2877  int32 dim = num_rows_;
2878  dim3 dimGrid(n_blocks(dim, CU2DBLOCK),
2879  n_blocks(dim, CU2DBLOCK));
2880  cuda_copy_low_upp(dimGrid, dimBlock, data_, Dim());
2881  CU_SAFE_CALL(cudaGetLastError());
2882  CuDevice::Instantiate().AccuProfile(__func__, tim);
2883  } else
2884 #endif
2885  {
2886  Mat().CopyLowerToUpper();
2887  }
2888 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void CopyRows ( const CuMatrixBase< Real > &  src,
const CuArrayBase< MatrixIndexT > &  indexes 
)

Copies row r from row indexes[r] of src.

As a special case, if indexes[i] < 0, sets row i to zero. src.NumCols() must equal this.NumCols()

Definition at line 2614 of file cu-matrix.cc.

References CuArrayBase< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuArrayBase< T >::Dim(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by StatisticsExtractionComponent::Backprop(), SpliceComponent::Backprop(), NnetComputer::ExecuteCommand(), main(), DistributeComponent::Propagate(), and SpliceMaxComponent::Propagate().

2615  {
2616 #if HAVE_CUDA == 1
2617  if (CuDevice::Instantiate().Enabled()) {
2618  KALDI_ASSERT(static_cast<MatrixIndexT>(indices.Dim()) == NumRows());
2619  KALDI_ASSERT(NumCols() == src.NumCols());
2620 
2621  CuTimer tim;
2622  dim3 dimGrid, dimBlock;
2623  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2624  &dimGrid, &dimBlock);
2625  cuda_copy_rows(dimGrid, dimBlock, data_, src.Data(), indices.Data(),
2626  Dim(), src.Stride());
2627  CU_SAFE_CALL(cudaGetLastError());
2628  CuDevice::Instantiate().AccuProfile(__func__, tim);
2629  } else
2630 #endif
2631  {
2632  Mat().CopyRows(src.Mat(), indices.Data());
2633  }
2634 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void CopyRows ( const CuArrayBase< const Real * > &  src)

Copies row r of this matrix from an array of floats at the location given by src[r], where src[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (the point is: the data it points to should be on the GPU if we're using a GPU, and on a CPU otherwise).

src.size() must equal this.NumRows(), and if any src[r] is NULL then this.Row(r) will be set to zero.

Definition at line 2659 of file cu-matrix.cc.

References CU2DBLOCK, CuArrayBase< T >::Data(), data_, CuArrayBase< T >::Dim(), and KALDI_ASSERT.

2659  {
2660  if (NumRows() == 0) return;
2661 #if HAVE_CUDA == 1
2662  if (CuDevice::Instantiate().Enabled()) {
2663  KALDI_ASSERT(static_cast<MatrixIndexT>(src.Dim()) == NumRows());
2664  CuTimer tim;
2665  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2666  dim3 dimGrid(n_blocks(num_cols_, CU2DBLOCK),
2667  n_blocks(num_rows_, CU2DBLOCK));
2668  cuda_copy_rows(dimGrid, dimBlock, data_, src.Data(), Dim());
2669  CU_SAFE_CALL(cudaGetLastError());
2670  CuDevice::Instantiate().AccuProfile(__func__, tim);
2671  } else
2672 #endif
2673  {
2674  Mat().CopyRows(src.Data());
2675  }
2676 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void CopyRowsFromVec ( const CuVectorBase< Real > &  v)

This function has two modes of operation.

If v.Dim() == NumRows() * NumCols(), then treats the vector as a row-by-row concatenation of a matrix and copies to *this. if v.Dim() == NumCols(), it sets each row of *this to a copy of v.

Definition at line 2282 of file cu-matrix.cc.

References CuVectorBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), KALDI_ERR, and CuVectorBase< Real >::Vec().

Referenced by kaldi::CuVectorUnitTestCopyFromMat(), NnetOnlineComputer::Flush(), NnetRescaler::FormatInput(), TimeHeightConvolutionComponent::Propagate(), RepeatedAffineComponent::Propagate(), ConstantComponent::Propagate(), AffineComponent::Propagate(), FixedAffineComponent::Propagate(), BlockAffineComponent::Propagate(), ConstantFunctionComponent::Propagate(), LinearTransform::SetParams(), AffineTransform::SetParams(), RecurrentComponent::SetParams(), ConvolutionalComponent::SetParams(), Convolutional2DComponent::SetParams(), LstmProjected::SetParams(), BlstmProjected::SetParams(), kaldi::UnitTestCuMatrixCopyRowsFromVec(), TimeHeightConvolutionComponent::UnVectorize(), RepeatedAffineComponent::UnVectorize(), AffineComponent::UnVectorize(), LinearComponent::UnVectorize(), BlockAffineComponent::UnVectorize(), ConvolutionComponent::UnVectorize(), and LstmNonlinearityComponent::UnVectorize().

2282  {
2283 #if HAVE_CUDA == 1
2284  if (CuDevice::Instantiate().Enabled()) {
2285  CuTimer tim;
2286  if (v.Dim() == num_rows_*num_cols_) {
2287  if (stride_ == num_cols_) {
2288  const Real* v_data = v.Data();
2289  CU_SAFE_CALL(cudaMemcpy(data_, v_data,
2290  sizeof(Real)*num_rows_*num_cols_,
2291  cudaMemcpyDeviceToDevice));
2292  } else {
2293  CU_SAFE_CALL(cudaMemcpy2D(data_, stride_ * sizeof(Real), v.Data(),
2294  num_cols_*sizeof(Real), num_cols_*sizeof(Real),
2295  num_rows_,
2296  cudaMemcpyDeviceToDevice));
2297  }
2298  } else if (v.Dim() == num_cols_) {
2299  dim3 dimGrid, dimBlock;
2300  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2301  &dimGrid, &dimBlock);
2302  cuda_copy_rows_from_vec(dimGrid, dimBlock, data_, this->Dim(), v.Data());
2303  CU_SAFE_CALL(cudaGetLastError());
2304  } else {
2305  KALDI_ERR << "Wrong sized arguments";
2306  }
2307  CuDevice::Instantiate().AccuProfile(__func__, tim);
2308  } else
2309 #endif
2310  {
2311  Mat().CopyRowsFromVec(v.Vec());
2312  }
2313 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT stride_
Definition: cu-matrix.h:714
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define KALDI_ERR
Definition: kaldi-error.h:127
::MatrixDim Dim() const
Definition: cu-matrix.h:211
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void CopyRowsFromVec ( const VectorBase< Real > &  v)

Version of CopyRowsFromVec() that takes a CPU-based vector.

Definition at line 2316 of file cu-matrix.cc.

References VectorBase< Real >::Data(), data_, VectorBase< Real >::Dim(), and KALDI_ERR.

2316  {
2317 #if HAVE_CUDA == 1
2318  if (CuDevice::Instantiate().Enabled()) {
2319  CuTimer tim;
2320  if (v.Dim() == num_rows_*num_cols_) {
2321  if (stride_ == num_cols_) {
2322  const Real* v_data = v.Data();
2323  cudaMemcpy(data_, v_data, sizeof(Real)*num_rows_*num_cols_, cudaMemcpyHostToDevice);
2324  } else {
2325  const Real *v_data = v.Data();
2326  for (MatrixIndexT r = 0; r < num_rows_; r++) {
2327  Real *row_data = RowData(r);
2328  cudaMemcpy(row_data, v_data, sizeof(Real)*num_cols_, cudaMemcpyHostToDevice);
2329  v_data += num_cols_;
2330  }
2331  }
2332  } else if (v.Dim() == num_cols_) {
2333  dim3 dimGrid, dimBlock;
2334  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2335  &dimGrid, &dimBlock);
2336  cuda_copy_rows_from_vec(dimGrid, dimBlock, this->data_, this->Dim(), v.Data());
2337  CU_SAFE_CALL(cudaGetLastError());
2338  } else {
2339  KALDI_ERR << "Wrong sized arguments";
2340  }
2341  CuDevice::Instantiate().AccuProfile(__func__, tim);
2342  } else
2343 #endif
2344  {
2345  Mat().CopyRowsFromVec(v);
2346  }
2347 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
int32 MatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT stride_
Definition: cu-matrix.h:714
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define KALDI_ERR
Definition: kaldi-error.h:127
::MatrixDim Dim() const
Definition: cu-matrix.h:211
const Real * RowData(MatrixIndexT r) const
Get raw row pointer (const).
Definition: cu-matrix.h:667
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
template void CopyToMat ( MatrixBase< OtherReal > *  dst,
MatrixTransposeType  trans = kNoTrans 
) const

Definition at line 415 of file cu-matrix.cc.

References MatrixBase< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyToMat(), MatrixBase< Real >::Data(), data_, KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, MatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), and MatrixBase< Real >::Stride().

Referenced by NnetComputerFromEg::Compute(), CuMatrixBase< Real >::CopyToMat(), kaldi::nnet1::MomentStatistics(), KlHmm::PropagateFnc(), kaldi::UnitInvert(), kaldi::UnitTestCholesky(), kaldi::UnitTestCuDiffLogSoftmax(), kaldi::UnitTestCuDiffSigmoid(), kaldi::UnitTestCuDiffSoftmax(), kaldi::UnitTestCuDiffTanh(), kaldi::UnitTestCuDiffXent(), kaldi::UnitTestCuMatrixAddMat(), kaldi::UnitTestCuMatrixAddMatMat(), kaldi::UnitTestCuMatrixAddVecToCols(), kaldi::UnitTestCuMatrixAddVecToRows(), kaldi::UnitTestCuMatrixAddVecVec(), kaldi::UnitTestCuMatrixDiffGroupPnorm(), kaldi::UnitTestCuMatrixDivElements(), kaldi::UnitTestCuMatrixDivRowsVec(), kaldi::UnitTestCuMatrixGroupMaxDeriv(), kaldi::UnitTestCuMatrixInvertElements(), kaldi::UnitTestCuMatrixMax(), kaldi::UnitTestCuMatrixMin(), kaldi::UnitTestCuMatrixMulColsVec(), kaldi::UnitTestCuMatrixMulElements(), kaldi::UnitTestCuMatrixMulRowsGroupMat(), kaldi::UnitTestCuMatrixMulRowsVec(), kaldi::UnitTestCuSigmoid(), kaldi::UnitTestCuTanh(), kaldi::UnitTestInvert(), kaldi::UnitTestMatrix(), UnitTestMatrixRandomizer(), kaldi::UnitTestSetZeroAboveDiag(), kaldi::UnitTestSwapCu2Cu(), and kaldi::UnitTestSwapCu2M().

416  {
417 #if HAVE_CUDA == 1
418  if (CuDevice::Instantiate().Enabled()) {
419  if (trans == kTrans || sizeof(OtherReal) != sizeof(Real)) {
420  CuMatrix<OtherReal> this_trans(*this, trans);
421  this_trans.CopyToMat(dst, kNoTrans);
422  } else {
423  KALDI_ASSERT(dst->NumRows() == NumRows() && dst->NumCols() == NumCols());
424  if (num_rows_ == 0) return;
425  CuTimer tim;
426 
427  MatrixIndexT src_pitch = stride_*sizeof(Real);
428  MatrixIndexT dst_pitch = dst->Stride()*sizeof(Real);
429  MatrixIndexT width = NumCols()*sizeof(Real);
430  CU_SAFE_CALL(cudaMemcpy2D(dst->Data(), dst_pitch, this->data_, src_pitch,
431  width, this->num_rows_, cudaMemcpyDeviceToHost));
432 
433  CuDevice::Instantiate().AccuProfile("CuMatrix::CopyToMatD2H", tim);
434  }
435  } else
436  #endif
437  {
438  dst->CopyFromMat(Mat(), trans);
439  }
440 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
int32 MatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT stride_
Definition: cu-matrix.h:714
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void CopyToRows ( const CuArrayBase< Real * > &  dst) const

For each row r of this matrix, copies it to the array of floats at the location given by dst[r], where dst[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (i.e.

it should point to memory on the GPU if we're using a GPU, or on the CPU otherwise). If dst[r] is NULL, does not copy anywhere. Requires that none of the memory regions pointed to by the pointers in "dst" overlap (e.g. none of the pointers should be the same).

Definition at line 2680 of file cu-matrix.cc.

References CU2DBLOCK, CuArrayBase< T >::Data(), data_, CuArrayBase< T >::Dim(), and KALDI_ASSERT.

Referenced by DistributeComponent::Backprop(), NnetComputer::ExecuteCommand(), and kaldi::UnitTestCuMatrixCopyToRows().

2680  {
2681  if (NumRows() == 0) return;
2682 #if HAVE_CUDA == 1
2683  if (CuDevice::Instantiate().Enabled()) {
2684  KALDI_ASSERT(static_cast<MatrixIndexT>(dst.Dim()) == NumRows());
2685 
2686  CuTimer tim;
2687  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2688  dim3 dimGrid(n_blocks(num_cols_, CU2DBLOCK),
2689  n_blocks(num_rows_, CU2DBLOCK));
2690  cuda_copy_to_rows(dimGrid, dimBlock, dst.Data(), data_, Dim());
2691  CU_SAFE_CALL(cudaGetLastError());
2692  CuDevice::Instantiate().AccuProfile(__func__, tim);
2693  } else
2694 #endif
2695  {
2696  Mat().CopyToRows(dst.Data());
2697  }
2698 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void CopyUpperToLower ( )

Definition at line 2891 of file cu-matrix.cc.

References CU2DBLOCK, data_, and KALDI_ASSERT.

Referenced by kaldi::TestCuMatrixCopyUpperToLower(), and kaldi::UnitTestCuMatrixCopyUpperToLower().

2891  {
2893  if (num_rows_ == 0) return;
2894 #if HAVE_CUDA == 1
2895  if (CuDevice::Instantiate().Enabled()) {
2896  CuTimer tim;
2897  int32 dim = this->num_rows_;
2898  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2899  dim3 dimGrid(n_blocks(dim, CU2DBLOCK),
2900  n_blocks(dim, CU2DBLOCK));
2901  cuda_copy_upp_low(dimGrid, dimBlock, data_, Dim());
2902  CU_SAFE_CALL(cudaGetLastError());
2903  CuDevice::Instantiate().AccuProfile(__func__, tim);
2904  } else
2905 #endif
2906  {
2907  Mat().CopyUpperToLower();
2908  }
2909 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
const Real* Data ( ) const
inline

Return data pointer (const).

Warning: may return a pointer to GPU memory. Use at your own risk.

Definition at line 673 of file cu-matrix.h.

Referenced by CuMatrixBase< Real >::AddCols(), CuVectorBase< Real >::AddColSumMat(), CuVectorBase< Real >::AddDiagMatMat(), CuMatrixBase< Real >::AddDiagVecMat(), CuSpMatrix< Real >::AddMat2(), CuMatrixBase< Real >::AddMatBlock(), CuMatrixBase< Real >::AddMatDiagVec(), CuBlockMatrix< Real >::AddMatMat(), CuMatrixBase< Real >::AddMatMatElements(), CuMatrixBase< Real >::AddMatSmat(), CuVectorBase< Real >::AddMatVec(), CuMatrixBase< Real >::AddRowRanges(), CuMatrixBase< Real >::AddRows(), CuMatrixBase< Real >::AddSmatMat(), CuMatrixBase< Real >::AddToRows(), NormalizeComponent::Backprop(), BatchNormComponent::Backprop(), RepeatedAffineComponent::Backprop(), PerElementScaleComponent::Backprop(), PerElementOffsetComponent::Backprop(), ScaleAndOffsetComponent::Backprop(), ScaleAndOffsetComponent::BackpropInternal(), kaldi::cu::BackpropLstmNonlinearity(), CuMatrix< Real >::CompObjfAndDeriv(), DistributeComponent::ComputeInputPointers(), kaldi::cu::ComputeLstmNonlinearity(), kaldi::nnet3::time_height_convolution::ConvolveBackwardData(), kaldi::nnet3::time_height_convolution::ConvolveBackwardDataInternal(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParams(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParamsInternal(), kaldi::nnet3::time_height_convolution::ConvolveForward(), kaldi::nnet3::time_height_convolution::ConvolveForwardInternal(), kaldi::cu::Copy(), CuVectorBase< Real >::CopyColFromMat(), CuMatrixBase< Real >::CopyCols(), CuVectorBase< Real >::CopyDiagFromMat(), CuVectorBase< Real >::CopyElements(), CuTpMatrix< Real >::CopyFromMat(), CuCompressedMatrix< I >::CopyFromMat(), CuSpMatrix< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyRows(), CuVectorBase< Real >::CopyRowsFromMat(), VectorBase< Real >::CopyRowsFromMat(), CuSparseMatrix< Real >::CopyToMat(), CuCompressedMatrix< I >::CopyToMat(), CuMatrixBase< Real >::DiffGroupPnorm(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), kaldi::cu::DiffNormalizePerRow(), CuMatrixBase< Real >::DiffSoftmaxPerRow(), kaldi::cu::EnsureNonzero(), CuMatrixBase< Real >::EqualElementMask(), NnetComputer::GetPointers(), CuMatrixBase< Real >::GroupMaxDeriv(), CuTpMatrix< Real >::Invert(), kaldi::cu::NormalizePerRow(), NormalizeComponent::Propagate(), BatchNormComponent::Propagate(), TimeHeightConvolutionComponent::Propagate(), RepeatedAffineComponent::Propagate(), PerElementOffsetComponent::Propagate(), ScaleAndOffsetComponent::Propagate(), ScaleAndOffsetComponent::PropagateInternal(), CuRand< Real >::RandGaussian(), kaldi::cu::Randomize(), CuRand< Real >::RandUniform(), kaldi::cu::RegularizeL1(), RectifiedLinearComponent::RepairGradients(), CuBlockMatrix< Real >::SetCudaData(), kaldi::cu::Splice(), BatchNormComponent::StoreStats(), CuMatrixBase< Real >::SumColumnRanges(), CuMatrixBase< Real >::SymAddMat2(), kaldi::TraceMatMat(), kaldi::TraceMatSmat(), RepeatedAffineComponent::Update(), NaturalGradientRepeatedAffineComponent::Update(), TimeHeightConvolutionComponent::UpdateNaturalGradient(), and TimeHeightConvolutionComponent::UpdateSimple().

673 { return data_; }
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
Real* Data ( )
inline

Return data pointer.

Warning: may return a pointer to GPU memory. Use at your own risk.

Definition at line 676 of file cu-matrix.h.

676 { return data_; }
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
void DiffGroupPnorm ( const CuMatrixBase< Real > &  in_value,
const CuMatrixBase< Real > &  out_value,
const CuMatrixBase< Real > &  out_deriv,
Real  power 
)

Differentiate backward through the GroupPnorm function.

It is a combination of GroupPnormDeriv and MulRowsGroupMat.

Definition at line 828 of file cu-matrix.cc.

References CU1DBLOCK, CuMatrixBase< Real >::Data(), data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by PnormComponent::Backprop(), and kaldi::UnitTestCuMatrixDiffGroupPnorm().

831  {
832  KALDI_ASSERT(out_value.NumCols() > 0);
833  KALDI_ASSERT(out_value.NumCols() == out_deriv.NumCols());
834  int group_size = this->NumCols() / out_value.NumCols();
835  KALDI_ASSERT(this->NumCols() == out_value.NumCols() * group_size);
836 #if HAVE_CUDA == 1
837  if (CuDevice::Instantiate().Enabled()) {
838  CuTimer tim;
839  const int kWarpSize = 32;
840  dim3 dimBlock(kWarpSize, CU1DBLOCK / kWarpSize);
841  dim3 dimGrid(n_blocks(NumCols(), dimBlock.x),
842  n_blocks(NumRows(), dimBlock.y));
843  if (dimGrid.x * dimGrid.y > 1024) {
844  dimGrid.y = std::max(1024 / dimGrid.x, unsigned(1));
845  }
846  cuda_diff_group_pnorm(dimGrid, dimBlock, this->data_, in_value.Data(),
847  out_value.Data(), out_deriv.Data(), Dim(),
848  in_value.Stride(), out_value.Stride(),
849  out_deriv.Stride(), group_size, power);
850  CU_SAFE_CALL(cudaGetLastError());
851  CuDevice::Instantiate().AccuProfile(__func__, tim);
852  } else
853 #endif
854  {
855  Mat().GroupPnormDeriv(in_value.Mat(), out_value.Mat(), power);
856  MulRowsGroupMat(out_deriv);
857  }
858 }
void MulRowsGroupMat(const CuMatrixBase< Real > &src)
divide each row into src.NumCols() groups, and then scale i'th row's jth group of elements by src[i...
Definition: cu-matrix.cc:803
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void DiffLogSoftmaxPerRow ( const CuMatrixBase< Real > &  out_value,
const CuMatrixBase< Real > &  out_deriv 
)

Differentiate backward through the log softmax function.

Here, "out_value" is the log softmax output. Does, for each row i, *this(i) = out_deriv(i) - sum(out_deriv(i)) .* exp(out_value(i)) xxxx(i) is row-vector. Supports in-place operation, this == &out_deriv.

Definition at line 1887 of file cu-matrix.cc.

References CuVectorBase< Real >::AddColSumMat(), CuMatrixBase< Real >::AddMat(), CuMatrixBase< Real >::ApplyExp(), CuMatrixBase< Real >::CopyFromMat(), CU1DBLOCK, CuMatrixBase< Real >::Data(), data_, CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kUndefined, CuMatrixBase< Real >::MulRowsVec(), CuMatrixBase< Real >::NumRows(), kaldi::SameDim(), CuMatrixBase< Real >::Scale(), and CuMatrixBase< Real >::Stride().

Referenced by LogSoftmaxComponent::Backprop(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), and kaldi::UnitTestCuDiffLogSoftmax().

1888  {
1889 
1890  KALDI_ASSERT(SameDim(out_value, out_deriv) && SameDim(out_value, *this) &&
1891  this != &out_value);
1892 
1893 #if HAVE_CUDA == 1
1894  if (CuDevice::Instantiate().Enabled()) {
1895  CuTimer tim;
1896 
1897  // CUDA thread layout: one thread block per matrix-row.
1898  dim3 dimBlock(CU1DBLOCK);
1899  dim3 dimGrid(num_rows_);
1900  cuda_diff_log_softmax(dimGrid, dimBlock, this->Dim(), out_value.Data(),
1901  out_value.Stride(), out_deriv.Data(),
1902  out_deriv.Stride(), data_);
1903  CU_SAFE_CALL(cudaGetLastError());
1904 
1905  CuDevice::Instantiate().AccuProfile(__func__, tim);
1906  } else
1907 #endif
1908  {
1909  if (this == &out_deriv) {
1910  // the code below doesn't work for in-place, so make a copy and recurse.
1911  CuMatrix<Real> temp(NumRows(), NumCols(), kUndefined);
1912  temp.DiffLogSoftmaxPerRow(out_value, out_deriv);
1913  CopyFromMat(temp);
1914  return;
1915  }
1916  /*
1917  Let the output be y, then
1918  y_i = x_i - log(sum_i exp(x_i))
1919  where x_i is the input to the component. The Jacobian matrix of this
1920  function is
1921  J = I - 1 exp(y^T)
1922  where 1 is a vector of ones. Let the derivative vector at the output be e,
1923  and at the input be d, then we have
1924  d = e - exp(y) Sum(e)
1925  d_i = e_i - exp(y_i) Sum(e)
1926  */
1927  const CuMatrixBase<Real> &Y(out_value), &E(out_deriv);
1928  CuMatrixBase<Real> &D(*this);
1929 
1930  D.CopyFromMat(Y);
1931  D.ApplyExp(); // exp(y)
1932  CuVector<Real> E_sum(D.NumRows()); // Initializes to zero
1933  E_sum.AddColSumMat(1.0, E); // Sum(e)
1934  D.MulRowsVec(E_sum); // exp(y) Sum(e)
1935  D.Scale(-1.0); // - exp(y) Sum(e)
1936  D.AddMat(1.0, E, kNoTrans); // e - exp(y_i) Sum(e)
1937  }
1938 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
void CopyFromMat(const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:339
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void DiffParametricRelu ( const CuMatrixBase< Real > &  value,
const CuMatrixBase< Real > &  diff,
const CuVectorBase< Real > &  alpha,
const CuVectorBase< Real > &  beta 
)

Differentiate backward through the parametric relu function.

Here the "value" is the Relu input. Does, element-by-element. *this = diff * (value > 0 ? alpha : beta)

Definition at line 1488 of file cu-matrix.cc.

References CU2DBLOCK, data_, CuVectorBase< Real >::data_, CuMatrixBase< Real >::data_, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::Stride(), and CuVectorBase< Real >::Vec().

Referenced by ParametricRelu::BackpropagateFnc().

1492  {
1493 #if HAVE_CUDA == 1
1494  if (CuDevice::Instantiate().Enabled()) {
1495  CuTimer tim;
1496 
1497  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
1498  dim3 dimGrid(n_blocks(num_cols_, CU2DBLOCK), n_blocks(num_rows_, CU2DBLOCK));
1499 
1500  cuda_diff_parametric_relu(dimGrid, dimBlock, data_, diff.data_, value.data_,
1501  Dim(), diff.Stride(), value.Stride(),
1502  alpha.data_, beta.data_);
1503  CU_SAFE_CALL(cudaGetLastError());
1504 
1505  CuDevice::Instantiate().AccuProfile(__func__, tim);
1506  } else
1507 #endif
1508  {
1509  // Do it on CPU,
1510  for (MatrixIndexT r = 0; r < NumRows(); r++) {
1511  for (MatrixIndexT c = 0; c < NumCols(); c++) {
1512  Real value_elem = value.Mat()(r,c);
1513  this->Mat()(r,c) = diff.Mat()(r,c) *
1514  (value_elem >= 0.0 ? alpha.Vec()(c) : beta.Vec()(c));
1515  }
1516  }
1517  }
1518 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
int32 MatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:211
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void DiffSigmoid ( const CuMatrixBase< Real > &  value,
const CuMatrixBase< Real > &  diff 
)

Differentiate backward through the sigmoid function.

Here, "value" is the sigmoid output. Does, element-by-element, *this = diff * value * (1 - value).

Definition at line 1748 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by SigmoidComponent::Backprop(), Sigmoid::BackpropagateFnc(), and kaldi::UnitTestCuDiffSigmoid().

1749  {
1750  KALDI_ASSERT(SameDim(*this, value) && SameDim(*this, diff));
1751 #if HAVE_CUDA == 1
1752  if (CuDevice::Instantiate().Enabled()) {
1753  CuTimer tim;
1754  dim3 dimGrid, dimBlock;
1755  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1756  &dimGrid, &dimBlock);
1757  cuda_diff_sigmoid(dimGrid, dimBlock, data_, diff.data_, value.data_, Dim(), diff.Stride(), value.Stride());
1758  CU_SAFE_CALL(cudaGetLastError());
1759 
1760  CuDevice::Instantiate().AccuProfile(__func__, tim);
1761  } else
1762 #endif
1763  {
1764  Mat().DiffSigmoid(value.Mat(), diff.Mat());
1765  }
1766 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void DiffSoftmaxPerRow ( const CuMatrixBase< Real > &  value,
const CuMatrixBase< Real > &  diff 
)

Differentiate backward through the softmax function.

Here, "value" is the softmax output. Does, for each row i, *this(i) = diff(i) * diag(value(i)) - diff(i) * (value(i)^T * value(i)) xxxx(i) is row-vector; '*' and '-' are matrix operations. Supports in-place operation, this == &diff.

Definition at line 1852 of file cu-matrix.cc.

References CuVectorBase< Real >::AddDiagMatMat(), CuMatrixBase< Real >::AddDiagVecMat(), CuMatrixBase< Real >::CopyFromMat(), CU1DBLOCK, CuMatrixBase< Real >::Data(), data_, KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::MulElements(), CuMatrixBase< Real >::NumRows(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by kaldi::nnet3::attention::AttentionBackward(), SoftmaxComponent::Backprop(), and kaldi::UnitTestCuDiffSoftmax().

1853  {
1854 
1855  KALDI_ASSERT(SameDim(value, diff) && SameDim(value, *this) &&
1856  this != &value);
1857 
1858 #if HAVE_CUDA == 1
1859  if (CuDevice::Instantiate().Enabled()) {
1860  CuTimer tim;
1861 
1862  // CUDA thread layout: one thread block per matrix-row.
1863  dim3 dimBlock(CU1DBLOCK);
1864  dim3 dimGrid(num_rows_);
1865  cuda_diff_softmax(dimGrid, dimBlock, data_, this->Dim(), value.Data(),
1866  value.Stride(), diff.Data(), diff.Stride());
1867  CU_SAFE_CALL(cudaGetLastError());
1868 
1869  CuDevice::Instantiate().AccuProfile(__func__, tim);
1870  } else
1871 #endif
1872  {
1873  const CuMatrixBase<Real> &P(value), &E(diff);
1874  CuMatrixBase<Real> &D(*this);
1875 
1876  CuVector<Real> pe_vec(D.NumRows()); // For each row i, the dot product (p_t . e_t).
1877  pe_vec.AddDiagMatMat(1.0, P, kNoTrans, E, kTrans, 0.0);
1878 
1879  D.CopyFromMat(E);
1880  D.MulElements(P);
1881  // At this point, D = P .* E (in matlab notation)
1882  D.AddDiagVecMat(-1.0, pe_vec, P, kNoTrans, 1.0); // does D -= diag(pe_vec) * P.
1883  }
1884 }
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void DiffTanh ( const CuMatrixBase< Real > &  value,
const CuMatrixBase< Real > &  diff 
)

Differentiate backward through the tanh function.

Here, "value" is the tanh output. Does, element-by-element, *this = diff * (1 - value^2).

Definition at line 1793 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, CuMatrixBase< Real >::Mat(), and CuMatrixBase< Real >::Stride().

Referenced by TanhComponent::Backprop(), RecurrentComponent::BackpropagateFnc(), Tanh::BackpropagateFnc(), and kaldi::UnitTestCuDiffTanh().

1794  {
1795 #if HAVE_CUDA == 1
1796  if (CuDevice::Instantiate().Enabled()) {
1797  CuTimer tim;
1798  dim3 dimGrid, dimBlock;
1799  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1800  &dimGrid, &dimBlock);
1801  cuda_diff_tanh(dimGrid, dimBlock, data_, diff.data_, value.data_, Dim(), diff.Stride(), value.Stride());
1802  CU_SAFE_CALL(cudaGetLastError());
1803 
1804  CuDevice::Instantiate().AccuProfile(__func__, tim);
1805  } else
1806 #endif
1807  {
1808  Mat().DiffTanh(value.Mat(), diff.Mat());
1809  }
1810 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
::MatrixDim Dim() const
Definition: cu-matrix.h:211
void DiffXent ( const CuArrayBase< int32 > &  tgt,
CuVector< Real > *  log_post_tgt 
)

Differentiate the block [softmax+cross-entropy] : dE/da = posterior_mat - target_mat, 'E' is error function, 'a' is activation on softmax input.

Interface: tgt ... index vector, encodes the matrix of targets net_out_or_diff ... before invocation net output, after diff dE/da log_post_tgt ... per-frame statistics for cross-entropy computations : log(sum_row(posterior_mat .* target_mat))

Definition at line 1941 of file cu-matrix.cc.

References CU2DBLOCK, CuArrayBase< T >::Data(), data_, CuVectorBase< Real >::data_, CuArrayBase< T >::Dim(), KALDI_ASSERT, kaldi::Log(), CuVector< Real >::Resize(), and CuVectorBase< Real >::Vec().

Referenced by kaldi::UnitTestCuDiffXent().

1942  {
1943 
1944  KALDI_ASSERT(tgt.Dim() == num_rows_);
1945  log_post_tgt->Resize(tgt.Dim());
1946 
1947 #if HAVE_CUDA == 1
1948  if (CuDevice::Instantiate().Enabled()) {
1949  CuTimer tim;
1950  dim3 dimBlock(1, CU2DBLOCK*8);
1951  dim3 dimGrid(1, n_blocks(tgt.Dim(), CU2DBLOCK*8));
1952  cuda_diff_xent(dimGrid, dimBlock, tgt.Data(), data_,
1953  log_post_tgt->data_, Dim());
1954 
1955  CuDevice::Instantiate().AccuProfile(__func__, tim);
1956  } else
1957 #endif
1958  {
1959  MatrixIndexT num_rows = num_rows_;
1960  for(int32 r = 0; r < num_rows; r++) {
1961  int32 col_tgt = tgt.Data()[r];
1962  Real &value = Mat()(r, col_tgt);
1963  log_post_tgt->Vec()(r) = Log(value);
1964  value -= 1.0;
1965  }
1966  }
1967 }
const T * Data() const
Get raw pointer.
Definition: cu-array.h:52
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
int32 MatrixIndexT
Definition: matrix-common.h:98
double Log(double x)
Definition: kaldi-math.h:100
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:49
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void DivElements ( const CuMatrixBase< Real > &  A)

Divide two matrices elementwise: C = A ./ A.

Definition at line 678 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by CuVectorBase< Real >::DivElements(), kaldi::UnitTestCuMatrixDivElements(), and kaldi::UnitTestCuMatrixSetMatMatDivMat().

678  {
679  #if HAVE_CUDA == 1
680  if (CuDevice::Instantiate().Enabled()) {
681  CuTimer tim;
682 
683  KALDI_ASSERT(num_cols_ == A.NumCols());
684  KALDI_ASSERT(num_rows_ == A.NumRows());
685 
686  dim3 dimGrid, dimBlock;
687  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
688  &dimGrid, &dimBlock);
689 
690  cuda_div_elements(dimGrid, dimBlock, data_, A.data_, Dim(), A.Stride());
691  CU_SAFE_CALL(cudaGetLastError());
692 
693  CuDevice::Instantiate().AccuProfile(__func__, tim);
694  } else
695  #endif
696  {
697  Mat().DivElements(A.Mat());
698  }
699 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
KALDI_ASSERT & A
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void DivRowsVec ( const CuVectorBase< Real > &  div)

divide i'th row by scale[i]

Definition at line 886 of file cu-matrix.cc.

References data_, CuVectorBase< Real >::data_, CuVectorBase< Real >::Dim(), VectorBase< Real >::InvertElements(), KALDI_ASSERT, and CuVectorBase< Real >::Vec().

Referenced by StatisticsPoolingComponent::Backprop(), StatisticsPoolingComponent::Propagate(), kaldi::TestCuMatrixDivRowsVec(), and kaldi::UnitTestCuMatrixDivRowsVec().

886  {
887 #if HAVE_CUDA == 1
888  if (CuDevice::Instantiate().Enabled()) {
889  CuTimer tim;
890 
891  KALDI_ASSERT(div.Dim() == NumRows());
892 
893  dim3 dimGrid, dimBlock;
894  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
895  &dimGrid, &dimBlock);
896  // For large matrix we do more work per thread by limiting the
897  // the grid size to reduce the block launching overhead.
898  if (dimGrid.x * dimGrid.y > 1024) {
899  dimGrid.x = 1024 / dimGrid.y;
900  if (dimGrid.x == 0) {
901  dimGrid.x = 1;
902  }
903  }
904  cuda_div_rows_vec(dimGrid, dimBlock, data_, div.data_, Dim());
905  CU_SAFE_CALL(cudaGetLastError());
906 
907  CuDevice::Instantiate().AccuProfile(__func__, tim);
908  } else
909 #endif
910  {
911  Vector<Real> temp(div.Vec()); // will copy.
912  temp.InvertElements();
913  Mat().MulRowsVec(temp);
914  }
915 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void EqualElementMask ( const CuMatrixBase< Real > &  mat,
CuMatrix< Real > *  mask 
) const

Definition at line 3329 of file cu-matrix.cc.

References CuMatrixBase< Real >::Data(), data_, KALDI_ASSERT, kaldi::kSetZero, CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), CuMatrix< Real >::Resize(), and CuMatrixBase< Real >::Stride().

Referenced by MaxpoolingComponent::Backprop(), MaxPoolingComponent::BackpropagateFnc(), and MaxPooling2DComponent::BackpropagateFnc().

3329  {
3330  // Check the inputs:
3331  KALDI_ASSERT(mat.NumRows() == NumRows() && mat.NumCols() == NumCols());
3332  KALDI_ASSERT(mask != NULL);
3333  // Resizes the output matrix:
3334  mask->Resize(NumRows(), NumCols(), kSetZero);
3335 
3336 #if HAVE_CUDA == 1
3337  if (CuDevice::Instantiate().Enabled()) {
3338  CuTimer tim;
3339  dim3 dimGrid, dimBlock;
3340  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
3341  &dimGrid, &dimBlock);
3342  cuda_equal_element_mask(dimGrid, dimBlock, this->data_, mat.Data(),
3343  mask->Data(), this->Dim(), mat.Stride(),
3344  mask->Stride());
3345  CU_SAFE_CALL(cudaGetLastError());
3346 
3347  CuDevice::Instantiate().AccuProfile(__func__, tim);
3348  } else
3349 #endif
3350  {
3351  for (int32 r = 0; r < NumRows(); r++) {
3352  for (int32 c = 0; c < NumCols(); c++) {
3353  (*mask)(r,c) = ((*this)(r,c) == mat(r,c) ? 1.0 : 0.0);
3354  }
3355  }
3356  }
3357 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void FindRowMaxId ( CuArray< int32 > *  id) const

Find the id of the maximal element for each row (resizes the 'id' array to the appropriate size).

Definition at line 1813 of file cu-matrix.cc.

References CU1DBLOCK, rnnlm::d, CuArrayBase< T >::Data(), and data_.

Referenced by kaldi::nnet3::ComputeAccuracy(), NnetUpdater::ComputeTotAccuracy(), Xent::Eval(), kaldi::TestCuFindRowMaxId(), and kaldi::UnitTestCuFindRowMaxId().

1813  {
1814 #if HAVE_CUDA == 1
1815  if (CuDevice::Instantiate().Enabled()) {
1816  CuTimer tim;
1817  id->Resize(num_rows_);
1818  MatrixDim d = Dim();
1819 
1820  // CUDA thread layout: one thread block per matrix-row.
1821  dim3 dimBlock(CU1DBLOCK);
1822  dim3 dimGrid(num_rows_);
1823  cuda_find_row_max_id(dimGrid, dimBlock, data_, NULL, id->Data(), d);
1824  CU_SAFE_CALL(cudaGetLastError());
1825 
1826  // now we have the indices!
1827  CuDevice::Instantiate().AccuProfile(__func__, tim);
1828  } else
1829 #endif
1830  {
1831  // allocate index buffer
1832  id->Resize(num_rows_);
1833  id->Set(-1);
1834  // find maxima
1835  MatrixIndexT num_rows = num_rows_, num_cols = num_cols_;
1836  for (MatrixIndexT r = 0; r < num_rows; r++) {
1837  Real max = -1e21;
1838  int32 max_id = -1;
1839  const Real *row_data = Mat().RowData(r);
1840  for (MatrixIndexT c = 0; c < num_cols; c++) {
1841  if (max < row_data[c]) {
1842  max = row_data[c];
1843  max_id = c;
1844  }
1845  }
1846  id->Data()[r] = max_id;
1847  }
1848  }
1849 }
const T * Data() const
Get raw pointer.
Definition: cu-array.h:52
Structure containing size of the matrix plus stride.
Definition: cu-matrixdim.h:52
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
int32 MatrixIndexT
Definition: matrix-common.h:98
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:211
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
Real FrobeniusNorm ( ) const
inline

Definition at line 216 of file cu-matrix.h.

Referenced by CuMatrixBase< Real >::ApproxEqual(), kaldi::nnet3::ConstrainOrthonormalInternal(), and kaldi::UnitTestCuSparseMatrixFrobeniusNorm().

216 { return sqrt(TraceMatMat(*this, *this, kTrans)); }
friend Real TraceMatMat(const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, MatrixTransposeType trans)
Definition: cu-matrix.cc:2128
void GroupMax ( const CuMatrixBase< Real > &  src)

Apply the function y(i) = (max_{j = i*G}^{(i+1)*G-1} x_j where G = x.NumCols() / y.NumCols() must be an integer.

[note: y corresponds to *this and x to src, so src.NumCols() / this->NumCols() must be an integer.

Definition at line 1604 of file cu-matrix.cc.

References CU1DBLOCK, data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by MaxoutComponent::Propagate(), kaldi::TestCuMatrixGroupMax(), kaldi::TestCuMatrixGroupMaxAllGroupSizes(), and kaldi::UnitTestCuMatrixGroupMax().

1604  {
1605  int group_size = src.NumCols() / this->NumCols();
1606  KALDI_ASSERT(src.NumCols() == this->NumCols() * group_size &&
1607  this->NumRows() == src.NumRows());
1608 #if HAVE_CUDA == 1
1609  if (CuDevice::Instantiate().Enabled()) {
1610  CuTimer tim;
1611  // One thread block per row.
1612  // Use 2D block for small group size to simplify the calculation.
1613  // Each group is reduced by threads_per_group threads.
1614  // threads_per_group should be a power of 2 for fast tree reduction.
1615  // group size: 1 2 3 4 5 6 7 .. 12 13 .. 24 25 .. 48 ...
1616  // threads_per_group: 1 1 1 2 2 2 4 .. 4 8 .. 8 16 .. 16 ...
1617  int threads_per_group = CU1DBLOCK;
1618  while (threads_per_group * 3 / 2 >= group_size) {
1619  threads_per_group >>= 1;
1620  }
1621  if (group_size == 1) {
1622  threads_per_group = 1;
1623  }
1624  dim3 dimBlock(threads_per_group, CU1DBLOCK / threads_per_group);
1625  dim3 dimGrid(NumRows());
1626  cuda_group_max(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
1627  src.Stride(), group_size);
1628  CU_SAFE_CALL(cudaGetLastError());
1629  CuDevice::Instantiate().AccuProfile(__func__, tim);
1630  } else
1631 #endif
1632  {
1633  Mat().GroupMax(src.Mat());
1634  }
1635 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void GroupMaxDeriv ( const CuMatrixBase< Real > &  input,
const CuMatrixBase< Real > &  output 
)

Calculate derivatives for the GroupMax function above, where "input" is the input to the GroupMax function above (i.e.

the "src" variable), and "output" is the result of the computation (i.e. the "this" of that function call), and *this must have the same dimension as "input". Each element of *this will be set to 1 if the corresponding input equals the output of the group, and 0 otherwise. The equals the function derivative where it is defined (it's not defined where multiple inputs in the group are equal to the output).

Definition at line 861 of file cu-matrix.cc.

References CU2DBLOCK, CuMatrixBase< Real >::Data(), data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by MaxoutComponent::Backprop(), kaldi::TestCuMatrixGroupMaxDeriv(), and kaldi::UnitTestCuMatrixGroupMaxDeriv().

862  {
863  KALDI_ASSERT(src2.NumCols() > 0);
864  int group_size = this->NumCols() / src2.NumCols();
865  KALDI_ASSERT(this->NumCols() == src2.NumCols() * group_size);
866 #if HAVE_CUDA == 1
867  if (CuDevice::Instantiate().Enabled()) {
868  CuTimer tim;
869  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
870  dim3 dimGrid(n_blocks(NumCols(), CU2DBLOCK),
871  n_blocks(NumRows(), CU2DBLOCK));
872  cuda_calc_group_max_deriv(dimGrid, dimBlock, this->data_, src1.Data(),
873  src2.Data(), Dim(), src1.Stride(), src2.Stride(),
874  group_size);
875  CU_SAFE_CALL(cudaGetLastError());
876 
877  CuDevice::Instantiate().AccuProfile(__func__, tim);
878  } else
879 #endif
880  {
881  Mat().GroupMaxDeriv(src1.Mat(), src2.Mat());
882  }
883 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void GroupPnorm ( const CuMatrixBase< Real > &  src,
Real  pow 
)

Apply the function y(i) = (sum_{j = i*G}^{(i+1)*G-1} x_j ^ (power)) ^ (1 / p) where G = x.NumCols() / y.NumCols() must be an integer.

[note: y corresponds to *this and x to src, so src.NumCols() / this->NumCols() must be an integer.

Definition at line 1563 of file cu-matrix.cc.

References CU1DBLOCK, CU2DBLOCK, data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by PnormComponent::Propagate(), kaldi::TestCuMatrixDiffGroupPnorm(), kaldi::TestCuMatrixGroupPnorm(), and kaldi::UnitTestCuMatrixGroupPnorm().

1563  {
1564  int group_size = src.NumCols() / this->NumCols();
1565  KALDI_ASSERT(src.NumCols() == this->NumCols() * group_size &&
1566  this->NumRows() == src.NumRows());
1567 #if HAVE_CUDA == 1
1568  if (CuDevice::Instantiate().Enabled()) {
1569  CuTimer tim;
1570  if (power == Real(0) || power == Real(1) || power == Real(2)
1571  || power == std::numeric_limits<Real>::infinity()) {
1572  // One thread block per row.
1573  // Use 2D block for small group size to simplify the calculation
1574  // Each group is reduced by threads_per_group threads.
1575  // threads_per_group should be a power of 2 for fast tree reduction.
1576  int threads_per_group = CU1DBLOCK;
1577  while (threads_per_group * 3 / 2 >= group_size) {
1578  threads_per_group >>= 1;
1579  }
1580  if (group_size == 1) {
1581  threads_per_group = 1;
1582  }
1583  dim3 dimBlock(threads_per_group, CU1DBLOCK / threads_per_group);
1584  dim3 dimGrid(NumRows());
1585  cuda_group_spec_pnorm(dimGrid, dimBlock, this->data_, src.data_,
1586  this->Dim(), src.Stride(), group_size, power);
1587  } else {
1588  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
1589  dim3 dimGrid(n_blocks(NumCols(), CU2DBLOCK),
1590  n_blocks(NumRows(), CU2DBLOCK));
1591  cuda_group_pnorm(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
1592  src.Stride(), group_size, power);
1593  }
1594  CU_SAFE_CALL(cudaGetLastError());
1595  CuDevice::Instantiate().AccuProfile(__func__, tim);
1596  } else
1597 #endif
1598  {
1599  Mat().GroupPnorm(src.Mat(), power);
1600  }
1601 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void Heaviside ( const CuMatrixBase< Real > &  src)

Set each element to the Heaviside function of the corresponding element of "src", which we define as the function (x > 0 ? 1.0 : 0.0) [note: in general, there are different ways to deal with the situation when x==0.

]

Definition at line 2463 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by RectifiedLinearComponent::Backprop(), CuRand< Real >::BinarizeProbs(), kaldi::CuCompressedMatrixTestSign(), Dropout::PropagateFnc(), RectifiedLinearComponent::StoreStats(), and kaldi::UnitTestCuMatrixHeaviside().

2463  {
2464  KALDI_ASSERT(SameDim(*this, src));
2465 #if HAVE_CUDA == 1
2466  if (CuDevice::Instantiate().Enabled()) {
2467  CuTimer tim;
2468  dim3 dimGrid, dimBlock;
2469  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2470  &dimGrid, &dimBlock);
2471  cuda_heaviside(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
2472  src.Stride());
2473  CU_SAFE_CALL(cudaGetLastError());
2474 
2475  CuDevice::Instantiate().AccuProfile(__func__, tim);
2476  } else
2477  #endif
2478  {
2479  Mat().Heaviside(src.Mat());
2480  }
2481 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void InvertElements ( )

invert the matrix by elements.

Definition at line 919 of file cu-matrix.cc.

References data_.

Referenced by kaldi::TestCuMatrixCompObjfAndDeriv(), NnetEnsembleTrainer::TrainOneMinibatch(), kaldi::UnitTestCuMatrixInvertElements(), and kaldi::UnitTestCuMatrixObjfDeriv().

919  {
920 #if HAVE_CUDA == 1
921  if (CuDevice::Instantiate().Enabled()) {
922  CuTimer tim;
923 
924  dim3 dimGrid, dimBlock;
925  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
926  &dimGrid, &dimBlock);
927 
928  cuda_invert_elements(dimGrid, dimBlock, data_, Dim());
929  CU_SAFE_CALL(cudaGetLastError());
930 
931  CuDevice::Instantiate().AccuProfile(__func__, tim);
932  } else
933 #endif
934  {
935  Mat().InvertElements();
936  }
937 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
::MatrixDim Dim() const
Definition: cu-matrix.h:211
bool IsUnit ( Real  tol = 0.001) const

Definition at line 595 of file cu-matrix.cc.

References KALDI_ASSERT, kaldi::kTrans, and kaldi::TraceMatMat().

Referenced by OnlinePreconditioner::InitOrthonormalSpecial(), kaldi::UnitTestCuMatrixSymInvertPosDef(), and kaldi::UnitTestCuSpMatrixInvert().

595  {
596  // want to return:
597  //FrobeniusNorm(*this - I) <= tol * NumRows(), i.e.:
598  //sqrt (trace((*this - I)(*this-I)) <= tol * NumRows()
599  // trace((*this - I)(*this - I)) <= tol * NumRows()
600  // trace(*this * *this) + trace(I) - 2 * trace(*this) <= tol * NumRows()
601  // trace(*this * *this) + dim - 2*this.Trace() <= tol * NumRows()
602  KALDI_ASSERT(this->NumRows() == this->NumCols());
603  return (TraceMatMat(*this, *this, kTrans) + this->NumRows() - 2.0 * this->Trace() <=
604  tol * this->NumRows());
605 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
friend Real TraceMatMat(const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, MatrixTransposeType trans)
Definition: cu-matrix.cc:2128
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
Real Trace(bool check_square=true) const
Return the trace. If check_square = true, will crash if matrix is not square.
Definition: cu-matrix.cc:2976
KALDI_DISALLOW_COPY_AND_ASSIGN ( CuMatrixBase< Real >  )
private
void Lookup ( const std::vector< Int32Pair > &  indexes,
Real *  output 
) const

Definition at line 3270 of file cu-matrix.cc.

References rnnlm::i, and KALDI_ASSERT.

Referenced by NnetDiscriminativeUpdater::LatticeComputations(), DiscriminativeComputation::LookupNnetOutput(), kaldi::TestCuMatrixLookup(), and kaldi::UnitTestCuMatrixLookup().

3271  {
3272  // Checks the dimension.
3273  MatrixIndexT num_rows = this->num_rows_, num_cols = this->num_cols_;
3274  for (int32 i = 0; i < indices.size(); ++i) {
3275  KALDI_ASSERT(indices[i].first < num_rows && indices[i].first >= 0 &&
3276  indices[i].second < num_cols && indices[i].second >= 0);
3277  }
3278  if (indices.size() == 0) return;
3279  KALDI_ASSERT(output != NULL);
3280 
3281 #if HAVE_CUDA == 1
3282  if (CuDevice::Instantiate().Enabled()) {
3283  CuArray<Int32Pair> cuda_indices(indices);
3284  Lookup(cuda_indices, output);
3285  } else
3286 #endif
3287  {
3288  for (int32 i = 0; i < indices.size(); i++) {
3289  output[i] = (*this)(indices[i].first, indices[i].second);
3290  }
3291  }
3292 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
int32 MatrixIndexT
Definition: matrix-common.h:98
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void Lookup(const std::vector< Int32Pair > &indexes, Real *output) const
Definition: cu-matrix.cc:3270
void Lookup ( const CuArrayBase< Int32Pair > &  indexes,
Real *  output 
) const

Definition at line 3295 of file cu-matrix.cc.

References CuArrayBase< T >::CopyToHost(), CU1DBLOCK, CuArrayBase< T >::Data(), data_, CuArrayBase< T >::Dim(), Int32Pair::first, rnnlm::i, KALDI_ASSERT, and Int32Pair::second.

3296  {
3297  int32 num_elements = indices.Dim();
3298  if (num_elements == 0) return;
3299  KALDI_ASSERT(output != NULL);
3300 
3301 #if HAVE_CUDA == 1
3302  if (CuDevice::Instantiate().Enabled()) {
3303  CuArray<Real> cuda_output(num_elements);
3304  CuTimer tim;
3305  dim3 dimBlock(CU1DBLOCK, 1);
3306  dim3 dimGrid(n_blocks(num_elements, CU1DBLOCK), 1);
3307 
3308  cuda_matrix_lookup(dimGrid, dimBlock, this->data_, this->Dim(),
3309  indices.Data(), num_elements, cuda_output.Data());
3310  CU_SAFE_CALL(cudaGetLastError());
3311 
3312  cuda_output.CopyToHost(output);
3313  CuDevice::Instantiate().AccuProfile(__func__, tim);
3314  } else
3315 #endif
3316  {
3317  MatrixIndexT num_rows = this->num_rows_, num_cols = this->num_cols_;
3318  const Int32Pair *index = indices.Data();
3319  for (int32 i = 0; i < num_elements; i++) {
3320  KALDI_ASSERT(index[i].first < num_rows && index[i].first >= 0 &&
3321  index[i].second < num_cols && index[i].second >= 0);
3322  output[i] = (*this)(index[i].first, index[i].second);
3323  }
3324  }
3325 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
int32 MatrixIndexT
Definition: matrix-common.h:98
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
int32_cuda second
Definition: cu-matrixdim.h:86
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
int32_cuda first
Definition: cu-matrixdim.h:85
const MatrixBase<Real>& Mat ( ) const
inline

Definition at line 682 of file cu-matrix.h.

Referenced by CuMatrixBase< Real >::AddCols(), CuVectorBase< Real >::AddColSumMat(), CuVectorBase< Real >::AddDiagMat2(), CuVectorBase< Real >::AddDiagMatMat(), CuMatrixBase< Real >::AddDiagVecMat(), CuMatrixBase< Real >::AddMat(), CuSpMatrix< Real >::AddMat2(), CuMatrixBase< Real >::AddMatBlocks(), CuMatrixBase< Real >::AddMatDiagVec(), CuMatrixBase< Real >::AddMatMat(), CuMatrixBase< Real >::AddMatMatElements(), CuMatrixBase< Real >::AddMatSmat(), CuVectorBase< Real >::AddMatVec(), CuMatrixBase< Real >::AddRows(), CuMatrixBase< Real >::AddSmatMat(), GeneralMatrix::AddToMat(), CuMatrixBase< Real >::AddToRows(), CuMatrixBase< Real >::ApplyLogSoftMaxPerRow(), CuMatrixBase< Real >::ApplySoftMaxPerRow(), kaldi::cu::BackpropLstmNonlinearity(), kaldi::cu::ComputeLstmNonlinearity(), kaldi::cu::Copy(), CuVectorBase< Real >::CopyColFromMat(), CuMatrixBase< Real >::CopyCols(), CuVectorBase< Real >::CopyElements(), CuTpMatrix< Real >::CopyFromMat(), CuSpMatrix< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyRows(), CuVectorBase< Real >::CopyRowsFromMat(), VectorBase< Real >::CopyRowsFromMat(), CuSparseMatrix< Real >::CopyToMat(), GeneralMatrix::CopyToMat(), CuMatrixBase< Real >::DiffGroupPnorm(), CuMatrixBase< Real >::DiffParametricRelu(), CuMatrixBase< Real >::DiffSigmoid(), CuMatrixBase< Real >::DiffTanh(), CuMatrixBase< Real >::DivElements(), CuMatrixBase< Real >::GroupMax(), CuMatrixBase< Real >::GroupMaxDeriv(), CuMatrixBase< Real >::GroupPnorm(), CuMatrixBase< Real >::Heaviside(), CuMatrixBase< Real >::Max(), CuMatrixBase< Real >::Min(), CuMatrixBase< Real >::MulElements(), CuMatrixBase< Real >::MulRowsGroupMat(), CuMatrixBase< Real >::ParametricRelu(), CuRand< Real >::RandGaussian(), kaldi::cu::Randomize(), CuRand< Real >::RandUniform(), kaldi::cu::RegularizeL1(), CuMatrixBase< Real >::SetMatMatDivMat(), CuMatrixBase< Real >::Sigmoid(), CuMatrixBase< Real >::SoftHinge(), kaldi::cu::Splice(), CuMatrixBase< Real >::SymAddMat2(), CuMatrixBase< Real >::Tanh(), kaldi::TraceMatMat(), and kaldi::TraceMatSmat().

682  {
683  return *(reinterpret_cast<const MatrixBase<Real>* >(this));
684  }
MatrixBase<Real>& Mat ( )
inline

Definition at line 685 of file cu-matrix.h.

685  {
686  return *(reinterpret_cast<MatrixBase<Real>* >(this));
687  }
void Max ( const CuMatrixBase< Real > &  A)

Do, elementwise, *this = max(*this, A).

Definition at line 702 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by kaldi::CuCompressedMatrixTestNonnegative(), kaldi::CuCompressedMatrixTestSymmetric(), main(), SpliceMaxComponent::Propagate(), MaxpoolingComponent::Propagate(), kaldi::TestCuMatrixMax(), kaldi::UnitTestCuMatrixMax(), and kaldi::UnitTestCuMatrixReduceMax().

702  {
703  #if HAVE_CUDA == 1
704  if (CuDevice::Instantiate().Enabled()) {
705  CuTimer tim;
706 
707  KALDI_ASSERT(num_cols_ == A.NumCols());
708  KALDI_ASSERT(num_rows_ == A.NumRows());
709 
710  dim3 dimGrid, dimBlock;
711  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
712  &dimGrid, &dimBlock);
713 
714  cuda_max(dimGrid, dimBlock, data_, A.data_, Dim(), A.Stride());
715  CU_SAFE_CALL(cudaGetLastError());
716 
717  CuDevice::Instantiate().AccuProfile(__func__, tim);
718  } else
719  #endif
720  {
721  Mat().Max(A.Mat());
722  }
723 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
KALDI_ASSERT & A
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
Real Max ( ) const

Definition at line 2934 of file cu-matrix.cc.

References CU1DBLOCK, CuVectorBase< Real >::Data(), data_, KALDI_ASSERT, kaldi::kUndefined, and CuVectorBase< Real >::Max().

2934  {
2935 #if HAVE_CUDA == 1
2936  if (CuDevice::Instantiate().Enabled()) {
2937  KALDI_ASSERT(num_rows_ > 0 && num_cols_ > 0);
2938  CuTimer tim;
2939 
2940  CuVector<Real> col_max(num_rows_, kUndefined);
2941  cuda_max_mat_cols(num_rows_, CU1DBLOCK, col_max.Data(), data_, Dim());
2942  Real ans = col_max.Max();
2943 
2944  CuDevice::Instantiate().AccuProfile(__func__, tim);
2945  return ans;
2946  } else
2947 #endif
2948  {
2949  return Mat().Max();
2950  }
2951 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void Min ( const CuMatrixBase< Real > &  A)

Do, elementwise, *this = min(*this, A).

Definition at line 727 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by kaldi::CuCompressedMatrixTestNonnegative(), kaldi::CuCompressedMatrixTestSymmetric(), main(), kaldi::TestCuMatrixMin(), kaldi::UnitTestCuMatrixMin(), and kaldi::UnitTestCuMatrixReduceMin().

727  {
728  #if HAVE_CUDA == 1
729  if (CuDevice::Instantiate().Enabled()) {
730  CuTimer tim;
731 
732  KALDI_ASSERT(num_cols_ == A.NumCols());
733  KALDI_ASSERT(num_rows_ == A.NumRows());
734 
735  dim3 dimGrid, dimBlock;
736  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
737  &dimGrid, &dimBlock);
738 
739  cuda_min(dimGrid, dimBlock, data_, A.data_, Dim(), A.Stride());
740  CU_SAFE_CALL(cudaGetLastError());
741 
742  CuDevice::Instantiate().AccuProfile(__func__, tim);
743  } else
744  #endif
745  {
746  Mat().Min(A.Mat());
747  }
748 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
KALDI_ASSERT & A
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
Real Min ( ) const

Definition at line 2955 of file cu-matrix.cc.

References CU1DBLOCK, CuVectorBase< Real >::Data(), data_, KALDI_ASSERT, kaldi::kUndefined, and CuVectorBase< Real >::Min().

2955  {
2956 #if HAVE_CUDA == 1
2957  if (CuDevice::Instantiate().Enabled()) {
2958  KALDI_ASSERT(num_rows_ > 0 && num_cols_ > 0);
2959  CuTimer tim;
2960 
2961  CuVector<Real> col_min(num_rows_, kUndefined);
2962  cuda_min_mat_cols(num_rows_, CU1DBLOCK, col_min.Data(), data_, Dim());
2963  Real ans = col_min.Min();
2964 
2965  CuDevice::Instantiate().AccuProfile(__func__, tim);
2966  return ans;
2967  } else
2968 #endif
2969  {
2970  return Mat().Min();
2971  }
2972 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void MulColsVec ( const CuVectorBase< Real > &  scale)

scale i'th column by scale[i]

Definition at line 752 of file cu-matrix.cc.

References data_, CuVectorBase< Real >::data_, CuVectorBase< Real >::Dim(), KALDI_ASSERT, and CuVectorBase< Real >::Vec().

Referenced by BatchNormComponent::Backprop(), PerElementScaleComponent::Backprop(), FixedScaleComponent::Backprop(), Convolutional2DComponent::BackpropagateFnc(), Rescale::BackpropagateFnc(), ScaleAndOffsetComponent::BackpropInternal(), ModelCollapser::GetDiagonallyPreModifiedComponentIndex(), BatchNormComponent::Propagate(), PerElementScaleComponent::Propagate(), FixedScaleComponent::Propagate(), Rescale::PropagateFnc(), ScaleAndOffsetComponent::PropagateInternal(), kaldi::UnitTestCuMatrixAddMatDiagVec(), and kaldi::UnitTestCuMatrixMulColsVec().

752  {
753 #if HAVE_CUDA == 1
754  if (CuDevice::Instantiate().Enabled()) {
755  CuTimer tim;
756 
757  KALDI_ASSERT(scale.Dim() == NumCols());
758 
759 
760  dim3 dimGrid, dimBlock;
761  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
762  &dimGrid, &dimBlock);
763 
764  cuda_mul_cols_vec(dimGrid, dimBlock, data_, scale.data_, Dim());
765  CU_SAFE_CALL(cudaGetLastError());
766 
767 
768  CuDevice::Instantiate().AccuProfile(__func__, tim);
769  } else
770 #endif
771  {
772  Mat().MulColsVec(scale.Vec());
773  }
774 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void MulElements ( const CuMatrixBase< Real > &  A)

Multiply two matrices elementwise: C = C .* A.

Definition at line 654 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by ElementwiseProductComponent::Backprop(), BackpropTruncationComponent::Backprop(), SigmoidComponent::Backprop(), TanhComponent::Backprop(), PowerComponent::Backprop(), RectifiedLinearComponent::Backprop(), SoftHingeComponent::Backprop(), MaxpoolingComponent::Backprop(), HiddenSoftmax::BackpropagateFnc(), Dropout::BackpropagateFnc(), ScaleAndOffsetComponent::BackpropInternal(), kaldi::nnet1::ComputeStdDev(), CuMatrixBase< Real >::DiffSoftmaxPerRow(), Xent::Eval(), Mse::Eval(), ElementwiseProductComponent::Propagate(), DropoutComponent::Propagate(), KlHmm::PropagateFnc(), LengthNormComponent::PropagateFnc(), Dropout::PropagateFnc(), ClipGradientComponent::RepairGradients(), NnetEnsembleTrainer::TrainOneMinibatch(), kaldi::UnitTestCuMatrixAddMatMatElements(), kaldi::UnitTestCuMatrixMulElements(), kaldi::nnet1::UnitTestLengthNorm(), ParametricRelu::Update(), AffineTransform::Update(), FramePoolingComponent::Update(), ConvolutionalComponent::Update(), Rescale::Update(), and NaturalGradientPerElementScaleComponent::Update().

654  {
655  #if HAVE_CUDA == 1
656  if (CuDevice::Instantiate().Enabled()) {
657  CuTimer tim;
658 
659  KALDI_ASSERT(num_cols_ == A.NumCols());
660  KALDI_ASSERT(num_rows_ == A.NumRows());
661 
662  dim3 dimGrid, dimBlock;
663  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
664  &dimGrid, &dimBlock);
665 
666  cuda_mul_elements(dimGrid, dimBlock, data_, A.data_, Dim(), A.Stride());
667  CU_SAFE_CALL(cudaGetLastError());
668 
669  CuDevice::Instantiate().AccuProfile(__func__, tim);
670  } else
671  #endif
672  {
673  Mat().MulElements(A.Mat());
674  }
675 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
MatrixIndexT num_cols_
Definition: cu-matrix.h:712
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
KALDI_ASSERT & A
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:713
void MulRowsGroupMat ( const CuMatrixBase< Real > &  src)

divide each row into src.NumCols() groups, and then scale i'th row's jth group of elements by src[i, j].

Definition at line 803 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by MaxoutComponent::Backprop(), and kaldi::UnitTestCuMatrixMulRowsGroupMat().

803  {
804  KALDI_ASSERT(src.NumCols() > 0);
805 #if HAVE_CUDA == 1
806  if (CuDevice::Instantiate().Enabled()) {
807  CuTimer tim;
808  int group_size = this->NumCols() / src.NumCols();
809 
810  dim3 dimGrid, dimBlock;
811  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
812  &dimGrid, &dimBlock);
813 
814  cuda_mul_rows_group_mat(dimGrid, dimBlock, this->data_, src.data_,
815  this->Dim(), src.Stride(), group_size);
816  CU_SAFE_CALL(cudaGetLastError());
817 
818  CuDevice::Instantiate().AccuProfile(__func__, tim);
819  } else
820 #endif
821  {
822  Mat().MulRowsGroupMat(src.Mat());
823  }
824 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void MulRowsVec ( const CuVectorBase< Real > &  scale)

scale i'th row by scale[i]

Definition at line 779 of file cu-matrix.cc.

References data_, CuVectorBase< Real >::data_, CuVectorBase< Real >::Dim(), KALDI_ASSERT, and CuVectorBase< Real >::Vec().

Referenced by BackpropTruncationComponent::Backprop(), ClipGradientComponent::Backprop(), BlockSoftmax::BackpropagateFnc(), LengthNormComponent::BackpropagateFnc(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), kaldi::cu::DiffNormalizePerRow(), Xent::Eval(), Mse::Eval(), kaldi::nnet2::PreconditionDirections(), NnetChainTrainer::ProcessOutputs(), NnetDiscriminativeTrainer::ProcessOutputs(), LengthNormComponent::PropagateFnc(), OnlinePreconditioner::ReorthogonalizeXt1(), OnlineNaturalGradient::ReorthogonalizeXt1(), kaldi::UnitTestCuMatrixMulRowsVec(), AffineTransform::Update(), ConvolutionalComponent::Update(), and kaldi::nnet3::time_height_convolution::ZeroBlankRows().

779  {
780  #if HAVE_CUDA == 1
781  if (CuDevice::Instantiate().Enabled()) {
782  CuTimer tim;
783 
784  KALDI_ASSERT(scale.Dim() == NumRows());
785 
786  dim3 dimGrid, dimBlock;
787  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
788  &dimGrid, &dimBlock);
789 
790  cuda_mul_rows_vec(dimGrid, dimBlock, data_, scale.data_, Dim());
791  CU_SAFE_CALL(cudaGetLastError());
792 
793 
794  CuDevice::Instantiate().AccuProfile(__func__, tim);
795  } else
796  #endif
797  {
798  Mat().MulRowsVec(scale.Vec());
799  }
800 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:206
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:682
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:205
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:704
::MatrixDim Dim() const
Definition: cu-matrix.h:211
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT NumCols ( ) const
inline

Definition at line 206 of file cu-matrix.h.

Referenced by NnetComputer::AcceptInput(), NnetLdaStatsAccumulator::AccStatsFromOutput(), RestrictedAttentionComponent::Add(), MatrixRandomizer::AddData(), CuVectorBase< Real >::AddDiagMatMat(), CuMatrixBase< Real >::AddDiagVecMat(), CuRand< Real >::AddGaussNoise(), CuMatrixBase< Real >::AddMat(), CuSpMatrix< Real >::AddMat2(), CuMatrixBase< Real >::AddMatBlock(), CuMatrixBase< Real >::AddMatBlocks(), CuMatrixBase< Real >::AddMatDiagVec(), CuBlockMatrix< Real >::AddMatMat(), CuMatrixBase< Real >::AddMatMat(), CuMatrixBase< Real >::AddMatSmat(), CuVectorBase< Real >::AddMatVec(), CuMatrixBase< Real >::AddRowRanges(), CuMatrixBase< Real >::AddRows(), CuVectorBase< Real >::AddRowSumMat(), CuMatrixBase< Real >::AddSmatMat(), CuMatrixBase< Real >::AddToRows(), kaldi::nnet3::attention::ApplyScalesToInput(), kaldi::nnet3::attention::ApplyScalesToInputSimple(), kaldi::nnet3::attention::ApplyScalesToOutput(), kaldi::nnet3::attention::ApplyScalesToOutputSimple(), kaldi::nnet3::attention::AttentionBackward(), kaldi::nnet3::attention::AttentionForward(), NnetUpdater::Backprop(), BatchNormComponent::Backprop(), StatisticsExtractionComponent::Backprop(), MaxoutComponent::Backprop(), MaxpoolingComponent::Backprop(), PnormComponent::Backprop(), RepeatedAffineComponent::Backprop(), NormalizeComponent::Backprop(), SigmoidComponent::Backprop(), TanhComponent::Backprop(), PowerComponent::Backprop(), RectifiedLinearComponent::Backprop(), SoftHingeComponent::Backprop(), ScaleComponent::Backprop(), SoftmaxComponent::Backprop(), LogSoftmaxComponent::Backprop(), SpliceComponent::Backprop(), SpliceMaxComponent::Backprop(), BlockAffineComponent::Backprop(), PermuteComponent::Backprop(), DctComponent::Backprop(), FixedLinearComponent::Backprop(), FixedAffineComponent::Backprop(), PerElementOffsetComponent::Backprop(), DropoutComponent::Backprop(), Convolutional1dComponent::Backprop(), ScaleAndOffsetComponent::Backprop(), ConvolutionComponent::Backprop(), LstmNonlinearityComponent::Backprop(), CompositeComponent::Backprop(), Component::Backpropagate(), Splice::BackpropagateFnc(), SentenceAveragingComponent::BackpropagateFnc(), Convolutional2DComponent::BackpropagateFnc(), ConvolutionalComponent::BackpropagateFnc(), kaldi::cu::BackpropLstmNonlinearity(), RestrictedAttentionComponent::BackpropOneHead(), CuRand< Real >::BinarizeProbs(), TimeHeightConvolutionComponent::Check(), ChunkInfo::CheckSize(), ModelCollapser::CollapseComponentsAffine(), NnetComputerFromEg::Compute(), NnetOnlineComputer::Compute(), DiscriminativeComputation::Compute(), kaldi::nnet3::ComputeAccuracy(), NnetComputer::ComputeLastLayerDeriv(), kaldi::cu::ComputeLstmNonlinearity(), kaldi::nnet3::ComputeObjectiveFunction(), kaldi::nnet1::ComputeStdDev(), OnlinePreconditioner::ComputeWt1(), OnlineNaturalGradient::ComputeWt1(), kaldi::nnet3::ConstrainOrthonormal(), kaldi::nnet3::ConstrainOrthonormalInternal(), ConvolutionComponent::ConvolutionComponent(), kaldi::nnet3::time_height_convolution::ConvolveBackwardData(), kaldi::nnet3::time_height_convolution::ConvolveBackwardDataInternal(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParams(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParamsInternal(), kaldi::nnet3::time_height_convolution::ConvolveForward(), kaldi::nnet3::time_height_convolution::ConvolveForwardInternal(), kaldi::cu::Copy(), CuVectorBase< Real >::CopyColFromMat(), CuVectorBase< Real >::CopyDiagFromMat(), CuVectorBase< Real >::CopyElements(), CuMatrixBase< Real >::CopyFromBlock(), CuTpMatrix< Real >::CopyFromMat(), CuCompressedMatrix< I >::CopyFromMat(), CuSpMatrix< Real >::CopyFromMat(), CuBlockMatrix< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyRows(), CuVectorBase< Real >::CopyRowsFromMat(), VectorBase< Real >::CopyRowsFromMat(), CuSparseMatrix< Real >::CopyToMat(), CuCompressedMatrix< I >::CopyToMat(), CuMatrix< Real >::CuMatrix(), kaldi::CuRandGaussianMatrixBaseSpeedTest(), kaldi::CuRandGaussianMatrixSpeedTest(), kaldi::CuRandUniformMatrixBaseSpeedTest(), kaldi::CuRandUniformMatrixSpeedTest(), CuSubVector< Real >::CuSubVector(), CuTpMatrix< Real >::CuTpMatrix(), CuMatrixBase< Real >::DiffGroupPnorm(), kaldi::cu::DiffNormalizePerRow(), CuMatrixBase< Real >::DivElements(), kaldi::cu::EnsureNonzero(), CuMatrixBase< Real >::EqualElementMask(), Xent::Eval(), Mse::Eval(), MultiTaskLoss::Eval(), NnetComputer::ExecuteCommand(), kaldi::nnet3::attention::GetAttentionDotProducts(), kaldi::nnet3::attention::GetAttentionDotProductsSimple(), AffineTransform::GetGradient(), RecurrentComponent::GetGradient(), LstmProjected::GetGradient(), ConvolutionalComponent::GetGradient(), Convolutional2DComponent::GetGradient(), BlstmProjected::GetGradient(), Convolutional1dComponent::GetParameterDim(), AffineTransform::GetParams(), RecurrentComponent::GetParams(), LstmProjected::GetParams(), ConvolutionalComponent::GetParams(), Convolutional2DComponent::GetParams(), BlstmProjected::GetParams(), NnetComputer::GetPointers(), CuMatrixBase< Real >::GroupMax(), CuMatrixBase< Real >::GroupMaxDeriv(), CuMatrixBase< Real >::GroupPnorm(), ConvolutionComponent::InderivPatchesToInderiv(), MaxpoolingComponent::InderivPatchesToInderiv(), RestrictedAttentionComponent::Info(), AffineComponent::Info(), AffineComponentPreconditioned::Info(), AffineComponentPreconditionedOnline::Info(), DctComponent::Info(), FixedLinearComponent::Info(), FixedAffineComponent::Info(), Convolutional1dComponent::Info(), LstmNonlinearityComponent::Info(), OnlinePreconditioner::Init(), OnlineNaturalGradient::Init(), AffineComponent::Init(), AffineComponentPreconditioned::Init(), AffineComponentPreconditionedOnline::Init(), FixedAffineComponent::Init(), Convolutional1dComponent::Init(), ConvolutionComponent::Init(), LinearTransform::InitData(), TimeHeightConvolutionComponent::InitFromConfig(), NaturalGradientAffineComponent::InitFromConfig(), OnlinePreconditioner::InitOrthonormalSpecial(), OnlineNaturalGradient::InitOrthonormalSpecial(), RepeatedAffineComponent::InputDim(), AffineComponent::InputDim(), LinearComponent::InputDim(), BlockAffineComponent::InputDim(), FixedLinearComponent::InputDim(), FixedAffineComponent::InputDim(), Convolutional1dComponent::InputDim(), LstmNonlinearityComponent::InputDim(), ConvolutionComponent::InputToInputPatches(), MaxpoolingComponent::InputToInputPatches(), NnetDiscriminativeUpdater::LatticeComputations(), main(), NnetComputer::MatrixStddev(), CuMatrixBase< Real >::Max(), kaldi::MeanVariance(), CuMatrixBase< Real >::Min(), kaldi::nnet1::MomentStatistics(), CuMatrixBase< Real >::MulElements(), CuMatrixBase< Real >::MulRowsGroupMat(), NnetComputer::NnetComputer(), kaldi::cu::NormalizePerRow(), TimeHeightConvolutionComponent::NumParameters(), BlockAffineComponent::NumParameters(), RepeatedAffineComponent::NumParameters(), LinearComponent::NumParameters(), ConvolutionComponent::NumParameters(), LstmNonlinearityComponent::NumParameters(), LinearTransform::NumParams(), AffineTransform::NumParams(), RecurrentComponent::NumParams(), LstmProjected::NumParams(), ConvolutionalComponent::NumParams(), Convolutional2DComponent::NumParams(), BlstmProjected::NumParams(), CuMatrix< BaseFloat >::operator=(), DctComponent::OutputDim(), LstmNonlinearityComponent::OutputDim(), CuMatrixBase< Real >::ParametricRelu(), TimeHeightConvolutionComponent::PerturbParams(), LstmNonlinearityComponent::PerturbParams(), kaldi::nnet2::PreconditionDirections(), OnlinePreconditioner::PreconditionDirections(), OnlineNaturalGradient::PreconditionDirections(), kaldi::nnet2::PreconditionDirectionsAlpha(), kaldi::nnet2::PreconditionDirectionsAlphaRescaled(), OnlinePreconditioner::PreconditionDirectionsInternal(), OnlineNaturalGradient::PreconditionDirectionsInternal(), kaldi::nnet3::PrintParameterStats(), NnetComputeProb::ProcessOutputs(), DistributeComponent::Propagate(), NormalizeComponent::Propagate(), Component::Propagate(), ElementwiseProductComponent::Propagate(), BatchNormComponent::Propagate(), StatisticsExtractionComponent::Propagate(), TimeHeightConvolutionComponent::Propagate(), StatisticsPoolingComponent::Propagate(), RepeatedAffineComponent::Propagate(), DropoutMaskComponent::Propagate(), SpliceComponent::Propagate(), SumBlockComponent::Propagate(), BlockAffineComponent::Propagate(), DctComponent::Propagate(), PerElementOffsetComponent::Propagate(), DropoutComponent::Propagate(), AdditiveNoiseComponent::Propagate(), Convolutional1dComponent::Propagate(), ScaleAndOffsetComponent::Propagate(), ConvolutionComponent::Propagate(), CompositeComponent::Propagate(), KlHmm::PropagateFnc(),