All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
CuMatrixBase< Real > Singleton Reference

Matrix for CUDA computing. More...

#include <matrix-common.h>

Inheritance diagram for CuMatrixBase< Real >:
Collaboration diagram for CuMatrixBase< Real >:

Public Member Functions

void CopyCols (const CuMatrixBase< Real > &src, const CuArrayBase< MatrixIndexT > &indexes)
 Copies column r from column indexes[r] of src. More...
 
void AddCols (const CuMatrixBase< Real > &src, const CuArrayBase< MatrixIndexT > &indices)
 Add column indices[r] of src to column r. More...
 
void CopyRows (const CuMatrixBase< Real > &src, const CuArrayBase< MatrixIndexT > &indexes)
 Copies row r from row indexes[r] of src. More...
 
void CopyRows (const CuArrayBase< const Real * > &src)
 Copies row r of this matrix from an array of floats at the location given by src[r], where src[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (the point is: the data it points to should be on the GPU if we're using a GPU, and on a CPU otherwise). More...
 
void CopyToRows (const CuArrayBase< Real * > &dst) const
 For each row r of this matrix, copies it to the array of floats at the location given by dst[r], where dst[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (i.e. More...
 
void AddRows (Real alpha, const CuMatrixBase< Real > &src, const CuArrayBase< MatrixIndexT > &indexes)
 Does for each row r, this.Row(r) += alpha * src.row(indexes[r]). More...
 
void MulRows (const CuMatrixBase< Real > &src, const CuArrayBase< MatrixIndexT > &indexes)
 Does for each row r, this.Row(r) *= alpha * src.row(indexes[r]), where '*=' is elementwise multiplication. More...
 
void AddRows (Real alpha, const CuArrayBase< const Real * > &src)
 Does for each row r, this.Row(r) += alpha * src[r], treating src[r] as the beginning of a region of memory representing a vector of floats, of the same length as this.NumCols(). More...
 
void AddToRows (Real alpha, const CuArrayBase< MatrixIndexT > &indexes, CuMatrixBase< Real > *dst) const
 For each row i of *this, adds this->Row(i) to dst->Row(indexes(i)) if indexes(i) >= 0, else do nothing. More...
 
void AddToRows (Real alpha, const CuArrayBase< Real * > &dst) const
 For each row r of this matrix, adds it (times alpha) to the array of floats at the location given by dst[r], where dst[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (i.e. More...
 
void SumColumnRanges (const CuMatrixBase< Real > &src, const CuArrayBase< Int32Pair > &indexes)
 For each row r of this and for each column c, sets (*this)(r, c) to the sum src(r, j), where j ranges from indexes[c].first through indexes[c].second - 1. More...
 
void AddRowRanges (const CuMatrixBase< Real > &src, const CuArrayBase< Int32Pair > &indexes)
 For each row r of this and for each column c, do (*this)(r, c) += src(j, c), where j ranges from indexes[r].first through indexes[r].second - 1. More...
 
void AddToDiag (Real value)
 Adds "value" to the diagonal elements of the matrix. More...
 
MatrixIndexT NumRows () const
 Dimensions. More...
 
MatrixIndexT NumCols () const
 
MatrixIndexT Stride () const
 
::MatrixDim Dim () const
 
Real FrobeniusNorm () const
 
bool IsUnit (Real tol=0.001) const
 
bool ApproxEqual (const CuMatrixBase< Real > &other, float tol=0.01) const
 True if ((*this)-other).FrobeniusNorm() <= tol * this->FrobeniusNorm() More...
 
MatrixIndexT SizeInBytes () const
 Get size of matrix in bytes. More...
 
template<typename OtherReal >
void CopyFromMat (const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
 
void CopyFromGeneralMat (const GeneralMatrix &src, MatrixTransposeType trans=kNoTrans)
 
void CopyFromMat (const MatrixBase< Real > &src, MatrixTransposeType trans=kNoTrans)
 
void CopyFromSp (const CuSpMatrix< Real > &M)
 
template<typename OtherReal >
void CopyFromTp (const CuTpMatrix< OtherReal > &M, MatrixTransposeType trans=kNoTrans)
 
template<typename OtherReal >
void CopyFromMat (const CuMatrixBase< OtherReal > &M, MatrixTransposeType trans=kNoTrans)
 
template<typename OtherReal >
void CopyToMat (MatrixBase< OtherReal > *dst, MatrixTransposeType trans=kNoTrans) const
 
void CopyRowsFromVec (const CuVectorBase< Real > &v)
 This function has two modes of operation. More...
 
void CopyRowsFromVec (const VectorBase< Real > &v)
 Version of CopyRowsFromVec() that takes a CPU-based vector. More...
 
void CopyColsFromVec (const CuVectorBase< Real > &v)
 Copies vector into matrix, column-by-column. More...
 
void CopyColFromVec (const CuVectorBase< Real > &v, const MatrixIndexT col)
 Copy vector into specific column of matrix. More...
 
void Sigmoid (const CuMatrixBase< Real > &src)
 Set each element to the sigmoid of the corresponding element of "src": element by element, x = 1 / (1 + exp(-x)) More...
 
void Heaviside (const CuMatrixBase< Real > &src)
 Set each element to the Heaviside function of the corresponding element of "src", which we define as the function (x > 0 ? 1.0 : 0.0) [note: in general, there are different ways to deal with the situation when x==0. More...
 
void SoftHinge (const CuMatrixBase< Real > &src)
 Apply the function y = log(1 + exp(x)), to each element. More...
 
void GroupPnorm (const CuMatrixBase< Real > &src, Real pow)
 Apply the function y(i) = (sum_{j = i*G}^{(i+1)*G-1} x_j ^ (power)) ^ (1 / p) where G = x.NumCols() / y.NumCols() must be an integer. More...
 
void DiffGroupPnorm (const CuMatrixBase< Real > &in_value, const CuMatrixBase< Real > &out_value, const CuMatrixBase< Real > &out_deriv, Real power)
 Differentiate backward through the GroupPnorm function. More...
 
void GroupMax (const CuMatrixBase< Real > &src)
 Apply the function y(i) = (max_{j = i*G}^{(i+1)*G-1} x_j where G = x.NumCols() / y.NumCols() must be an integer. More...
 
void GroupMaxDeriv (const CuMatrixBase< Real > &input, const CuMatrixBase< Real > &output)
 Calculate derivatives for the GroupMax function above, where "input" is the input to the GroupMax function above (i.e. More...
 
void ParametricRelu (const CuMatrixBase< Real > &src, const CuVectorBase< Real > &alpha, const CuVectorBase< Real > &beta)
 Compute the parametric rectified linear unit function; element by element, *this = src * (src > 0 ? alpha : beta) More...
 
void DiffParametricRelu (const CuMatrixBase< Real > &value, const CuMatrixBase< Real > &diff, const CuVectorBase< Real > &alpha, const CuVectorBase< Real > &beta)
 Differentiate backward through the parametric relu function. More...
 
void Tanh (const CuMatrixBase< Real > &src)
 Compute the hyperbolic tangent (tanh) function; element by element, *this = tanh(src). More...
 
void DiffSigmoid (const CuMatrixBase< Real > &value, const CuMatrixBase< Real > &diff)
 Differentiate backward through the sigmoid function. More...
 
void DiffTanh (const CuMatrixBase< Real > &value, const CuMatrixBase< Real > &diff)
 Differentiate backward through the tanh function. More...
 
void DiffSoftmaxPerRow (const CuMatrixBase< Real > &value, const CuMatrixBase< Real > &diff)
 Differentiate backward through the softmax function. More...
 
void DiffLogSoftmaxPerRow (const CuMatrixBase< Real > &out_value, const CuMatrixBase< Real > &out_deriv)
 Differentiate backward through the log softmax function. More...
 
void DiffXent (const CuArrayBase< int32 > &tgt, CuVector< Real > *log_post_tgt)
 Differentiate the block [softmax+cross-entropy] : dE/da = posterior_mat - target_mat, 'E' is error function, 'a' is activation on softmax input. More...
 
void Cholesky (CuMatrixBase< Real > *inv_cholesky=NULL)
 This function does sets *this to the Cholesky factor of *this (i.e. More...
 
void SymInvertPosDef ()
 Inversion for positive definite symmetric matrices. More...
 
void ApplyPow (Real power)
 
void ApplyPowAbs (Real power, bool include_sign=false)
 Apply power to the absolute value of each element. More...
 
void ApplyHeaviside ()
 For each element, sets x = (x > 0 ? 1.0 : 0.0). More...
 
void ApplyFloor (Real floor_val)
 
void ApplyCeiling (Real ceiling_val)
 
void ApplyExp ()
 
void ApplyExpLimited (Real lower_limit, Real upper_limit)
 This is equivalent to running: ApplyFloor(lower_limit); ApplyCeiling(upper_limit); ApplyExp() More...
 
void ApplyExpSpecial ()
 For each element x of the matrix, set it to (x < 0 ? exp(x) : x + 1). More...
 
void ApplySoftMaxPerRow (const CuMatrixBase< Real > &src)
 Softmax nonlinearity Y = Softmax(X) : Yij = e^Xij / sum_k(e^Xik), done to each row, with attention to avoiding overflow or underflow. More...
 
void ApplyLogSoftMaxPerRow (const CuMatrixBase< Real > &src)
 LogSoftmax nonlinearity Y = LogSoftmax(X) : Yij = Xij - log(sum_k(e^Xik)), done to each row, with attention to avoiding overflow or underflow. More...
 
void FindRowMaxId (CuArray< int32 > *id) const
 Find the id of the maximal element for each row (resizes the 'id' array to the appropriate size). More...
 
void SetZero ()
 Math operations, some calling kernels. More...
 
void Set (Real value)
 
void Add (Real value)
 
void SetZeroAboveDiag ()
 Zeroes all elements for which col > row. More...
 
void Scale (Real value)
 
void ApplyLog ()
 
void MulElements (const CuMatrixBase< Real > &A)
 Multiply two matrices elementwise: C = C .* A. More...
 
void DivElements (const CuMatrixBase< Real > &A)
 Divide two matrices elementwise: C = A ./ A. More...
 
void Max (const CuMatrixBase< Real > &A)
 Do, elementwise, *this = max(*this, A). More...
 
void Min (const CuMatrixBase< Real > &A)
 Do, elementwise, *this = min(*this, A). More...
 
void MulColsVec (const CuVectorBase< Real > &scale)
 scale i'th column by scale[i] More...
 
void MulRowsVec (const CuVectorBase< Real > &scale)
 scale i'th row by scale[i] More...
 
void MulRowsGroupMat (const CuMatrixBase< Real > &src)
 divide each row into src.NumCols() groups, and then scale i'th row's jth group of elements by src[i, j]. More...
 
void DivRowsVec (const CuVectorBase< Real > &div)
 divide i'th row by scale[i] More...
 
void InvertElements ()
 invert the matrix by elements. More...
 
void AddMat (Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType trans=kNoTrans)
 *this += alpha * A More...
 
void AddSmat (Real alpha, const CuSparseMatrix< Real > &A, MatrixTransposeType trans=kNoTrans)
 *this += alpha * A. More...
 
void AddSmatMat (Real alpha, const CuSparseMatrix< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, Real beta)
 (*this) = alpha * op(A) * B + beta * (*this), where A is sparse. More...
 
void AddMatSmat (Real alpha, const CuMatrixBase< Real > &A, const CuSparseMatrix< Real > &B, MatrixTransposeType transB, Real beta)
 (*this) = alpha * A * op(B) + beta * (*this), where B is sparse and op(B) is either B or trans(B) depending on the 'transB' argument. More...
 
void AddToElements (Real alpha, const CuArrayBase< int32 > &elements)
 This is a rather special purpose function; we might generalize it later by adding a transpose-type option. More...
 
void AddMatBlocks (Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType trans=kNoTrans)
 This function is like AddMat (it does *this += alpha * src), except that it supports cases where *this and src have different dimension. More...
 
void AddVecToCols (Real alpha, const CuVectorBase< Real > &col, Real beta=1.0)
 (for each column c of *this), c = alpha * col + beta * c More...
 
void AddVecToRows (Real alpha, const CuVectorBase< Real > &row, Real beta=1.0)
 (for each row r of *this), r = alpha * row + beta * r More...
 
void AddMatMat (Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
 C = alpha * A(^T)*B(^T) + beta * C. More...
 
void AddVecVec (Real alpha, const CuVectorBase< Real > &x, const CuVectorBase< Real > &y)
 A = alpha * x * y^T + A . More...
 
void SetMatMatDivMat (const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, const CuMatrixBase< Real > &C)
 *this = a * b / c (by element; when c = 0, *this = a) *this can be an alias of a, b or c safely and get expected result. More...
 
void SymAddMat2 (const Real alpha, const CuMatrixBase< Real > &M, MatrixTransposeType transA, Real beta)
 *this = beta * *this + alpha * M M^T, for symmetric matrices. More...
 
void AddMatBlock (Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuBlockMatrix< Real > &B, MatrixTransposeType transB, Real beta)
 This function is like AddMatMat but for where the second argument is of type CuBlockMatrix (a block-diagonal matrix of blocks). More...
 
void AddDiagVecMat (const Real alpha, const CuVectorBase< Real > &v, const CuMatrixBase< Real > &M, MatrixTransposeType transM, Real beta=1.0)
 *this = beta * *this + alpha * diag(v) * M [or M^T]. More...
 
void AddMatDiagVec (const Real alpha, const CuMatrixBase< Real > &M, MatrixTransposeType transM, CuVectorBase< Real > &v, Real beta=1.0)
 
void AddMatMatElements (const Real alpha, const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, const Real beta)
 *this = beta * *this + alpha * A .* B (.* element by element multiplication) More...
 
void AddMatSp (const Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuSpMatrix< Real > &B, const Real beta)
 this <– beta*this + alpha*A*B More...
 
void AddSpMat (const Real alpha, const CuSpMatrix< Real > &A, const CuMatrixBase< Real > &B, MatrixTransposeType transB, const Real beta)
 this <– beta*this + alpha*SpA*B More...
 
void AddTpMat (const Real alpha, const CuTpMatrix< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, const Real beta)
 this <– beta*this + alpha*A*B. More...
 
void AddMatTp (const Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuTpMatrix< Real > &B, MatrixTransposeType transB, const Real beta)
 this <– beta*this + alpha*A*B. More...
 
void CopyFromBlock (const CuBlockMatrix< Real > &B, MatrixTransposeType trans=kNoTrans)
 
void CopyLowerToUpper ()
 
void CopyUpperToLower ()
 
CuSubMatrix< Real > Range (const MatrixIndexT row_offset, const MatrixIndexT num_rows, const MatrixIndexT col_offset, const MatrixIndexT num_cols) const
 
CuSubMatrix< Real > RowRange (const MatrixIndexT row_offset, const MatrixIndexT num_rows) const
 
CuSubMatrix< Real > ColRange (const MatrixIndexT col_offset, const MatrixIndexT num_cols) const
 
const CuSubVector< Real > Row (MatrixIndexT i) const
 
CuSubVector< Real > Row (MatrixIndexT i)
 
CuValue< Real > operator() (MatrixIndexT r, MatrixIndexT c)
 
Real operator() (MatrixIndexT r, MatrixIndexT c) const
 
Real Sum () const
 
Real Max () const
 
Real Min () const
 
Real Trace (bool check_square=true) const
 Return the trace. If check_square = true, will crash if matrix is not square. More...
 
void SetRandn ()
 
void SetRandUniform ()
 
void Write (std::ostream &os, bool binary) const
 
void AddElements (Real alpha, const std::vector< MatrixElement< Real > > &input)
 
void AddElements (Real alpha, const CuArrayBase< Int32Pair > &indexes, const Real *input)
 
void Lookup (const std::vector< Int32Pair > &indexes, Real *output) const
 
void Lookup (const CuArrayBase< Int32Pair > &indexes, Real *output) const
 
void EqualElementMask (const CuMatrixBase< Real > &mat, CuMatrix< Real > *mask) const
 
const Real * RowData (MatrixIndexT r) const
 Get raw row pointer (const). More...
 
Real * RowData (MatrixIndexT r)
 Get raw row pointer. More...
 
const Real * Data () const
 Return data pointer (const). More...
 
Real * Data ()
 Return data pointer. More...
 
const MatrixBase< Real > & Mat () const
 
MatrixBase< Real > & Mat ()
 

Protected Member Functions

 CuMatrixBase ()
 
 CuMatrixBase (Real *data, MatrixIndexT num_rows, MatrixIndexT num_cols, MatrixIndexT stride)
 This constructor takes the #rows, #cols and stride; it's called from the constructor of CuSubMatrix. More...
 

Protected Attributes

Real * data_
 GPU data pointer (or regular matrix data pointer,. More...
 
MatrixIndexT num_cols_
 
MatrixIndexT num_rows_
 
MatrixIndexT stride_
 

Private Member Functions

 KALDI_DISALLOW_COPY_AND_ASSIGN (CuMatrixBase)
 

Friends

class CuMatrixBase< float >
 
class CuMatrixBase< double >
 
class CuVectorBase< float >
 
class CuVectorBase< double >
 
class VectorBase< Real >
 
class CuSpMatrix< Real >
 
class CuTpMatrix< float >
 
class CuTpMatrix< double >
 
class CuVectorBase< Real >
 
class CuSubMatrix< Real >
 
class CuRand< Real >
 
class CuSubVector< Real >
 
class CuBlockMatrix< Real >
 
class CuSparseMatrix< float >
 
class CuSparseMatrix< double >
 
class CuSparseMatrix< Real >
 
Real TraceMatMat (const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, MatrixTransposeType trans)
 
Real TraceMatSmat (const CuMatrixBase< Real > &A, const CuSparseMatrix< Real > &B, MatrixTransposeType trans)
 
void AddMatMatBatched (const Real alpha, std::vector< CuSubMatrix< Real > * > &C, const std::vector< CuSubMatrix< Real > * > &A, MatrixTransposeType transA, const std::vector< CuSubMatrix< Real > * > &B, MatrixTransposeType transB, const Real beta)
 Does multiple matrix multiplications, executing them in parallel using cuBLAS's gemmBatched if we are using a GPU. More...
 

Detailed Description

template<typename Real>
singleton kaldi::CuMatrixBase< Real >

Matrix for CUDA computing.

Does the computation on the CUDA card when CUDA is compiled in and we have a suitable GPU (CuDevice::Instantiate().Enabled() == true); otherwise, does it on the CPU.

Definition at line 69 of file matrix-common.h.

Constructor & Destructor Documentation

CuMatrixBase ( )
inlineprotected

Definition at line 710 of file cu-matrix.h.

710 : data_(NULL), num_cols_(0), num_rows_(0), stride_(0) { }
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
MatrixIndexT stride_
Definition: cu-matrix.h:730
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
CuMatrixBase ( Real *  data,
MatrixIndexT  num_rows,
MatrixIndexT  num_cols,
MatrixIndexT  stride 
)
inlineprotected

This constructor takes the #rows, #cols and stride; it's called from the constructor of CuSubMatrix.

Definition at line 714 of file cu-matrix.h.

717  :
718  data_(data), num_cols_(num_cols), num_rows_(num_rows), stride_(stride) { }
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
MatrixIndexT stride_
Definition: cu-matrix.h:730
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
MatrixIndexT num_rows_
Definition: cu-matrix.h:729

Member Function Documentation

void Add ( Real  value)

Definition at line 548 of file cu-matrix.cc.

References data_.

Referenced by BackpropTruncationComponent::Backprop(), TanhComponent::Backprop(), kaldi::CuCompressedMatrixTestNonnegative(), kaldi::CuCompressedMatrixTestSymmetric(), Xent::Eval(), GeneralDropoutComponent::GetMemo(), main(), kaldi::MeanVariance(), DropoutMaskComponent::Propagate(), DropoutComponent::Propagate(), Dropout::PropagateFnc(), ClipGradientComponent::RepairGradients(), TanhComponent::StoreStats(), kaldi::TestCuMatrixCompObjfAndDeriv(), kaldi::nnet3::TestSimpleComponentPropagateProperties(), kaldi::UnitTestCuMatrixAdd(), kaldi::UnitTestCuMatrixAdd2(), kaldi::UnitTestCuMatrixEqualElementMask(), kaldi::UnitTestCuMatrixObjfDeriv(), kaldi::UnitTestCuMatrixSetRandUniform(), and kaldi::UnitTestCuMatrixTraceMatMat().

548  {
549 #if HAVE_CUDA == 1
550  if (CuDevice::Instantiate().Enabled()) {
551  if (num_rows_ == 0) return;
552  CuTimer tim;
553 
554  dim3 dimGrid, dimBlock;
555  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
556  &dimGrid, &dimBlock);
557 
558  cuda_add(dimGrid, dimBlock, data_, value, Dim());
559  CU_SAFE_CALL(cudaGetLastError());
560 
561  CuDevice::Instantiate().AccuProfile(__func__, tim);
562  } else
563  #endif
564  {
565  Mat().Add(value);
566  }
567 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
::MatrixDim Dim() const
Definition: cu-matrix.h:220
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void AddCols ( const CuMatrixBase< Real > &  src,
const CuArrayBase< MatrixIndexT > &  indices 
)

Add column indices[r] of src to column r.

As a special case, if indexes[i] == -1, skip column i indices.size() must equal this->NumCols(), and src.NumRows() must equal this.NumRows()

Definition at line 2668 of file cu-matrix.cc.

References CuArrayBase< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuArrayBase< T >::Dim(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by Convolutional1dComponent::Backprop(), ConvolutionalComponent::BackpropagateFnc(), ConvolutionComponent::InderivPatchesToInderiv(), and MaxpoolingComponent::InderivPatchesToInderiv().

2669  {
2670 #if HAVE_CUDA == 1
2671  if (CuDevice::Instantiate().Enabled()) {
2672  KALDI_ASSERT(indices.Dim() == NumCols());
2673  KALDI_ASSERT(NumRows() == src.NumRows());
2674  CuTimer tim;
2675  dim3 dimGrid, dimBlock;
2676  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2677  &dimGrid, &dimBlock);
2678  cuda_add_cols(dimGrid, dimBlock, data_, src.Data(), indices.Data(),
2679  Dim(), src.Stride());
2680  CU_SAFE_CALL(cudaGetLastError());
2681  CuDevice::Instantiate().AccuProfile(__func__, tim);
2682  } else
2683 #endif
2684  {
2685  Mat().AddCols(src.Mat(), indices.Data());
2686  }
2687 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddDiagVecMat ( const Real  alpha,
const CuVectorBase< Real > &  v,
const CuMatrixBase< Real > &  M,
MatrixTransposeType  transM,
Real  beta = 1.0 
)

*this = beta * *this + alpha * diag(v) * M [or M^T].

The same as adding M but scaling each row M_i by v(i).

Definition at line 1369 of file cu-matrix.cc.

References CU2DBLOCK, CuVectorBase< Real >::Data(), CuMatrixBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), kaldi::SameDim(), CuMatrixBase< Real >::Stride(), kaldi::swap(), and CuVectorBase< Real >::Vec().

Referenced by kaldi::nnet3::attention::ApplyScalesToInput(), kaldi::nnet3::attention::ApplyScalesToOutput(), HiddenSoftmax::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), OnlinePreconditioner::ComputeWt1(), OnlineNaturalGradient::ComputeWt1(), kaldi::cu::DiffNormalizePerRow(), CuMatrixBase< Real >::DiffSoftmaxPerRow(), MultiBasisComponent::PropagateFnc(), and kaldi::TestCuMatrixAddDiagVecMat().

1372  {
1373 #if HAVE_CUDA == 1
1374  if (CuDevice::Instantiate().Enabled()) {
1375  if (transM == kNoTrans) {
1376  KALDI_ASSERT(SameDim(*this, M));
1377  } else {
1378  KALDI_ASSERT(M.NumRows() == NumCols() && M.NumCols() == NumRows());
1379  }
1380  KALDI_ASSERT(v.Dim() == this->NumRows());
1381 
1382  CuTimer tim;
1383  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
1384  dim3 dimGrid(n_blocks(num_cols_, CU2DBLOCK),
1385  n_blocks(num_rows_, CU2DBLOCK));
1386  MatrixIndexT M_row_stride = M.Stride(), M_col_stride = 1;
1387  if (transM == kTrans)
1388  std::swap(M_row_stride, M_col_stride);
1389  cuda_add_diag_vec_mat(dimGrid, dimBlock, alpha, data_, Dim(),
1390  v.Data(), M.Data(), M_row_stride, M_col_stride, beta);
1391  CU_SAFE_CALL(cudaGetLastError());
1392  CuDevice::Instantiate().AccuProfile(__func__, tim);
1393  } else
1394 #endif
1395  {
1396  Mat().AddDiagVecMat(alpha, v.Vec(), M.Mat(), transM, beta);
1397  }
1398 }
void swap(basic_filebuf< CharT, Traits > &x, basic_filebuf< CharT, Traits > &y)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
int32 MatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void AddElements ( Real  alpha,
const std::vector< MatrixElement< Real > > &  input 
)

Definition at line 3244 of file cu-matrix.cc.

References CU1DBLOCK, data_, rnnlm::i, and KALDI_ASSERT.

Referenced by OnlinePreconditioner::InitOrthonormalSpecial(), OnlineNaturalGradient::InitOrthonormalSpecial(), DiscriminativeComputation::ProcessPosteriors(), and kaldi::UnitTestCuMatrixAddElements().

3245  {
3246  // Checks the dimension.
3247  MatrixIndexT num_rows = this->num_rows_, num_cols = this->num_cols_;
3248  for (int32 i = 0; i < input.size(); ++i) {
3249  KALDI_ASSERT(input[i].row < num_rows && input[i].row >= 0 &&
3250  input[i].column < num_cols && input[i].column >= 0);
3251  }
3252 #if HAVE_CUDA == 1
3253  if (CuDevice::Instantiate().Enabled()) {
3254  void *addr = CuDevice::Instantiate().Malloc(input.size() * sizeof(MatrixElement<Real>));
3255  CU_SAFE_CALL(cudaMemcpy(addr, input.data(),
3256  input.size() * sizeof(MatrixElement<Real>),
3257  cudaMemcpyHostToDevice));
3258 
3259  CuTimer tim;
3260  int dimBlock(CU1DBLOCK);
3261  int dimGrid(n_blocks(input.size(), CU1DBLOCK));
3262 
3263  cuda_matrix_add_elements(dimGrid, dimBlock, this->data_, this->Dim(),
3264  alpha, (MatrixElement<Real>*)addr, input.size());
3265  CU_SAFE_CALL(cudaGetLastError());
3266  CuDevice::Instantiate().Free(addr);
3267  CuDevice::Instantiate().AccuProfile(__func__, tim);
3268  } else
3269 #endif
3270  {
3271  for (int32 i = 0; i < input.size(); i++) {
3272  (*this)(input[i].row, input[i].column) += alpha * input[i].weight;
3273  }
3274  }
3275 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
int32 MatrixIndexT
Definition: matrix-common.h:98
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void AddElements ( Real  alpha,
const CuArrayBase< Int32Pair > &  indexes,
const Real *  input 
)

Definition at line 3278 of file cu-matrix.cc.

References CU1DBLOCK, CuArrayBase< T >::Data(), data_, CuArrayBase< T >::Dim(), Int32Pair::first, rnnlm::i, KALDI_ASSERT, kaldi::kUndefined, and Int32Pair::second.

3279  {
3280  if (indexes.Dim() == 0) return;
3281  KALDI_ASSERT(input != NULL);
3282 
3283 #if HAVE_CUDA == 1
3284  if (CuDevice::Instantiate().Enabled()) {
3285  CuTimer tim;
3286  CuVector<Real> tmp_vec(indexes.Dim(), kUndefined);
3287  CU_SAFE_CALL(cudaMemcpy(tmp_vec.Data(), input, indexes.Dim() * sizeof(Real),
3288  cudaMemcpyHostToDevice));
3289 
3290  int dimBlock(CU1DBLOCK);
3291  int dimGrid = n_blocks(indexes.Dim(), CU1DBLOCK);
3292  cuda_matrix_add_indexed_values(dimGrid, dimBlock, this->Dim(), alpha,
3293  indexes.Data(), tmp_vec.Data(), indexes.Dim(), this->data_);
3294  CU_SAFE_CALL(cudaGetLastError());
3295  CuDevice::Instantiate().AccuProfile(__func__, tim);
3296  } else
3297 #endif
3298  {
3299  MatrixIndexT num_rows = this->num_rows_, num_cols = this->num_cols_;
3300  const Int32Pair *index = indexes.Data();
3301  for (int32 i = 0; i < indexes.Dim(); i++) {
3302  KALDI_ASSERT(index[i].first < num_rows && index[i].first >= 0 &&
3303  index[i].second < num_cols && index[i].second >= 0);
3304  (*this)(index[i].first, index[i].second) += alpha * input[i];
3305  }
3306  }
3307 }
const T * Data() const
Get raw pointer.
Definition: cu-array.h:52
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
int32 MatrixIndexT
Definition: matrix-common.h:98
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:49
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
int32_cuda second
Definition: cu-matrixdim.h:86
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
int32_cuda first
Definition: cu-matrixdim.h:85
void AddMat ( Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  trans = kNoTrans 
)

*this += alpha * A

Definition at line 941 of file cu-matrix.cc.

References CU2DBLOCK, data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by RestrictedAttentionComponent::Add(), TimeHeightConvolutionComponent::Add(), RepeatedAffineComponent::Add(), NaturalGradientAffineComponent::Add(), AffineComponent::Add(), LinearComponent::Add(), BlockAffineComponent::Add(), Convolutional1dComponent::Add(), ConvolutionComponent::Add(), LstmNonlinearityComponent::Add(), CuRand< Real >::AddGaussNoise(), GeneralMatrix::AddToMat(), CuMatrixBase< Real >::ApproxEqual(), kaldi::nnet3::attention::AttentionBackward(), kaldi::nnet3::attention::AttentionForward(), SigmoidComponent::Backprop(), LstmNonlinearityComponent::Backprop(), Splice::BackpropagateFnc(), AveragePoolingComponent::BackpropagateFnc(), AveragePooling2DComponent::BackpropagateFnc(), Convolutional2DComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), DiscriminativeComputation::Compute(), kaldi::nnet3::ConstrainOrthonormalInternal(), kaldi::CuCompressedMatrixTestNonnegative(), kaldi::CuCompressedMatrixTestSymmetric(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), Xent::Eval(), Mse::Eval(), NnetComputer::ExecuteCommand(), TimeHeightConvolutionComponent::PerturbParams(), RepeatedAffineComponent::PerturbParams(), AffineComponent::PerturbParams(), LinearComponent::PerturbParams(), BlockAffineComponent::PerturbParams(), Convolutional1dComponent::PerturbParams(), ConvolutionComponent::PerturbParams(), LstmNonlinearityComponent::PerturbParams(), AdditiveNoiseComponent::Propagate(), Rbm::RbmUpdate(), ClipGradientComponent::RepairGradients(), RestrictedAttentionComponent::StoreStats(), kaldi::nnet3::attention::TestAttentionForwardBackward(), kaldi::UnitTestCuMatrixAddMat(), kaldi::UnitTestCuMatrixAddMatBlocks1(), kaldi::UnitTestCuMatrixAddMatBlocks1Trans(), kaldi::UnitTestCuMatrixAddMatBlocks2(), kaldi::UnitTestCuMatrixAddMatDiagVec(), kaldi::UnitTestCuMatrixAddMatMatElements(), kaldi::UnitTestLstmNonlinearity(), kaldi::nnet3::UnitTestNnetInputDerivatives(), LinearTransform::Update(), AffineTransform::Update(), RecurrentComponent::Update(), ConvolutionalComponent::Update(), Convolutional2DComponent::Update(), NaturalGradientRepeatedAffineComponent::Update(), LstmProjected::Update(), BlstmProjected::Update(), Convolutional1dComponent::Update(), ConvolutionComponent::Update(), and TimeHeightConvolutionComponent::UpdateNaturalGradient().

942  {
943 
944 #if HAVE_CUDA == 1
945  if (CuDevice::Instantiate().Enabled()) {
946  if (transA == kNoTrans) {
947  KALDI_ASSERT(A.NumRows() == num_rows_ && A.NumCols() == num_cols_);
948  } else {
949  KALDI_ASSERT(A.NumCols() == num_rows_ && A.NumRows() == num_cols_);
950  }
951  if (num_rows_ == 0) return;
952  CuTimer tim;
953  // This block dimension seems to work better than the
954  // one from GetBlockSizesForSimpleMatrixOperation().
955  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
956  dim3 dimGrid(n_blocks(NumCols(), CU2DBLOCK),
957  n_blocks(NumRows(), CU2DBLOCK));
958  cuda_add_mat(dimGrid, dimBlock, alpha, A.data_,
959  data_, Dim(), A.Stride(),
960  (transA == kTrans ? 1 : 0));
961  CU_SAFE_CALL(cudaGetLastError());
962 
963  CuDevice::Instantiate().AccuProfile(__func__, tim);
964  } else
965 #endif
966  {
967  Mat().AddMat(alpha, A.Mat(), transA);
968  }
969 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
KALDI_ASSERT & A
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void AddMatBlock ( Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  transA,
const CuBlockMatrix< Real > &  B,
MatrixTransposeType  transB,
Real  beta 
)

This function is like AddMatMat but for where the second argument is of type CuBlockMatrix (a block-diagonal matrix of blocks).

Definition at line 3172 of file cu-matrix.cc.

References A, CuMatrixBase< Real >::AddMatMat(), CuBlockMatrix< Real >::Block(), CU2DBLOCK, CuMatrixBase< Real >::Data(), data_, KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuBlockMatrix< Real >::NumBlocks(), CuBlockMatrix< Real >::NumCols(), CuMatrixBase< Real >::NumCols(), CuBlockMatrix< Real >::NumRows(), CuMatrixBase< Real >::NumRows(), CuMatrixBase< Real >::Stride(), and kaldi::swap().

Referenced by kaldi::UnitTestCuBlockMatrixAddMatBlock().

3176  {
3177  // Check dimensions
3178  int32 A_num_rows = A.NumRows(), A_num_cols = A.NumCols(),
3179  A_row_stride = A.Stride(), A_col_stride = 1,
3180  B_num_rows = B.NumRows(), B_num_cols = B.NumCols();
3181  if (transA == kTrans) {
3182  std::swap(A_num_rows, A_num_cols);
3183  std::swap(A_row_stride, A_col_stride);
3184  }
3185  if (transB == kTrans) {
3186  std::swap(B_num_rows, B_num_cols);
3187  }
3188  // At this point the {A,B}_{rows,cols} variables are
3189  // after any transposition.
3190  KALDI_ASSERT(NumRows() == A_num_rows && NumCols() == B_num_cols);
3191  KALDI_ASSERT(A_num_cols == B_num_rows);
3192  int32 B_num_blocks = B.NumBlocks();
3193 
3194  if (num_rows_ == 0) return;
3195 #if HAVE_CUDA == 1
3196  if (CuDevice::Instantiate().Enabled()) {
3197  CuTimer tim;
3198  MatrixDim this_dim = Dim();
3199 
3200  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
3201  // (x,y) indices will be (row of *this, block of B)
3202  dim3 dimGrid(n_blocks(num_rows_, CU2DBLOCK),
3203  n_blocks(B_num_blocks, CU2DBLOCK));
3204 
3205  // caution: the use of x as the row-index is not good, but
3206  // this code is not much used, so I'm not updating it.a
3207  cuda_add_mat_blockmat(dimGrid, dimBlock, data_, this_dim, A.Data(),
3208  A_num_rows, A_num_cols, A_row_stride, A_col_stride,
3209  B.CuData(), B_num_blocks, alpha, beta,
3210  (transB == kTrans ? 1 : 0));
3211 
3212  CU_SAFE_CALL(cudaGetLastError());
3213 
3214  CuDevice::Instantiate().AccuProfile(__func__, tim);
3215  } else
3216 #endif
3217  {
3218  // "row_offset" and "col_offset" are offsets into B (or into B^T, if
3219  // transB == kTrans).
3220  int32 row_offset = 0, col_offset = 0;
3221  for (int32 b = 0; b < B_num_blocks; b++) {
3222  const CuSubMatrix<Real> this_block = B.Block(b);
3223  int32 this_num_rows = this_block.NumRows(),
3224  this_num_cols = this_block.NumCols();
3225  if (transB == kTrans) std::swap(this_num_rows, this_num_cols);
3226  CuSubMatrix<Real> this_part(*this, 0, num_rows_,
3227  col_offset, this_num_cols);
3228  CuSubMatrix<Real> A_part = (transA == kNoTrans ?
3230  row_offset, this_num_rows) :
3231  CuSubMatrix<Real>(A, row_offset, this_num_rows,
3232  0, num_rows_));
3233  this_part.AddMatMat(alpha, A_part, transA, this_block, transB, beta);
3234  row_offset += this_num_rows;
3235  col_offset += this_num_cols;
3236  }
3237  // Note: the values being compared below are all after applying any
3238  // transposition to B.
3239  KALDI_ASSERT(row_offset == B_num_rows && col_offset == B_num_cols);
3240  }
3241 }
friend class CuSubMatrix< Real >
Definition: cu-matrix.h:89
Structure containing size of the matrix plus stride.
Definition: cu-matrixdim.h:52
void swap(basic_filebuf< CharT, Traits > &x, basic_filebuf< CharT, Traits > &y)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
KALDI_ASSERT & A
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void AddMatBlocks ( Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  trans = kNoTrans 
)

This function is like AddMat (it does *this += alpha * src), except that it supports cases where *this and src have different dimension.

There are two allowed cases:

(1) *this is larger than src; we do a broadcasting operation. *this must have NumRows() == a * src.NumRows() and NumCols() == b * src.NumCols() for integer a >= 1, b >= 1. *this will be treated as a being made up of of blocks with the same size as src, and to each block we'll add alpha * src. This case does not support trans == kTrans.

(2) *this is smaller than src; we sum. src.NumRows() must == a * this->NumRows(), and src.NumCols() must == b * this->NumCols(), for a >= 1, b >= 1. In this case, src will be treated as being made up of blocks with the same size as *this, and to *this we will add the summation of all of those blocks.

Definition at line 1106 of file cu-matrix.cc.

References MatrixBase< Real >::AddMat(), data_, CuMatrixBase< Real >::data_, CuMatrixBase< Real >::Dim(), rnnlm::i, rnnlm::j, KALDI_ASSERT, KALDI_ERR, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::Mat(), MatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by SumBlockComponent::Backprop(), SumBlockComponent::Propagate(), kaldi::UnitTestCuMatrixAddMatBlocks1(), kaldi::UnitTestCuMatrixAddMatBlocks1Trans(), kaldi::UnitTestCuMatrixAddMatBlocks2(), Convolutional1dComponent::Update(), and ConvolutionComponent::Update().

1107  {
1108  if (num_rows_ == 0 || num_cols_ == 0) return;
1109 
1110  if (A.NumRows() >= (transA == kNoTrans ? num_rows_ : num_cols_) &&
1111  A.NumCols() >= (transA == kNoTrans ? num_cols_ : num_rows_)) {
1112  // This is the "summing", not broadcasting, version of AddMatBlocks.
1113  // It supports both regular and transposed operation.
1114  int32 num_row_blocks, num_col_blocks;
1115  if (transA == kNoTrans) {
1116  KALDI_ASSERT(A.NumRows() % num_rows_ == 0 && A.NumCols() % num_cols_ == 0);
1117  num_row_blocks = A.Mat().NumRows() / num_rows_;
1118  num_col_blocks = A.Mat().NumCols() / num_cols_;
1119  } else {
1120  KALDI_ASSERT(A.NumRows() % num_cols_ == 0 && A.NumCols() % num_rows_ == 0);
1121  num_row_blocks = A.Mat().NumRows() / num_cols_;
1122  num_col_blocks = A.Mat().NumCols() / num_rows_;
1123  }
1124 #if HAVE_CUDA == 1
1125  if (CuDevice::Instantiate().Enabled()) {
1126  CuTimer tim;
1127  dim3 dimGrid, dimBlock;
1128  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1129  &dimGrid, &dimBlock);
1130  cuda_add_mat_blocks(dimGrid, dimBlock, alpha, A.data_, num_row_blocks,
1131  num_col_blocks, data_, Dim(), A.Stride(),
1132  (transA == kTrans ? 1 : 0));
1133  CU_SAFE_CALL(cudaGetLastError());
1134 
1135  CuDevice::Instantiate().AccuProfile(__func__, tim);
1136  } else
1137 #endif
1138  {
1139  int32 nr, nc;
1140  if (transA == kNoTrans) {
1141  nr = num_rows_;
1142  nc = num_cols_;
1143  } else {
1144  nr = num_cols_;
1145  nc = num_rows_;
1146  }
1147  for (int32 i = 0; i < num_row_blocks; i++) {
1148  for (int32 j = 0; j < num_col_blocks; j++) {
1149  Mat().AddMat(alpha, SubMatrix<Real>(A.Mat(), i * nr, nr, j * nc, nc),
1150  transA);
1151  }
1152  }
1153  }
1154  } else {
1155  // This is the "broadcasting" version of AddMatBlocks, where
1156  // *this is larger than src.
1157  if (transA != kNoTrans)
1158  KALDI_ERR << "Transposed operation not supported currently.";
1159  if (!(num_rows_ % A.NumRows() == 0 && num_cols_ % A.NumCols() == 0))
1160  KALDI_ERR << "Invalid sizes of arguments";
1161 #if HAVE_CUDA == 1
1162  if (CuDevice::Instantiate().Enabled()) {
1163  CuTimer tim;
1164  dim3 dimGrid, dimBlock;
1165  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1166  &dimGrid, &dimBlock);
1167  cuda_add_mat_repeated(dimGrid, dimBlock, alpha,
1168  A.data_, A.Dim(), data_, Dim());
1169  CU_SAFE_CALL(cudaGetLastError());
1170  CuDevice::Instantiate().AccuProfile(__func__, tim);
1171  } else
1172 #endif
1173  {
1174  const MatrixBase<Real> &src_mat = A.Mat(),
1175  &this_mat = this->Mat();
1176  for (int32 row_offset = 0; row_offset < NumRows();
1177  row_offset += src_mat.NumRows()) {
1178  for (int32 col_offset = 0; col_offset < NumCols();
1179  col_offset += src_mat.NumCols()) {
1180  SubMatrix<Real> this_part(this_mat,
1181  row_offset, src_mat.NumRows(),
1182  col_offset, src_mat.NumCols());
1183  this_part.AddMat(alpha, src_mat);
1184  }
1185  }
1186  }
1187  }
1188 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define KALDI_ERR
Definition: kaldi-error.h:127
KALDI_ASSERT & A
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void AddMatDiagVec ( const Real  alpha,
const CuMatrixBase< Real > &  M,
MatrixTransposeType  transM,
CuVectorBase< Real > &  v,
Real  beta = 1.0 
)

Definition at line 1402 of file cu-matrix.cc.

References CuVectorBase< Real >::Data(), CuMatrixBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), kaldi::SameDim(), CuMatrixBase< Real >::Stride(), kaldi::swap(), and CuVectorBase< Real >::Vec().

Referenced by BatchNormComponent::Backprop(), SigmoidComponent::RepairGradients(), and TanhComponent::RepairGradients().

1406  {
1407 #if HAVE_CUDA == 1
1408  if (CuDevice::Instantiate().Enabled()) {
1409  if (transM == kNoTrans) {
1410  KALDI_ASSERT(SameDim(*this, M));
1411  } else {
1412  KALDI_ASSERT(M.NumRows() == NumCols() && M.NumCols() == NumRows());
1413  }
1414  KALDI_ASSERT(v.Dim() == this->NumCols());
1415 
1416  CuTimer tim;
1417  dim3 dimGrid, dimBlock;
1418  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1419  &dimGrid, &dimBlock);
1420  MatrixIndexT M_row_stride = M.Stride(), M_col_stride = 1;
1421  if (transM == kTrans) std::swap(M_row_stride, M_col_stride);
1422  cuda_add_mat_diag_vec(dimGrid, dimBlock, alpha, data_, Dim(),
1423  M.Data(), M_row_stride, M_col_stride, v.Data(), beta);
1424  CU_SAFE_CALL(cudaGetLastError());
1425  CuDevice::Instantiate().AccuProfile(__func__, tim);
1426  } else
1427 #endif
1428  {
1429  Mat().AddMatDiagVec(alpha, M.Mat(), transM, v.Vec(), beta);
1430  }
1431 }
void swap(basic_filebuf< CharT, Traits > &x, basic_filebuf< CharT, Traits > &y)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
int32 MatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddMatMat ( Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  transA,
const CuMatrixBase< Real > &  B,
MatrixTransposeType  transB,
Real  beta 
)

C = alpha * A(^T)*B(^T) + beta * C.

Definition at line 1278 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, kaldi::kTrans, CuMatrixBase< Real >::Mat(), rnnlm::n, CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by CuMatrixBase< Real >::AddMatBlock(), CuBlockMatrix< Real >::AddMatMat(), CuMatrixBase< BaseFloat >::AddMatSp(), CuMatrixBase< BaseFloat >::AddMatTp(), CuMatrixBase< BaseFloat >::AddSpMat(), CuMatrixBase< BaseFloat >::AddTpMat(), RepeatedAffineComponent::Backprop(), AffineComponent::Backprop(), LinearComponent::Backprop(), FixedLinearComponent::Backprop(), FixedAffineComponent::Backprop(), LinearTransform::BackpropagateFnc(), AffineTransform::BackpropagateFnc(), RecurrentComponent::BackpropagateFnc(), ConvolutionalComponent::BackpropagateFnc(), LstmProjected::BackpropagateFnc(), BlstmProjected::BackpropagateFnc(), ModelCollapser::CollapseComponentsAffine(), AffineComponent::CollapseWithNext(), AffineComponent::CollapseWithPrevious(), OnlinePreconditioner::ComputeWt1(), OnlineNaturalGradient::ComputeWt1(), kaldi::nnet3::ConstrainOrthonormalInternal(), kaldi::nnet3::time_height_convolution::ConvolveBackwardDataInternal(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParamsInternal(), kaldi::nnet3::time_height_convolution::ConvolveForwardInternal(), kaldi::CuVectorUnitTestAddDiagMatMat(), OnlinePreconditioner::InitOrthonormalSpecial(), kaldi::nnet2::PreconditionDirections(), OnlinePreconditioner::PreconditionDirectionsInternal(), OnlineNaturalGradient::PreconditionDirectionsInternal(), AffineComponent::Propagate(), LinearComponent::Propagate(), DctComponent::Propagate(), FixedLinearComponent::Propagate(), FixedAffineComponent::Propagate(), KlHmm::PropagateFnc(), LinearTransform::PropagateFnc(), AffineTransform::PropagateFnc(), RecurrentComponent::PropagateFnc(), Rbm::PropagateFnc(), LstmProjected::PropagateFnc(), BlstmProjected::PropagateFnc(), Rbm::RbmUpdate(), Rbm::Reconstruct(), OnlinePreconditioner::ReorthogonalizeXt1(), OnlineNaturalGradient::ReorthogonalizeXt1(), kaldi::TestCuMatrixMatMat(), kaldi::UnitTestCuBlockMatrixAddMatMat(), kaldi::UnitTestCuCholesky(), kaldi::UnitTestCuMatrixAddMatMat(), kaldi::UnitTestCuMatrixSymAddMat2(), kaldi::UnitTestCuMatrixSymInvertPosDef(), kaldi::UnitTestCuSpMatrixInvert(), LinearTransform::Update(), AffineTransform::Update(), RecurrentComponent::Update(), ConvolutionalComponent::Update(), Convolutional2DComponent::Update(), RepeatedAffineComponent::Update(), NaturalGradientAffineComponent::Update(), AffineComponentPreconditioned::Update(), AffineComponentPreconditionedOnline::Update(), BlockAffineComponentPreconditioned::Update(), AffineComponent::UpdateSimple(), and BlockAffineComponent::UpdateSimple().

1280  {
1281 
1282 
1283  // CUBLAS is col-major, cudamatrix is row-major, how to do the mapping?
1284  // keep trans..., just swap A&B matrices: A->B B->A
1285  MatrixIndexT m = ((transB==kTrans)? B.NumRows() : B.NumCols());
1286  MatrixIndexT n = ((transA==kTrans)? A.NumCols() : A.NumRows());
1287  MatrixIndexT k = ((transB==kTrans)? B.NumCols() : B.NumRows());
1288  MatrixIndexT k1 = ((transA==kTrans)? A.NumRows() : A.NumCols());
1289 
1290  KALDI_ASSERT(m == NumCols());
1291  KALDI_ASSERT(n == NumRows());
1292  KALDI_ASSERT(k == k1);
1293 
1294  if (m == 0) return;
1295 
1296 
1297 #if HAVE_CUDA == 1
1298  if (CuDevice::Instantiate().Enabled()) {
1299  CuTimer tim;
1300  CUBLAS_SAFE_CALL(cublas_gemm(GetCublasHandle(),
1301  (transB==kTrans? CUBLAS_OP_T:CUBLAS_OP_N),
1302  (transA==kTrans? CUBLAS_OP_T:CUBLAS_OP_N),
1303  m, n, k, alpha, B.data_, B.Stride(),
1304  A.data_, A.Stride(), beta, data_, Stride()));
1305 
1306  CuDevice::Instantiate().AccuProfile(__func__, tim);
1307  } else
1308 #endif
1309  {
1310  Mat().AddMatMat(alpha, A.Mat(), transA, B.Mat(), transB, beta);
1311  }
1312 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
int32 MatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
struct rnnlm::@11::@12 n
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
KALDI_ASSERT & A
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT Stride() const
Definition: cu-matrix.h:216
void AddMatMatElements ( const Real  alpha,
const CuMatrixBase< Real > &  A,
const CuMatrixBase< Real > &  B,
const Real  beta 
)

*this = beta * *this + alpha * A .* B (.* element by element multiplication)

Definition at line 1434 of file cu-matrix.cc.

References CuMatrixBase< Real >::Data(), data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by StatisticsExtractionComponent::Backprop(), StatisticsPoolingComponent::Propagate(), and kaldi::UnitTestCuMatrixSetMatMatDivMat().

1435  {
1436 #if HAVE_CUDA == 1
1437  if (CuDevice::Instantiate().Enabled()) {
1438  KALDI_ASSERT(SameDim(*this, A) && SameDim(A, B));
1439  CuTimer tim;
1440  dim3 dimGrid, dimBlock;
1441  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1442  &dimGrid, &dimBlock);
1443  cuda_add_mat_mat_elements(dimGrid, dimBlock, this->data_, A.Data(),
1444  B.Data(), Dim(), A.Stride(), B.Stride(), alpha, beta);
1445  CuDevice::Instantiate().AccuProfile(__func__, tim);
1446  } else
1447 #endif
1448  {
1449  Mat().AddMatMatElements(alpha, A.Mat(), B.Mat(), beta);
1450  }
1451 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
KALDI_ASSERT & A
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddMatSmat ( Real  alpha,
const CuMatrixBase< Real > &  A,
const CuSparseMatrix< Real > &  B,
MatrixTransposeType  transB,
Real  beta 
)

(*this) = alpha * A * op(B) + beta * (*this), where B is sparse and op(B) is either B or trans(B) depending on the 'transB' argument.

This is multiplication of a dense by a sparse matrix. See also AddSmatMat.

Definition at line 1067 of file cu-matrix.cc.

References CuSparseMatrix< Real >::CsrColIdx(), CuSparseMatrix< Real >::CsrRowPtr(), CuSparseMatrix< Real >::CsrVal(), CuMatrixBase< Real >::Data(), KALDI_ASSERT, kaldi::kNoTrans, CuMatrixBase< Real >::Mat(), CuSparseMatrix< Real >::NumCols(), CuMatrixBase< Real >::NumCols(), CuSparseMatrix< Real >::NumElements(), CuSparseMatrix< Real >::NumRows(), CuMatrixBase< Real >::NumRows(), CuSparseMatrix< Real >::Smat(), and CuMatrixBase< Real >::Stride().

Referenced by kaldi::UnitTextCuMatrixAddMatSmat().

1069  {
1070 #if HAVE_CUDA == 1
1071  if (CuDevice::Instantiate().Enabled()) {
1072  if (transB == kNoTrans) {
1073  KALDI_ASSERT(NumRows() == A.NumRows());
1074  KALDI_ASSERT(NumCols() == B.NumCols());
1075  KALDI_ASSERT(A.NumCols() == B.NumRows());
1076  } else {
1077  KALDI_ASSERT(NumRows() == A.NumRows());
1078  KALDI_ASSERT(NumCols() == B.NumRows());
1079  KALDI_ASSERT(A.NumCols() == B.NumCols());
1080  }
1081 
1082  CuTimer tim;
1083 
1084  cusparseMatDescr_t descr;
1085  CUSPARSE_SAFE_CALL(cusparseCreateMatDescr(&descr));
1086  CU_SAFE_CALL(
1087  cusparse_csrmm(
1088  GetCusparseHandle(),
1089  transB == kNoTrans ?
1090  CUSPARSE_OPERATION_TRANSPOSE : CUSPARSE_OPERATION_NON_TRANSPOSE,
1091  B.NumRows(), NumRows(), B.NumCols(), B.NumElements(), &alpha, descr,
1092  B.CsrVal(), B.CsrRowPtr(), B.CsrColIdx(), A.Data(), A.Stride(),
1093  &beta, Data(), Stride()));
1094  CUSPARSE_SAFE_CALL(cusparseDestroyMatDescr(descr));
1095 
1096  CuDevice::Instantiate().AccuProfile(__func__, tim);
1097  } else
1098 #endif
1099  {
1100  Mat().AddMatSmat(alpha, A.Mat(), B.Smat(), transB, beta);
1101  }
1102 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
KALDI_ASSERT & A
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT Stride() const
Definition: cu-matrix.h:216
const Real * Data() const
Return data pointer (const).
Definition: cu-matrix.h:689
void AddMatSp ( const Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  transA,
const CuSpMatrix< Real > &  B,
const Real  beta 
)
inline

this <– beta*this + alpha*A*B

Definition at line 557 of file cu-matrix.h.

560  {
561  CuMatrix<Real> M(B);
562  return AddMatMat(alpha, A, transA, M, kNoTrans, beta);
563  }
void AddMatMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
C = alpha * A(^T)*B(^T) + beta * C.
Definition: cu-matrix.cc:1278
KALDI_ASSERT & A
void AddMatTp ( const Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  transA,
const CuTpMatrix< Real > &  B,
MatrixTransposeType  transB,
const Real  beta 
)
inline

this <– beta*this + alpha*A*B.

Definition at line 584 of file cu-matrix.h.

Referenced by kaldi::UnitTestCuMatrixAddMatTp().

587  {
588  CuMatrix<Real> M(B);
589  return AddMatMat(alpha, A, transA, M, transB, beta);
590  }
void AddMatMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
C = alpha * A(^T)*B(^T) + beta * C.
Definition: cu-matrix.cc:1278
KALDI_ASSERT & A
void AddRowRanges ( const CuMatrixBase< Real > &  src,
const CuArrayBase< Int32Pair > &  indexes 
)

For each row r of this and for each column c, do (*this)(r, c) += src(j, c), where j ranges from indexes[r].first through indexes[r].second - 1.

In general indexes must be >= 0 and < src.NumRows(); but to represent an empty range you may use the pair (-1, -1) or any pair of numbers (i, j) such that i >= j.

Definition at line 2898 of file cu-matrix.cc.

References CuArrayBase< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuMatrixBase< Real >::data_, CuArrayBase< T >::Dim(), CuMatrixBase< Real >::Dim(), Int32Pair::first, KALDI_ASSERT, CuMatrixBase< Real >::NumCols(), Int32Pair::second, and CuMatrixBase< Real >::stride_.

Referenced by StatisticsPoolingComponent::Backprop(), NnetComputer::ExecuteCommand(), StatisticsPoolingComponent::Propagate(), and kaldi::UnitTestCuMatrixAddRowRanges().

2899  {
2900  KALDI_ASSERT(static_cast<MatrixIndexT>(indexes.Dim()) == NumRows());
2901  KALDI_ASSERT(src.NumCols() == NumCols());
2902  if (NumRows() == 0) return;
2903 #if HAVE_CUDA == 1
2904  if (CuDevice::Instantiate().Enabled()) {
2905  CuTimer tim;
2906  dim3 dimGrid, dimBlock;
2907  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2908  &dimGrid, &dimBlock);
2909  cuda_add_row_ranges(dimGrid, dimBlock,
2910  data_, Dim(), src.Data(), src.Dim(), indexes.Data());
2911  CU_SAFE_CALL(cudaGetLastError());
2912  CuDevice::Instantiate().AccuProfile(__func__, tim);
2913  } else
2914 #endif
2915  { // Implement here for the CPU..
2916  int32 num_rows = this->num_rows_, num_cols = this->num_cols_,
2917  this_stride = this->stride_, src_stride = src.stride_;
2918  Real *data = this->data_;
2919  const Real *src_data = src.data_;
2920  const Int32Pair *indexes_data = indexes.Data();
2921  for (int32 row = 0; row < num_rows; row++) {
2922  int32 start_row = indexes_data[row].first,
2923  end_row = indexes_data[row].second;
2924  for (int32 col = 0; col < num_cols; col++) {
2925  Real sum = 0.0;
2926  for (int32 src_row = start_row; src_row < end_row; src_row++)
2927  sum += src_data[src_row * src_stride + col];
2928  data[row * this_stride + col] += sum;
2929  }
2930  }
2931  }
2932 }
const T * Data() const
Get raw pointer.
Definition: cu-array.h:52
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
MatrixIndexT stride_
Definition: cu-matrix.h:730
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:49
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
int32_cuda second
Definition: cu-matrixdim.h:86
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
int32_cuda first
Definition: cu-matrixdim.h:85
void AddRows ( Real  alpha,
const CuMatrixBase< Real > &  src,
const CuArrayBase< MatrixIndexT > &  indexes 
)

Does for each row r, this.Row(r) += alpha * src.row(indexes[r]).

If indexes[r] < 0, does not add anything. src.NumCols() must equal this.NumCols()

Definition at line 2733 of file cu-matrix.cc.

References CuArrayBase< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuArrayBase< T >::Dim(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by StatisticsExtractionComponent::Backprop(), and NnetComputer::ExecuteCommand().

2735  {
2736  if (NumRows() == 0) return;
2737 #if HAVE_CUDA == 1
2738  if (CuDevice::Instantiate().Enabled()) {
2739  KALDI_ASSERT(static_cast<MatrixIndexT>(indexes.Dim()) == NumRows());
2740  KALDI_ASSERT(src.NumCols() == NumCols());
2741  CuTimer tim;
2742  dim3 dimGrid, dimBlock;
2743  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2744  &dimGrid, &dimBlock);
2745  cuda_add_rows(dimGrid, dimBlock, alpha,
2746  data_, src.Data(), indexes.Data(), Dim(), src.Stride());
2747  CU_SAFE_CALL(cudaGetLastError());
2748  CuDevice::Instantiate().AccuProfile(__func__, tim);
2749  } else
2750 #endif
2751  {
2752  Mat().AddRows(alpha, src.Mat(), indexes.Data());
2753  }
2754 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddRows ( Real  alpha,
const CuArrayBase< const Real * > &  src 
)

Does for each row r, this.Row(r) += alpha * src[r], treating src[r] as the beginning of a region of memory representing a vector of floats, of the same length as this.NumCols().

Definition at line 2793 of file cu-matrix.cc.

References CuArrayBase< T >::Data(), data_, CuArrayBase< T >::Dim(), and KALDI_ASSERT.

2793  {
2794  if (NumRows() == 0) return;
2795 #if HAVE_CUDA == 1
2796  if (CuDevice::Instantiate().Enabled()) {
2797  KALDI_ASSERT(static_cast<MatrixIndexT>(src.Dim()) == NumRows());
2798  CuTimer tim;
2799  dim3 dimGrid, dimBlock;
2800  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2801  &dimGrid, &dimBlock);
2802  cuda_add_rows(dimGrid, dimBlock, alpha, data_, src.Data(), Dim());
2803  CU_SAFE_CALL(cudaGetLastError());
2804  CuDevice::Instantiate().AccuProfile(__func__, tim);
2805  } else
2806 #endif
2807  {
2808  Mat().AddRows(alpha, src.Data());
2809  }
2810 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddSmat ( Real  alpha,
const CuSparseMatrix< Real > &  A,
MatrixTransposeType  trans = kNoTrans 
)

*this += alpha * A.

Definition at line 972 of file cu-matrix.cc.

References CuSparseMatrix< Real >::CsrColIdx(), CuSparseMatrix< Real >::CsrRowPtr(), CuSparseMatrix< Real >::CsrVal(), CU1DBLOCK, KALDI_ASSERT, kaldi::kNoTrans, CuSparseMatrix< Real >::NumCols(), CuSparseMatrix< Real >::NumRows(), and CuSparseMatrix< Real >::Smat().

Referenced by GeneralMatrix::AddToMat(), and kaldi::UnitTextCuMatrixAddSmat().

973  {
974 #if HAVE_CUDA == 1
975  if (CuDevice::Instantiate().Enabled()) {
976  if (trans == kNoTrans) {
977  KALDI_ASSERT(NumRows() == A.NumRows());
978  KALDI_ASSERT(NumCols() == A.NumCols());
979  } else {
980  KALDI_ASSERT(NumRows() == A.NumCols());
981  KALDI_ASSERT(NumCols() == A.NumRows());
982  }
983 
984  CuTimer tim;
985 
986  // We use warpSize threads per row to access only the nonzero elements.
987  // Every CU1DBLOCK/warpSize rows share one thread block.
988  // 1D grid to cover all rows of A.
989  const int warpSize = 32;
990  dim3 dimBlock(warpSize, CU1DBLOCK / warpSize);
991  dim3 dimGrid(n_blocks(A.NumRows(), dimBlock.y));
992 
993  if (trans == kNoTrans) {
994  cuda_add_smat(dimGrid, dimBlock, Data(), Dim(), alpha, A.CsrRowPtr(),
995  A.CsrColIdx(), A.CsrVal());
996  } else {
997  cuda_add_smat_trans(dimGrid, dimBlock, Data(), Dim(), alpha,
998  A.CsrRowPtr(), A.CsrColIdx(), A.CsrVal());
999  }
1000 
1001  CU_SAFE_CALL(cudaGetLastError());
1002  CuDevice::Instantiate().AccuProfile(__func__, tim);
1003  } else
1004 #endif
1005  {
1006  Mat().AddSmat(alpha, A.Smat(), trans);
1007  }
1008 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
KALDI_ASSERT & A
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
const Real * Data() const
Return data pointer (const).
Definition: cu-matrix.h:689
void AddSmatMat ( Real  alpha,
const CuSparseMatrix< Real > &  A,
MatrixTransposeType  transA,
const CuMatrixBase< Real > &  B,
Real  beta 
)

(*this) = alpha * op(A) * B + beta * (*this), where A is sparse.

Multiplication of sparse with dense matrix. See also AddMatSmat. Note: we recommend, for greatest efficiency, that transA be kNoTrans. Use AddMatSmat() for better efficiency, as 2 dense mat transpose ops are called in this API.

Definition at line 1011 of file cu-matrix.cc.

References CuSparseMatrix< Real >::CsrColIdx(), CuSparseMatrix< Real >::CsrRowPtr(), CuSparseMatrix< Real >::CsrVal(), CuMatrixBase< Real >::Data(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::Mat(), CuSparseMatrix< Real >::NumCols(), CuMatrixBase< Real >::NumCols(), CuSparseMatrix< Real >::NumElements(), CuSparseMatrix< Real >::NumRows(), CuMatrixBase< Real >::NumRows(), CuSparseMatrix< Real >::Smat(), and CuMatrixBase< Real >::Stride().

Referenced by kaldi::UnitTextCuMatrixAddSmatMat().

1013  {
1014 #if HAVE_CUDA == 1
1015  if (CuDevice::Instantiate().Enabled()) {
1016  if (transA == kNoTrans) {
1017  KALDI_ASSERT(NumRows() == A.NumRows());
1018  KALDI_ASSERT(NumCols() == B.NumCols());
1019  KALDI_ASSERT(A.NumCols() == B.NumRows());
1020  } else {
1021  KALDI_ASSERT(NumRows() == A.NumCols());
1022  KALDI_ASSERT(NumCols() == B.NumCols());
1023  KALDI_ASSERT(A.NumRows() == B.NumRows());
1024  }
1025 
1026  CuTimer tim;
1027 
1028  // We have op(A) and BT in col-major (B in row-major).
1029  // We first compute C in col-major (CT in row-major)
1030  // with C = op(A) * BT^T by cusparse_csrmm2,
1031  // then transpose CT to get C in row-major
1032  CuMatrix<Real> CT(*this, kTrans);
1033 
1034  cusparseMatDescr_t descr;
1035  CUSPARSE_SAFE_CALL(cusparseCreateMatDescr(&descr));
1036  if (transA == kTrans) {
1037  // Note: only op(A)=A is supported if op(B)=B^T according to cusparse doc
1038  // http://docs.nvidia.com/cuda/cusparse/index.html#cusparse-lt-t-gt-csrmm2
1040  CU_SAFE_CALL(
1041  cusparse_csrmm2(GetCusparseHandle(), CUSPARSE_OPERATION_NON_TRANSPOSE,
1042  CUSPARSE_OPERATION_TRANSPOSE, AT.NumRows(),
1043  CT.NumRows(), AT.NumCols(), AT.NumElements(), &alpha,
1044  descr, AT.CsrVal(), AT.CsrRowPtr(), AT.CsrColIdx(),
1045  B.Data(), B.Stride(), &beta, CT.Data(), CT.Stride()));
1046  } else {
1047  CU_SAFE_CALL(
1048  cusparse_csrmm2(GetCusparseHandle(), CUSPARSE_OPERATION_NON_TRANSPOSE,
1049  CUSPARSE_OPERATION_TRANSPOSE, A.NumRows(),
1050  CT.NumRows(), A.NumCols(), A.NumElements(), &alpha,
1051  descr, A.CsrVal(), A.CsrRowPtr(), A.CsrColIdx(),
1052  B.Data(), B.Stride(), &beta, CT.Data(), CT.Stride()));
1053  }
1054  CUSPARSE_SAFE_CALL(cusparseDestroyMatDescr(descr));
1055 
1056  this->CopyFromMat(CT, kTrans);
1057 
1058  CuDevice::Instantiate().AccuProfile(__func__, tim);
1059  } else
1060 #endif
1061  {
1062  Mat().AddSmatMat(alpha, A.Smat(), transA, B.Mat(), beta);
1063  }
1064 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
void CopyFromMat(const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:339
friend class CuSparseMatrix< Real >
Definition: cu-matrix.h:95
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
KALDI_ASSERT & A
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddSpMat ( const Real  alpha,
const CuSpMatrix< Real > &  A,
const CuMatrixBase< Real > &  B,
MatrixTransposeType  transB,
const Real  beta 
)
inline

this <– beta*this + alpha*SpA*B

Definition at line 566 of file cu-matrix.h.

569  {
570  CuMatrix<Real> M(A);
571  return AddMatMat(alpha, M, kNoTrans, B, transB, beta);
572  }
void AddMatMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
C = alpha * A(^T)*B(^T) + beta * C.
Definition: cu-matrix.cc:1278
KALDI_ASSERT & A
void AddToDiag ( Real  value)

Adds "value" to the diagonal elements of the matrix.

The matrix *this does not have to be square.

Definition at line 570 of file cu-matrix.cc.

References CU1DBLOCK, rnnlm::d, and data_.

Referenced by kaldi::nnet3::ConstrainOrthonormalInternal(), kaldi::nnet2::PreconditionDirections(), kaldi::TestCuMatrixCholesky(), and kaldi::UnitTestCuMatrixAddToDiag().

570  {
571 #if HAVE_CUDA == 1
572  if (CuDevice::Instantiate().Enabled()) {
573  if (num_rows_ == 0) return;
574  CuTimer tim;
575  // We'll create a fake matrix with "num_diag" rows, one
576  // columnn, and a stride of "this_stride". The y-value of
577  // the grid/blocks corresponds to the row, in this kernel.
578  MatrixIndexT num_diag = std::min(num_rows_, num_cols_),
579  this_stride = stride_ + 1;
580  dim3 dimBlock(1, CU1DBLOCK);
581  dim3 dimGrid(1, n_blocks(num_diag, CU1DBLOCK));
582  ::MatrixDim d = { num_diag, 1, this_stride };
583  cuda_add(dimGrid, dimBlock, data_, value, d);
584  CU_SAFE_CALL(cudaGetLastError());
585 
586  CuDevice::Instantiate().AccuProfile(__func__, tim);
587  } else
588  #endif
589  {
590  Mat().AddToDiag(value);
591  }
592 }
Structure containing size of the matrix plus stride.
Definition: cu-matrixdim.h:52
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
int32 MatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT stride_
Definition: cu-matrix.h:730
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void AddToElements ( Real  alpha,
const CuArrayBase< int32 > &  elements 
)

This is a rather special purpose function; we might generalize it later by adding a transpose-type option.

It expects 'elements.Dim()' to equal NumRows(), and for each elements[i] to be either -1, or 0 <= element[i] < NumCols(). It adds alpha to each element (*this)(i, elements[i]) for 0 <= i < NumRows().

Definition at line 3310 of file cu-matrix.cc.

References CU1DBLOCK, CuArrayBase< T >::Data(), data_, CuArrayBase< T >::Dim(), KALDI_ASSERT, and MatrixBase< Real >::NumRows().

Referenced by kaldi::UnitTestCuMatrixAddToElements().

3310  {
3311  KALDI_ASSERT(elements.Dim() == NumRows());
3312 #if HAVE_CUDA == 1
3313  if (CuDevice::Instantiate().Enabled()) {
3314  CuTimer tim;
3315 
3316  dim3 dimBlock(CU1DBLOCK);
3317  dim3 dimGrid(n_blocks(NumRows(), CU1DBLOCK));
3318 
3319  cuda_matrix_add_to_elements(dimGrid, dimBlock, alpha, data_, Dim(), elements.Data());
3320  CU_SAFE_CALL(cudaGetLastError());
3321  CuDevice::Instantiate().AccuProfile(__func__, tim);
3322  } else
3323 #endif
3324  {
3325  MatrixBase<Real> &this_mat = this->Mat();
3326  const int32* row_to_col = elements.Data();
3327  for (int32 r = 0; r < this_mat.NumRows(); r++) {
3328  KALDI_ASSERT(row_to_col[r] >= -1);
3329  if (row_to_col[r] >= 0)
3330  this_mat(r, row_to_col[r]) += alpha;
3331  }
3332  }
3333 }
const T * Data() const
Get raw pointer.
Definition: cu-array.h:52
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:49
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddToRows ( Real  alpha,
const CuArrayBase< MatrixIndexT > &  indexes,
CuMatrixBase< Real > *  dst 
) const

For each row i of *this, adds this->Row(i) to dst->Row(indexes(i)) if indexes(i) >= 0, else do nothing.

Requires that all the indexes[i] that are >= 0 be distinct, otherwise the behavior is undefined.

Definition at line 2836 of file cu-matrix.cc.

References CuArrayBase< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuArrayBase< T >::Dim(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by NnetComputer::ExecuteCommand(), and kaldi::UnitTestCuMatrixAddToRows().

2838  {
2839  if (NumRows() == 0) return;
2840 #if HAVE_CUDA == 1
2841  if (CuDevice::Instantiate().Enabled()) {
2842  KALDI_ASSERT(static_cast<MatrixIndexT>(indexes.Dim()) == NumRows());
2843  KALDI_ASSERT(dst->NumCols() == NumCols());
2844  CuTimer tim;
2845  dim3 dimGrid, dimBlock;
2846  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2847  &dimGrid, &dimBlock);
2848  cuda_add_to_rows(dimGrid, dimBlock, alpha, dst->Data(), data_, indexes.Data(), Dim(), dst->Stride());
2849  CU_SAFE_CALL(cudaGetLastError());
2850  CuDevice::Instantiate().AccuProfile(__func__, tim);
2851  } else
2852 #endif
2853  {
2854  Mat().AddToRows(alpha, indexes.Data(), &(dst->Mat()));
2855  }
2856 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddToRows ( Real  alpha,
const CuArrayBase< Real * > &  dst 
) const

For each row r of this matrix, adds it (times alpha) to the array of floats at the location given by dst[r], where dst[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (i.e.

it should point to memory on the GPU if we're using a GPU, or on the CPU otherwise). If dst[r] is NULL, does not do anything for that row. Requires that none of the memory regions pointed to by the pointers in "dst" overlap (e.g. none of the pointers should be the same).

Definition at line 2814 of file cu-matrix.cc.

References CuArrayBase< T >::Data(), data_, CuArrayBase< T >::Dim(), and KALDI_ASSERT.

2815  {
2816  if (NumRows() == 0) return;
2817 #if HAVE_CUDA == 1
2818  if (CuDevice::Instantiate().Enabled()) {
2819  KALDI_ASSERT(static_cast<MatrixIndexT>(dst.Dim()) == NumRows());
2820  CuTimer tim;
2821  dim3 dimGrid, dimBlock;
2822  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2823  &dimGrid, &dimBlock);
2824  cuda_add_to_rows(dimGrid, dimBlock, alpha, dst.Data(), data_, Dim());
2825  CU_SAFE_CALL(cudaGetLastError());
2826  CuDevice::Instantiate().AccuProfile(__func__, tim);
2827  } else
2828 #endif
2829  {
2830  Mat().AddToRows(alpha, dst.Data());
2831  }
2832 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddTpMat ( const Real  alpha,
const CuTpMatrix< Real > &  A,
MatrixTransposeType  transA,
const CuMatrixBase< Real > &  B,
MatrixTransposeType  transB,
const Real  beta 
)
inline

this <– beta*this + alpha*A*B.

Definition at line 575 of file cu-matrix.h.

Referenced by kaldi::UnitTestCuMatrixAddTpMat().

578  {
579  CuMatrix<Real> M(A);
580  return AddMatMat(alpha, M, transA, B, transB, beta);
581  }
void AddMatMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
C = alpha * A(^T)*B(^T) + beta * C.
Definition: cu-matrix.cc:1278
KALDI_ASSERT & A
void AddVecToCols ( Real  alpha,
const CuVectorBase< Real > &  col,
Real  beta = 1.0 
)

(for each column c of *this), c = alpha * col + beta * c

Definition at line 1219 of file cu-matrix.cc.

References data_, CuVectorBase< Real >::data_, CuVectorBase< Real >::Dim(), KALDI_ERR, and CuVectorBase< Real >::Vec().

Referenced by KlHmm::PropagateFnc(), and kaldi::UnitTestCuMatrixAddVecToCols().

1221  {
1222  if (col.Dim() != NumRows()) {
1223  KALDI_ERR << "Non matching dimensions: Rows:" << NumRows() << " VectorDim:" << col.Dim();
1224  }
1225 
1226  #if HAVE_CUDA == 1
1227  if (CuDevice::Instantiate().Enabled()) {
1228  CuTimer tim;
1229  dim3 dimGrid, dimBlock;
1230  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1231  &dimGrid, &dimBlock);
1232  cuda_add_vec_to_cols(dimGrid, dimBlock, alpha, col.data_, beta,
1233  data_, Dim());
1234  CU_SAFE_CALL(cudaGetLastError());
1235 
1236  CuDevice::Instantiate().AccuProfile(__func__, tim);
1237  } else
1238  #endif
1239  {
1240  if (beta != 1.0) Mat().Scale(beta);
1241  Mat().AddVecToCols(alpha, col.Vec());
1242  }
1243 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define KALDI_ERR
Definition: kaldi-error.h:127
::MatrixDim Dim() const
Definition: cu-matrix.h:220
void AddVecToRows ( Real  alpha,
const CuVectorBase< Real > &  row,
Real  beta = 1.0 
)

(for each row r of *this), r = alpha * row + beta * r

Definition at line 1248 of file cu-matrix.cc.

References data_, CuVectorBase< Real >::data_, CuVectorBase< Real >::Dim(), KALDI_ERR, and CuVectorBase< Real >::Vec().

Referenced by DecodableNnetLoopedOnlineBase::AdvanceChunk(), DecodableNnetSimpleLooped::AdvanceChunk(), BatchNormComponent::Backprop(), SimpleSentenceAveragingComponent::BackpropagateFnc(), ScaleAndOffsetComponent::BackpropInternal(), DecodableAmNnetParallel::Compute(), DecodableNnet2Online::ComputeForFrame(), DecodableNnetSimple::DoNnetComputation(), BatchNormComponent::Propagate(), FixedAffineComponent::Propagate(), FixedBiasComponent::Propagate(), PerElementOffsetComponent::Propagate(), Convolutional1dComponent::Propagate(), ConvolutionComponent::Propagate(), SimpleSentenceAveragingComponent::PropagateFnc(), AffineTransform::PropagateFnc(), RecurrentComponent::PropagateFnc(), Rbm::PropagateFnc(), ConvolutionalComponent::PropagateFnc(), Convolutional2DComponent::PropagateFnc(), AddShift::PropagateFnc(), ScaleAndOffsetComponent::PropagateInternal(), Rbm::Reconstruct(), SigmoidComponent::RepairGradients(), RectifiedLinearComponent::RepairGradients(), PdfPrior::SubtractOnLogpost(), kaldi::UnitTestCuMatrixAddVecToRows(), and SentenceAveragingComponent::Update().

1250  {
1251  if (row.Dim() != NumCols()) {
1252  KALDI_ERR << "Non matching dimensions: Cols:" << NumCols() << " VectorDim:" << row.Dim();
1253  }
1254 #if HAVE_CUDA == 1
1255  if (CuDevice::Instantiate().Enabled()) {
1256  CuTimer tim;
1257  dim3 dimGrid, dimBlock;
1258  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1259  &dimGrid, &dimBlock);
1260  cuda_add_vec_to_rows(dimGrid, dimBlock, alpha, row.data_, beta, data_, Dim());
1261  CU_SAFE_CALL(cudaGetLastError());
1262 
1263  CuDevice::Instantiate().AccuProfile(__func__, tim);
1264  } else
1265 #endif
1266  {
1267  if (beta != 1.0) Mat().Scale(beta);
1268  Mat().AddVecToRows(alpha, row.Vec());
1269  }
1270 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define KALDI_ERR
Definition: kaldi-error.h:127
::MatrixDim Dim() const
Definition: cu-matrix.h:220
void AddVecVec ( Real  alpha,
const CuVectorBase< Real > &  x,
const CuVectorBase< Real > &  y 
)

A = alpha * x * y^T + A .

Definition at line 1316 of file cu-matrix.cc.

References CuVectorBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), KALDI_ASSERT, rnnlm::n, and CuVectorBase< Real >::Vec().

Referenced by kaldi::UnitTestCuMatrixAddVecVec().

1317  {
1318 
1319  MatrixIndexT m = y.Dim();
1320  MatrixIndexT n = x.Dim();
1321  KALDI_ASSERT(m == NumCols());
1322  KALDI_ASSERT(n == NumRows());
1323 
1324 #if HAVE_CUDA == 1
1325  if (CuDevice::Instantiate().Enabled()) {
1326  CuTimer tim;
1327  CUBLAS_SAFE_CALL(cublas_ger(GetCublasHandle(), m, n, alpha,
1328  y.Data(), 1, x.Data(), 1, data_, Stride()));
1329 
1330  CuDevice::Instantiate().AccuProfile(__func__, tim);
1331  } else
1332 #endif
1333  {
1334  Mat().AddVecVec(alpha, x.Vec(), y.Vec());
1335  }
1336 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
int32 MatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
struct rnnlm::@11::@12 n
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT Stride() const
Definition: cu-matrix.h:216
void ApplyCeiling ( Real  ceiling_val)

Definition at line 2572 of file cu-matrix.cc.

References data_.

Referenced by ClipGradientComponent::Backprop(), RecurrentComponent::BackpropagateFnc(), kaldi::UnitTestCuMatrixApplyCeiling(), ParametricRelu::Update(), LstmProjected::Update(), and BlstmProjected::Update().

2572  {
2573 #if HAVE_CUDA == 1
2574  if (CuDevice::Instantiate().Enabled()) {
2575  CuTimer tim;
2576  dim3 dimGrid, dimBlock;
2577  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2578  &dimGrid, &dimBlock);
2579  cuda_apply_ceiling(dimGrid, dimBlock, data_, ceiling_val, Dim());
2580  CU_SAFE_CALL(cudaGetLastError());
2581  CuDevice::Instantiate().AccuProfile(__func__, tim);
2582  } else
2583 #endif
2584  {
2585  Mat().ApplyCeiling(ceiling_val);
2586  }
2587 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
::MatrixDim Dim() const
Definition: cu-matrix.h:220
void ApplyExp ( )

Definition at line 2484 of file cu-matrix.cc.

References data_.

Referenced by DiscriminativeComputation::Compute(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), and kaldi::UnitTestCuMatrixApplyExp().

2484  {
2485 #if HAVE_CUDA == 1
2486  if (CuDevice::Instantiate().Enabled()) {
2487  CuTimer tim;
2488  dim3 dimGrid, dimBlock;
2489  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2490  &dimGrid, &dimBlock);
2491  cuda_apply_exp(dimGrid, dimBlock, data_, Dim());
2492  CU_SAFE_CALL(cudaGetLastError());
2493  CuDevice::Instantiate().AccuProfile(__func__, tim);
2494  } else
2495 #endif
2496  {
2497  Mat().ApplyExp();
2498  }
2499 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
::MatrixDim Dim() const
Definition: cu-matrix.h:220
void ApplyExpLimited ( Real  lower_limit,
Real  upper_limit 
)

This is equivalent to running: ApplyFloor(lower_limit); ApplyCeiling(upper_limit); ApplyExp()

Definition at line 2502 of file cu-matrix.cc.

References data_, kaldi::Exp(), and KALDI_ASSERT.

Referenced by kaldi::UnitTestCuMatrixApplyExpLimited().

2502  {
2503  KALDI_ASSERT(upper_limit > lower_limit);
2504 #if HAVE_CUDA == 1
2505  if (CuDevice::Instantiate().Enabled()) {
2506  CuTimer tim;
2507  dim3 dimGrid, dimBlock;
2508  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2509  &dimGrid, &dimBlock);
2510  cuda_apply_exp_limited(dimGrid, dimBlock, data_, Dim(), lower_limit, upper_limit);
2511  CU_SAFE_CALL(cudaGetLastError());
2512  CuDevice::Instantiate().AccuProfile(__func__, tim);
2513  } else
2514 #endif
2515  {
2516  int32 num_rows = num_rows_, num_cols = num_cols_;
2517  for (int32 r = 0; r < num_rows; r++) {
2518  Real *row_data = this->RowData(r);
2519  for (int32 c = 0; c < num_cols; c++) {
2520  Real x = row_data[c];
2521  if (!(x >= lower_limit))
2522  x = lower_limit;
2523  if (x > upper_limit)
2524  x = upper_limit;
2525  row_data[c] = Exp(x);
2526  }
2527  }
2528  }
2529 }
double Exp(double x)
Definition: kaldi-math.h:83
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
const Real * RowData(MatrixIndexT r) const
Get raw row pointer (const).
Definition: cu-matrix.h:683
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void ApplyExpSpecial ( )

For each element x of the matrix, set it to (x < 0 ? exp(x) : x + 1).

This function is used in our RNNLM training.

Definition at line 2533 of file cu-matrix.cc.

References CU1DBLOCK.

Referenced by kaldi::UnitTestCuMatrixApplyExpSpecial().

2533  {
2534 #if HAVE_CUDA == 1
2535  if (CuDevice::Instantiate().Enabled()) {
2536  CuTimer tim;
2537 
2538  const int warpSize = 32;
2539  dim3 dimBlock(CU1DBLOCK / warpSize, warpSize);
2540  dim3 dimGrid(n_blocks(NumRows(), dimBlock.x),
2541  n_blocks(NumCols(), dimBlock.y));
2542 
2543  cuda_apply_exp_special(dimGrid, dimBlock, Data(), Dim(), Data(), Stride());
2544  CU_SAFE_CALL(cudaGetLastError());
2545  CuDevice::Instantiate().AccuProfile(__func__, tim);
2546  } else
2547 #endif
2548  {
2549  Mat().ApplyExpSpecial();
2550  }
2551 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:220
MatrixIndexT Stride() const
Definition: cu-matrix.h:216
const Real * Data() const
Return data pointer (const).
Definition: cu-matrix.h:689
void ApplyFloor ( Real  floor_val)

Definition at line 2554 of file cu-matrix.cc.

References data_.

Referenced by ClipGradientComponent::Backprop(), RecurrentComponent::BackpropagateFnc(), DecodableAmNnetParallel::Compute(), DecodableNnet2Online::ComputeForFrame(), main(), StatisticsPoolingComponent::Propagate(), RectifiedLinearComponent::Propagate(), SoftmaxComponent::Propagate(), LogSoftmaxComponent::Propagate(), ClipGradientComponent::RepairGradients(), RestrictedAttentionComponent::StoreStats(), kaldi::TestCuMatrixCompObjfAndDeriv(), kaldi::UnitTestCuMatrixApplyFloor(), kaldi::UnitTestCuMatrixObjfDeriv(), ParametricRelu::Update(), LstmProjected::Update(), and BlstmProjected::Update().

2554  {
2555 #if HAVE_CUDA == 1
2556  if (CuDevice::Instantiate().Enabled()) {
2557  CuTimer tim;
2558  dim3 dimGrid, dimBlock;
2559  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2560  &dimGrid, &dimBlock);
2561  cuda_apply_floor(dimGrid, dimBlock, data_, floor_val, Dim());
2562  CU_SAFE_CALL(cudaGetLastError());
2563  CuDevice::Instantiate().AccuProfile(__func__, tim);
2564  } else
2565 #endif
2566  {
2567  Mat().ApplyFloor(floor_val);
2568  }
2569 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
::MatrixDim Dim() const
Definition: cu-matrix.h:220
void ApplyHeaviside ( )

For each element, sets x = (x > 0 ? 1.0 : 0.0).

See also Heaviside().

Definition at line 2445 of file cu-matrix.cc.

References data_.

Referenced by BackpropTruncationComponent::Backprop(), RectifiedLinearComponent::Backprop(), GeneralDropoutComponent::GetMemo(), DropoutMaskComponent::Propagate(), DropoutComponent::Propagate(), SigmoidComponent::RepairGradients(), TanhComponent::RepairGradients(), ClipGradientComponent::RepairGradients(), kaldi::TestCuMatrixHeaviside(), and kaldi::UnitTestCuMatrixApplyHeaviside().

2445  {
2446 #if HAVE_CUDA == 1
2447  if (CuDevice::Instantiate().Enabled()) {
2448  CuTimer tim;
2449  dim3 dimGrid, dimBlock;
2450  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2451  &dimGrid, &dimBlock);
2452  cuda_apply_heaviside(dimGrid, dimBlock, data_, Dim());
2453  CU_SAFE_CALL(cudaGetLastError());
2454  CuDevice::Instantiate().AccuProfile(__func__, tim);
2455  } else
2456 #endif
2457  {
2458  Mat().ApplyHeaviside();
2459  }
2460 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
::MatrixDim Dim() const
Definition: cu-matrix.h:220
void ApplyLog ( )

Definition at line 632 of file cu-matrix.cc.

References data_.

Referenced by DecodableAmNnetParallel::Compute(), DecodableNnet2Online::ComputeForFrame(), Xent::Eval(), main(), RestrictedAttentionComponent::StoreStats(), kaldi::TestCuMatrixCompObjfAndDeriv(), kaldi::UnitTestCuMatrixApplyLog(), and kaldi::UnitTestCuMatrixObjfDeriv().

632  {
633  #if HAVE_CUDA == 1
634  if (CuDevice::Instantiate().Enabled()) {
635  if (num_rows_ == 0) return;
636  CuTimer tim;
637 
638  dim3 dimGrid, dimBlock;
639  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
640  &dimGrid, &dimBlock);
641 
642  cuda_apply_log(dimGrid, dimBlock, data_, Dim());
643  CU_SAFE_CALL(cudaGetLastError());
644 
645  CuDevice::Instantiate().AccuProfile(__func__, tim);
646  } else
647  #endif
648  {
649  Mat().ApplyLog();
650  }
651 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
::MatrixDim Dim() const
Definition: cu-matrix.h:220
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void ApplyLogSoftMaxPerRow ( const CuMatrixBase< Real > &  src)

LogSoftmax nonlinearity Y = LogSoftmax(X) : Yij = Xij - log(sum_k(e^Xik)), done to each row, with attention to avoiding overflow or underflow.

Supports in-place operation (i.e. this == &src).

Definition at line 1724 of file cu-matrix.cc.

References MatrixBase< Real >::CopyFromMat(), CU1DBLOCK, data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::num_rows_, MatrixBase< Real >::NumRows(), MatrixBase< Real >::Row(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by LogSoftmaxComponent::Propagate(), kaldi::TestCuMatrixLogSoftmax(), and kaldi::UnitTestCuLogSoftmax().

1724  {
1725  KALDI_ASSERT(SameDim(*this, src));
1726 #if HAVE_CUDA == 1
1727  if (CuDevice::Instantiate().Enabled()) {
1728  CuTimer tim;
1729  size_t dimBlock = CU1DBLOCK;
1730  size_t dimGrid = src.num_rows_;
1731  cuda_log_softmax_reduce(dimGrid, dimBlock,
1732  data_, src.data_, Dim(), src.Stride());
1733  CU_SAFE_CALL(cudaGetLastError());
1734 
1735  CuDevice::Instantiate().AccuProfile(__func__, tim);
1736  } else
1737 #endif
1738  {
1739  MatrixBase<Real> &mat(this->Mat());
1740  mat.CopyFromMat(src.Mat());
1741  for(MatrixIndexT r = 0; r < mat.NumRows(); r++) {
1742  mat.Row(r).ApplyLogSoftMax();
1743  }
1744  }
1745 }
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
int32 MatrixIndexT
Definition: matrix-common.h:98
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void ApplyPow ( Real  power)

Definition at line 2409 of file cu-matrix.cc.

References data_.

Referenced by TanhComponent::Backprop(), kaldi::MeanVariance(), StatisticsExtractionComponent::Propagate(), StatisticsPoolingComponent::Propagate(), TanhComponent::StoreStats(), kaldi::UnitTestCuMatrixApplyPow(), kaldi::UnitTestCuMatrixSetRandn(), and kaldi::UnitTestCuMatrixSetRandUniform().

2409  {
2410 #if HAVE_CUDA == 1
2411  if (CuDevice::Instantiate().Enabled()) {
2412  CuTimer tim;
2413  dim3 dimGrid, dimBlock;
2414  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2415  &dimGrid, &dimBlock);
2416  cuda_apply_pow(dimGrid, dimBlock, data_, power, Dim());
2417  CU_SAFE_CALL(cudaGetLastError());
2418  CuDevice::Instantiate().AccuProfile(__func__, tim);
2419  } else
2420 #endif
2421  {
2422  Mat().ApplyPow(power);
2423  }
2424 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
::MatrixDim Dim() const
Definition: cu-matrix.h:220
void ApplyPowAbs ( Real  power,
bool  include_sign = false 
)

Apply power to the absolute value of each element.

If include_sign is true, the result will be multiplied with the sign of the input value. If the power is negative and the input to the power is zero, The output will be set zero. If include_sign is true, it will multiply the result by the sign of the input.

Definition at line 2427 of file cu-matrix.cc.

References data_.

Referenced by PowerComponent::Backprop(), PowerComponent::Propagate(), ClipGradientComponent::RepairGradients(), and kaldi::UnitTestCuMatrixApplyPowAbs().

2427  {
2428 #if HAVE_CUDA == 1
2429  if (CuDevice::Instantiate().Enabled()) {
2430  CuTimer tim;
2431  dim3 dimGrid, dimBlock;
2432  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2433  &dimGrid, &dimBlock);
2434  cuda_apply_pow_abs(dimGrid, dimBlock, data_, power, include_sign, Dim());
2435  CU_SAFE_CALL(cudaGetLastError());
2436  CuDevice::Instantiate().AccuProfile(__func__, tim);
2437  } else
2438 #endif
2439  {
2440  Mat().ApplyPowAbs(power, include_sign);
2441  }
2442 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
::MatrixDim Dim() const
Definition: cu-matrix.h:220
void ApplySoftMaxPerRow ( const CuMatrixBase< Real > &  src)

Softmax nonlinearity Y = Softmax(X) : Yij = e^Xij / sum_k(e^Xik), done to each row, with attention to avoiding overflow or underflow.

Supports in-place operation (i.e. this == &src).

Definition at line 1701 of file cu-matrix.cc.

References MatrixBase< Real >::CopyFromMat(), CU1DBLOCK, data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::num_rows_, MatrixBase< Real >::NumRows(), MatrixBase< Real >::Row(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by kaldi::nnet3::attention::AttentionForward(), SoftmaxComponent::Propagate(), Softmax::PropagateFnc(), HiddenSoftmax::PropagateFnc(), BlockSoftmax::PropagateFnc(), kaldi::TestCuMatrixSoftmax(), and kaldi::UnitTestCuSoftmax().

1701  {
1702  KALDI_ASSERT(SameDim(*this, src));
1703 #if HAVE_CUDA == 1
1704  if (CuDevice::Instantiate().Enabled()) {
1705  CuTimer tim;
1706  size_t dimBlock = CU1DBLOCK;
1707  size_t dimGrid = src.num_rows_;
1708  cuda_softmax_reduce(dimGrid, dimBlock, data_, src.data_, Dim(), src.Stride());
1709  CU_SAFE_CALL(cudaGetLastError());
1710 
1711  CuDevice::Instantiate().AccuProfile(__func__, tim);
1712  } else
1713  #endif
1714  {
1715  MatrixBase<Real> &mat(this->Mat());
1716  mat.CopyFromMat(src.Mat());
1717  for(MatrixIndexT r = 0; r < mat.NumRows(); r++) {
1718  mat.Row(r).ApplySoftMax();
1719  }
1720  }
1721 }
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
int32 MatrixIndexT
Definition: matrix-common.h:98
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
bool ApproxEqual ( const CuMatrixBase< Real > &  other,
float  tol = 0.01 
) const

True if ((*this)-other).FrobeniusNorm() <= tol * this->FrobeniusNorm()

Definition at line 2120 of file cu-matrix.cc.

References CuMatrixBase< Real >::AddMat(), and CuMatrixBase< Real >::FrobeniusNorm().

Referenced by kaldi::UnitTestCuCholesky(), and kaldi::UnitTestCuCopy().

2121  {
2122  CuMatrix<Real> diff(*this);
2123  diff.AddMat(-1.0, other);
2124  return (diff.FrobeniusNorm() <= tol * (*this).FrobeniusNorm());
2125 }
void Cholesky ( CuMatrixBase< Real > *  inv_cholesky = NULL)

This function does sets *this to the Cholesky factor of *this (i.e.

the C satisfying *this = C C^T), and sets "inv_cholesky" (if supplied) to its inverse. *this is treated as a symmetric matrix but only the lower triangle is accessed.

Definition at line 1971 of file cu-matrix.cc.

References TpMatrix< Real >::Cholesky(), CuMatrixBase< Real >::Cholesky(), CuSpMatrix< Real >::CopyFromMat(), CuTpMatrix< Real >::CopyFromTp(), CuMatrixBase< Real >::CopyFromTp(), TpMatrix< Real >::Invert(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTakeLower, kaldi::kTrans, kaldi::kUndefined, and CuMatrixBase< Real >::SymAddMat2().

Referenced by CuTpMatrix< Real >::Cholesky(), CuMatrixBase< Real >::Cholesky(), kaldi::TestCuMatrixCholesky(), kaldi::UnitTestCholesky(), and kaldi::UnitTestCuCholesky().

1971  {
1972  KALDI_ASSERT(this->NumRows() == this->NumCols());
1973  const int32 block_size = 64; // We can tune this.
1974 #if HAVE_CUDA == 1
1975  bool have_gpu = CuDevice::Instantiate().Enabled();
1976 #else
1977  bool have_gpu = false;
1978 #endif
1979  if (this->NumRows() == 0) {
1980  return;
1981  }
1982  if (inv_cholesky == NULL && this->NumRows() >= block_size * 2 && have_gpu) {
1983  // Even if the user did not request the inverse Cholesky, for large enough
1984  // matrices (on GPUs) it's going to be more efficient to compute it anyway
1985  // as the recursion depends on it.
1986  CuMatrix<Real> inv(this->NumRows(), this->NumCols());
1987  Cholesky(&inv);
1988  return;
1989  }
1990  if (this->NumRows() <= block_size || inv_cholesky == NULL || !have_gpu) {
1991  // Don't recurse: compute the Cholesky (and inverse Cholesky, if requested)
1992  // directly, on the CPu.
1993  int32 dim = this->NumRows();
1994  CuSpMatrix<Real> this_sp(dim, kUndefined);
1995  this_sp.CopyFromMat(*this, kTakeLower);
1996  SpMatrix<Real> this_sp_cpu(this_sp);
1997  TpMatrix<Real> C_cpu(dim);
1998  C_cpu.Cholesky(this_sp_cpu);
1999  CuTpMatrix<Real> C(C_cpu);
2000  this->CopyFromTp(C);
2001  if (inv_cholesky != NULL) {
2002  C_cpu.Invert(); // Get inverse Cholesky on CPU.
2003  C.CopyFromTp(C_cpu);
2004  inv_cholesky->CopyFromTp(C); // Copy inverse Cholesky from CPU.
2005  }
2006  return;
2007  }
2008  // At this point, if none of the other cases apply, we recurse.
2009 
2010  // The selection of dim1 is a heuristic. We could also just take half.
2011  int32 tot_dim = this->NumRows();
2012  int32 dim1;
2013  // Break it up into a whole number of blocks, for better memory alignment.
2014  // The line below, setting dim1 can be decided on a heuristic basis: from
2015  // the point of view of correctness, it can really be any value
2016  // 0 < dim1 < tot_dim.
2017  dim1 = block_size * std::max<int32>(1, tot_dim / (2 * block_size));
2018 
2019  int32 dim2 = tot_dim - dim1;
2020  CuSubMatrix<Real> this_11(*this, 0, dim1, 0, dim1),
2021  this_12(*this, 0, dim1, dim1, dim2),
2022  this_21(*this, dim1, dim2, 0, dim1),
2023  this_22(*this, dim1, dim2, dim1, dim2);
2024  CuSubMatrix<Real> inv_11(*inv_cholesky, 0, dim1, 0, dim1),
2025  inv_12(*inv_cholesky, 0, dim1, dim1, dim2),
2026  inv_21(*inv_cholesky, dim1, dim2, 0, dim1),
2027  inv_22(*inv_cholesky, dim1, dim2, dim1, dim2);
2028  /*
2029  Here is the math on block-wise Cholesky. We'll use a Matlab-like notation for blocks of a matrix,
2030  e.g. [ A B; C D ], and also for transposes, e.g. A' is the transpose of A.
2031  Let A be the input matrix; we want to compute both its Cholesky L and its inverse Cholesky, which
2032  we'll call M.
2033  OK. let L = [ L11 0; L21 L22 ] be the Cholesky factor of A.
2034  We have A = L L' = [ L11 0; L21 L22 ] * [ L11' L21'; 0 L22' ]. Multiplying it out,
2035  if A = [ A11 A12; A21 A22 ]; then
2036  A11 = L11 L11', A21 = L21 L11', A22 = L21 L21' + L22 L22', and A12 = A21'.
2037 
2038  We also want an expression for the inverse of L (we call this M).
2039  If M = [ M11 0; M21 M22 ], then it's not hard to see that
2040  M11 = inv(L11), M22 = inv(L22).
2041  We can work out M21 as follows. We know that [ L11 0; L21 L22 ] [ M11 0; M21 M22 ] = [ I 0; 0 I ].
2042  Considering the zero on the bottom of the rhs, we have: L21 M11 + L22 M21 = 0, which gives us:
2043  M21 = - L22^{-1} L21 M11 = - M22 L21 M11.
2044 
2045  Next, we want expressions for L21 and L22. From the equation A21 = L21 L11', we have:
2046  L21 = A21 inv(L11') = A21 M11'
2047  We can compute L22 and M22 recursively by doing Cholesky (and computing the inverse Cholesky)
2048  on the quantity T = (A22 - L21 L21'). [we give it the name T just for easy reference.]
2049 
2050  Computationally, we do this as follows:
2051  (1) Recurse to get L11 and M11.
2052  (2) Compute L21 = A21 M11'
2053  (3) Compute T = A22 - L21 L21'
2054  (4) Recurse on T to get L22 and M22.
2055  (5) Compute M21 = -M22 L21 M11.
2056  Next, we have to consider the in-place nature of the computation, since L overwrites A
2057  [M has its own storage, in "inv_cholesky"].
2058  We address this here:
2059  (1) is in-place [L11 replaces A11, M11 has its own storage].
2060  (2) L21 gets written where M21 belongs.
2061  (3) T replaces A22.
2062  (4) is in-place [L22 replaces T where A22 was, M22 has its own storage]
2063  (5):(a) we first compute the transpose of (L21 M11) is done in the upper part of A/L,
2064  where A12 or L12 would be. Define a temporary expression
2065  U = (L21 M11)' = M11' L21'; this goes where A12 or L12 would be.
2066  (b) copy L21 to where it should be, in *this.
2067  (c) Compute M21 = -M22 U', in the correct place for M21.
2068  (d) zero L12 and M12. */
2069 
2070  // (1) compute L11 and M11.
2071  this_11.Cholesky(&inv_11);
2072  // (2) compute L21 = A21 M11'. For now it's in the "wrong place", where M21 should be.
2073  inv_21.AddMatMat(1.0, this_21, kNoTrans, inv_11, kTrans, 0.0);
2074  // (3) compute T = A22 - L21 L21'. Note: only the lower triangle of T will be valid, but
2075  // that's OK because Cholesky will ignore the upper part.
2076  this_22.SymAddMat2(-1.0, inv_21, kNoTrans, 1.0);
2077  // (4) Recurse to compute L22 and M22.
2078  this_22.Cholesky(&inv_22);
2079  // (5)(a) compute U = M11' L21'. We use the storage of this_12 for this. Note that L21 is
2080  // currently where M21 should be.
2081  this_12.AddMatMat(1.0, inv_11, kTrans, inv_21, kTrans, 0.0);
2082  // (5)(b) copy L21 to where it should be.
2083  this_21.CopyFromMat(inv_21);
2084  // (5)(c) compute M21 = -M22 U'.
2085  inv_21.AddMatMat(-1.0, inv_22, kNoTrans, this_12, kTrans, 0.0);
2086  // (5)(d) zero L12 and M12.
2087  this_12.SetZero();
2088  inv_12.SetZero();
2089 }
friend class CuSubMatrix< Real >
Definition: cu-matrix.h:89
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
void Cholesky(CuMatrixBase< Real > *inv_cholesky=NULL)
This function does sets *this to the Cholesky factor of *this (i.e.
Definition: cu-matrix.cc:1971
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
void CopyFromTp(const CuTpMatrix< OtherReal > &M, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:277
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
friend class CuSpMatrix< Real >
Definition: cu-matrix.h:85
CuSubMatrix<Real> ColRange ( const MatrixIndexT  col_offset,
const MatrixIndexT  num_cols 
) const
inline

Definition at line 608 of file cu-matrix.h.

Referenced by StatisticsExtractionComponent::Backprop(), StatisticsPoolingComponent::Backprop(), MaxpoolingComponent::Backprop(), BlockAffineComponent::Backprop(), Convolutional1dComponent::Backprop(), ConvolutionComponent::Backprop(), MaxPoolingComponent::BackpropagateFnc(), AveragePoolingComponent::BackpropagateFnc(), AveragePooling2DComponent::BackpropagateFnc(), MaxPooling2DComponent::BackpropagateFnc(), BlockSoftmax::BackpropagateFnc(), ParallelComponent::BackpropagateFnc(), SentenceAveragingComponent::BackpropagateFnc(), Convolutional2DComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), ConvolutionalComponent::BackpropagateFnc(), LstmProjected::BackpropagateFnc(), BlstmProjected::BackpropagateFnc(), MultiTaskLoss::Eval(), LstmProjected::InfoGradient(), BlstmProjected::InfoGradient(), StatisticsExtractionComponent::Propagate(), StatisticsPoolingComponent::Propagate(), MaxpoolingComponent::Propagate(), BlockAffineComponent::Propagate(), Convolutional1dComponent::Propagate(), ConvolutionComponent::Propagate(), AveragePoolingComponent::PropagateFnc(), MaxPoolingComponent::PropagateFnc(), AveragePooling2DComponent::PropagateFnc(), MaxPooling2DComponent::PropagateFnc(), BlockSoftmax::PropagateFnc(), FramePoolingComponent::PropagateFnc(), ParallelComponent::PropagateFnc(), SentenceAveragingComponent::PropagateFnc(), ConvolutionalComponent::PropagateFnc(), Convolutional2DComponent::PropagateFnc(), MultiBasisComponent::PropagateFnc(), LstmProjected::PropagateFnc(), BlstmProjected::PropagateFnc(), kaldi::UnitTestLstmNonlinearity(), FramePoolingComponent::Update(), SentenceAveragingComponent::Update(), ConvolutionalComponent::Update(), Convolutional2DComponent::Update(), NaturalGradientRepeatedAffineComponent::Update(), Convolutional1dComponent::Update(), and ConvolutionComponent::Update().

609  {
610  return CuSubMatrix<Real>(*this, 0, num_rows_, col_offset, num_cols);
611  }
friend class CuSubMatrix< Real >
Definition: cu-matrix.h:89
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void CopyColFromVec ( const CuVectorBase< Real > &  v,
const MatrixIndexT  col 
)

Copy vector into specific column of matrix.

Definition at line 2388 of file cu-matrix.cc.

References CuVectorBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), KALDI_ASSERT, and CuVectorBase< Real >::Vec().

Referenced by kaldi::cu::NormalizePerRow(), StatisticsExtractionComponent::Propagate(), DropoutMaskComponent::Propagate(), NaturalGradientRepeatedAffineComponent::Update(), and TimeHeightConvolutionComponent::UpdateNaturalGradient().

2389  {
2390  KALDI_ASSERT(v.Dim() == num_rows_ &&
2391  static_cast<UnsignedMatrixIndexT>(col) <
2392  static_cast<UnsignedMatrixIndexT>(num_cols_));
2393 #if HAVE_CUDA == 1
2394  if (CuDevice::Instantiate().Enabled()) {
2395  CuTimer tim;
2396  cublas_copy(GetCublasHandle(),
2397  v.Dim(), v.Data(), 1,
2398  this->data_ + col, this->stride_);
2399  CU_SAFE_CALL(cudaGetLastError());
2400  CuDevice::Instantiate().AccuProfile(__func__, tim);
2401  } else
2402 #endif
2403  {
2404  Mat().CopyColFromVec(v.Vec(), col);
2405  }
2406 }
uint32 UnsignedMatrixIndexT
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT stride_
Definition: cu-matrix.h:730
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void CopyCols ( const CuMatrixBase< Real > &  src,
const CuArrayBase< MatrixIndexT > &  indexes 
)

Copies column r from column indexes[r] of src.

As a special case, if indexes[i] == -1, sets column i to zero indexes.size() must equal this->NumCols(), and src.NumRows() must equal this.NumRows()

Definition at line 2623 of file cu-matrix.cc.

References CuArrayBase< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuArrayBase< T >::Dim(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by SumGroupComponent::Backprop(), PermuteComponent::Backprop(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParamsInternal(), kaldi::nnet3::time_height_convolution::ConvolveForwardInternal(), ConvolutionComponent::InputToInputPatches(), MaxpoolingComponent::InputToInputPatches(), PermuteComponent::Propagate(), Convolutional1dComponent::Propagate(), ConvolutionalComponent::PropagateFnc(), and Convolutional1dComponent::Update().

2624  {
2625 #if HAVE_CUDA == 1
2626  if (CuDevice::Instantiate().Enabled()) {
2627  KALDI_ASSERT(indices.Dim() == NumCols());
2628  KALDI_ASSERT(NumRows() == src.NumRows());
2629  CuTimer tim;
2630  dim3 dimGrid, dimBlock;
2631  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2632  &dimGrid, &dimBlock);
2633  cuda_copy_cols(dimGrid, dimBlock, data_, src.Data(), indices.Data(), Dim(), src.Stride());
2634  CU_SAFE_CALL(cudaGetLastError());
2635  CuDevice::Instantiate().AccuProfile(__func__, tim);
2636  } else
2637 #endif
2638  {
2639  Mat().CopyCols(src.Mat(), indices.Data());
2640  }
2641 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void CopyColsFromVec ( const CuVectorBase< Real > &  v)

Copies vector into matrix, column-by-column.

Note that rv.Dim() must either equal NumRows()*NumCols() or NumRows(); this has two modes of operation.

Definition at line 2350 of file cu-matrix.cc.

References MatrixDim_::cols, CU1DBLOCK, CuVectorBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), KALDI_ERR, MatrixDim_::rows, and CuVectorBase< Real >::Vec().

Referenced by DropoutComponent::Propagate(), and kaldi::UnitTestCuMatrixCopyColsFromVec().

2350  {
2351 #if HAVE_CUDA == 1
2352  if (CuDevice::Instantiate().Enabled()) {
2353  CuTimer tim;
2354  if (rv.Dim() == num_rows_ * num_cols_) {
2355  // treat rv as a matrix of the size (num_cols x num_rows_)
2356  // and use transposed copy to fill *this
2357  // see CuMatrixBase<Real>::CopyFromMat() for more detail of the impl
2358  MatrixDim rv_dim = { num_cols_, num_rows_, num_rows_ };
2359  const int32 warpSize = 32;
2360  dim3 dimBlock(warpSize, CU1DBLOCK / warpSize);
2361  dim3 dimGrid(n_blocks(rv_dim.cols, warpSize),
2362  n_blocks(rv_dim.rows, warpSize));
2363  cuda_copy_from_mat_trans(dimGrid, dimBlock, data_, rv.Data(), Dim(),
2364  rv_dim);
2365  CU_SAFE_CALL(cudaGetLastError());
2366  } else if (rv.Dim() == num_rows_) {
2367  // use 2D block (8x32) and large enough grid to cover matrix *this
2368  // dimBlock.x need to be at least warpSize for coalesced memory access.
2369  const int32 warpSize = 32;
2370  dim3 dimBlock(warpSize, CU1DBLOCK / warpSize);
2371  dim3 dimGrid(n_blocks(num_cols_, dimBlock.x),
2372  n_blocks(num_rows_, dimBlock.y));
2373  cuda_copy_cols_from_vec(dimGrid, dimBlock, Data(), Dim(), rv.Data());
2374  CU_SAFE_CALL(cudaGetLastError());
2375  } else {
2376  KALDI_ERR<< "Wrong sized arguments";
2377  }
2378  CuDevice::Instantiate().AccuProfile(__func__, tim);
2379  } else
2380 #endif
2381  {
2382  Mat().CopyColsFromVec(rv.Vec());
2383  }
2384 }
int32_cuda rows
Definition: cu-matrixdim.h:53
Structure containing size of the matrix plus stride.
Definition: cu-matrixdim.h:52
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define KALDI_ERR
Definition: kaldi-error.h:127
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
int32_cuda cols
Definition: cu-matrixdim.h:54
::MatrixDim Dim() const
Definition: cu-matrix.h:220
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
const Real * Data() const
Return data pointer (const).
Definition: cu-matrix.h:689
void CopyFromBlock ( const CuBlockMatrix< Real > &  B,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 160 of file cu-matrix.cc.

References CuBlockMatrix< Real >::Block(), CuMatrixBase< Real >::CopyFromMat(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuBlockMatrix< Real >::NumBlocks(), CuBlockMatrix< Real >::NumCols(), CuMatrixBase< Real >::NumCols(), CuBlockMatrix< Real >::NumRows(), and CuMatrixBase< Real >::NumRows().

Referenced by CuMatrix< Real >::CuMatrix().

161  {
162  this->SetZero();
163  if (trans == kNoTrans) {
164  KALDI_ASSERT(NumRows() == B.NumRows() && NumCols() == B.NumCols());
165  int32 row_offset = 0, col_offset = 0;
166  for (int32 b = 0; b < B.NumBlocks(); b++) {
167  const CuMatrixBase<Real> &block = B.Block(b);
168  int32 num_rows = block.NumRows(), num_cols = block.NumCols();
169  CuSubMatrix<Real> this_block(*this, row_offset, num_rows,
170  col_offset, num_cols);
171  this_block.CopyFromMat(block);
172  row_offset += num_rows;
173  col_offset += num_cols;
174  }
175  KALDI_ASSERT(row_offset == NumRows() && col_offset == NumCols());
176  } else {
177  KALDI_ASSERT(NumRows() == B.NumCols() && NumCols() == B.NumRows());
178  int32 row_offset = 0, col_offset = 0;
179  for (int32 b = 0; b < B.NumBlocks(); b++) {
180  const CuMatrixBase<Real> &block = B.Block(b);
181  int32 num_rows = block.NumCols(), num_cols = block.NumRows();
182  CuSubMatrix<Real> this_block(*this, row_offset, num_rows,
183  col_offset, num_cols);
184  this_block.CopyFromMat(block, kTrans);
185  row_offset += num_rows;
186  col_offset += num_cols;
187  }
188  KALDI_ASSERT(row_offset == NumRows() && col_offset == NumCols());
189  }
190 }
friend class CuSubMatrix< Real >
Definition: cu-matrix.h:89
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
void SetZero()
Math operations, some calling kernels.
Definition: cu-matrix.cc:476
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void CopyFromGeneralMat ( const GeneralMatrix src,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 3063 of file cu-matrix.cc.

References CuSparseMatrix< Real >::CopyToMat(), SparseMatrix< Real >::CopyToMat(), GeneralMatrix::GetFullMatrix(), GeneralMatrix::GetMatrix(), GeneralMatrix::GetSparseMatrix(), KALDI_ERR, kaldi::kCompressedMatrix, kaldi::kFullMatrix, kaldi::kSparseMatrix, and GeneralMatrix::Type().

Referenced by NnetComputer::AcceptInputs(), and kaldi::nnet3::ComputeObjectiveFunction().

3064  {
3065  switch (src.Type()) {
3066  case kFullMatrix: {
3067  const Matrix<BaseFloat> &src_full_mat = src.GetFullMatrix();
3068  this->CopyFromMat(src_full_mat, trans);
3069  return;
3070  }
3071  case kCompressedMatrix: {
3072  Matrix<BaseFloat> mat;
3073  src.GetMatrix(&mat);
3074  this->CopyFromMat(mat, trans);
3075  return;
3076  }
3077  case kSparseMatrix: {
3078  const SparseMatrix<BaseFloat> &smat = src.GetSparseMatrix();
3079 #if HAVE_CUDA == 1
3080  if (CuDevice::Instantiate().Enabled()) {
3081  // only take this branch if we're actually using CUDA, or it would
3082  // entail a wasteful copy of the sparse matrix.
3083  CuSparseMatrix<BaseFloat> cu_smat(smat);
3084  cu_smat.CopyToMat(this, trans);
3085  return;
3086  }
3087 #endif
3088  smat.CopyToMat(&(Mat()), trans);
3089  return;
3090  }
3091  default:
3092  KALDI_ERR << "Invalid GeneralMatrix type.";
3093  }
3094 }
void CopyFromMat(const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:339
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
#define KALDI_ERR
Definition: kaldi-error.h:127
void CopyFromMat ( const MatrixBase< OtherReal > &  src,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 339 of file cu-matrix.cc.

Referenced by kaldi::nnet3::attention::AttentionForward(), ElementwiseProductComponent::Backprop(), BatchNormComponent::Backprop(), BackpropTruncationComponent::Backprop(), TanhComponent::Backprop(), PowerComponent::Backprop(), RectifiedLinearComponent::Backprop(), ScaleComponent::Backprop(), GeneralDropoutComponent::Backprop(), FixedScaleComponent::Backprop(), FixedBiasComponent::Backprop(), NoOpComponent::Backprop(), ClipGradientComponent::Backprop(), PerElementScaleComponent::Backprop(), PerElementOffsetComponent::Backprop(), Softmax::BackpropagateFnc(), HiddenSoftmax::BackpropagateFnc(), BlockSoftmax::BackpropagateFnc(), ParallelComponent::BackpropagateFnc(), SentenceAveragingComponent::BackpropagateFnc(), LengthNormComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), Dropout::BackpropagateFnc(), AddShift::BackpropagateFnc(), Rescale::BackpropagateFnc(), ScaleAndOffsetComponent::BackpropInternal(), BlockAffineComponent::BlockAffineComponent(), NnetOnlineComputer::Compute(), kaldi::nnet3::ConstrainOrthonormal(), kaldi::nnet3::time_height_convolution::ConvolveForwardSimple(), CuMatrixBase< Real >::CopyFromBlock(), CuBlockMatrix< Real >::CopyFromMat(), GeneralMatrix::CopyToMat(), CuMatrix< Real >::CuMatrix(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), CuMatrixBase< Real >::DiffSoftmaxPerRow(), NnetComputer::ExecuteCommand(), NnetRescaler::FormatInput(), kaldi::nnet3::attention::GetAttentionDotProducts(), OnlinePreconditioner::Init(), AffineComponent::Init(), AffineComponentPreconditioned::Init(), AffineComponentPreconditionedOnline::Init(), Convolutional1dComponent::Init(), ConvolutionComponent::Init(), NaturalGradientAffineComponent::InitFromConfig(), main(), kaldi::nnet2::NnetComputation(), kaldi::cu::NormalizePerRow(), CuMatrix< BaseFloat >::operator=(), kaldi::nnet2::PreconditionDirections(), OnlinePreconditionerSimple::PreconditionDirections(), OnlineNaturalGradientSimple::PreconditionDirections(), kaldi::nnet2::PreconditionDirectionsAlphaRescaled(), DropoutComponent::Propagate(), ElementwiseProductComponent::Propagate(), BatchNormComponent::Propagate(), BackpropTruncationComponent::Propagate(), PowerComponent::Propagate(), RectifiedLinearComponent::Propagate(), ScaleComponent::Propagate(), GeneralDropoutComponent::Propagate(), SpliceMaxComponent::Propagate(), NoOpComponent::Propagate(), ClipGradientComponent::Propagate(), PerElementScaleComponent::Propagate(), FixedScaleComponent::Propagate(), FixedBiasComponent::Propagate(), PerElementOffsetComponent::Propagate(), AdditiveNoiseComponent::Propagate(), KlHmm::PropagateFnc(), ParallelComponent::PropagateFnc(), LengthNormComponent::PropagateFnc(), Dropout::PropagateFnc(), LstmProjected::PropagateFnc(), AddShift::PropagateFnc(), Rescale::PropagateFnc(), BlstmProjected::PropagateFnc(), ScaleAndOffsetComponent::PropagateInternal(), kaldi::nnet1::RandGauss(), CuRand< Real >::RandGaussian(), CuRand< Real >::RandUniform(), kaldi::nnet1::RandUniform(), OnlinePreconditioner::ReorthogonalizeXt1(), OnlineNaturalGradient::ReorthogonalizeXt1(), LinearTransform::SetLinearity(), AffineTransform::SetLinearity(), kaldi::TestCuFindRowMaxId(), kaldi::TestCuMatrixTransposeCross(), kaldi::nnet3::TestSimpleComponentPropagateProperties(), kaldi::TestSymInvertPosDef(), kaldi::UnitInvert(), kaldi::UnitTestCheck(), kaldi::UnitTestCholesky(), kaldi::UnitTestConstructor(), kaldi::UnitTestCopyFromMat(), kaldi::UnitTestCopySp(), kaldi::UnitTestCuCopy(), kaldi::UnitTestCuDiffLogSoftmax(), kaldi::UnitTestCuDiffNormalizePerRow(), kaldi::UnitTestCuDiffSigmoid(), kaldi::UnitTestCuDiffSoftmax(), kaldi::UnitTestCuDiffTanh(), kaldi::UnitTestCuDiffXent(), kaldi::UnitTestCuFindRowMaxId(), kaldi::UnitTestCuLogSoftmax(), kaldi::UnitTestCuMathNormalizePerRow(), kaldi::UnitTestCuMatrixAddMat(), kaldi::UnitTestCuMatrixAddMatDiagVec(), kaldi::UnitTestCuMatrixAddMatMat(), kaldi::UnitTestCuMatrixAddMatMatBatched(), kaldi::UnitTestCuMatrixAddMatMatElements(), kaldi::UnitTestCuMatrixAddVecToCols(), kaldi::UnitTestCuMatrixAddVecToRows(), kaldi::UnitTestCuMatrixCopyCross(), kaldi::UnitTestCuMatrixCopyCross2(), kaldi::UnitTestCuMatrixCopyFromMat(), kaldi::UnitTestCuMatrixDiffGroupPnorm(), kaldi::UnitTestCuMatrixDivElements(), kaldi::UnitTestCuMatrixDivRowsVec(), kaldi::UnitTestCuMatrixGroupMaxDeriv(), kaldi::UnitTestCuMatrixInvertElements(), kaldi::UnitTestCuMatrixMax(), kaldi::UnitTestCuMatrixMin(), kaldi::UnitTestCuMatrixMulColsVec(), kaldi::UnitTestCuMatrixMulElements(), kaldi::UnitTestCuMatrixMulRowsGroupMat(), kaldi::UnitTestCuMatrixMulRowsVec(), kaldi::UnitTestCuSigmoid(), kaldi::UnitTestCuSoftmax(), kaldi::UnitTestCuTanh(), kaldi::UnitTestCuVectorAddColSumMat(), kaldi::UnitTestCuVectorAddColSumMatLarge(), kaldi::UnitTestCuVectorAddRowSumMat(), kaldi::UnitTestCuVectorAddRowSumMatLarge(), kaldi::UnitTestInvert(), kaldi::UnitTestSwapCu2Cu(), kaldi::UnitTestSwapCu2M(), BlockAffineComponentPreconditioned::Update(), and MatrixRandomizer::Value().

340  {
341  CuMatrix<OtherReal> temp(src);
342  this->CopyFromMat(temp, trans);
343 }
void CopyFromMat(const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:339
void CopyFromMat ( const MatrixBase< Real > &  src,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 311 of file cu-matrix.cc.

References MatrixBase< Real >::Data(), data_, KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, MatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), and MatrixBase< Real >::Stride().

312  {
313 #if HAVE_CUDA == 1
314  if (CuDevice::Instantiate().Enabled()) {
315  if (trans == kNoTrans) {
316  KALDI_ASSERT(src.NumRows() == num_rows_ && src.NumCols() == num_cols_);
317  CuTimer tim;
318 
319  MatrixIndexT dst_pitch = stride_*sizeof(Real);
320  MatrixIndexT src_pitch = src.Stride()*sizeof(Real);
321  MatrixIndexT width = src.NumCols()*sizeof(Real);
322  CU_SAFE_CALL(cudaMemcpy2D(data_, dst_pitch, src.Data(), src_pitch,
323  width, src.NumRows(), cudaMemcpyHostToDevice));
324 
325  CuDevice::Instantiate().AccuProfile("CuMatrixBase::CopyFromMat(from CPU)", tim);
326  } else {
327  CuMatrix<Real> trans_mat(src); // Do the transpose on the GPU board.
328  this->CopyFromMat(trans_mat, kTrans);
329  }
330  } else
331 #endif
332  {
333  Mat().CopyFromMat(src, trans);
334  }
335 }
void CopyFromMat(const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:339
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
int32 MatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT stride_
Definition: cu-matrix.h:730
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void CopyFromMat ( const CuMatrixBase< OtherReal > &  M,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 207 of file cu-matrix.cc.

References CU1DBLOCK, CuMatrixBase< Real >::Data(), data_, CuMatrixBase< Real >::data_, CuMatrixBase< Real >::Dim(), KALDI_ASSERT, kaldi::kNoTrans, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::num_rows_, CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

208  {
209  if (sizeof(Real) == sizeof(OtherReal) &&
210  static_cast<const void*>(M.Data()) ==
211  static_cast<const void*>(this->Data())) {
212  if (M.Data() == NULL)
213  return;
214  // CopyFromMat called on same data. Nothing to do (except sanity checks)
215  KALDI_ASSERT(trans == kNoTrans && M.NumRows() == NumRows() &&
216  M.NumCols() == NumCols() && M.Stride() == Stride());
217  return;
218  }
219 #if HAVE_CUDA == 1
220  if (CuDevice::Instantiate().Enabled()) {
221  if (trans == kNoTrans) {
222  KALDI_ASSERT(M.NumRows() == num_rows_ && M.NumCols() == num_cols_);
223  } else {
224  KALDI_ASSERT(M.NumCols() == num_rows_ && M.NumRows() == num_cols_);
225  }
226  if (M.num_rows_ == 0) return; // Nothing to do.
227  CuTimer tim;
228  if (sizeof(Real) == sizeof(OtherReal) && trans == kNoTrans ) {
229  MatrixIndexT dst_pitch = stride_ * sizeof(Real);
230  MatrixIndexT src_pitch = M.Stride() * sizeof(Real);
231  MatrixIndexT width = M.NumCols() * sizeof(Real);
232  CU_SAFE_CALL(cudaMemcpy2D(data_, dst_pitch, M.data_, src_pitch,
233  width, M.num_rows_, cudaMemcpyDeviceToDevice));
234  } else {
235  if (trans == kNoTrans) {
236  dim3 dimGrid, dimBlock;
237  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
238  &dimGrid, &dimBlock);
239  cuda_copy_from_mat(dimGrid, dimBlock, data_, M.data_, Dim(), M.Dim());
240  } else {
241  // 2D thread block with warps (blockDim.x) along the row-dim of input M.
242  // Each (8x32) thread block will transpose (32x32) data
243  const int32 warpSize = 32;
244  dim3 dimBlock(warpSize, CU1DBLOCK / warpSize);
245  dim3 dimGrid(n_blocks(M.NumCols(), warpSize),
246  n_blocks(M.NumRows(), warpSize));
247  cuda_copy_from_mat_trans(dimGrid, dimBlock, data_, M.data_, Dim(),
248  M.Dim());
249  }
250  CU_SAFE_CALL(cudaGetLastError());
251  }
252  CuDevice::Instantiate().AccuProfile("CuMatrixBase::CopyFromMat(from other CuMatrixBase)", tim);
253  } else
254 #endif
255  {
256  Mat().CopyFromMat(M.Mat(), trans);
257  }
258 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
int32 MatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT stride_
Definition: cu-matrix.h:730
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
MatrixIndexT Stride() const
Definition: cu-matrix.h:216
const Real * Data() const
Return data pointer (const).
Definition: cu-matrix.h:689
void CopyFromSp ( const CuSpMatrix< Real > &  M)

Definition at line 355 of file cu-matrix.cc.

References CU2DBLOCK, CuPackedMatrix< Real >::Data(), data_, KALDI_ASSERT, CuSpMatrix< Real >::Mat(), and CuPackedMatrix< Real >::NumRows().

Referenced by CuMatrix< BaseFloat >::CuMatrix(), CuSpMatrix< Real >::Invert(), and kaldi::TestCuMatrixCopyFromSp().

355  {
356  KALDI_ASSERT(num_rows_ == M.NumRows() && num_cols_ == num_rows_);
357  if (num_rows_ == 0)
358  return;
359 #if HAVE_CUDA == 1
360  if (CuDevice::Instantiate().Enabled()) {
361  CuTimer tim;
362  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
363  dim3 dimGrid(n_blocks(NumRows(), CU2DBLOCK),
364  n_blocks(NumRows(), CU2DBLOCK));
365  cuda_copy_from_sp(dimGrid, dimBlock, M.Data(), data_, Dim());
366  CuDevice::Instantiate().AccuProfile("CuMatrix::CopyFromSp", tim);
367  } else
368 #endif
369  {
370  Mat().CopyFromSp(M.Mat());
371  }
372 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
template void CopyFromTp ( const CuTpMatrix< OtherReal > &  M,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 277 of file cu-matrix.cc.

References CU2DBLOCK, CuPackedMatrix< Real >::Data(), data_, KALDI_ASSERT, kaldi::kNoTrans, CuTpMatrix< Real >::Mat(), and CuPackedMatrix< Real >::NumRows().

Referenced by CuMatrixBase< Real >::Cholesky(), CuMatrix< BaseFloat >::CuMatrix(), CuTpMatrix< Real >::Invert(), kaldi::TestCuMatrixCopyFromTp(), and kaldi::UnitTestCuMatrixCopyFromTp().

278  {
279  KALDI_ASSERT(num_rows_ == M.NumRows() && num_cols_ == num_rows_);
280  if (num_rows_ == 0)
281  return;
282 #if HAVE_CUDA == 1
283  if (CuDevice::Instantiate().Enabled()) {
284  CuTimer tim;
285  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
286  dim3 dimGrid(n_blocks(num_rows_, CU2DBLOCK),
287  n_blocks(num_rows_, CU2DBLOCK));
288  if (trans == kNoTrans) {
289  cuda_copy_from_tp(dimGrid, dimBlock, data_, M.Data(), Dim());
290  } else {
291  cuda_copy_from_tp_trans(dimGrid, dimBlock, data_, M.Data(), Dim());
292  }
293  CuDevice::Instantiate().AccuProfile(__func__, tim);
294  } else
295 #endif
296  {
297  Mat().CopyFromTp(M.Mat(), trans);
298  }
299 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void CopyLowerToUpper ( )

Definition at line 2936 of file cu-matrix.cc.

References CU2DBLOCK, data_, and KALDI_ASSERT.

Referenced by kaldi::nnet3::ConstrainOrthonormalInternal(), kaldi::nnet2::PreconditionDirections(), kaldi::TestCuMatrixCopyLowerToUpper(), kaldi::UnitTestCuCholesky(), and kaldi::UnitTestCuMatrixCopyLowerToUpper().

2936  {
2938  if (num_rows_ == 0) return;
2939 #if HAVE_CUDA == 1
2940  if (CuDevice::Instantiate().Enabled()) {
2941  CuTimer tim;
2942  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2943  int32 dim = num_rows_;
2944  dim3 dimGrid(n_blocks(dim, CU2DBLOCK),
2945  n_blocks(dim, CU2DBLOCK));
2946  cuda_copy_low_upp(dimGrid, dimBlock, data_, Dim());
2947  CU_SAFE_CALL(cudaGetLastError());
2948  CuDevice::Instantiate().AccuProfile(__func__, tim);
2949  } else
2950 #endif
2951  {
2952  Mat().CopyLowerToUpper();
2953  }
2954 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void CopyRows ( const CuMatrixBase< Real > &  src,
const CuArrayBase< MatrixIndexT > &  indexes 
)

Copies row r from row indexes[r] of src.

As a special case, if indexes[i] < 0, sets row i to zero. src.NumCols() must equal this.NumCols()

Definition at line 2645 of file cu-matrix.cc.

References CuArrayBase< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuArrayBase< T >::Dim(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by StatisticsExtractionComponent::Backprop(), SpliceComponent::Backprop(), NnetComputer::ExecuteCommand(), main(), DistributeComponent::Propagate(), and SpliceMaxComponent::Propagate().

2646  {
2647 #if HAVE_CUDA == 1
2648  if (CuDevice::Instantiate().Enabled()) {
2649  KALDI_ASSERT(static_cast<MatrixIndexT>(indices.Dim()) == NumRows());
2650  KALDI_ASSERT(NumCols() == src.NumCols());
2651 
2652  CuTimer tim;
2653  dim3 dimGrid, dimBlock;
2654  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2655  &dimGrid, &dimBlock);
2656  cuda_copy_rows(dimGrid, dimBlock, data_, src.Data(), indices.Data(),
2657  Dim(), src.Stride());
2658  CU_SAFE_CALL(cudaGetLastError());
2659  CuDevice::Instantiate().AccuProfile(__func__, tim);
2660  } else
2661 #endif
2662  {
2663  Mat().CopyRows(src.Mat(), indices.Data());
2664  }
2665 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void CopyRows ( const CuArrayBase< const Real * > &  src)

Copies row r of this matrix from an array of floats at the location given by src[r], where src[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (the point is: the data it points to should be on the GPU if we're using a GPU, and on a CPU otherwise).

src.size() must equal this.NumRows(), and if any src[r] is NULL then this.Row(r) will be set to zero.

Definition at line 2690 of file cu-matrix.cc.

References CU2DBLOCK, CuArrayBase< T >::Data(), data_, CuArrayBase< T >::Dim(), and KALDI_ASSERT.

2690  {
2691  if (NumRows() == 0) return;
2692 #if HAVE_CUDA == 1
2693  if (CuDevice::Instantiate().Enabled()) {
2694  KALDI_ASSERT(static_cast<MatrixIndexT>(src.Dim()) == NumRows());
2695  CuTimer tim;
2696  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2697  dim3 dimGrid(n_blocks(num_cols_, CU2DBLOCK),
2698  n_blocks(num_rows_, CU2DBLOCK));
2699  cuda_copy_rows(dimGrid, dimBlock, data_, src.Data(), Dim());
2700  CU_SAFE_CALL(cudaGetLastError());
2701  CuDevice::Instantiate().AccuProfile(__func__, tim);
2702  } else
2703 #endif
2704  {
2705  Mat().CopyRows(src.Data());
2706  }
2707 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void CopyRowsFromVec ( const CuVectorBase< Real > &  v)

This function has two modes of operation.

If v.Dim() == NumRows() * NumCols(), then treats the vector as a row-by-row concatenation of a matrix and copies to *this. if v.Dim() == NumCols(), it sets each row of *this to a copy of v.

Definition at line 2282 of file cu-matrix.cc.

References CuVectorBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), KALDI_ERR, and CuVectorBase< Real >::Vec().

Referenced by kaldi::CuVectorUnitTestCopyFromMat(), NnetOnlineComputer::Flush(), NnetRescaler::FormatInput(), TimeHeightConvolutionComponent::Propagate(), RepeatedAffineComponent::Propagate(), ConstantComponent::Propagate(), AffineComponent::Propagate(), FixedAffineComponent::Propagate(), BlockAffineComponent::Propagate(), ConstantFunctionComponent::Propagate(), LinearTransform::SetParams(), AffineTransform::SetParams(), RecurrentComponent::SetParams(), ConvolutionalComponent::SetParams(), Convolutional2DComponent::SetParams(), LstmProjected::SetParams(), BlstmProjected::SetParams(), kaldi::UnitTestCuMatrixCopyRowsFromVec(), TimeHeightConvolutionComponent::UnVectorize(), RepeatedAffineComponent::UnVectorize(), AffineComponent::UnVectorize(), LinearComponent::UnVectorize(), BlockAffineComponent::UnVectorize(), ConvolutionComponent::UnVectorize(), and LstmNonlinearityComponent::UnVectorize().

2282  {
2283 #if HAVE_CUDA == 1
2284  if (CuDevice::Instantiate().Enabled()) {
2285  CuTimer tim;
2286  if (v.Dim() == num_rows_*num_cols_) {
2287  if (stride_ == num_cols_) {
2288  const Real* v_data = v.Data();
2289  CU_SAFE_CALL(cudaMemcpy(data_, v_data,
2290  sizeof(Real)*num_rows_*num_cols_,
2291  cudaMemcpyDeviceToDevice));
2292  } else {
2293  CU_SAFE_CALL(cudaMemcpy2D(data_, stride_ * sizeof(Real), v.Data(),
2294  num_cols_*sizeof(Real), num_cols_*sizeof(Real),
2295  num_rows_,
2296  cudaMemcpyDeviceToDevice));
2297  }
2298  } else if (v.Dim() == num_cols_) {
2299  dim3 dimGrid, dimBlock;
2300  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2301  &dimGrid, &dimBlock);
2302  cuda_copy_rows_from_vec(dimGrid, dimBlock, data_, this->Dim(), v.Data());
2303  CU_SAFE_CALL(cudaGetLastError());
2304  } else {
2305  KALDI_ERR << "Wrong sized arguments";
2306  }
2307  CuDevice::Instantiate().AccuProfile(__func__, tim);
2308  } else
2309 #endif
2310  {
2311  Mat().CopyRowsFromVec(v.Vec());
2312  }
2313 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT stride_
Definition: cu-matrix.h:730
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define KALDI_ERR
Definition: kaldi-error.h:127
::MatrixDim Dim() const
Definition: cu-matrix.h:220
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void CopyRowsFromVec ( const VectorBase< Real > &  v)

Version of CopyRowsFromVec() that takes a CPU-based vector.

Definition at line 2316 of file cu-matrix.cc.

References VectorBase< Real >::Data(), data_, VectorBase< Real >::Dim(), and KALDI_ERR.

2316  {
2317 #if HAVE_CUDA == 1
2318  if (CuDevice::Instantiate().Enabled()) {
2319  CuTimer tim;
2320  if (v.Dim() == num_rows_*num_cols_) {
2321  if (stride_ == num_cols_) {
2322  const Real* v_data = v.Data();
2323  cudaMemcpy(data_, v_data, sizeof(Real)*num_rows_*num_cols_, cudaMemcpyHostToDevice);
2324  } else {
2325  const Real *v_data = v.Data();
2326  for (MatrixIndexT r = 0; r < num_rows_; r++) {
2327  Real *row_data = RowData(r);
2328  cudaMemcpy(row_data, v_data, sizeof(Real)*num_cols_, cudaMemcpyHostToDevice);
2329  v_data += num_cols_;
2330  }
2331  }
2332  } else if (v.Dim() == num_cols_) {
2333  dim3 dimGrid, dimBlock;
2334  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2335  &dimGrid, &dimBlock);
2336  cuda_copy_rows_from_vec(dimGrid, dimBlock, this->data_, this->Dim(), v.Data());
2337  CU_SAFE_CALL(cudaGetLastError());
2338  } else {
2339  KALDI_ERR << "Wrong sized arguments";
2340  }
2341  CuDevice::Instantiate().AccuProfile(__func__, tim);
2342  } else
2343 #endif
2344  {
2345  Mat().CopyRowsFromVec(v);
2346  }
2347 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
int32 MatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT stride_
Definition: cu-matrix.h:730
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define KALDI_ERR
Definition: kaldi-error.h:127
::MatrixDim Dim() const
Definition: cu-matrix.h:220
const Real * RowData(MatrixIndexT r) const
Get raw row pointer (const).
Definition: cu-matrix.h:683
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
template void CopyToMat ( MatrixBase< OtherReal > *  dst,
MatrixTransposeType  trans = kNoTrans 
) const

Definition at line 415 of file cu-matrix.cc.

References MatrixBase< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyToMat(), MatrixBase< Real >::Data(), data_, KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, MatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), and MatrixBase< Real >::Stride().

Referenced by NnetComputerFromEg::Compute(), CuMatrixBase< Real >::CopyToMat(), kaldi::nnet1::MomentStatistics(), KlHmm::PropagateFnc(), kaldi::UnitInvert(), kaldi::UnitTestCholesky(), kaldi::UnitTestCuDiffLogSoftmax(), kaldi::UnitTestCuDiffSigmoid(), kaldi::UnitTestCuDiffSoftmax(), kaldi::UnitTestCuDiffTanh(), kaldi::UnitTestCuDiffXent(), kaldi::UnitTestCuMatrixAddMat(), kaldi::UnitTestCuMatrixAddMatMat(), kaldi::UnitTestCuMatrixAddVecToCols(), kaldi::UnitTestCuMatrixAddVecToRows(), kaldi::UnitTestCuMatrixAddVecVec(), kaldi::UnitTestCuMatrixDiffGroupPnorm(), kaldi::UnitTestCuMatrixDivElements(), kaldi::UnitTestCuMatrixDivRowsVec(), kaldi::UnitTestCuMatrixGroupMaxDeriv(), kaldi::UnitTestCuMatrixInvertElements(), kaldi::UnitTestCuMatrixMax(), kaldi::UnitTestCuMatrixMin(), kaldi::UnitTestCuMatrixMulColsVec(), kaldi::UnitTestCuMatrixMulElements(), kaldi::UnitTestCuMatrixMulRowsGroupMat(), kaldi::UnitTestCuMatrixMulRowsVec(), kaldi::UnitTestCuSigmoid(), kaldi::UnitTestCuTanh(), kaldi::UnitTestInvert(), kaldi::UnitTestMatrix(), UnitTestMatrixRandomizer(), kaldi::UnitTestSetZeroAboveDiag(), kaldi::UnitTestSwapCu2Cu(), and kaldi::UnitTestSwapCu2M().

416  {
417 #if HAVE_CUDA == 1
418  if (CuDevice::Instantiate().Enabled()) {
419  if (trans == kTrans || sizeof(OtherReal) != sizeof(Real)) {
420  CuMatrix<OtherReal> this_trans(*this, trans);
421  this_trans.CopyToMat(dst, kNoTrans);
422  } else {
423  KALDI_ASSERT(dst->NumRows() == NumRows() && dst->NumCols() == NumCols());
424  if (num_rows_ == 0) return;
425  CuTimer tim;
426 
427  MatrixIndexT src_pitch = stride_*sizeof(Real);
428  MatrixIndexT dst_pitch = dst->Stride()*sizeof(Real);
429  MatrixIndexT width = NumCols()*sizeof(Real);
430  CU_SAFE_CALL(cudaMemcpy2D(dst->Data(), dst_pitch, this->data_, src_pitch,
431  width, this->num_rows_, cudaMemcpyDeviceToHost));
432 
433  CuDevice::Instantiate().AccuProfile("CuMatrix::CopyToMatD2H", tim);
434  }
435  } else
436  #endif
437  {
438  dst->CopyFromMat(Mat(), trans);
439  }
440 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
int32 MatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT stride_
Definition: cu-matrix.h:730
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void CopyToRows ( const CuArrayBase< Real * > &  dst) const

For each row r of this matrix, copies it to the array of floats at the location given by dst[r], where dst[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (i.e.

it should point to memory on the GPU if we're using a GPU, or on the CPU otherwise). If dst[r] is NULL, does not copy anywhere. Requires that none of the memory regions pointed to by the pointers in "dst" overlap (e.g. none of the pointers should be the same).

Definition at line 2711 of file cu-matrix.cc.

References CU2DBLOCK, CuArrayBase< T >::Data(), data_, CuArrayBase< T >::Dim(), and KALDI_ASSERT.

Referenced by DistributeComponent::Backprop(), NnetComputer::ExecuteCommand(), and kaldi::UnitTestCuMatrixCopyToRows().

2711  {
2712  if (NumRows() == 0) return;
2713 #if HAVE_CUDA == 1
2714  if (CuDevice::Instantiate().Enabled()) {
2715  KALDI_ASSERT(static_cast<MatrixIndexT>(dst.Dim()) == NumRows());
2716 
2717  CuTimer tim;
2718  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2719  dim3 dimGrid(n_blocks(num_cols_, CU2DBLOCK),
2720  n_blocks(num_rows_, CU2DBLOCK));
2721  cuda_copy_to_rows(dimGrid, dimBlock, dst.Data(), data_, Dim());
2722  CU_SAFE_CALL(cudaGetLastError());
2723  CuDevice::Instantiate().AccuProfile(__func__, tim);
2724  } else
2725 #endif
2726  {
2727  Mat().CopyToRows(dst.Data());
2728  }
2729 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void CopyUpperToLower ( )

Definition at line 2957 of file cu-matrix.cc.

References CU2DBLOCK, data_, and KALDI_ASSERT.

Referenced by kaldi::TestCuMatrixCopyUpperToLower(), and kaldi::UnitTestCuMatrixCopyUpperToLower().

2957  {
2959  if (num_rows_ == 0) return;
2960 #if HAVE_CUDA == 1
2961  if (CuDevice::Instantiate().Enabled()) {
2962  CuTimer tim;
2963  int32 dim = this->num_rows_;
2964  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2965  dim3 dimGrid(n_blocks(dim, CU2DBLOCK),
2966  n_blocks(dim, CU2DBLOCK));
2967  cuda_copy_upp_low(dimGrid, dimBlock, data_, Dim());
2968  CU_SAFE_CALL(cudaGetLastError());
2969  CuDevice::Instantiate().AccuProfile(__func__, tim);
2970  } else
2971 #endif
2972  {
2973  Mat().CopyUpperToLower();
2974  }
2975 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
const Real* Data ( ) const
inline

Return data pointer (const).

Warning: may return a pointer to GPU memory. Use at your own risk.

Definition at line 689 of file cu-matrix.h.

Referenced by CuMatrixBase< Real >::AddCols(), CuVectorBase< Real >::AddColSumMat(), CuVectorBase< Real >::AddDiagMatMat(), CuMatrixBase< Real >::AddDiagVecMat(), CuSpMatrix< Real >::AddMat2(), CuMatrixBase< Real >::AddMatBlock(), CuMatrixBase< Real >::AddMatDiagVec(), CuBlockMatrix< Real >::AddMatMat(), CuMatrixBase< Real >::AddMatMatElements(), CuMatrixBase< Real >::AddMatSmat(), CuVectorBase< Real >::AddMatVec(), CuMatrixBase< Real >::AddRowRanges(), CuMatrixBase< Real >::AddRows(), CuMatrixBase< Real >::AddSmatMat(), CuMatrixBase< Real >::AddToRows(), NormalizeComponent::Backprop(), BatchNormComponent::Backprop(), RepeatedAffineComponent::Backprop(), GeneralDropoutComponent::Backprop(), PerElementScaleComponent::Backprop(), PerElementOffsetComponent::Backprop(), ScaleAndOffsetComponent::Backprop(), ScaleAndOffsetComponent::BackpropInternal(), kaldi::cu::BackpropLstmNonlinearity(), CuMatrix< Real >::CompObjfAndDeriv(), DistributeComponent::ComputeInputPointers(), kaldi::cu::ComputeLstmNonlinearity(), kaldi::nnet3::time_height_convolution::ConvolveBackwardData(), kaldi::nnet3::time_height_convolution::ConvolveBackwardDataInternal(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParams(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParamsInternal(), kaldi::nnet3::time_height_convolution::ConvolveForward(), kaldi::nnet3::time_height_convolution::ConvolveForwardInternal(), kaldi::cu::Copy(), CuVectorBase< Real >::CopyColFromMat(), CuMatrixBase< Real >::CopyCols(), CuVectorBase< Real >::CopyDiagFromMat(), CuVectorBase< Real >::CopyElements(), CuTpMatrix< Real >::CopyFromMat(), CuCompressedMatrix< I >::CopyFromMat(), CuSpMatrix< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyRows(), CuVectorBase< Real >::CopyRowsFromMat(), VectorBase< Real >::CopyRowsFromMat(), CuSparseMatrix< Real >::CopyToMat(), CuCompressedMatrix< I >::CopyToMat(), CuMatrixBase< Real >::DiffGroupPnorm(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), kaldi::cu::DiffNormalizePerRow(), CuMatrixBase< Real >::DiffSoftmaxPerRow(), kaldi::cu::EnsureNonzero(), CuMatrixBase< Real >::EqualElementMask(), NnetComputer::GetPointers(), CuMatrixBase< Real >::GroupMaxDeriv(), CuTpMatrix< Real >::Invert(), CuMatrixBase< Real >::MulRows(), kaldi::cu::NormalizePerRow(), NormalizeComponent::Propagate(), BatchNormComponent::Propagate(), TimeHeightConvolutionComponent::Propagate(), RepeatedAffineComponent::Propagate(), GeneralDropoutComponent::Propagate(), PerElementOffsetComponent::Propagate(), ScaleAndOffsetComponent::Propagate(), ScaleAndOffsetComponent::PropagateInternal(), CuRand< Real >::RandGaussian(), kaldi::cu::Randomize(), CuRand< Real >::RandUniform(), kaldi::cu::RegularizeL1(), RectifiedLinearComponent::RepairGradients(), CuBlockMatrix< Real >::SetCudaData(), kaldi::cu::Splice(), BatchNormComponent::StoreStats(), CuMatrixBase< Real >::SumColumnRanges(), CuMatrixBase< Real >::SymAddMat2(), kaldi::TraceMatMat(), kaldi::TraceMatSmat(), RepeatedAffineComponent::Update(), NaturalGradientRepeatedAffineComponent::Update(), TimeHeightConvolutionComponent::UpdateNaturalGradient(), and TimeHeightConvolutionComponent::UpdateSimple().

689 { return data_; }
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
Real* Data ( )
inline

Return data pointer.

Warning: may return a pointer to GPU memory. Use at your own risk.

Definition at line 692 of file cu-matrix.h.

692 { return data_; }
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
void DiffGroupPnorm ( const CuMatrixBase< Real > &  in_value,
const CuMatrixBase< Real > &  out_value,
const CuMatrixBase< Real > &  out_deriv,
Real  power 
)

Differentiate backward through the GroupPnorm function.

It is a combination of GroupPnormDeriv and MulRowsGroupMat.

Definition at line 828 of file cu-matrix.cc.

References CU1DBLOCK, CuMatrixBase< Real >::Data(), data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by PnormComponent::Backprop(), and kaldi::UnitTestCuMatrixDiffGroupPnorm().

831  {
832  KALDI_ASSERT(out_value.NumCols() > 0);
833  KALDI_ASSERT(out_value.NumCols() == out_deriv.NumCols());
834  int group_size = this->NumCols() / out_value.NumCols();
835  KALDI_ASSERT(this->NumCols() == out_value.NumCols() * group_size);
836 #if HAVE_CUDA == 1
837  if (CuDevice::Instantiate().Enabled()) {
838  CuTimer tim;
839  const int kWarpSize = 32;
840  dim3 dimBlock(kWarpSize, CU1DBLOCK / kWarpSize);
841  dim3 dimGrid(n_blocks(NumCols(), dimBlock.x),
842  n_blocks(NumRows(), dimBlock.y));
843  if (dimGrid.x * dimGrid.y > 1024) {
844  dimGrid.y = std::max(1024 / dimGrid.x, unsigned(1));
845  }
846  cuda_diff_group_pnorm(dimGrid, dimBlock, this->data_, in_value.Data(),
847  out_value.Data(), out_deriv.Data(), Dim(),
848  in_value.Stride(), out_value.Stride(),
849  out_deriv.Stride(), group_size, power);
850  CU_SAFE_CALL(cudaGetLastError());
851  CuDevice::Instantiate().AccuProfile(__func__, tim);
852  } else
853 #endif
854  {
855  Mat().GroupPnormDeriv(in_value.Mat(), out_value.Mat(), power);
856  MulRowsGroupMat(out_deriv);
857  }
858 }
void MulRowsGroupMat(const CuMatrixBase< Real > &src)
divide each row into src.NumCols() groups, and then scale i'th row's jth group of elements by src[i...
Definition: cu-matrix.cc:803
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void DiffLogSoftmaxPerRow ( const CuMatrixBase< Real > &  out_value,
const CuMatrixBase< Real > &  out_deriv 
)

Differentiate backward through the log softmax function.

Here, "out_value" is the log softmax output. Does, for each row i, *this(i) = out_deriv(i) - sum(out_deriv(i)) .* exp(out_value(i)) xxxx(i) is row-vector. Supports in-place operation, this == &out_deriv.

Definition at line 1887 of file cu-matrix.cc.

References CuVectorBase< Real >::AddColSumMat(), CuMatrixBase< Real >::AddMat(), CuMatrixBase< Real >::ApplyExp(), CuMatrixBase< Real >::CopyFromMat(), CU1DBLOCK, CuMatrixBase< Real >::Data(), data_, CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kUndefined, CuMatrixBase< Real >::MulRowsVec(), CuMatrixBase< Real >::NumRows(), kaldi::SameDim(), CuMatrixBase< Real >::Scale(), and CuMatrixBase< Real >::Stride().

Referenced by LogSoftmaxComponent::Backprop(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), and kaldi::UnitTestCuDiffLogSoftmax().

1888  {
1889 
1890  KALDI_ASSERT(SameDim(out_value, out_deriv) && SameDim(out_value, *this) &&
1891  this != &out_value);
1892 
1893 #if HAVE_CUDA == 1
1894  if (CuDevice::Instantiate().Enabled()) {
1895  CuTimer tim;
1896 
1897  // CUDA thread layout: one thread block per matrix-row.
1898  dim3 dimBlock(CU1DBLOCK);
1899  dim3 dimGrid(num_rows_);
1900  cuda_diff_log_softmax(dimGrid, dimBlock, this->Dim(), out_value.Data(),
1901  out_value.Stride(), out_deriv.Data(),
1902  out_deriv.Stride(), data_);
1903  CU_SAFE_CALL(cudaGetLastError());
1904 
1905  CuDevice::Instantiate().AccuProfile(__func__, tim);
1906  } else
1907 #endif
1908  {
1909  if (this == &out_deriv) {
1910  // the code below doesn't work for in-place, so make a copy and recurse.
1911  CuMatrix<Real> temp(NumRows(), NumCols(), kUndefined);
1912  temp.DiffLogSoftmaxPerRow(out_value, out_deriv);
1913  CopyFromMat(temp);
1914  return;
1915  }
1916  /*
1917  Let the output be y, then
1918  y_i = x_i - log(sum_i exp(x_i))
1919  where x_i is the input to the component. The Jacobian matrix of this
1920  function is
1921  J = I - 1 exp(y^T)
1922  where 1 is a vector of ones. Let the derivative vector at the output be e,
1923  and at the input be d, then we have
1924  d = e - exp(y) Sum(e)
1925  d_i = e_i - exp(y_i) Sum(e)
1926  */
1927  const CuMatrixBase<Real> &Y(out_value), &E(out_deriv);
1928  CuMatrixBase<Real> &D(*this);
1929 
1930  D.CopyFromMat(Y);
1931  D.ApplyExp(); // exp(y)
1932  CuVector<Real> E_sum(D.NumRows()); // Initializes to zero
1933  E_sum.AddColSumMat(1.0, E); // Sum(e)
1934  D.MulRowsVec(E_sum); // exp(y) Sum(e)
1935  D.Scale(-1.0); // - exp(y) Sum(e)
1936  D.AddMat(1.0, E, kNoTrans); // e - exp(y_i) Sum(e)
1937  }
1938 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
void CopyFromMat(const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:339
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void DiffParametricRelu ( const CuMatrixBase< Real > &  value,
const CuMatrixBase< Real > &  diff,
const CuVectorBase< Real > &  alpha,
const CuVectorBase< Real > &  beta 
)

Differentiate backward through the parametric relu function.

Here the "value" is the Relu input. Does, element-by-element. *this = diff * (value > 0 ? alpha : beta)

Definition at line 1488 of file cu-matrix.cc.

References CU2DBLOCK, data_, CuVectorBase< Real >::data_, CuMatrixBase< Real >::data_, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::Stride(), and CuVectorBase< Real >::Vec().

Referenced by ParametricRelu::BackpropagateFnc().

1492  {
1493 #if HAVE_CUDA == 1
1494  if (CuDevice::Instantiate().Enabled()) {
1495  CuTimer tim;
1496 
1497  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
1498  dim3 dimGrid(n_blocks(num_cols_, CU2DBLOCK), n_blocks(num_rows_, CU2DBLOCK));
1499 
1500  cuda_diff_parametric_relu(dimGrid, dimBlock, data_, diff.data_, value.data_,
1501  Dim(), diff.Stride(), value.Stride(),
1502  alpha.data_, beta.data_);
1503  CU_SAFE_CALL(cudaGetLastError());
1504 
1505  CuDevice::Instantiate().AccuProfile(__func__, tim);
1506  } else
1507 #endif
1508  {
1509  // Do it on CPU,
1510  for (MatrixIndexT r = 0; r < NumRows(); r++) {
1511  for (MatrixIndexT c = 0; c < NumCols(); c++) {
1512  Real value_elem = value.Mat()(r,c);
1513  this->Mat()(r,c) = diff.Mat()(r,c) *
1514  (value_elem >= 0.0 ? alpha.Vec()(c) : beta.Vec()(c));
1515  }
1516  }
1517  }
1518 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
int32 MatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:220
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void DiffSigmoid ( const CuMatrixBase< Real > &  value,
const CuMatrixBase< Real > &  diff 
)

Differentiate backward through the sigmoid function.

Here, "value" is the sigmoid output. Does, element-by-element, *this = diff * value * (1 - value).

Definition at line 1748 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by SigmoidComponent::Backprop(), Sigmoid::BackpropagateFnc(), and kaldi::UnitTestCuDiffSigmoid().

1749  {
1750  KALDI_ASSERT(SameDim(*this, value) && SameDim(*this, diff));
1751 #if HAVE_CUDA == 1
1752  if (CuDevice::Instantiate().Enabled()) {
1753  CuTimer tim;
1754  dim3 dimGrid, dimBlock;
1755  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1756  &dimGrid, &dimBlock);
1757  cuda_diff_sigmoid(dimGrid, dimBlock, data_, diff.data_, value.data_, Dim(), diff.Stride(), value.Stride());
1758  CU_SAFE_CALL(cudaGetLastError());
1759 
1760  CuDevice::Instantiate().AccuProfile(__func__, tim);
1761  } else
1762 #endif
1763  {
1764  Mat().DiffSigmoid(value.Mat(), diff.Mat());
1765  }
1766 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void DiffSoftmaxPerRow ( const CuMatrixBase< Real > &  value,
const CuMatrixBase< Real > &  diff 
)

Differentiate backward through the softmax function.

Here, "value" is the softmax output. Does, for each row i, *this(i) = diff(i) * diag(value(i)) - diff(i) * (value(i)^T * value(i)) xxxx(i) is row-vector; '*' and '-' are matrix operations. Supports in-place operation, this == &diff.

Definition at line 1852 of file cu-matrix.cc.

References CuVectorBase< Real >::AddDiagMatMat(), CuMatrixBase< Real >::AddDiagVecMat(), CuMatrixBase< Real >::CopyFromMat(), CU1DBLOCK, CuMatrixBase< Real >::Data(), data_, KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::MulElements(), CuMatrixBase< Real >::NumRows(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by kaldi::nnet3::attention::AttentionBackward(), SoftmaxComponent::Backprop(), and kaldi::UnitTestCuDiffSoftmax().

1853  {
1854 
1855  KALDI_ASSERT(SameDim(value, diff) && SameDim(value, *this) &&
1856  this != &value);
1857 
1858 #if HAVE_CUDA == 1
1859  if (CuDevice::Instantiate().Enabled()) {
1860  CuTimer tim;
1861 
1862  // CUDA thread layout: one thread block per matrix-row.
1863  dim3 dimBlock(CU1DBLOCK);
1864  dim3 dimGrid(num_rows_);
1865  cuda_diff_softmax(dimGrid, dimBlock, data_, this->Dim(), value.Data(),
1866  value.Stride(), diff.Data(), diff.Stride());
1867  CU_SAFE_CALL(cudaGetLastError());
1868 
1869  CuDevice::Instantiate().AccuProfile(__func__, tim);
1870  } else
1871 #endif
1872  {
1873  const CuMatrixBase<Real> &P(value), &E(diff);
1874  CuMatrixBase<Real> &D(*this);
1875 
1876  CuVector<Real> pe_vec(D.NumRows()); // For each row i, the dot product (p_t . e_t).
1877  pe_vec.AddDiagMatMat(1.0, P, kNoTrans, E, kTrans, 0.0);
1878 
1879  D.CopyFromMat(E);
1880  D.MulElements(P);
1881  // At this point, D = P .* E (in matlab notation)
1882  D.AddDiagVecMat(-1.0, pe_vec, P, kNoTrans, 1.0); // does D -= diag(pe_vec) * P.
1883  }
1884 }
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void DiffTanh ( const CuMatrixBase< Real > &  value,
const CuMatrixBase< Real > &  diff 
)

Differentiate backward through the tanh function.

Here, "value" is the tanh output. Does, element-by-element, *this = diff * (1 - value^2).

Definition at line 1793 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, CuMatrixBase< Real >::Mat(), and CuMatrixBase< Real >::Stride().

Referenced by TanhComponent::Backprop(), RecurrentComponent::BackpropagateFnc(), Tanh::BackpropagateFnc(), and kaldi::UnitTestCuDiffTanh().

1794  {
1795 #if HAVE_CUDA == 1
1796  if (CuDevice::Instantiate().Enabled()) {
1797  CuTimer tim;
1798  dim3 dimGrid, dimBlock;
1799  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1800  &dimGrid, &dimBlock);
1801  cuda_diff_tanh(dimGrid, dimBlock, data_, diff.data_, value.data_, Dim(), diff.Stride(), value.Stride());
1802  CU_SAFE_CALL(cudaGetLastError());
1803 
1804  CuDevice::Instantiate().AccuProfile(__func__, tim);
1805  } else
1806 #endif
1807  {
1808  Mat().DiffTanh(value.Mat(), diff.Mat());
1809  }
1810 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
::MatrixDim Dim() const
Definition: cu-matrix.h:220
void DiffXent ( const CuArrayBase< int32 > &  tgt,
CuVector< Real > *  log_post_tgt 
)

Differentiate the block [softmax+cross-entropy] : dE/da = posterior_mat - target_mat, 'E' is error function, 'a' is activation on softmax input.

Interface: tgt ... index vector, encodes the matrix of targets net_out_or_diff ... before invocation net output, after diff dE/da log_post_tgt ... per-frame statistics for cross-entropy computations : log(sum_row(posterior_mat .* target_mat))

Definition at line 1941 of file cu-matrix.cc.

References CU2DBLOCK, CuArrayBase< T >::Data(), data_, CuVectorBase< Real >::data_, CuArrayBase< T >::Dim(), KALDI_ASSERT, kaldi::Log(), CuVector< Real >::Resize(), and CuVectorBase< Real >::Vec().

Referenced by kaldi::UnitTestCuDiffXent().

1942  {
1943 
1944  KALDI_ASSERT(tgt.Dim() == num_rows_);
1945  log_post_tgt->Resize(tgt.Dim());
1946 
1947 #if HAVE_CUDA == 1
1948  if (CuDevice::Instantiate().Enabled()) {
1949  CuTimer tim;
1950  dim3 dimBlock(1, CU2DBLOCK*8);
1951  dim3 dimGrid(1, n_blocks(tgt.Dim(), CU2DBLOCK*8));
1952  cuda_diff_xent(dimGrid, dimBlock, tgt.Data(), data_,
1953  log_post_tgt->data_, Dim());
1954 
1955  CuDevice::Instantiate().AccuProfile(__func__, tim);
1956  } else
1957 #endif
1958  {
1959  MatrixIndexT num_rows = num_rows_;
1960  for(int32 r = 0; r < num_rows; r++) {
1961  int32 col_tgt = tgt.Data()[r];
1962  Real &value = Mat()(r, col_tgt);
1963  log_post_tgt->Vec()(r) = Log(value);
1964  value -= 1.0;
1965  }
1966  }
1967 }
const T * Data() const
Get raw pointer.
Definition: cu-array.h:52
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
int32 MatrixIndexT
Definition: matrix-common.h:98
double Log(double x)
Definition: kaldi-math.h:100
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:49
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void DivElements ( const CuMatrixBase< Real > &  A)

Divide two matrices elementwise: C = A ./ A.

Definition at line 678 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by CuVectorBase< Real >::DivElements(), kaldi::UnitTestCuMatrixDivElements(), and kaldi::UnitTestCuMatrixSetMatMatDivMat().

678  {
679  #if HAVE_CUDA == 1
680  if (CuDevice::Instantiate().Enabled()) {
681  CuTimer tim;
682 
683  KALDI_ASSERT(num_cols_ == A.NumCols());
684  KALDI_ASSERT(num_rows_ == A.NumRows());
685 
686  dim3 dimGrid, dimBlock;
687  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
688  &dimGrid, &dimBlock);
689 
690  cuda_div_elements(dimGrid, dimBlock, data_, A.data_, Dim(), A.Stride());
691  CU_SAFE_CALL(cudaGetLastError());
692 
693  CuDevice::Instantiate().AccuProfile(__func__, tim);
694  } else
695  #endif
696  {
697  Mat().DivElements(A.Mat());
698  }
699 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
KALDI_ASSERT & A
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void DivRowsVec ( const CuVectorBase< Real > &  div)

divide i'th row by scale[i]

Definition at line 886 of file cu-matrix.cc.

References data_, CuVectorBase< Real >::data_, CuVectorBase< Real >::Dim(), VectorBase< Real >::InvertElements(), KALDI_ASSERT, and CuVectorBase< Real >::Vec().

Referenced by StatisticsPoolingComponent::Backprop(), StatisticsPoolingComponent::Propagate(), kaldi::TestCuMatrixDivRowsVec(), and kaldi::UnitTestCuMatrixDivRowsVec().

886  {
887 #if HAVE_CUDA == 1
888  if (CuDevice::Instantiate().Enabled()) {
889  CuTimer tim;
890 
891  KALDI_ASSERT(div.Dim() == NumRows());
892 
893  dim3 dimGrid, dimBlock;
894  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
895  &dimGrid, &dimBlock);
896  // For large matrix we do more work per thread by limiting the
897  // the grid size to reduce the block launching overhead.
898  if (dimGrid.x * dimGrid.y > 1024) {
899  dimGrid.x = 1024 / dimGrid.y;
900  if (dimGrid.x == 0) {
901  dimGrid.x = 1;
902  }
903  }
904  cuda_div_rows_vec(dimGrid, dimBlock, data_, div.data_, Dim());
905  CU_SAFE_CALL(cudaGetLastError());
906 
907  CuDevice::Instantiate().AccuProfile(__func__, tim);
908  } else
909 #endif
910  {
911  Vector<Real> temp(div.Vec()); // will copy.
912  temp.InvertElements();
913  Mat().MulRowsVec(temp);
914  }
915 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void EqualElementMask ( const CuMatrixBase< Real > &  mat,
CuMatrix< Real > *  mask 
) const

Definition at line 3395 of file cu-matrix.cc.

References CuMatrixBase< Real >::Data(), data_, KALDI_ASSERT, kaldi::kSetZero, CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), CuMatrix< Real >::Resize(), and CuMatrixBase< Real >::Stride().

Referenced by MaxpoolingComponent::Backprop(), MaxPoolingComponent::BackpropagateFnc(), and MaxPooling2DComponent::BackpropagateFnc().

3395  {
3396  // Check the inputs:
3397  KALDI_ASSERT(mat.NumRows() == NumRows() && mat.NumCols() == NumCols());
3398  KALDI_ASSERT(mask != NULL);
3399  // Resizes the output matrix:
3400  mask->Resize(NumRows(), NumCols(), kSetZero);
3401 
3402 #if HAVE_CUDA == 1
3403  if (CuDevice::Instantiate().Enabled()) {
3404  CuTimer tim;
3405  dim3 dimGrid, dimBlock;
3406  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
3407  &dimGrid, &dimBlock);
3408  cuda_equal_element_mask(dimGrid, dimBlock, this->data_, mat.Data(),
3409  mask->Data(), this->Dim(), mat.Stride(),
3410  mask->Stride());
3411  CU_SAFE_CALL(cudaGetLastError());
3412 
3413  CuDevice::Instantiate().AccuProfile(__func__, tim);
3414  } else
3415 #endif
3416  {
3417  for (int32 r = 0; r < NumRows(); r++) {
3418  for (int32 c = 0; c < NumCols(); c++) {
3419  (*mask)(r,c) = ((*this)(r,c) == mat(r,c) ? 1.0 : 0.0);
3420  }
3421  }
3422  }
3423 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void FindRowMaxId ( CuArray< int32 > *  id) const

Find the id of the maximal element for each row (resizes the 'id' array to the appropriate size).

Definition at line 1813 of file cu-matrix.cc.

References CU1DBLOCK, rnnlm::d, CuArrayBase< T >::Data(), and data_.

Referenced by kaldi::nnet3::ComputeAccuracy(), NnetUpdater::ComputeTotAccuracy(), Xent::Eval(), kaldi::TestCuFindRowMaxId(), and kaldi::UnitTestCuFindRowMaxId().

1813  {
1814 #if HAVE_CUDA == 1
1815  if (CuDevice::Instantiate().Enabled()) {
1816  CuTimer tim;
1817  id->Resize(num_rows_);
1818  MatrixDim d = Dim();
1819 
1820  // CUDA thread layout: one thread block per matrix-row.
1821  dim3 dimBlock(CU1DBLOCK);
1822  dim3 dimGrid(num_rows_);
1823  cuda_find_row_max_id(dimGrid, dimBlock, data_, NULL, id->Data(), d);
1824  CU_SAFE_CALL(cudaGetLastError());
1825 
1826  // now we have the indices!
1827  CuDevice::Instantiate().AccuProfile(__func__, tim);
1828  } else
1829 #endif
1830  {
1831  // allocate index buffer
1832  id->Resize(num_rows_);
1833  id->Set(-1);
1834  // find maxima
1835  MatrixIndexT num_rows = num_rows_, num_cols = num_cols_;
1836  for (MatrixIndexT r = 0; r < num_rows; r++) {
1837  Real max = -1e21;
1838  int32 max_id = -1;
1839  const Real *row_data = Mat().RowData(r);
1840  for (MatrixIndexT c = 0; c < num_cols; c++) {
1841  if (max < row_data[c]) {
1842  max = row_data[c];
1843  max_id = c;
1844  }
1845  }
1846  id->Data()[r] = max_id;
1847  }
1848  }
1849 }
const T * Data() const
Get raw pointer.
Definition: cu-array.h:52
Structure containing size of the matrix plus stride.
Definition: cu-matrixdim.h:52
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
int32 MatrixIndexT
Definition: matrix-common.h:98
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:220
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
Real FrobeniusNorm ( ) const
inline

Definition at line 225 of file cu-matrix.h.

Referenced by CuMatrixBase< Real >::ApproxEqual(), kaldi::nnet3::ConstrainOrthonormalInternal(), and kaldi::UnitTestCuSparseMatrixFrobeniusNorm().

225 { return sqrt(TraceMatMat(*this, *this, kTrans)); }
friend Real TraceMatMat(const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, MatrixTransposeType trans)
Definition: cu-matrix.cc:2128
void GroupMax ( const CuMatrixBase< Real > &  src)

Apply the function y(i) = (max_{j = i*G}^{(i+1)*G-1} x_j where G = x.NumCols() / y.NumCols() must be an integer.

[note: y corresponds to *this and x to src, so src.NumCols() / this->NumCols() must be an integer.

Definition at line 1604 of file cu-matrix.cc.

References CU1DBLOCK, data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by MaxoutComponent::Propagate(), kaldi::TestCuMatrixGroupMax(), kaldi::TestCuMatrixGroupMaxAllGroupSizes(), and kaldi::UnitTestCuMatrixGroupMax().

1604  {
1605  int group_size = src.NumCols() / this->NumCols();
1606  KALDI_ASSERT(src.NumCols() == this->NumCols() * group_size &&
1607  this->NumRows() == src.NumRows());
1608 #if HAVE_CUDA == 1
1609  if (CuDevice::Instantiate().Enabled()) {
1610  CuTimer tim;
1611  // One thread block per row.
1612  // Use 2D block for small group size to simplify the calculation.
1613  // Each group is reduced by threads_per_group threads.
1614  // threads_per_group should be a power of 2 for fast tree reduction.
1615  // group size: 1 2 3 4 5 6 7 .. 12 13 .. 24 25 .. 48 ...
1616  // threads_per_group: 1 1 1 2 2 2 4 .. 4 8 .. 8 16 .. 16 ...
1617  int threads_per_group = CU1DBLOCK;
1618  while (threads_per_group * 3 / 2 >= group_size) {
1619  threads_per_group >>= 1;
1620  }
1621  if (group_size == 1) {
1622  threads_per_group = 1;
1623  }
1624  dim3 dimBlock(threads_per_group, CU1DBLOCK / threads_per_group);
1625  dim3 dimGrid(NumRows());
1626  cuda_group_max(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
1627  src.Stride(), group_size);
1628  CU_SAFE_CALL(cudaGetLastError());
1629  CuDevice::Instantiate().AccuProfile(__func__, tim);
1630  } else
1631 #endif
1632  {
1633  Mat().GroupMax(src.Mat());
1634  }
1635 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void GroupMaxDeriv ( const CuMatrixBase< Real > &  input,
const CuMatrixBase< Real > &  output 
)

Calculate derivatives for the GroupMax function above, where "input" is the input to the GroupMax function above (i.e.

the "src" variable), and "output" is the result of the computation (i.e. the "this" of that function call), and *this must have the same dimension as "input". Each element of *this will be set to 1 if the corresponding input equals the output of the group, and 0 otherwise. The equals the function derivative where it is defined (it's not defined where multiple inputs in the group are equal to the output).

Definition at line 861 of file cu-matrix.cc.

References CU2DBLOCK, CuMatrixBase< Real >::Data(), data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by MaxoutComponent::Backprop(), kaldi::TestCuMatrixGroupMaxDeriv(), and kaldi::UnitTestCuMatrixGroupMaxDeriv().

862  {
863  KALDI_ASSERT(src2.NumCols() > 0);
864  int group_size = this->NumCols() / src2.NumCols();
865  KALDI_ASSERT(this->NumCols() == src2.NumCols() * group_size);
866 #if HAVE_CUDA == 1
867  if (CuDevice::Instantiate().Enabled()) {
868  CuTimer tim;
869  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
870  dim3 dimGrid(n_blocks(NumCols(), CU2DBLOCK),
871  n_blocks(NumRows(), CU2DBLOCK));
872  cuda_calc_group_max_deriv(dimGrid, dimBlock, this->data_, src1.Data(),
873  src2.Data(), Dim(), src1.Stride(), src2.Stride(),
874  group_size);
875  CU_SAFE_CALL(cudaGetLastError());
876 
877  CuDevice::Instantiate().AccuProfile(__func__, tim);
878  } else
879 #endif
880  {
881  Mat().GroupMaxDeriv(src1.Mat(), src2.Mat());
882  }
883 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void GroupPnorm ( const CuMatrixBase< Real > &  src,
Real  pow 
)

Apply the function y(i) = (sum_{j = i*G}^{(i+1)*G-1} x_j ^ (power)) ^ (1 / p) where G = x.NumCols() / y.NumCols() must be an integer.

[note: y corresponds to *this and x to src, so src.NumCols() / this->NumCols() must be an integer.

Definition at line 1563 of file cu-matrix.cc.

References CU1DBLOCK, CU2DBLOCK, data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by PnormComponent::Propagate(), kaldi::TestCuMatrixDiffGroupPnorm(), kaldi::TestCuMatrixGroupPnorm(), and kaldi::UnitTestCuMatrixGroupPnorm().

1563  {
1564  int group_size = src.NumCols() / this->NumCols();
1565  KALDI_ASSERT(src.NumCols() == this->NumCols() * group_size &&
1566  this->NumRows() == src.NumRows());
1567 #if HAVE_CUDA == 1
1568  if (CuDevice::Instantiate().Enabled()) {
1569  CuTimer tim;
1570  if (power == Real(0) || power == Real(1) || power == Real(2)
1571  || power == std::numeric_limits<Real>::infinity()) {
1572  // One thread block per row.
1573  // Use 2D block for small group size to simplify the calculation
1574  // Each group is reduced by threads_per_group threads.
1575  // threads_per_group should be a power of 2 for fast tree reduction.
1576  int threads_per_group = CU1DBLOCK;
1577  while (threads_per_group * 3 / 2 >= group_size) {
1578  threads_per_group >>= 1;
1579  }
1580  if (group_size == 1) {
1581  threads_per_group = 1;
1582  }
1583  dim3 dimBlock(threads_per_group, CU1DBLOCK / threads_per_group);
1584  dim3 dimGrid(NumRows());
1585  cuda_group_spec_pnorm(dimGrid, dimBlock, this->data_, src.data_,
1586  this->Dim(), src.Stride(), group_size, power);
1587  } else {
1588  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
1589  dim3 dimGrid(n_blocks(NumCols(), CU2DBLOCK),
1590  n_blocks(NumRows(), CU2DBLOCK));
1591  cuda_group_pnorm(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
1592  src.Stride(), group_size, power);
1593  }
1594  CU_SAFE_CALL(cudaGetLastError());
1595  CuDevice::Instantiate().AccuProfile(__func__, tim);
1596  } else
1597 #endif
1598  {
1599  Mat().GroupPnorm(src.Mat(), power);
1600  }
1601 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void Heaviside ( const CuMatrixBase< Real > &  src)

Set each element to the Heaviside function of the corresponding element of "src", which we define as the function (x > 0 ? 1.0 : 0.0) [note: in general, there are different ways to deal with the situation when x==0.

]

Definition at line 2463 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by RectifiedLinearComponent::Backprop(), CuRand< Real >::BinarizeProbs(), kaldi::CuCompressedMatrixTestSign(), Dropout::PropagateFnc(), RectifiedLinearComponent::StoreStats(), and kaldi::UnitTestCuMatrixHeaviside().

2463  {
2464  KALDI_ASSERT(SameDim(*this, src));
2465 #if HAVE_CUDA == 1
2466  if (CuDevice::Instantiate().Enabled()) {
2467  CuTimer tim;
2468  dim3 dimGrid, dimBlock;
2469  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2470  &dimGrid, &dimBlock);
2471  cuda_heaviside(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
2472  src.Stride());
2473  CU_SAFE_CALL(cudaGetLastError());
2474 
2475  CuDevice::Instantiate().AccuProfile(__func__, tim);
2476  } else
2477  #endif
2478  {
2479  Mat().Heaviside(src.Mat());
2480  }
2481 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void InvertElements ( )

invert the matrix by elements.

Definition at line 919 of file cu-matrix.cc.

References data_.

Referenced by kaldi::TestCuMatrixCompObjfAndDeriv(), NnetEnsembleTrainer::TrainOneMinibatch(), kaldi::UnitTestCuMatrixInvertElements(), and kaldi::UnitTestCuMatrixObjfDeriv().

919  {
920 #if HAVE_CUDA == 1
921  if (CuDevice::Instantiate().Enabled()) {
922  CuTimer tim;
923 
924  dim3 dimGrid, dimBlock;
925  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
926  &dimGrid, &dimBlock);
927 
928  cuda_invert_elements(dimGrid, dimBlock, data_, Dim());
929  CU_SAFE_CALL(cudaGetLastError());
930 
931  CuDevice::Instantiate().AccuProfile(__func__, tim);
932  } else
933 #endif
934  {
935  Mat().InvertElements();
936  }
937 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
::MatrixDim Dim() const
Definition: cu-matrix.h:220
bool IsUnit ( Real  tol = 0.001) const

Definition at line 595 of file cu-matrix.cc.

References KALDI_ASSERT, kaldi::kTrans, and kaldi::TraceMatMat().

Referenced by OnlinePreconditioner::InitOrthonormalSpecial(), kaldi::UnitTestCuMatrixSymInvertPosDef(), and kaldi::UnitTestCuSpMatrixInvert().

595  {
596  // want to return:
597  //FrobeniusNorm(*this - I) <= tol * NumRows(), i.e.:
598  //sqrt (trace((*this - I)(*this-I)) <= tol * NumRows()
599  // trace((*this - I)(*this - I)) <= tol * NumRows()
600  // trace(*this * *this) + trace(I) - 2 * trace(*this) <= tol * NumRows()
601  // trace(*this * *this) + dim - 2*this.Trace() <= tol * NumRows()
602  KALDI_ASSERT(this->NumRows() == this->NumCols());
603  return (TraceMatMat(*this, *this, kTrans) + this->NumRows() - 2.0 * this->Trace() <=
604  tol * this->NumRows());
605 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
friend Real TraceMatMat(const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, MatrixTransposeType trans)
Definition: cu-matrix.cc:2128
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
Real Trace(bool check_square=true) const
Return the trace. If check_square = true, will crash if matrix is not square.
Definition: cu-matrix.cc:3042
KALDI_DISALLOW_COPY_AND_ASSIGN ( CuMatrixBase< Real >  )
private
void Lookup ( const std::vector< Int32Pair > &  indexes,
Real *  output 
) const

Definition at line 3336 of file cu-matrix.cc.

References rnnlm::i, and KALDI_ASSERT.

Referenced by NnetDiscriminativeUpdater::LatticeComputations(), DiscriminativeComputation::LookupNnetOutput(), kaldi::TestCuMatrixLookup(), and kaldi::UnitTestCuMatrixLookup().

3337  {
3338  // Checks the dimension.
3339  MatrixIndexT num_rows = this->num_rows_, num_cols = this->num_cols_;
3340  for (int32 i = 0; i < indices.size(); ++i) {
3341  KALDI_ASSERT(indices[i].first < num_rows && indices[i].first >= 0 &&
3342  indices[i].second < num_cols && indices[i].second >= 0);
3343  }
3344  if (indices.size() == 0) return;
3345  KALDI_ASSERT(output != NULL);
3346 
3347 #if HAVE_CUDA == 1
3348  if (CuDevice::Instantiate().Enabled()) {
3349  CuArray<Int32Pair> cuda_indices(indices);
3350  Lookup(cuda_indices, output);
3351  } else
3352 #endif
3353  {
3354  for (int32 i = 0; i < indices.size(); i++) {
3355  output[i] = (*this)(indices[i].first, indices[i].second);
3356  }
3357  }
3358 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
int32 MatrixIndexT
Definition: matrix-common.h:98
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void Lookup(const std::vector< Int32Pair > &indexes, Real *output) const
Definition: cu-matrix.cc:3336
void Lookup ( const CuArrayBase< Int32Pair > &  indexes,
Real *  output 
) const

Definition at line 3361 of file cu-matrix.cc.

References CuArrayBase< T >::CopyToHost(), CU1DBLOCK, CuArrayBase< T >::Data(), data_, CuArrayBase< T >::Dim(), Int32Pair::first, rnnlm::i, KALDI_ASSERT, and Int32Pair::second.

3362  {
3363  int32 num_elements = indices.Dim();
3364  if (num_elements == 0) return;
3365  KALDI_ASSERT(output != NULL);
3366 
3367 #if HAVE_CUDA == 1
3368  if (CuDevice::Instantiate().Enabled()) {
3369  CuArray<Real> cuda_output(num_elements);
3370  CuTimer tim;
3371  dim3 dimBlock(CU1DBLOCK, 1);
3372  dim3 dimGrid(n_blocks(num_elements, CU1DBLOCK), 1);
3373 
3374  cuda_matrix_lookup(dimGrid, dimBlock, this->data_, this->Dim(),
3375  indices.Data(), num_elements, cuda_output.Data());
3376  CU_SAFE_CALL(cudaGetLastError());
3377 
3378  cuda_output.CopyToHost(output);
3379  CuDevice::Instantiate().AccuProfile(__func__, tim);
3380  } else
3381 #endif
3382  {
3383  MatrixIndexT num_rows = this->num_rows_, num_cols = this->num_cols_;
3384  const Int32Pair *index = indices.Data();
3385  for (int32 i = 0; i < num_elements; i++) {
3386  KALDI_ASSERT(index[i].first < num_rows && index[i].first >= 0 &&
3387  index[i].second < num_cols && index[i].second >= 0);
3388  output[i] = (*this)(index[i].first, index[i].second);
3389  }
3390  }
3391 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
int32 MatrixIndexT
Definition: matrix-common.h:98
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
int32_cuda second
Definition: cu-matrixdim.h:86
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
int32_cuda first
Definition: cu-matrixdim.h:85
const MatrixBase<Real>& Mat ( ) const
inline

Definition at line 698 of file cu-matrix.h.

Referenced by CuMatrixBase< Real >::AddCols(), CuVectorBase< Real >::AddColSumMat(), CuVectorBase< Real >::AddDiagMat2(), CuVectorBase< Real >::AddDiagMatMat(), CuMatrixBase< Real >::AddDiagVecMat(), CuMatrixBase< Real >::AddMat(), CuSpMatrix< Real >::AddMat2(), CuMatrixBase< Real >::AddMatBlocks(), CuMatrixBase< Real >::AddMatDiagVec(), CuMatrixBase< Real >::AddMatMat(), CuMatrixBase< Real >::AddMatMatElements(), CuMatrixBase< Real >::AddMatSmat(), CuVectorBase< Real >::AddMatVec(), CuMatrixBase< Real >::AddRows(), CuMatrixBase< Real >::AddSmatMat(), GeneralMatrix::AddToMat(), CuMatrixBase< Real >::AddToRows(), CuMatrixBase< Real >::ApplyLogSoftMaxPerRow(), CuMatrixBase< Real >::ApplySoftMaxPerRow(), kaldi::cu::BackpropLstmNonlinearity(), kaldi::cu::ComputeLstmNonlinearity(), kaldi::cu::Copy(), CuVectorBase< Real >::CopyColFromMat(), CuMatrixBase< Real >::CopyCols(), CuVectorBase< Real >::CopyElements(), CuTpMatrix< Real >::CopyFromMat(), CuSpMatrix< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyRows(), CuVectorBase< Real >::CopyRowsFromMat(), VectorBase< Real >::CopyRowsFromMat(), CuSparseMatrix< Real >::CopyToMat(), GeneralMatrix::CopyToMat(), CuMatrixBase< Real >::DiffGroupPnorm(), CuMatrixBase< Real >::DiffParametricRelu(), CuMatrixBase< Real >::DiffSigmoid(), CuMatrixBase< Real >::DiffTanh(), CuMatrixBase< Real >::DivElements(), CuMatrixBase< Real >::GroupMax(), CuMatrixBase< Real >::GroupMaxDeriv(), CuMatrixBase< Real >::GroupPnorm(), CuMatrixBase< Real >::Heaviside(), CuMatrixBase< Real >::Max(), CuMatrixBase< Real >::Min(), CuMatrixBase< Real >::MulElements(), CuMatrixBase< Real >::MulRows(), CuMatrixBase< Real >::MulRowsGroupMat(), CuMatrixBase< Real >::ParametricRelu(), CuRand< Real >::RandGaussian(), kaldi::cu::Randomize(), CuRand< Real >::RandUniform(), kaldi::cu::RegularizeL1(), CuMatrixBase< Real >::SetMatMatDivMat(), CuMatrixBase< Real >::Sigmoid(), CuMatrixBase< Real >::SoftHinge(), kaldi::cu::Splice(), CuMatrixBase< Real >::SymAddMat2(), CuMatrixBase< Real >::Tanh(), kaldi::TraceMatMat(), and kaldi::TraceMatSmat().

698  {
699  return *(reinterpret_cast<const MatrixBase<Real>* >(this));
700  }
MatrixBase<Real>& Mat ( )
inline

Definition at line 701 of file cu-matrix.h.

701  {
702  return *(reinterpret_cast<MatrixBase<Real>* >(this));
703  }
void Max ( const CuMatrixBase< Real > &  A)

Do, elementwise, *this = max(*this, A).

Definition at line 702 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by kaldi::CuCompressedMatrixTestNonnegative(), kaldi::CuCompressedMatrixTestSymmetric(), main(), SpliceMaxComponent::Propagate(), MaxpoolingComponent::Propagate(), kaldi::TestCuMatrixMax(), kaldi::UnitTestCuMatrixMax(), and kaldi::UnitTestCuMatrixReduceMax().

702  {
703  #if HAVE_CUDA == 1
704  if (CuDevice::Instantiate().Enabled()) {
705  CuTimer tim;
706 
707  KALDI_ASSERT(num_cols_ == A.NumCols());
708  KALDI_ASSERT(num_rows_ == A.NumRows());
709 
710  dim3 dimGrid, dimBlock;
711  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
712  &dimGrid, &dimBlock);
713 
714  cuda_max(dimGrid, dimBlock, data_, A.data_, Dim(), A.Stride());
715  CU_SAFE_CALL(cudaGetLastError());
716 
717  CuDevice::Instantiate().AccuProfile(__func__, tim);
718  } else
719  #endif
720  {
721  Mat().Max(A.Mat());
722  }
723 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
KALDI_ASSERT & A
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
Real Max ( ) const

Definition at line 3000 of file cu-matrix.cc.

References CU1DBLOCK, CuVectorBase< Real >::Data(), data_, KALDI_ASSERT, kaldi::kUndefined, and CuVectorBase< Real >::Max().

3000  {
3001 #if HAVE_CUDA == 1
3002  if (CuDevice::Instantiate().Enabled()) {
3003  KALDI_ASSERT(num_rows_ > 0 && num_cols_ > 0);
3004  CuTimer tim;
3005 
3006  CuVector<Real> col_max(num_rows_, kUndefined);
3007  cuda_max_mat_cols(num_rows_, CU1DBLOCK, col_max.Data(), data_, Dim());
3008  Real ans = col_max.Max();
3009 
3010  CuDevice::Instantiate().AccuProfile(__func__, tim);
3011  return ans;
3012  } else
3013 #endif
3014  {
3015  return Mat().Max();
3016  }
3017 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void Min ( const CuMatrixBase< Real > &  A)

Do, elementwise, *this = min(*this, A).

Definition at line 727 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by kaldi::CuCompressedMatrixTestNonnegative(), kaldi::CuCompressedMatrixTestSymmetric(), main(), kaldi::TestCuMatrixMin(), kaldi::UnitTestCuMatrixMin(), and kaldi::UnitTestCuMatrixReduceMin().

727  {
728  #if HAVE_CUDA == 1
729  if (CuDevice::Instantiate().Enabled()) {
730  CuTimer tim;
731 
732  KALDI_ASSERT(num_cols_ == A.NumCols());
733  KALDI_ASSERT(num_rows_ == A.NumRows());
734 
735  dim3 dimGrid, dimBlock;
736  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
737  &dimGrid, &dimBlock);
738 
739  cuda_min(dimGrid, dimBlock, data_, A.data_, Dim(), A.Stride());
740  CU_SAFE_CALL(cudaGetLastError());
741 
742  CuDevice::Instantiate().AccuProfile(__func__, tim);
743  } else
744  #endif
745  {
746  Mat().Min(A.Mat());
747  }
748 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
KALDI_ASSERT & A
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
Real Min ( ) const

Definition at line 3021 of file cu-matrix.cc.

References CU1DBLOCK, CuVectorBase< Real >::Data(), data_, KALDI_ASSERT, kaldi::kUndefined, and CuVectorBase< Real >::Min().

3021  {
3022 #if HAVE_CUDA == 1
3023  if (CuDevice::Instantiate().Enabled()) {
3024  KALDI_ASSERT(num_rows_ > 0 && num_cols_ > 0);
3025  CuTimer tim;
3026 
3027  CuVector<Real> col_min(num_rows_, kUndefined);
3028  cuda_min_mat_cols(num_rows_, CU1DBLOCK, col_min.Data(), data_, Dim());
3029  Real ans = col_min.Min();
3030 
3031  CuDevice::Instantiate().AccuProfile(__func__, tim);
3032  return ans;
3033  } else
3034 #endif
3035  {
3036  return Mat().Min();
3037  }
3038 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void MulColsVec ( const CuVectorBase< Real > &  scale)

scale i'th column by scale[i]

Definition at line 752 of file cu-matrix.cc.

References data_, CuVectorBase< Real >::data_, CuVectorBase< Real >::Dim(), KALDI_ASSERT, and CuVectorBase< Real >::Vec().

Referenced by BatchNormComponent::Backprop(), PerElementScaleComponent::Backprop(), FixedScaleComponent::Backprop(), Convolutional2DComponent::BackpropagateFnc(), Rescale::BackpropagateFnc(), ScaleAndOffsetComponent::BackpropInternal(), ModelCollapser::GetDiagonallyPreModifiedComponentIndex(), BatchNormComponent::Propagate(), PerElementScaleComponent::Propagate(), FixedScaleComponent::Propagate(), Rescale::PropagateFnc(), ScaleAndOffsetComponent::PropagateInternal(), kaldi::UnitTestCuMatrixAddMatDiagVec(), and kaldi::UnitTestCuMatrixMulColsVec().

752  {
753 #if HAVE_CUDA == 1
754  if (CuDevice::Instantiate().Enabled()) {
755  CuTimer tim;
756 
757  KALDI_ASSERT(scale.Dim() == NumCols());
758 
759 
760  dim3 dimGrid, dimBlock;
761  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
762  &dimGrid, &dimBlock);
763 
764  cuda_mul_cols_vec(dimGrid, dimBlock, data_, scale.data_, Dim());
765  CU_SAFE_CALL(cudaGetLastError());
766 
767 
768  CuDevice::Instantiate().AccuProfile(__func__, tim);
769  } else
770 #endif
771  {
772  Mat().MulColsVec(scale.Vec());
773  }
774 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void MulElements ( const CuMatrixBase< Real > &  A)

Multiply two matrices elementwise: C = C .* A.

Definition at line 654 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by ElementwiseProductComponent::Backprop(), BackpropTruncationComponent::Backprop(), SigmoidComponent::Backprop(), TanhComponent::Backprop(), PowerComponent::Backprop(), RectifiedLinearComponent::Backprop(), SoftHingeComponent::Backprop(), MaxpoolingComponent::Backprop(), HiddenSoftmax::BackpropagateFnc(), Dropout::BackpropagateFnc(), ScaleAndOffsetComponent::BackpropInternal(), kaldi::nnet1::ComputeStdDev(), CuMatrixBase< Real >::DiffSoftmaxPerRow(), Xent::Eval(), Mse::Eval(), ElementwiseProductComponent::Propagate(), DropoutComponent::Propagate(), KlHmm::PropagateFnc(), LengthNormComponent::PropagateFnc(), Dropout::PropagateFnc(), ClipGradientComponent::RepairGradients(), NnetEnsembleTrainer::TrainOneMinibatch(), kaldi::UnitTestCuMatrixAddMatMatElements(), kaldi::UnitTestCuMatrixMulElements(), kaldi::nnet1::UnitTestLengthNorm(), ParametricRelu::Update(), AffineTransform::Update(), FramePoolingComponent::Update(), ConvolutionalComponent::Update(), Rescale::Update(), and NaturalGradientPerElementScaleComponent::Update().

654  {
655  #if HAVE_CUDA == 1
656  if (CuDevice::Instantiate().Enabled()) {
657  CuTimer tim;
658 
659  KALDI_ASSERT(num_cols_ == A.NumCols());
660  KALDI_ASSERT(num_rows_ == A.NumRows());
661 
662  dim3 dimGrid, dimBlock;
663  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
664  &dimGrid, &dimBlock);
665 
666  cuda_mul_elements(dimGrid, dimBlock, data_, A.data_, Dim(), A.Stride());
667  CU_SAFE_CALL(cudaGetLastError());
668 
669  CuDevice::Instantiate().AccuProfile(__func__, tim);
670  } else
671  #endif
672  {
673  Mat().MulElements(A.Mat());
674  }
675 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
MatrixIndexT num_cols_
Definition: cu-matrix.h:728
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
KALDI_ASSERT & A
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:729
void MulRows ( const CuMatrixBase< Real > &  src,
const CuArrayBase< MatrixIndexT > &  indexes 
)

Does for each row r, this.Row(r) *= alpha * src.row(indexes[r]), where '*=' is elementwise multiplication.

If indexes[r] < 0, does not add anything. src.NumCols() must equal this.NumCols()

Definition at line 2757 of file cu-matrix.cc.

References CuArrayBase< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuArrayBase< T >::Dim(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by GeneralDropoutComponent::Backprop(), and GeneralDropoutComponent::Propagate().

2758  {
2759  if (NumRows() == 0) return;
2760  KALDI_ASSERT(static_cast<MatrixIndexT>(indexes.Dim()) == NumRows());
2761 #if HAVE_CUDA == 1
2762  if (CuDevice::Instantiate().Enabled()) {
2763  KALDI_ASSERT(src.NumCols() == NumCols());
2764  CuTimer tim;
2765  dim3 dimGrid, dimBlock;
2766  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2767  &dimGrid, &dimBlock);
2768  cuda_mul_rows(dimGrid, dimBlock,
2769  data_, src.Data(), indexes.Data(), Dim(), src.Stride());
2770  CU_SAFE_CALL(cudaGetLastError());
2771  CuDevice::Instantiate().AccuProfile(__func__, tim);
2772  } else
2773 #endif
2774  {
2775  MatrixBase<Real> &this_mat(Mat());
2776  const MatrixBase<Real> &src_mat(src.Mat());
2777  int32 num_rows = NumRows();
2778  const MatrixIndexT *index_ptr = indexes.Data();
2779  for (int32 r = 0; r < num_rows; r++) {
2780  int32 src_r = index_ptr[r];
2781  if (src_r < 0)
2782  continue;
2783  SubVector<Real> this_row(this_mat, r),
2784  src_row(src_mat, src_r);
2785  this_row.MulElements(src_row);
2786  }
2787  }
2788 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
int32 MatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void MulRowsGroupMat ( const CuMatrixBase< Real > &  src)

divide each row into src.NumCols() groups, and then scale i'th row's jth group of elements by src[i, j].

Definition at line 803 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by MaxoutComponent::Backprop(), and kaldi::UnitTestCuMatrixMulRowsGroupMat().

803  {
804  KALDI_ASSERT(src.NumCols() > 0);
805 #if HAVE_CUDA == 1
806  if (CuDevice::Instantiate().Enabled()) {
807  CuTimer tim;
808  int group_size = this->NumCols() / src.NumCols();
809 
810  dim3 dimGrid, dimBlock;
811  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
812  &dimGrid, &dimBlock);
813 
814  cuda_mul_rows_group_mat(dimGrid, dimBlock, this->data_, src.data_,
815  this->Dim(), src.Stride(), group_size);
816  CU_SAFE_CALL(cudaGetLastError());
817 
818  CuDevice::Instantiate().AccuProfile(__func__, tim);
819  } else
820 #endif
821  {
822  Mat().MulRowsGroupMat(src.Mat());
823  }
824 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void MulRowsVec ( const CuVectorBase< Real > &  scale)

scale i'th row by scale[i]

Definition at line 779 of file cu-matrix.cc.

References data_, CuVectorBase< Real >::data_, CuVectorBase< Real >::Dim(), KALDI_ASSERT, and CuVectorBase< Real >::Vec().

Referenced by BackpropTruncationComponent::Backprop(), ClipGradientComponent::Backprop(), BlockSoftmax::BackpropagateFnc(), LengthNormComponent::BackpropagateFnc(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), kaldi::cu::DiffNormalizePerRow(), Xent::Eval(), Mse::Eval(), kaldi::nnet2::PreconditionDirections(), NnetChainTrainer::ProcessOutputs(), NnetDiscriminativeTrainer::ProcessOutputs(), LengthNormComponent::PropagateFnc(), OnlinePreconditioner::ReorthogonalizeXt1(), OnlineNaturalGradient::ReorthogonalizeXt1(), kaldi::UnitTestCuMatrixMulRowsVec(), AffineTransform::Update(), ConvolutionalComponent::Update(), and kaldi::nnet3::time_height_convolution::ZeroBlankRows().

779  {
780  #if HAVE_CUDA == 1
781  if (CuDevice::Instantiate().Enabled()) {
782  CuTimer tim;
783 
784  KALDI_ASSERT(scale.Dim() == NumRows());
785 
786  dim3 dimGrid, dimBlock;
787  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
788  &dimGrid, &dimBlock);
789 
790  cuda_mul_rows_vec(dimGrid, dimBlock, data_, scale.data_, Dim());
791  CU_SAFE_CALL(cudaGetLastError());
792 
793 
794  CuDevice::Instantiate().AccuProfile(__func__, tim);
795  } else
796  #endif
797  {
798  Mat().MulRowsVec(scale.Vec());
799  }
800 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:215
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:698
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:214
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:720
::MatrixDim Dim() const
Definition: cu-matrix.h:220
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169