CuMatrixBase< Real > Class Template Reference

Matrix for CUDA computing. More...

#include <matrix-common.h>

Inheritance diagram for CuMatrixBase< Real >:
Collaboration diagram for CuMatrixBase< Real >:

Public Member Functions

void CopyCols (const CuMatrixBase< Real > &src, const CuArrayBase< MatrixIndexT > &indexes)
 Copies column r from column indexes[r] of src. More...
 
void AddCols (const CuMatrixBase< Real > &src, const CuArrayBase< MatrixIndexT > &indices)
 Add column indices[r] of src to column r. More...
 
void CopyRows (const CuMatrixBase< Real > &src, const CuArrayBase< MatrixIndexT > &indexes)
 Copies row r from row indexes[r] of src. More...
 
void CopyRows (const CuArrayBase< const Real *> &src)
 Copies row r of this matrix from an array of floats at the location given by src[r], where src[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (the point is: the data it points to should be on the GPU if we're using a GPU, and on a CPU otherwise). More...
 
void CopyToRows (const CuArrayBase< Real *> &dst) const
 For each row r of this matrix, copies it to the array of floats at the location given by dst[r], where dst[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (i.e. More...
 
void AddRows (Real alpha, const CuMatrixBase< Real > &src, const CuArrayBase< MatrixIndexT > &indexes)
 Does for each row r, this.Row(r) += alpha * src.row(indexes[r]). More...
 
void MulRows (const CuMatrixBase< Real > &src, const CuArrayBase< MatrixIndexT > &indexes)
 Does for each row r, this.Row(r) *= alpha * src.row(indexes[r]), where '*=' is elementwise multiplication. More...
 
void AddRows (Real alpha, const CuArrayBase< const Real *> &src)
 Does for each row r, this.Row(r) += alpha * src[r], treating src[r] as the beginning of a region of memory representing a vector of floats, of the same length as this.NumCols(). More...
 
void AddToRows (Real alpha, const CuArrayBase< MatrixIndexT > &indexes, CuMatrixBase< Real > *dst) const
 For each row i of *this, adds this->Row(i) to dst->Row(indexes(i)) if indexes(i) >= 0, else do nothing. More...
 
void AddToRows (Real alpha, const CuArrayBase< Real *> &dst) const
 For each row r of this matrix, adds it (times alpha) to the array of floats at the location given by dst[r], where dst[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (i.e. More...
 
void SumColumnRanges (const CuMatrixBase< Real > &src, const CuArrayBase< Int32Pair > &indexes)
 For each row r of this and for each column c, sets (*this)(r, c) to the sum src(r, j), where j ranges from indexes[c].first through indexes[c].second - 1. More...
 
void AddRowRanges (const CuMatrixBase< Real > &src, const CuArrayBase< Int32Pair > &indexes)
 For each row r of this and for each column c, do (*this)(r, c) += src(j, c), where j ranges from indexes[r].first through indexes[r].second - 1. More...
 
void AddToDiag (Real value)
 Adds "value" to the diagonal elements of the matrix. More...
 
MatrixIndexT NumRows () const
 Dimensions. More...
 
MatrixIndexT NumCols () const
 
MatrixIndexT Stride () const
 
::MatrixDim Dim () const
 
Real FrobeniusNorm () const
 
bool IsUnit (Real tol=0.001) const
 
bool ApproxEqual (const CuMatrixBase< Real > &other, float tol=0.01) const
 True if ((*this)-other).FrobeniusNorm() <= tol * this->FrobeniusNorm() More...
 
MatrixIndexT SizeInBytes () const
 Get size of matrix in bytes. More...
 
template<typename OtherReal >
void CopyFromMat (const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
 
void CopyFromGeneralMat (const GeneralMatrix &src, MatrixTransposeType trans=kNoTrans)
 
void CopyFromMat (const MatrixBase< Real > &src, MatrixTransposeType trans=kNoTrans)
 
void CopyFromSp (const CuSpMatrix< Real > &M)
 
template<typename OtherReal >
void CopyFromTp (const CuTpMatrix< OtherReal > &M, MatrixTransposeType trans=kNoTrans)
 
void CopyRangeFromMatClamped (const CuMatrixBase< Real > &src, int32_t start_range, int32_t end_range, int32_t clamp_low, int32_t clamp_high)
 
template<typename OtherReal >
void CopyFromMat (const CuMatrixBase< OtherReal > &M, MatrixTransposeType trans=kNoTrans)
 
template<typename OtherReal >
void CopyToMat (MatrixBase< OtherReal > *dst, MatrixTransposeType trans=kNoTrans) const
 
void CopyRowsFromVec (const CuVectorBase< Real > &v)
 This function has two modes of operation. More...
 
void CopyRowsFromVec (const VectorBase< Real > &v)
 Version of CopyRowsFromVec() that takes a CPU-based vector. More...
 
void CopyColsFromVec (const CuVectorBase< Real > &v)
 Copies vector into matrix, column-by-column. More...
 
void CopyColFromVec (const CuVectorBase< Real > &v, const MatrixIndexT col)
 Copy vector into specific column of matrix. More...
 
void Sigmoid (const CuMatrixBase< Real > &src)
 Set each element to the sigmoid of the corresponding element of "src": element by element, x = 1 / (1 + exp(-x)) More...
 
void Heaviside (const CuMatrixBase< Real > &src)
 Set each element to the Heaviside function of the corresponding element of "src", which we define as the function (x > 0 ? 1.0 : 0.0) [note: in general, there are different ways to deal with the situation when x==0. More...
 
void Exp (const CuMatrixBase< Real > &src)
 
void Log (const CuMatrixBase< Real > &src)
 
void Pow (const CuMatrixBase< Real > &src, Real power)
 
void PowAbs (const CuMatrixBase< Real > &src, Real power, bool include_sign=false)
 Apply power to the absolute value of each element. More...
 
void Floor (const CuMatrixBase< Real > &src, Real floor_val)
 
void Ceiling (const CuMatrixBase< Real > &src, Real ceiling_val)
 
void ExpLimited (const CuMatrixBase< Real > &src, Real lower_limit, Real upper_limit)
 This is equivalent to running: Floor(src, lower_limit); Ceiling(src, upper_limit); Exp(src) More...
 
void ExpSpecial (const CuMatrixBase< Real > &src)
 For each element x of the matrix, set it to (x < 0 ? exp(x) : x + 1). More...
 
void SoftMaxPerRow (const CuMatrixBase< Real > &src)
 Softmax nonlinearity Y = Softmax(X) : Yij = e^Xij / sum_k(e^Xik), done to each row, with attention to avoiding overflow or underflow. More...
 
void LogSoftMaxPerRow (const CuMatrixBase< Real > &src)
 LogSoftmax nonlinearity Y = LogSoftmax(X) : Yij = Xij - log(sum_k(e^Xik)), done to each row, with attention to avoiding overflow or underflow. More...
 
void SoftHinge (const CuMatrixBase< Real > &src)
 Apply the function y = log(1 + exp(x)), to each element. More...
 
void GroupPnorm (const CuMatrixBase< Real > &src, Real pow)
 Apply the function y(i) = (sum_{j = i*G}^{(i+1)*G-1} x_j ^ (power)) ^ (1 / p) where G = x.NumCols() / y.NumCols() must be an integer. More...
 
void DiffGroupPnorm (const CuMatrixBase< Real > &in_value, const CuMatrixBase< Real > &out_value, const CuMatrixBase< Real > &out_deriv, Real power)
 Differentiate backward through the GroupPnorm function. More...
 
void GroupMax (const CuMatrixBase< Real > &src)
 Apply the function y(i) = (max_{j = i*G}^{(i+1)*G-1} x_j where G = x.NumCols() / y.NumCols() must be an integer. More...
 
void GroupMaxDeriv (const CuMatrixBase< Real > &input, const CuMatrixBase< Real > &output)
 Calculate derivatives for the GroupMax function above, where "input" is the input to the GroupMax function above (i.e. More...
 
void ParametricRelu (const CuMatrixBase< Real > &src, const CuVectorBase< Real > &alpha, const CuVectorBase< Real > &beta)
 Compute the parametric rectified linear unit function; element by element, *this = src * (src > 0 ? alpha : beta) More...
 
void DiffParametricRelu (const CuMatrixBase< Real > &value, const CuMatrixBase< Real > &diff, const CuVectorBase< Real > &alpha, const CuVectorBase< Real > &beta)
 Differentiate backward through the parametric relu function. More...
 
void Tanh (const CuMatrixBase< Real > &src)
 Compute the hyperbolic tangent (tanh) function; element by element, *this = tanh(src). More...
 
void DiffSigmoid (const CuMatrixBase< Real > &value, const CuMatrixBase< Real > &diff)
 Differentiate backward through the sigmoid function. More...
 
void DiffTanh (const CuMatrixBase< Real > &value, const CuMatrixBase< Real > &diff)
 Differentiate backward through the tanh function. More...
 
void DiffSoftmaxPerRow (const CuMatrixBase< Real > &value, const CuMatrixBase< Real > &diff)
 Differentiate backward through the softmax function. More...
 
void DiffLogSoftmaxPerRow (const CuMatrixBase< Real > &out_value, const CuMatrixBase< Real > &out_deriv)
 Differentiate backward through the log softmax function. More...
 
void DiffXent (const CuArrayBase< int32 > &tgt, CuVector< Real > *log_post_tgt)
 Differentiate the block [softmax+cross-entropy] : dE/da = posterior_mat - target_mat, 'E' is error function, 'a' is activation on softmax input. More...
 
void Cholesky (CuMatrixBase< Real > *inv_cholesky=NULL)
 This function does sets *this to the Cholesky factor of *this (i.e. More...
 
void SymInvertPosDef ()
 Inversion for positive definite symmetric matrices. More...
 
void ApplyPow (Real power)
 
void ApplyPowAbs (Real power, bool include_sign=false)
 
void ApplyHeaviside ()
 
void ApplyFloor (Real floor_val)
 
void ApplyCeiling (Real ceiling_val)
 
void ApplyExp ()
 
void ApplyExpLimited (Real lower_limit, Real upper_limit)
 
void ApplyExpSpecial ()
 
void ApplySoftMaxPerRow ()
 
void ApplyLogSoftMaxPerRow ()
 
void ApplyLog ()
 
void FindRowMaxId (CuArray< int32 > *id) const
 Find the id of the maximal element for each row (resizes the 'id' array to the appropriate size). More...
 
void SetZero ()
 Math operations, some calling kernels. More...
 
void Set (Real value)
 
void Add (Real value)
 
void SetZeroAboveDiag ()
 Zeroes all elements for which col > row. More...
 
void Scale (Real value)
 
void MulElements (const CuMatrixBase< Real > &A)
 Multiply two matrices elementwise: C = C .* A. More...
 
void DivElements (const CuMatrixBase< Real > &A)
 Divide two matrices elementwise: C = A ./ A. More...
 
void Max (const CuMatrixBase< Real > &A)
 Do, elementwise, *this = max(*this, A). More...
 
void Min (const CuMatrixBase< Real > &A)
 Do, elementwise, *this = min(*this, A). More...
 
void MulColsVec (const CuVectorBase< Real > &scale)
 scale i'th column by scale[i] More...
 
void MulRowsVec (const CuVectorBase< Real > &scale)
 scale i'th row by scale[i] More...
 
void MulRowsGroupMat (const CuMatrixBase< Real > &src)
 divide each row into src.NumCols() groups, and then scale i'th row's jth group of elements by src[i, j]. More...
 
void DivRowsVec (const CuVectorBase< Real > &div)
 divide i'th row by scale[i] More...
 
void InvertElements ()
 invert the matrix by elements. More...
 
void AddMat (Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType trans=kNoTrans)
 *this += alpha * A More...
 
void AddSmat (Real alpha, const CuSparseMatrix< Real > &A, MatrixTransposeType trans=kNoTrans)
 *this += alpha * A. More...
 
void AddSmatMat (Real alpha, const CuSparseMatrix< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, Real beta)
 (*this) = alpha * op(A) * B + beta * (*this), where A is sparse. More...
 
void AddMatSmat (Real alpha, const CuMatrixBase< Real > &A, const CuSparseMatrix< Real > &B, MatrixTransposeType transB, Real beta)
 (*this) = alpha * A * op(B) + beta * (*this), where B is sparse and op(B) is either B or trans(B) depending on the 'transB' argument. More...
 
void AddToElements (Real alpha, const CuArrayBase< int32 > &elements)
 This is a rather special purpose function; we might generalize it later by adding a transpose-type option. More...
 
void AddMatBlocks (Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType trans=kNoTrans)
 This function is like AddMat (it does *this += alpha * src), except that it supports cases where *this and src have different dimension. More...
 
void AddVecToCols (Real alpha, const CuVectorBase< Real > &col, Real beta=1.0)
 (for each column c of *this), c = alpha * col + beta * c More...
 
void AddVecToRows (Real alpha, const CuVectorBase< Real > &row, Real beta=1.0)
 (for each row r of *this), r = alpha * row + beta * r More...
 
void AddMatMat (Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
 C = alpha * A(^T)*B(^T) + beta * C. More...
 
void AddVecVec (Real alpha, const CuVectorBase< Real > &x, const CuVectorBase< Real > &y)
 A = alpha * x * y^T + A . More...
 
void SetMatMatDivMat (const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, const CuMatrixBase< Real > &C)
 *this = a * b / c (by element; when c = 0, *this = a) *this can be an alias of a, b or c safely and get expected result. More...
 
void SymAddMat2 (const Real alpha, const CuMatrixBase< Real > &M, MatrixTransposeType transA, Real beta)
 *this = beta * *this + alpha * M M^T, for symmetric matrices. More...
 
void AddMatBlock (Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuBlockMatrix< Real > &B, MatrixTransposeType transB, Real beta)
 This function is like AddMatMat but for where the second argument is of type CuBlockMatrix (a block-diagonal matrix of blocks). More...
 
void AddDiagVecMat (const Real alpha, const CuVectorBase< Real > &v, const CuMatrixBase< Real > &M, MatrixTransposeType transM, Real beta=1.0)
 *this = beta * *this + alpha * diag(v) * M [or M^T]. More...
 
void AddMatDiagVec (const Real alpha, const CuMatrixBase< Real > &M, MatrixTransposeType transM, CuVectorBase< Real > &v, Real beta=1.0)
 
void AddMatMatElements (const Real alpha, const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, const Real beta)
 *this = beta * *this + alpha * A .* B (.* element by element multiplication) More...
 
void AddMatSp (const Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuSpMatrix< Real > &B, const Real beta)
 this <– beta*this + alpha*A*B More...
 
void AddSpMat (const Real alpha, const CuSpMatrix< Real > &A, const CuMatrixBase< Real > &B, MatrixTransposeType transB, const Real beta)
 this <– beta*this + alpha*SpA*B More...
 
void AddTpMat (const Real alpha, const CuTpMatrix< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, const Real beta)
 this <– beta*this + alpha*A*B. More...
 
void AddMatTp (const Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuTpMatrix< Real > &B, MatrixTransposeType transB, const Real beta)
 this <– beta*this + alpha*A*B. More...
 
void CopyFromBlock (const CuBlockMatrix< Real > &B, MatrixTransposeType trans=kNoTrans)
 
void CopyLowerToUpper ()
 
void CopyUpperToLower ()
 
CuSubMatrix< Real > Range (const MatrixIndexT row_offset, const MatrixIndexT num_rows, const MatrixIndexT col_offset, const MatrixIndexT num_cols) const
 
CuSubMatrix< Real > RowRange (const MatrixIndexT row_offset, const MatrixIndexT num_rows) const
 
CuSubMatrix< Real > ColRange (const MatrixIndexT col_offset, const MatrixIndexT num_cols) const
 
const CuSubVector< Real > Row (MatrixIndexT i) const
 
CuSubVector< Real > Row (MatrixIndexT i)
 
CuValue< Real > operator() (MatrixIndexT r, MatrixIndexT c)
 
Real operator() (MatrixIndexT r, MatrixIndexT c) const
 
Real Sum () const
 
Real Max () const
 
Real Min () const
 
Real Trace (bool check_square=true) const
 Return the trace. If check_square = true, will crash if matrix is not square. More...
 
void SetRandn ()
 
void SetRandUniform ()
 
void Write (std::ostream &os, bool binary) const
 
void AddElements (Real alpha, const std::vector< MatrixElement< Real > > &input)
 
void AddElements (Real alpha, const CuArrayBase< Int32Pair > &indexes, const Real *input)
 
void Lookup (const std::vector< Int32Pair > &indexes, Real *output) const
 
void Lookup (const CuArrayBase< Int32Pair > &indexes, Real *output) const
 
void EqualElementMask (const CuMatrixBase< Real > &mat, CuMatrix< Real > *mask) const
 
const Real * RowData (MatrixIndexT r) const
 Get raw row pointer (const). More...
 
Real * RowData (MatrixIndexT r)
 Get raw row pointer. More...
 
const Real * Data () const
 Return data pointer (const). More...
 
Real * Data ()
 Return data pointer. More...
 
const MatrixBase< Real > & Mat () const
 
MatrixBase< Real > & Mat ()
 

Protected Member Functions

 CuMatrixBase ()
 
 CuMatrixBase (Real *data, MatrixIndexT num_rows, MatrixIndexT num_cols, MatrixIndexT stride)
 This constructor takes the #rows, #cols and stride; it's called from the constructor of CuSubMatrix. More...
 

Protected Attributes

Real * data_
 GPU data pointer (or regular matrix data pointer,. More...
 
MatrixIndexT num_cols_
 
MatrixIndexT num_rows_
 
MatrixIndexT stride_
 

Private Member Functions

 KALDI_DISALLOW_COPY_AND_ASSIGN (CuMatrixBase)
 

Friends

class CuMatrixBase< float >
 
class CuMatrixBase< double >
 
class CuVectorBase< float >
 
class CuVectorBase< double >
 
class VectorBase< Real >
 
class CuSpMatrix< Real >
 
class CuTpMatrix< float >
 
class CuTpMatrix< double >
 
class CuVectorBase< Real >
 
class CuSubMatrix< Real >
 
class CuRand< Real >
 
class CuSubVector< Real >
 
class CuBlockMatrix< Real >
 
class CuSparseMatrix< float >
 
class CuSparseMatrix< double >
 
class CuSparseMatrix< Real >
 
Real TraceMatMat (const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, MatrixTransposeType trans)
 
Real TraceMatSmat (const CuMatrixBase< Real > &A, const CuSparseMatrix< Real > &B, MatrixTransposeType trans)
 
void AddMatMatBatched (const Real alpha, std::vector< CuSubMatrix< Real > * > &C, const std::vector< CuSubMatrix< Real > * > &A, MatrixTransposeType transA, const std::vector< CuSubMatrix< Real > * > &B, MatrixTransposeType transB, const Real beta)
 Does multiple matrix multiplications, executing them in parallel using cuBLAS's gemmBatched if we are using a GPU. More...
 

Detailed Description

template<typename Real>
class kaldi::CuMatrixBase< Real >

Matrix for CUDA computing.

Does the computation on the CUDA card when CUDA is compiled in and we have a suitable GPU (CuDevice::Instantiate().Enabled() == true); otherwise, does it on the CPU.

Definition at line 69 of file matrix-common.h.

Constructor & Destructor Documentation

◆ CuMatrixBase() [1/2]

CuMatrixBase ( )
inlineprotected

Definition at line 767 of file cu-matrix.h.

767 : data_(NULL), num_cols_(0), num_rows_(0), stride_(0) { }
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT stride_
Definition: cu-matrix.h:787
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ CuMatrixBase() [2/2]

CuMatrixBase ( Real *  data,
MatrixIndexT  num_rows,
MatrixIndexT  num_cols,
MatrixIndexT  stride 
)
inlineprotected

This constructor takes the #rows, #cols and stride; it's called from the constructor of CuSubMatrix.

Definition at line 771 of file cu-matrix.h.

774  :
775  data_(data), num_cols_(num_cols), num_rows_(num_rows), stride_(stride) { }
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT stride_
Definition: cu-matrix.h:787
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

Member Function Documentation

◆ Add()

void Add ( Real  value)

Definition at line 582 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), BackpropTruncationComponent::Backprop(), TanhComponent::Backprop(), LstmNonlinearityComponent::ConsolidateMemory(), kaldi::CuCompressedMatrixTestNonnegative(), kaldi::CuCompressedMatrixTestSymmetric(), GeneralDropoutComponent::GetMemo(), main(), kaldi::MeanVariance(), DropoutMaskComponent::Propagate(), DropoutComponent::Propagate(), ClipGradientComponent::RepairGradients(), TanhComponent::StoreStats(), kaldi::TestCuMatrixCompObjfAndDeriv(), kaldi::nnet3::TestSimpleComponentPropagateProperties(), kaldi::UnitTestCuMatrixAdd(), kaldi::UnitTestCuMatrixAdd2(), kaldi::UnitTestCuMatrixEqualElementMask(), kaldi::UnitTestCuMatrixObjfDeriv(), kaldi::UnitTestCuMatrixSetRandUniform(), and kaldi::UnitTestCuMatrixTraceMatMat().

582  {
583 #if HAVE_CUDA == 1
584  if (CuDevice::Instantiate().Enabled()) {
585  if (num_rows_ == 0) return;
586  CuTimer tim;
587 
588  dim3 dimGrid, dimBlock;
589  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
590  &dimGrid, &dimBlock);
591 
592  cuda_add(dimGrid, dimBlock, data_, value, Dim());
593  CU_SAFE_CALL(cudaGetLastError());
594 
595  CuDevice::Instantiate().AccuProfile(__func__, tim);
596  } else
597  #endif
598  {
599  Mat().Add(value);
600  }
601 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ AddCols()

void AddCols ( const CuMatrixBase< Real > &  src,
const CuArrayBase< MatrixIndexT > &  indices 
)

Add column indices[r] of src to column r.

As a special case, if indexes[i] == -1, skip column i indices.size() must equal this->NumCols(), and src.NumRows() must equal this.NumRows()

Definition at line 2701 of file cu-matrix.cc.

Referenced by Convolutional1dComponent::Backprop(), ConvolutionalComponent::BackpropagateFnc(), ConvolutionComponent::InderivPatchesToInderiv(), and MaxpoolingComponent::InderivPatchesToInderiv().

2702  {
2703 #if HAVE_CUDA == 1
2704  if (CuDevice::Instantiate().Enabled()) {
2705  KALDI_ASSERT(indices.Dim() == NumCols());
2706  KALDI_ASSERT(NumRows() == src.NumRows());
2707  CuTimer tim;
2708  dim3 dimGrid, dimBlock;
2709  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2710  &dimGrid, &dimBlock);
2711  cuda_add_cols(dimGrid, dimBlock, data_, src.Data(), indices.Data(),
2712  Dim(), src.Stride());
2713  CU_SAFE_CALL(cudaGetLastError());
2714  CuDevice::Instantiate().AccuProfile(__func__, tim);
2715  } else
2716 #endif
2717  {
2718  Mat().AddCols(src.Mat(), indices.Data());
2719  }
2720 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ AddDiagVecMat()

void AddDiagVecMat ( const Real  alpha,
const CuVectorBase< Real > &  v,
const CuMatrixBase< Real > &  M,
MatrixTransposeType  transM,
Real  beta = 1.0 
)

*this = beta * *this + alpha * diag(v) * M [or M^T].

The same as adding M but scaling each row M_i by v(i).

Definition at line 1382 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), kaldi::nnet3::attention::ApplyScalesToInput(), kaldi::nnet3::attention::ApplyScalesToOutput(), HiddenSoftmax::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), OnlinePreconditioner::ComputeWt1(), OnlineNaturalGradient::ComputeWt1(), kaldi::cu::DiffNormalizePerRow(), CuMatrixBase< float >::DiffSoftmaxPerRow(), MultiBasisComponent::PropagateFnc(), and kaldi::TestCuMatrixAddDiagVecMat().

1385  {
1386 #if HAVE_CUDA == 1
1387  if (CuDevice::Instantiate().Enabled()) {
1388  if (transM == kNoTrans) {
1389  KALDI_ASSERT(SameDim(*this, M));
1390  } else {
1391  KALDI_ASSERT(M.NumRows() == NumCols() && M.NumCols() == NumRows());
1392  }
1393  KALDI_ASSERT(v.Dim() == this->NumRows());
1394 
1395  CuTimer tim;
1396  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
1397  dim3 dimGrid(n_blocks(num_cols_, CU2DBLOCK),
1398  n_blocks(num_rows_, CU2DBLOCK));
1399  MatrixIndexT M_row_stride = M.Stride(), M_col_stride = 1;
1400  if (transM == kTrans)
1401  std::swap(M_row_stride, M_col_stride);
1402  cuda_add_diag_vec_mat(dimGrid, dimBlock, alpha, data_, Dim(),
1403  v.Data(), M.Data(), M_row_stride, M_col_stride, beta);
1404  CU_SAFE_CALL(cudaGetLastError());
1405  CuDevice::Instantiate().AccuProfile(__func__, tim);
1406  } else
1407 #endif
1408  {
1409  Mat().AddDiagVecMat(alpha, v.Vec(), M.Mat(), transM, beta);
1410  }
1411 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
void swap(basic_filebuf< CharT, Traits > &x, basic_filebuf< CharT, Traits > &y)
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
int32 MatrixIndexT
Definition: matrix-common.h:98
#define CU2DBLOCK
Definition: cu-matrixdim.h:61
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ AddElements() [1/2]

void AddElements ( Real  alpha,
const std::vector< MatrixElement< Real > > &  input 
)

Definition at line 3277 of file cu-matrix.cc.

Referenced by OnlinePreconditioner::InitOrthonormalSpecial(), OnlineNaturalGradient::InitOrthonormalSpecial(), CuMatrixBase< float >::operator()(), DiscriminativeComputation::ProcessPosteriors(), and kaldi::UnitTestCuMatrixAddElements().

3278  {
3279  // Checks the dimension.
3280  MatrixIndexT num_rows = this->num_rows_, num_cols = this->num_cols_;
3281  for (int32 i = 0; i < input.size(); ++i) {
3282  KALDI_ASSERT(input[i].row < num_rows && input[i].row >= 0 &&
3283  input[i].column < num_cols && input[i].column >= 0);
3284  }
3285 #if HAVE_CUDA == 1
3286  if (CuDevice::Instantiate().Enabled()) {
3287  void *addr = CuDevice::Instantiate().Malloc(input.size() * sizeof(MatrixElement<Real>));
3288  CU_SAFE_CALL(cudaMemcpyAsync(addr, input.data(),
3289  input.size() * sizeof(MatrixElement<Real>),
3290  cudaMemcpyHostToDevice, cudaStreamPerThread));
3291 
3292  CuTimer tim;
3293  int dimBlock(CU1DBLOCK);
3294  int dimGrid(n_blocks(input.size(), CU1DBLOCK));
3295 
3296  cuda_matrix_add_elements(dimGrid, dimBlock, this->data_, this->Dim(),
3297  alpha, (MatrixElement<Real>*)addr, input.size());
3298  CU_SAFE_CALL(cudaGetLastError());
3299  CuDevice::Instantiate().Free(addr);
3300  CuDevice::Instantiate().AccuProfile(__func__, tim);
3301  } else
3302 #endif
3303  {
3304  for (int32 i = 0; i < input.size(); i++) {
3305  (*this)(input[i].row, input[i].column) += alpha * input[i].weight;
3306  }
3307  }
3308 }
kaldi::int32 int32
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
int32 MatrixIndexT
Definition: matrix-common.h:98
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ AddElements() [2/2]

void AddElements ( Real  alpha,
const CuArrayBase< Int32Pair > &  indexes,
const Real *  input 
)

Definition at line 3311 of file cu-matrix.cc.

3312  {
3313  if (indexes.Dim() == 0) return;
3314  KALDI_ASSERT(input != NULL);
3315 
3316 #if HAVE_CUDA == 1
3317  if (CuDevice::Instantiate().Enabled()) {
3318  CuTimer tim;
3319  CuVector<Real> tmp_vec(indexes.Dim(), kUndefined);
3320  CU_SAFE_CALL(cudaMemcpyAsync(tmp_vec.Data(), input,
3321  indexes.Dim() * sizeof(Real),
3322  cudaMemcpyHostToDevice, cudaStreamPerThread));
3323 
3324  int dimBlock(CU1DBLOCK);
3325  int dimGrid = n_blocks(indexes.Dim(), CU1DBLOCK);
3326  cuda_matrix_add_indexed_values(dimGrid, dimBlock, this->Dim(), alpha,
3327  indexes.Data(), tmp_vec.Data(), indexes.Dim(), this->data_);
3328  CU_SAFE_CALL(cudaGetLastError());
3329  CuDevice::Instantiate().AccuProfile(__func__, tim);
3330  } else
3331 #endif
3332  {
3333  MatrixIndexT num_rows = this->num_rows_, num_cols = this->num_cols_;
3334  const Int32Pair *index = indexes.Data();
3335  for (int32 i = 0; i < indexes.Dim(); i++) {
3336  KALDI_ASSERT(index[i].first < num_rows && index[i].first >= 0 &&
3337  index[i].second < num_cols && index[i].second >= 0);
3338  (*this)(index[i].first, index[i].second) += alpha * input[i];
3339  }
3340  }
3341 }
kaldi::int32 int32
const T * Data() const
Get raw pointer.
Definition: cu-array.h:52
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
int32 MatrixIndexT
Definition: matrix-common.h:98
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
int32_cuda second
Definition: cu-matrixdim.h:80
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:49
int32_cuda first
Definition: cu-matrixdim.h:79
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ AddMat()

void AddMat ( Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  trans = kNoTrans 
)

*this += alpha * A

Definition at line 954 of file cu-matrix.cc.

Referenced by RestrictedAttentionComponent::Add(), CuRand< float >::AddGaussNoise(), GeneralMatrix::AddToMat(), CuMatrixBase< float >::ApplyLog(), CuMatrixBase< float >::ApproxEqual(), kaldi::nnet3::attention::AttentionBackward(), kaldi::nnet3::attention::AttentionForward(), SigmoidComponent::Backprop(), Splice::BackpropagateFnc(), AveragePoolingComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), LstmNonlinearityComponent::ConsolidateMemory(), kaldi::nnet3::ConstrainOrthonormalInternal(), kaldi::CuCompressedMatrixTestNonnegative(), kaldi::CuCompressedMatrixTestSymmetric(), CuMatrixBase< float >::DiffLogSoftmaxPerRow(), Xent::Eval(), Mse::Eval(), NnetComputer::ExecuteCommand(), AdditiveNoiseComponent::Propagate(), ClipGradientComponent::RepairGradients(), RestrictedAttentionComponent::StoreStats(), kaldi::nnet3::attention::TestAttentionForwardBackward(), NoOpTransform::TrainingBackward(), kaldi::UnitTestCuMatrixAddMat(), kaldi::UnitTestCuMatrixAddMatBlocks1(), kaldi::UnitTestCuMatrixAddMatBlocks1Trans(), kaldi::UnitTestCuMatrixAddMatBlocks2(), kaldi::UnitTestCuMatrixAddMatDiagVec(), kaldi::UnitTestCuMatrixAddMatMatElements(), kaldi::UnitTestLstmNonlinearity(), and kaldi::nnet3::UnitTestNnetInputDerivatives().

955  {
956 
957 #if HAVE_CUDA == 1
958  if (CuDevice::Instantiate().Enabled()) {
959  if (transA == kNoTrans) {
960  KALDI_ASSERT(A.NumRows() == num_rows_ && A.NumCols() == num_cols_);
961  } else {
962  KALDI_ASSERT(A.NumCols() == num_rows_ && A.NumRows() == num_cols_);
963  }
964  if (num_rows_ == 0) return;
965  CuTimer tim;
966  // This block dimension seems to work better than the
967  // one from GetBlockSizesForSimpleMatrixOperation().
968  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
969  dim3 dimGrid(n_blocks(NumCols(), CU2DBLOCK),
970  n_blocks(NumRows(), CU2DBLOCK));
971  cuda_add_mat(dimGrid, dimBlock, alpha, A.data_,
972  data_, Dim(), A.Stride(),
973  (transA == kTrans ? 1 : 0));
974  CU_SAFE_CALL(cudaGetLastError());
975 
976  CuDevice::Instantiate().AccuProfile(__func__, tim);
977  } else
978 #endif
979  {
980  Mat().AddMat(alpha, A.Mat(), transA);
981  }
982 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
#define CU2DBLOCK
Definition: cu-matrixdim.h:61
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ AddMatBlock()

void AddMatBlock ( Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  transA,
const CuBlockMatrix< Real > &  B,
MatrixTransposeType  transB,
Real  beta 
)

This function is like AddMatMat but for where the second argument is of type CuBlockMatrix (a block-diagonal matrix of blocks).

Definition at line 3205 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), and kaldi::UnitTestCuBlockMatrixAddMatBlock().

3209  {
3210  // Check dimensions
3211  int32 A_num_rows = A.NumRows(), A_num_cols = A.NumCols(),
3212  A_row_stride = A.Stride(), A_col_stride = 1,
3213  B_num_rows = B.NumRows(), B_num_cols = B.NumCols();
3214  if (transA == kTrans) {
3215  std::swap(A_num_rows, A_num_cols);
3216  std::swap(A_row_stride, A_col_stride);
3217  }
3218  if (transB == kTrans) {
3219  std::swap(B_num_rows, B_num_cols);
3220  }
3221  // At this point the {A,B}_{rows,cols} variables are
3222  // after any transposition.
3223  KALDI_ASSERT(NumRows() == A_num_rows && NumCols() == B_num_cols);
3224  KALDI_ASSERT(A_num_cols == B_num_rows);
3225  int32 B_num_blocks = B.NumBlocks();
3226 
3227  if (num_rows_ == 0) return;
3228 #if HAVE_CUDA == 1
3229  if (CuDevice::Instantiate().Enabled()) {
3230  CuTimer tim;
3231  MatrixDim this_dim = Dim();
3232 
3233  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
3234  // (x,y) indices will be (row of *this, block of B)
3235  dim3 dimGrid(n_blocks(num_rows_, CU2DBLOCK),
3236  n_blocks(B_num_blocks, CU2DBLOCK));
3237 
3238  // caution: the use of x as the row-index is not good, but
3239  // this code is not much used, so I'm not updating it.a
3240  cuda_add_mat_blockmat(dimGrid, dimBlock, data_, this_dim, A.Data(),
3241  A_num_rows, A_num_cols, A_row_stride, A_col_stride,
3242  B.CuData(), B_num_blocks, alpha, beta,
3243  (transB == kTrans ? 1 : 0));
3244 
3245  CU_SAFE_CALL(cudaGetLastError());
3246 
3247  CuDevice::Instantiate().AccuProfile(__func__, tim);
3248  } else
3249 #endif
3250  {
3251  // "row_offset" and "col_offset" are offsets into B (or into B^T, if
3252  // transB == kTrans).
3253  int32 row_offset = 0, col_offset = 0;
3254  for (int32 b = 0; b < B_num_blocks; b++) {
3255  const CuSubMatrix<Real> this_block = B.Block(b);
3256  int32 this_num_rows = this_block.NumRows(),
3257  this_num_cols = this_block.NumCols();
3258  if (transB == kTrans) std::swap(this_num_rows, this_num_cols);
3259  CuSubMatrix<Real> this_part(*this, 0, num_rows_,
3260  col_offset, this_num_cols);
3261  CuSubMatrix<Real> A_part = (transA == kNoTrans ?
3263  row_offset, this_num_rows) :
3264  CuSubMatrix<Real>(A, row_offset, this_num_rows,
3265  0, num_rows_));
3266  this_part.AddMatMat(alpha, A_part, transA, this_block, transB, beta);
3267  row_offset += this_num_rows;
3268  col_offset += this_num_cols;
3269  }
3270  // Note: the values being compared below are all after applying any
3271  // transposition to B.
3272  KALDI_ASSERT(row_offset == B_num_rows && col_offset == B_num_cols);
3273  }
3274 }
Structure containing size of the matrix plus stride.
Definition: cu-matrixdim.h:46
void swap(basic_filebuf< CharT, Traits > &x, basic_filebuf< CharT, Traits > &y)
kaldi::int32 int32
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
friend class CuSubMatrix< Real >
Definition: cu-matrix.h:90
#define CU2DBLOCK
Definition: cu-matrixdim.h:61
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ AddMatBlocks()

void AddMatBlocks ( Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  trans = kNoTrans 
)

This function is like AddMat (it does *this += alpha * src), except that it supports cases where *this and src have different dimension.

There are two allowed cases:

(1) *this is larger than src; we do a broadcasting operation. *this must have NumRows() == a * src.NumRows() and NumCols() == b * src.NumCols() for integer a >= 1, b >= 1. *this will be treated as a being made up of of blocks with the same size as src, and to each block we'll add alpha * src. This case does not support trans == kTrans.

(2) *this is smaller than src; we sum. src.NumRows() must == a * this->NumRows(), and src.NumCols() must == b * this->NumCols(), for a >= 1, b >= 1. In this case, src will be treated as being made up of blocks with the same size as *this, and to *this we will add the summation of all of those blocks.

Definition at line 1119 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), SumBlockComponent::Backprop(), SumBlockComponent::Propagate(), kaldi::UnitTestCuMatrixAddMatBlocks1(), kaldi::UnitTestCuMatrixAddMatBlocks1Trans(), kaldi::UnitTestCuMatrixAddMatBlocks2(), ConvolutionComponent::Update(), and Convolutional1dComponent::Update().

1120  {
1121  if (num_rows_ == 0 || num_cols_ == 0) return;
1122 
1123  if (A.NumRows() >= (transA == kNoTrans ? num_rows_ : num_cols_) &&
1124  A.NumCols() >= (transA == kNoTrans ? num_cols_ : num_rows_)) {
1125  // This is the "summing", not broadcasting, version of AddMatBlocks.
1126  // It supports both regular and transposed operation.
1127  int32 num_row_blocks, num_col_blocks;
1128  if (transA == kNoTrans) {
1129  KALDI_ASSERT(A.NumRows() % num_rows_ == 0 && A.NumCols() % num_cols_ == 0);
1130  num_row_blocks = A.Mat().NumRows() / num_rows_;
1131  num_col_blocks = A.Mat().NumCols() / num_cols_;
1132  } else {
1133  KALDI_ASSERT(A.NumRows() % num_cols_ == 0 && A.NumCols() % num_rows_ == 0);
1134  num_row_blocks = A.Mat().NumRows() / num_cols_;
1135  num_col_blocks = A.Mat().NumCols() / num_rows_;
1136  }
1137 #if HAVE_CUDA == 1
1138  if (CuDevice::Instantiate().Enabled()) {
1139  CuTimer tim;
1140  dim3 dimGrid, dimBlock;
1141  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1142  &dimGrid, &dimBlock);
1143  cuda_add_mat_blocks(dimGrid, dimBlock, alpha, A.data_, num_row_blocks,
1144  num_col_blocks, data_, Dim(), A.Stride(),
1145  (transA == kTrans ? 1 : 0));
1146  CU_SAFE_CALL(cudaGetLastError());
1147 
1148  CuDevice::Instantiate().AccuProfile(__func__, tim);
1149  } else
1150 #endif
1151  {
1152  int32 nr, nc;
1153  if (transA == kNoTrans) {
1154  nr = num_rows_;
1155  nc = num_cols_;
1156  } else {
1157  nr = num_cols_;
1158  nc = num_rows_;
1159  }
1160  for (int32 i = 0; i < num_row_blocks; i++) {
1161  for (int32 j = 0; j < num_col_blocks; j++) {
1162  Mat().AddMat(alpha, SubMatrix<Real>(A.Mat(), i * nr, nr, j * nc, nc),
1163  transA);
1164  }
1165  }
1166  }
1167  } else {
1168  // This is the "broadcasting" version of AddMatBlocks, where
1169  // *this is larger than src.
1170  if (transA != kNoTrans)
1171  KALDI_ERR << "Transposed operation not supported currently.";
1172  if (!(num_rows_ % A.NumRows() == 0 && num_cols_ % A.NumCols() == 0))
1173  KALDI_ERR << "Invalid sizes of arguments";
1174 #if HAVE_CUDA == 1
1175  if (CuDevice::Instantiate().Enabled()) {
1176  CuTimer tim;
1177  dim3 dimGrid, dimBlock;
1178  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1179  &dimGrid, &dimBlock);
1180  cuda_add_mat_repeated(dimGrid, dimBlock, alpha,
1181  A.data_, A.Dim(), data_, Dim());
1182  CU_SAFE_CALL(cudaGetLastError());
1183  CuDevice::Instantiate().AccuProfile(__func__, tim);
1184  } else
1185 #endif
1186  {
1187  const MatrixBase<Real> &src_mat = A.Mat(),
1188  &this_mat = this->Mat();
1189  for (int32 row_offset = 0; row_offset < NumRows();
1190  row_offset += src_mat.NumRows()) {
1191  for (int32 col_offset = 0; col_offset < NumCols();
1192  col_offset += src_mat.NumCols()) {
1193  SubMatrix<Real> this_part(this_mat,
1194  row_offset, src_mat.NumRows(),
1195  col_offset, src_mat.NumCols());
1196  this_part.AddMat(alpha, src_mat);
1197  }
1198  }
1199  }
1200  }
1201 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
kaldi::int32 int32
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
#define KALDI_ERR
Definition: kaldi-error.h:147
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ AddMatDiagVec()

void AddMatDiagVec ( const Real  alpha,
const CuMatrixBase< Real > &  M,
MatrixTransposeType  transM,
CuVectorBase< Real > &  v,
Real  beta = 1.0 
)

Definition at line 1415 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), BatchNormComponent::Backprop(), LstmNonlinearityComponent::ConsolidateMemory(), SigmoidComponent::RepairGradients(), and TanhComponent::RepairGradients().

1419  {
1420 #if HAVE_CUDA == 1
1421  if (CuDevice::Instantiate().Enabled()) {
1422  if (transM == kNoTrans) {
1423  KALDI_ASSERT(SameDim(*this, M));
1424  } else {
1425  KALDI_ASSERT(M.NumRows() == NumCols() && M.NumCols() == NumRows());
1426  }
1427  KALDI_ASSERT(v.Dim() == this->NumCols());
1428 
1429  CuTimer tim;
1430  dim3 dimGrid, dimBlock;
1431  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1432  &dimGrid, &dimBlock);
1433  MatrixIndexT M_row_stride = M.Stride(), M_col_stride = 1;
1434  if (transM == kTrans) std::swap(M_row_stride, M_col_stride);
1435  cuda_add_mat_diag_vec(dimGrid, dimBlock, alpha, data_, Dim(),
1436  M.Data(), M_row_stride, M_col_stride, v.Data(), beta);
1437  CU_SAFE_CALL(cudaGetLastError());
1438  CuDevice::Instantiate().AccuProfile(__func__, tim);
1439  } else
1440 #endif
1441  {
1442  Mat().AddMatDiagVec(alpha, M.Mat(), transM, v.Vec(), beta);
1443  }
1444 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
void swap(basic_filebuf< CharT, Traits > &x, basic_filebuf< CharT, Traits > &y)
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
int32 MatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ AddMatMat()

void AddMatMat ( Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  transA,
const CuMatrixBase< Real > &  B,
MatrixTransposeType  transB,
Real  beta 
)

C = alpha * A(^T)*B(^T) + beta * C.

Definition at line 1291 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::AddMatBlock(), CuBlockMatrix< Real >::AddMatMat(), CuMatrixBase< float >::AddMatSp(), CuMatrixBase< float >::AddMatTp(), CuMatrixBase< float >::AddSpMat(), CuMatrixBase< float >::AddTpMat(), CuMatrixBase< float >::ApplyLog(), TdnnComponent::Backprop(), RepeatedAffineComponent::Backprop(), AffineComponent::Backprop(), LinearComponent::Backprop(), FixedLinearComponent::Backprop(), FixedAffineComponent::Backprop(), LinearTransform::BackpropagateFnc(), AffineTransform::BackpropagateFnc(), RecurrentComponent::BackpropagateFnc(), ConvolutionalComponent::BackpropagateFnc(), LstmProjected::BackpropagateFnc(), BlstmProjected::BackpropagateFnc(), ModelCollapser::CollapseComponentsAffine(), OnlinePreconditioner::ComputeWt1(), OnlineNaturalGradient::ComputeWt1(), LstmNonlinearityComponent::ConsolidateMemory(), kaldi::nnet3::ConstrainOrthonormalInternal(), kaldi::nnet3::time_height_convolution::ConvolveBackwardDataInternal(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParamsInternal(), kaldi::nnet3::time_height_convolution::ConvolveForwardInternal(), kaldi::CuVectorUnitTestAddDiagMatMat(), OnlinePreconditioner::InitOrthonormalSpecial(), kaldi::nnet2::PreconditionDirections(), OnlinePreconditioner::PreconditionDirectionsInternal(), OnlineNaturalGradient::PreconditionDirectionsInternal(), TdnnComponent::Propagate(), AffineComponent::Propagate(), LinearComponent::Propagate(), DctComponent::Propagate(), FixedLinearComponent::Propagate(), FixedAffineComponent::Propagate(), KlHmm::PropagateFnc(), LinearTransform::PropagateFnc(), AffineTransform::PropagateFnc(), RecurrentComponent::PropagateFnc(), Rbm::PropagateFnc(), LstmProjected::PropagateFnc(), BlstmProjected::PropagateFnc(), Rbm::Reconstruct(), OnlineNaturalGradient::ReorthogonalizeRt1(), OnlinePreconditioner::ReorthogonalizeXt1(), kaldi::TestCuMatrixMatMat(), kaldi::UnitTestCuBlockMatrixAddMatMat(), kaldi::UnitTestCuCholesky(), kaldi::UnitTestCuMatrixAddMatMat(), kaldi::UnitTestCuMatrixSymAddMat2(), kaldi::UnitTestCuMatrixSymInvertPosDef(), kaldi::UnitTestCuSpMatrixInvert(), BlockAffineComponentPreconditioned::Update(), TdnnComponent::UpdateSimple(), and BlockAffineComponent::UpdateSimple().

1293  {
1294 
1295 
1296  // CUBLAS is col-major, cudamatrix is row-major, how to do the mapping?
1297  // keep trans..., just swap A&B matrices: A->B B->A
1298  MatrixIndexT m = ((transB==kTrans)? B.NumRows() : B.NumCols());
1299  MatrixIndexT n = ((transA==kTrans)? A.NumCols() : A.NumRows());
1300  MatrixIndexT k = ((transB==kTrans)? B.NumCols() : B.NumRows());
1301  MatrixIndexT k1 = ((transA==kTrans)? A.NumRows() : A.NumCols());
1302 
1303  KALDI_ASSERT(m == NumCols());
1304  KALDI_ASSERT(n == NumRows());
1305  KALDI_ASSERT(k == k1);
1306 
1307  if (m == 0) return;
1308 
1309 
1310 #if HAVE_CUDA == 1
1311  if (CuDevice::Instantiate().Enabled()) {
1312  CuTimer tim;
1313  CUBLAS_SAFE_CALL(cublas_gemm(GetCublasHandle(),
1314  (transB==kTrans? CUBLAS_OP_T:CUBLAS_OP_N),
1315  (transA==kTrans? CUBLAS_OP_T:CUBLAS_OP_N),
1316  m, n, k, alpha, B.data_, B.Stride(),
1317  A.data_, A.Stride(), beta, data_, Stride()));
1318 
1319  CuDevice::Instantiate().AccuProfile(__func__, tim);
1320  } else
1321 #endif
1322  {
1323  Mat().AddMatMat(alpha, A.Mat(), transA, B.Mat(), transB, beta);
1324  }
1325 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
MatrixIndexT Stride() const
Definition: cu-matrix.h:217
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
int32 MatrixIndexT
Definition: matrix-common.h:98
struct rnnlm::@11::@12 n
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ AddMatMatElements()

void AddMatMatElements ( const Real  alpha,
const CuMatrixBase< Real > &  A,
const CuMatrixBase< Real > &  B,
const Real  beta 
)

*this = beta * *this + alpha * A .* B (.* element by element multiplication)

Definition at line 1447 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), StatisticsExtractionComponent::Backprop(), LstmNonlinearityComponent::ConsolidateMemory(), StatisticsPoolingComponent::Propagate(), and kaldi::UnitTestCuMatrixSetMatMatDivMat().

1448  {
1449 #if HAVE_CUDA == 1
1450  if (CuDevice::Instantiate().Enabled()) {
1451  KALDI_ASSERT(SameDim(*this, A) && SameDim(A, B));
1452  CuTimer tim;
1453  dim3 dimGrid, dimBlock;
1454  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1455  &dimGrid, &dimBlock);
1456  cuda_add_mat_mat_elements(dimGrid, dimBlock, this->data_, A.Data(),
1457  B.Data(), Dim(), A.Stride(), B.Stride(), alpha, beta);
1458  CuDevice::Instantiate().AccuProfile(__func__, tim);
1459  } else
1460 #endif
1461  {
1462  Mat().AddMatMatElements(alpha, A.Mat(), B.Mat(), beta);
1463  }
1464 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ AddMatSmat()

void AddMatSmat ( Real  alpha,
const CuMatrixBase< Real > &  A,
const CuSparseMatrix< Real > &  B,
MatrixTransposeType  transB,
Real  beta 
)

(*this) = alpha * A * op(B) + beta * (*this), where B is sparse and op(B) is either B or trans(B) depending on the 'transB' argument.

This is multiplication of a dense by a sparse matrix. See also AddSmatMat.

Definition at line 1080 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), and kaldi::UnitTextCuMatrixAddMatSmat().

1082  {
1083 #if HAVE_CUDA == 1
1084  if (CuDevice::Instantiate().Enabled()) {
1085  if (transB == kNoTrans) {
1086  KALDI_ASSERT(NumRows() == A.NumRows());
1087  KALDI_ASSERT(NumCols() == B.NumCols());
1088  KALDI_ASSERT(A.NumCols() == B.NumRows());
1089  } else {
1090  KALDI_ASSERT(NumRows() == A.NumRows());
1091  KALDI_ASSERT(NumCols() == B.NumRows());
1092  KALDI_ASSERT(A.NumCols() == B.NumCols());
1093  }
1094 
1095  CuTimer tim;
1096 
1097  cusparseMatDescr_t descr;
1098  CUSPARSE_SAFE_CALL(cusparseCreateMatDescr(&descr));
1099  CU_SAFE_CALL(
1100  cusparse_csrmm(
1101  GetCusparseHandle(),
1102  transB == kNoTrans ?
1103  CUSPARSE_OPERATION_TRANSPOSE : CUSPARSE_OPERATION_NON_TRANSPOSE,
1104  B.NumRows(), NumRows(), B.NumCols(), B.NumElements(), &alpha, descr,
1105  B.CsrVal(), B.CsrRowPtr(), B.CsrColIdx(), A.Data(), A.Stride(),
1106  &beta, Data(), Stride()));
1107  CUSPARSE_SAFE_CALL(cusparseDestroyMatDescr(descr));
1108 
1109  CuDevice::Instantiate().AccuProfile(__func__, tim);
1110  } else
1111 #endif
1112  {
1113  Mat().AddMatSmat(alpha, A.Mat(), B.Smat(), transB, beta);
1114  }
1115 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
MatrixIndexT Stride() const
Definition: cu-matrix.h:217
const Real * Data() const
Return data pointer (const).
Definition: cu-matrix.h:746
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ AddMatSp()

void AddMatSp ( const Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  transA,
const CuSpMatrix< Real > &  B,
const Real  beta 
)
inline

this <– beta*this + alpha*A*B

Definition at line 614 of file cu-matrix.h.

617  {
618  CuMatrix<Real> M(B);
619  return AddMatMat(alpha, A, transA, M, kNoTrans, beta);
620  }
void AddMatMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
C = alpha * A(^T)*B(^T) + beta * C.
Definition: cu-matrix.cc:1291

◆ AddMatTp()

void AddMatTp ( const Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  transA,
const CuTpMatrix< Real > &  B,
MatrixTransposeType  transB,
const Real  beta 
)
inline

this <– beta*this + alpha*A*B.

Definition at line 641 of file cu-matrix.h.

Referenced by kaldi::UnitTestCuMatrixAddMatTp().

644  {
645  CuMatrix<Real> M(B);
646  return AddMatMat(alpha, A, transA, M, transB, beta);
647  }
void AddMatMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
C = alpha * A(^T)*B(^T) + beta * C.
Definition: cu-matrix.cc:1291

◆ AddRowRanges()

void AddRowRanges ( const CuMatrixBase< Real > &  src,
const CuArrayBase< Int32Pair > &  indexes 
)

For each row r of this and for each column c, do (*this)(r, c) += src(j, c), where j ranges from indexes[r].first through indexes[r].second - 1.

In general indexes must be >= 0 and < src.NumRows(); but to represent an empty range you may use the pair (-1, -1) or any pair of numbers (i, j) such that i >= j.

Definition at line 2931 of file cu-matrix.cc.

Referenced by StatisticsPoolingComponent::Backprop(), NnetComputer::ExecuteCommand(), StatisticsPoolingComponent::Propagate(), and kaldi::UnitTestCuMatrixAddRowRanges().

2932  {
2933  KALDI_ASSERT(static_cast<MatrixIndexT>(indexes.Dim()) == NumRows());
2934  KALDI_ASSERT(src.NumCols() == NumCols());
2935  if (NumRows() == 0) return;
2936 #if HAVE_CUDA == 1
2937  if (CuDevice::Instantiate().Enabled()) {
2938  CuTimer tim;
2939  dim3 dimGrid, dimBlock;
2940  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2941  &dimGrid, &dimBlock);
2942  cuda_add_row_ranges(dimGrid, dimBlock,
2943  data_, Dim(), src.Data(), src.Dim(), indexes.Data());
2944  CU_SAFE_CALL(cudaGetLastError());
2945  CuDevice::Instantiate().AccuProfile(__func__, tim);
2946  } else
2947 #endif
2948  { // Implement here for the CPU..
2949  int32 num_rows = this->num_rows_, num_cols = this->num_cols_,
2950  this_stride = this->stride_, src_stride = src.stride_;
2951  Real *data = this->data_;
2952  const Real *src_data = src.data_;
2953  const Int32Pair *indexes_data = indexes.Data();
2954  for (int32 row = 0; row < num_rows; row++) {
2955  int32 start_row = indexes_data[row].first,
2956  end_row = indexes_data[row].second;
2957  for (int32 col = 0; col < num_cols; col++) {
2958  Real sum = 0.0;
2959  for (int32 src_row = start_row; src_row < end_row; src_row++)
2960  sum += src_data[src_row * src_stride + col];
2961  data[row * this_stride + col] += sum;
2962  }
2963  }
2964  }
2965 }
kaldi::int32 int32
const T * Data() const
Get raw pointer.
Definition: cu-array.h:52
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT stride_
Definition: cu-matrix.h:787
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
int32_cuda second
Definition: cu-matrixdim.h:80
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:49
int32_cuda first
Definition: cu-matrixdim.h:79
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ AddRows() [1/2]

void AddRows ( Real  alpha,
const CuMatrixBase< Real > &  src,
const CuArrayBase< MatrixIndexT > &  indexes 
)

Does for each row r, this.Row(r) += alpha * src.row(indexes[r]).

If indexes[r] < 0, does not add anything. src.NumCols() must equal this.NumCols()

Definition at line 2766 of file cu-matrix.cc.

Referenced by StatisticsExtractionComponent::Backprop(), and NnetComputer::ExecuteCommand().

2768  {
2769  if (NumRows() == 0) return;
2770 #if HAVE_CUDA == 1
2771  if (CuDevice::Instantiate().Enabled()) {
2772  KALDI_ASSERT(static_cast<MatrixIndexT>(indexes.Dim()) == NumRows());
2773  KALDI_ASSERT(src.NumCols() == NumCols());
2774  CuTimer tim;
2775  dim3 dimGrid, dimBlock;
2776  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2777  &dimGrid, &dimBlock);
2778  cuda_add_rows(dimGrid, dimBlock, alpha,
2779  data_, src.Data(), indexes.Data(), Dim(), src.Stride());
2780  CU_SAFE_CALL(cudaGetLastError());
2781  CuDevice::Instantiate().AccuProfile(__func__, tim);
2782  } else
2783 #endif
2784  {
2785  Mat().AddRows(alpha, src.Mat(), indexes.Data());
2786  }
2787 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ AddRows() [2/2]

void AddRows ( Real  alpha,
const CuArrayBase< const Real *> &  src 
)

Does for each row r, this.Row(r) += alpha * src[r], treating src[r] as the beginning of a region of memory representing a vector of floats, of the same length as this.NumCols().

Definition at line 2826 of file cu-matrix.cc.

2826  {
2827  if (NumRows() == 0) return;
2828 #if HAVE_CUDA == 1
2829  if (CuDevice::Instantiate().Enabled()) {
2830  KALDI_ASSERT(static_cast<MatrixIndexT>(src.Dim()) == NumRows());
2831  CuTimer tim;
2832  dim3 dimGrid, dimBlock;
2833  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2834  &dimGrid, &dimBlock);
2835  cuda_add_rows(dimGrid, dimBlock, alpha, data_, src.Data(), Dim());
2836  CU_SAFE_CALL(cudaGetLastError());
2837  CuDevice::Instantiate().AccuProfile(__func__, tim);
2838  } else
2839 #endif
2840  {
2841  Mat().AddRows(alpha, src.Data());
2842  }
2843 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ AddSmat()

void AddSmat ( Real  alpha,
const CuSparseMatrix< Real > &  A,
MatrixTransposeType  trans = kNoTrans 
)

*this += alpha * A.

Definition at line 985 of file cu-matrix.cc.

Referenced by GeneralMatrix::AddToMat(), CuMatrixBase< float >::ApplyLog(), and kaldi::UnitTextCuMatrixAddSmat().

986  {
987 #if HAVE_CUDA == 1
988  if (CuDevice::Instantiate().Enabled()) {
989  if (trans == kNoTrans) {
990  KALDI_ASSERT(NumRows() == A.NumRows());
991  KALDI_ASSERT(NumCols() == A.NumCols());
992  } else {
993  KALDI_ASSERT(NumRows() == A.NumCols());
994  KALDI_ASSERT(NumCols() == A.NumRows());
995  }
996 
997  CuTimer tim;
998 
999  // We use warpSize threads per row to access only the nonzero elements.
1000  // Every CU1DBLOCK/warpSize rows share one thread block.
1001  // 1D grid to cover all rows of A.
1002  const int warpSize = 32;
1003  dim3 dimBlock(warpSize, CU1DBLOCK / warpSize);
1004  dim3 dimGrid(n_blocks(A.NumRows(), dimBlock.y));
1005 
1006  if (trans == kNoTrans) {
1007  cuda_add_smat(dimGrid, dimBlock, Data(), Dim(), alpha, A.CsrRowPtr(),
1008  A.CsrColIdx(), A.CsrVal());
1009  } else {
1010  cuda_add_smat_trans(dimGrid, dimBlock, Data(), Dim(), alpha,
1011  A.CsrRowPtr(), A.CsrColIdx(), A.CsrVal());
1012  }
1013 
1014  CU_SAFE_CALL(cudaGetLastError());
1015  CuDevice::Instantiate().AccuProfile(__func__, tim);
1016  } else
1017 #endif
1018  {
1019  Mat().AddSmat(alpha, A.Smat(), trans);
1020  }
1021 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
const Real * Data() const
Return data pointer (const).
Definition: cu-matrix.h:746
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ AddSmatMat()

void AddSmatMat ( Real  alpha,
const CuSparseMatrix< Real > &  A,
MatrixTransposeType  transA,
const CuMatrixBase< Real > &  B,
Real  beta 
)

(*this) = alpha * op(A) * B + beta * (*this), where A is sparse.

Multiplication of sparse with dense matrix. See also AddMatSmat. Note: we recommend, for greatest efficiency, that transA be kNoTrans. Use AddMatSmat() for better efficiency, as 2 dense mat transpose ops are called in this API.

Definition at line 1024 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), and kaldi::UnitTextCuMatrixAddSmatMat().

1026  {
1027 #if HAVE_CUDA == 1
1028  if (CuDevice::Instantiate().Enabled()) {
1029  if (transA == kNoTrans) {
1030  KALDI_ASSERT(NumRows() == A.NumRows());
1031  KALDI_ASSERT(NumCols() == B.NumCols());
1032  KALDI_ASSERT(A.NumCols() == B.NumRows());
1033  } else {
1034  KALDI_ASSERT(NumRows() == A.NumCols());
1035  KALDI_ASSERT(NumCols() == B.NumCols());
1036  KALDI_ASSERT(A.NumRows() == B.NumRows());
1037  }
1038 
1039  CuTimer tim;
1040 
1041  // We have op(A) and BT in col-major (B in row-major).
1042  // We first compute C in col-major (CT in row-major)
1043  // with C = op(A) * BT^T by cusparse_csrmm2,
1044  // then transpose CT to get C in row-major
1045  CuMatrix<Real> CT(*this, kTrans);
1046 
1047  cusparseMatDescr_t descr;
1048  CUSPARSE_SAFE_CALL(cusparseCreateMatDescr(&descr));
1049  if (transA == kTrans) {
1050  // Note: only op(A)=A is supported if op(B)=B^T according to cusparse doc
1051  // http://docs.nvidia.com/cuda/cusparse/index.html#cusparse-lt-t-gt-csrmm2
1053  CU_SAFE_CALL(
1054  cusparse_csrmm2(GetCusparseHandle(), CUSPARSE_OPERATION_NON_TRANSPOSE,
1055  CUSPARSE_OPERATION_TRANSPOSE, AT.NumRows(),
1056  CT.NumRows(), AT.NumCols(), AT.NumElements(), &alpha,
1057  descr, AT.CsrVal(), AT.CsrRowPtr(), AT.CsrColIdx(),
1058  B.Data(), B.Stride(), &beta, CT.Data(), CT.Stride()));
1059  } else {
1060  CU_SAFE_CALL(
1061  cusparse_csrmm2(GetCusparseHandle(), CUSPARSE_OPERATION_NON_TRANSPOSE,
1062  CUSPARSE_OPERATION_TRANSPOSE, A.NumRows(),
1063  CT.NumRows(), A.NumCols(), A.NumElements(), &alpha,
1064  descr, A.CsrVal(), A.CsrRowPtr(), A.CsrColIdx(),
1065  B.Data(), B.Stride(), &beta, CT.Data(), CT.Stride()));
1066  }
1067  CUSPARSE_SAFE_CALL(cusparseDestroyMatDescr(descr));
1068 
1069  this->CopyFromMat(CT, kTrans);
1070 
1071  CuDevice::Instantiate().AccuProfile(__func__, tim);
1072  } else
1073 #endif
1074  {
1075  Mat().AddSmatMat(alpha, A.Smat(), transA, B.Mat(), beta);
1076  }
1077 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
void CopyFromMat(const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:344
friend class CuSparseMatrix< Real >
Definition: cu-matrix.h:96
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ AddSpMat()

void AddSpMat ( const Real  alpha,
const CuSpMatrix< Real > &  A,
const CuMatrixBase< Real > &  B,
MatrixTransposeType  transB,
const Real  beta 
)
inline

this <– beta*this + alpha*SpA*B

Definition at line 623 of file cu-matrix.h.

626  {
627  CuMatrix<Real> M(A);
628  return AddMatMat(alpha, M, kNoTrans, B, transB, beta);
629  }
void AddMatMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
C = alpha * A(^T)*B(^T) + beta * C.
Definition: cu-matrix.cc:1291

◆ AddToDiag()

void AddToDiag ( Real  value)

Adds "value" to the diagonal elements of the matrix.

The matrix *this does not have to be square.

Definition at line 604 of file cu-matrix.cc.

Referenced by kaldi::nnet3::ConstrainOrthonormalInternal(), kaldi::nnet2::PreconditionDirections(), kaldi::TestCuMatrixCholesky(), and kaldi::UnitTestCuMatrixAddToDiag().

604  {
605 #if HAVE_CUDA == 1
606  if (CuDevice::Instantiate().Enabled()) {
607  if (num_rows_ == 0) return;
608  CuTimer tim;
609  // We'll create a fake matrix with "num_diag" rows, one
610  // columnn, and a stride of "this_stride". The y-value of
611  // the grid/blocks corresponds to the row, in this kernel.
612  MatrixIndexT num_diag = std::min(num_rows_, num_cols_),
613  this_stride = stride_ + 1;
614  dim3 dimBlock(1, CU1DBLOCK);
615  dim3 dimGrid(1, n_blocks(num_diag, CU1DBLOCK));
616  ::MatrixDim d = { num_diag, 1, this_stride };
617  cuda_add(dimGrid, dimBlock, data_, value, d);
618  CU_SAFE_CALL(cudaGetLastError());
619 
620  CuDevice::Instantiate().AccuProfile(__func__, tim);
621  } else
622  #endif
623  {
624  Mat().AddToDiag(value);
625  }
626 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Structure containing size of the matrix plus stride.
Definition: cu-matrixdim.h:46
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT stride_
Definition: cu-matrix.h:787
int32 MatrixIndexT
Definition: matrix-common.h:98
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ AddToElements()

void AddToElements ( Real  alpha,
const CuArrayBase< int32 > &  elements 
)

This is a rather special purpose function; we might generalize it later by adding a transpose-type option.

It expects 'elements.Dim()' to equal NumRows(), and for each elements[i] to be either -1, or 0 <= element[i] < NumCols(). It adds alpha to each element (*this)(i, elements[i]) for 0 <= i < NumRows().

Definition at line 3344 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), and kaldi::UnitTestCuMatrixAddToElements().

3344  {
3345  KALDI_ASSERT(elements.Dim() == NumRows());
3346 #if HAVE_CUDA == 1
3347  if (CuDevice::Instantiate().Enabled()) {
3348  CuTimer tim;
3349 
3350  dim3 dimBlock(CU1DBLOCK);
3351  dim3 dimGrid(n_blocks(NumRows(), CU1DBLOCK));
3352 
3353  cuda_matrix_add_to_elements(dimGrid, dimBlock, alpha, data_, Dim(), elements.Data());
3354  CU_SAFE_CALL(cudaGetLastError());
3355  CuDevice::Instantiate().AccuProfile(__func__, tim);
3356  } else
3357 #endif
3358  {
3359  MatrixBase<Real> &this_mat = this->Mat();
3360  const int32* row_to_col = elements.Data();
3361  for (int32 r = 0; r < this_mat.NumRows(); r++) {
3362  KALDI_ASSERT(row_to_col[r] >= -1);
3363  if (row_to_col[r] >= 0)
3364  this_mat(r, row_to_col[r]) += alpha;
3365  }
3366  }
3367 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
kaldi::int32 int32
const T * Data() const
Get raw pointer.
Definition: cu-array.h:52
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:49

◆ AddToRows() [1/2]

void AddToRows ( Real  alpha,
const CuArrayBase< MatrixIndexT > &  indexes,
CuMatrixBase< Real > *  dst 
) const

For each row i of *this, adds this->Row(i) to dst->Row(indexes(i)) if indexes(i) >= 0, else do nothing.

Requires that all the indexes[i] that are >= 0 be distinct, otherwise the behavior is undefined.

Definition at line 2869 of file cu-matrix.cc.

Referenced by NnetComputer::ExecuteCommand(), and kaldi::UnitTestCuMatrixAddToRows().

2871  {
2872  if (NumRows() == 0) return;
2873 #if HAVE_CUDA == 1
2874  if (CuDevice::Instantiate().Enabled()) {
2875  KALDI_ASSERT(static_cast<MatrixIndexT>(indexes.Dim()) == NumRows());
2876  KALDI_ASSERT(dst->NumCols() == NumCols());
2877  CuTimer tim;
2878  dim3 dimGrid, dimBlock;
2879  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2880  &dimGrid, &dimBlock);
2881  cuda_add_to_rows(dimGrid, dimBlock, alpha, dst->Data(), data_, indexes.Data(), Dim(), dst->Stride());
2882  CU_SAFE_CALL(cudaGetLastError());
2883  CuDevice::Instantiate().AccuProfile(__func__, tim);
2884  } else
2885 #endif
2886  {
2887  Mat().AddToRows(alpha, indexes.Data(), &(dst->Mat()));
2888  }
2889 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ AddToRows() [2/2]

void AddToRows ( Real  alpha,
const CuArrayBase< Real *> &  dst 
) const

For each row r of this matrix, adds it (times alpha) to the array of floats at the location given by dst[r], where dst[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (i.e.

it should point to memory on the GPU if we're using a GPU, or on the CPU otherwise). If dst[r] is NULL, does not do anything for that row. Requires that none of the memory regions pointed to by the pointers in "dst" overlap (e.g. none of the pointers should be the same).

Definition at line 2847 of file cu-matrix.cc.

2848  {
2849  if (NumRows() == 0) return;
2850 #if HAVE_CUDA == 1
2851  if (CuDevice::Instantiate().Enabled()) {
2852  KALDI_ASSERT(static_cast<MatrixIndexT>(dst.Dim()) == NumRows());
2853  CuTimer tim;
2854  dim3 dimGrid, dimBlock;
2855  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2856  &dimGrid, &dimBlock);
2857  cuda_add_to_rows(dimGrid, dimBlock, alpha, dst.Data(), data_, Dim());
2858  CU_SAFE_CALL(cudaGetLastError());
2859  CuDevice::Instantiate().AccuProfile(__func__, tim);
2860  } else
2861 #endif
2862  {
2863  Mat().AddToRows(alpha, dst.Data());
2864  }
2865 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ AddTpMat()

void AddTpMat ( const Real  alpha,
const CuTpMatrix< Real > &  A,
MatrixTransposeType  transA,
const CuMatrixBase< Real > &  B,
MatrixTransposeType  transB,
const Real  beta 
)
inline

this <– beta*this + alpha*A*B.

Definition at line 632 of file cu-matrix.h.

Referenced by kaldi::UnitTestCuMatrixAddTpMat().

635  {
636  CuMatrix<Real> M(A);
637  return AddMatMat(alpha, M, transA, B, transB, beta);
638  }
void AddMatMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
C = alpha * A(^T)*B(^T) + beta * C.
Definition: cu-matrix.cc:1291

◆ AddVecToCols()

void AddVecToCols ( Real  alpha,
const CuVectorBase< Real > &  col,
Real  beta = 1.0 
)

(for each column c of *this), c = alpha * col + beta * c

Definition at line 1232 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), KlHmm::PropagateFnc(), and kaldi::UnitTestCuMatrixAddVecToCols().

1234  {
1235  if (col.Dim() != NumRows()) {
1236  KALDI_ERR << "Non matching dimensions: Rows:" << NumRows() << " VectorDim:" << col.Dim();
1237  }
1238 
1239  #if HAVE_CUDA == 1
1240  if (CuDevice::Instantiate().Enabled()) {
1241  CuTimer tim;
1242  dim3 dimGrid, dimBlock;
1243  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1244  &dimGrid, &dimBlock);
1245  cuda_add_vec_to_cols(dimGrid, dimBlock, alpha, col.data_, beta,
1246  data_, Dim());
1247  CU_SAFE_CALL(cudaGetLastError());
1248 
1249  CuDevice::Instantiate().AccuProfile(__func__, tim);
1250  } else
1251  #endif
1252  {
1253  if (beta != 1.0) Mat().Scale(beta);
1254  Mat().AddVecToCols(alpha, col.Vec());
1255  }
1256 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
#define KALDI_ERR
Definition: kaldi-error.h:147
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ AddVecToRows()

void AddVecToRows ( Real  alpha,
const CuVectorBase< Real > &  row,
Real  beta = 1.0 
)

(for each row r of *this), r = alpha * row + beta * r

Definition at line 1261 of file cu-matrix.cc.

Referenced by DecodableNnetLoopedOnlineBase::AdvanceChunk(), DecodableNnetSimpleLooped::AdvanceChunk(), CuMatrixBase< float >::ApplyLog(), BatchNormComponent::Backprop(), SimpleSentenceAveragingComponent::BackpropagateFnc(), ScaleAndOffsetComponent::BackpropInternal(), NnetBatchComputer::Compute(), DecodableNnet2Online::ComputeForFrame(), DecodableNnetSimple::DoNnetComputation(), SingleUtteranceNnet2DecoderThreaded::ProcessLoglikes(), ConvolutionComponent::Propagate(), BatchNormComponent::Propagate(), FixedAffineComponent::Propagate(), FixedBiasComponent::Propagate(), PerElementOffsetComponent::Propagate(), Convolutional1dComponent::Propagate(), SimpleSentenceAveragingComponent::PropagateFnc(), AffineTransform::PropagateFnc(), RecurrentComponent::PropagateFnc(), Rbm::PropagateFnc(), ConvolutionalComponent::PropagateFnc(), AddShift::PropagateFnc(), ScaleAndOffsetComponent::PropagateInternal(), Rbm::Reconstruct(), SigmoidComponent::RepairGradients(), RectifiedLinearComponent::RepairGradients(), PdfPrior::SubtractOnLogpost(), kaldi::UnitTestCuMatrixAddVecToRows(), and SentenceAveragingComponent::Update().

1263  {
1264  if (row.Dim() != NumCols()) {
1265  KALDI_ERR << "Non matching dimensions: Cols:" << NumCols() << " VectorDim:" << row.Dim();
1266  }
1267 #if HAVE_CUDA == 1
1268  if (CuDevice::Instantiate().Enabled()) {
1269  CuTimer tim;
1270  dim3 dimGrid, dimBlock;
1271  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1272  &dimGrid, &dimBlock);
1273  cuda_add_vec_to_rows(dimGrid, dimBlock, alpha, row.data_, beta, data_, Dim());
1274  CU_SAFE_CALL(cudaGetLastError());
1275 
1276  CuDevice::Instantiate().AccuProfile(__func__, tim);
1277  } else
1278 #endif
1279  {
1280  if (beta != 1.0) Mat().Scale(beta);
1281  Mat().AddVecToRows(alpha, row.Vec());
1282  }
1283 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
#define KALDI_ERR
Definition: kaldi-error.h:147
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ AddVecVec()

void AddVecVec ( Real  alpha,
const CuVectorBase< Real > &  x,
const CuVectorBase< Real > &  y 
)

A = alpha * x * y^T + A .

Definition at line 1329 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), and kaldi::UnitTestCuMatrixAddVecVec().

1330  {
1331 
1332  MatrixIndexT m = y.Dim();
1333  MatrixIndexT n = x.Dim();
1334  KALDI_ASSERT(m == NumCols());
1335  KALDI_ASSERT(n == NumRows());
1336 
1337 #if HAVE_CUDA == 1
1338  if (CuDevice::Instantiate().Enabled()) {
1339  CuTimer tim;
1340  CUBLAS_SAFE_CALL(cublas_ger(GetCublasHandle(), m, n, alpha,
1341  y.Data(), 1, x.Data(), 1, data_, Stride()));
1342 
1343  CuDevice::Instantiate().AccuProfile(__func__, tim);
1344  } else
1345 #endif
1346  {
1347  Mat().AddVecVec(alpha, x.Vec(), y.Vec());
1348  }
1349 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
MatrixIndexT Stride() const
Definition: cu-matrix.h:217
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
int32 MatrixIndexT
Definition: matrix-common.h:98
struct rnnlm::@11::@12 n
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ ApplyCeiling()

void ApplyCeiling ( Real  ceiling_val)
inline

Definition at line 455 of file cu-matrix.h.

Referenced by ClipGradientComponent::Backprop(), RecurrentComponent::BackpropagateFnc(), and kaldi::UnitTestCuMatrixApplyCeiling().

455  {
456  this -> Ceiling(*this, ceiling_val);
457  };
void Ceiling(const CuMatrixBase< Real > &src, Real ceiling_val)
Definition: cu-matrix.cc:2601

◆ ApplyExp()

void ApplyExp ( )
inline

Definition at line 459 of file cu-matrix.h.

Referenced by DiscriminativeComputation::Compute(), CuMatrixBase< float >::DiffLogSoftmaxPerRow(), and kaldi::UnitTestCuMatrixApplyExp().

459  {
460  this -> Exp(*this);
461  };
void Exp(const CuMatrixBase< Real > &src)
Definition: cu-matrix.cc:2456

◆ ApplyExpLimited()

void ApplyExpLimited ( Real  lower_limit,
Real  upper_limit 
)
inline

Definition at line 464 of file cu-matrix.h.

Referenced by kaldi::UnitTestCuMatrixApplyExpLimited().

464  {
465  this -> ExpLimited(*this, lower_limit, upper_limit);
466  };
void ExpLimited(const CuMatrixBase< Real > &src, Real lower_limit, Real upper_limit)
This is equivalent to running: Floor(src, lower_limit); Ceiling(src, upper_limit); Exp(src) ...
Definition: cu-matrix.cc:2541

◆ ApplyExpSpecial()

void ApplyExpSpecial ( )
inline

Definition at line 468 of file cu-matrix.h.

Referenced by kaldi::UnitTestCuMatrixApplyExpSpecial().

468  {
469  this -> ExpSpecial(*this);
470  };
void ExpSpecial(const CuMatrixBase< Real > &src)
For each element x of the matrix, set it to (x < 0 ? exp(x) : x + 1).
Definition: cu-matrix.cc:2563

◆ ApplyFloor()

◆ ApplyHeaviside()

◆ ApplyLog()

◆ ApplyLogSoftMaxPerRow()

void ApplyLogSoftMaxPerRow ( )
inline

Definition at line 476 of file cu-matrix.h.

476  {
477  this -> LogSoftMaxPerRow(*this);
478  };
void LogSoftMaxPerRow(const CuMatrixBase< Real > &src)
LogSoftmax nonlinearity Y = LogSoftmax(X) : Yij = Xij - log(sum_k(e^Xik)), done to each row...
Definition: cu-matrix.cc:1740

◆ ApplyPow()

◆ ApplyPowAbs()

void ApplyPowAbs ( Real  power,
bool  include_sign = false 
)
inline

Definition at line 443 of file cu-matrix.h.

Referenced by PowerComponent::Backprop(), PowerComponent::Propagate(), ClipGradientComponent::RepairGradients(), and kaldi::UnitTestCuMatrixApplyPowAbs().

443  {
444  this -> PowAbs(*this, power, include_sign);
445  };
void PowAbs(const CuMatrixBase< Real > &src, Real power, bool include_sign=false)
Apply power to the absolute value of each element.
Definition: cu-matrix.cc:2521

◆ ApplySoftMaxPerRow()

void ApplySoftMaxPerRow ( )
inline

Definition at line 472 of file cu-matrix.h.

472  {
473  this -> SoftMaxPerRow(*this);
474  };
void SoftMaxPerRow(const CuMatrixBase< Real > &src)
Softmax nonlinearity Y = Softmax(X) : Yij = e^Xij / sum_k(e^Xik), done to each row, with attention to avoiding overflow or underflow.
Definition: cu-matrix.cc:1717

◆ ApproxEqual()

bool ApproxEqual ( const CuMatrixBase< Real > &  other,
float  tol = 0.01 
) const

True if ((*this)-other).FrobeniusNorm() <= tol * this->FrobeniusNorm()

Definition at line 2137 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::FrobeniusNorm(), kaldi::UnitTestCuCholesky(), and kaldi::UnitTestCuCopy().

2138  {
2139  CuMatrix<Real> diff(*this);
2140  diff.AddMat(-1.0, other);
2141  return (diff.FrobeniusNorm() <= tol * (*this).FrobeniusNorm());
2142 }

◆ Ceiling()

void Ceiling ( const CuMatrixBase< Real > &  src,
Real  ceiling_val 
)

Definition at line 2601 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyCeiling(), and CuMatrixBase< float >::SizeInBytes().

2601  {
2602  KALDI_ASSERT(SameDim(*this, src));
2603 #if HAVE_CUDA == 1
2604  if (CuDevice::Instantiate().Enabled()) {
2605  CuTimer tim;
2606  dim3 dimGrid, dimBlock;
2607  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2608  &dimGrid, &dimBlock);
2609  cuda_ceiling(dimGrid, dimBlock, this->data_, src.data_, ceiling_val, this->Dim(), src.Stride());
2610  CU_SAFE_CALL(cudaGetLastError());
2611  CuDevice::Instantiate().AccuProfile(__func__, tim);
2612  } else
2613 #endif
2614  {
2615  Mat().Ceiling(src.Mat(), ceiling_val);
2616  }
2617 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ Cholesky()

void Cholesky ( CuMatrixBase< Real > *  inv_cholesky = NULL)

This function does sets *this to the Cholesky factor of *this (i.e.

the C satisfying *this = C C^T), and sets "inv_cholesky" (if supplied) to its inverse. *this is treated as a symmetric matrix but only the lower triangle is accessed.

Definition at line 1987 of file cu-matrix.cc.

Referenced by CuTpMatrix< Real >::Cholesky(), CuMatrixBase< float >::Cholesky(), CuMatrixBase< float >::SizeInBytes(), kaldi::TestCuMatrixCholesky(), kaldi::UnitTestCholesky(), and kaldi::UnitTestCuCholesky().

1987  {
1988  KALDI_ASSERT(this->NumRows() == this->NumCols());
1989  const int32 block_size = 64; // We can tune this.
1990 #if HAVE_CUDA == 1
1991  bool have_gpu = CuDevice::Instantiate().Enabled();
1992 #else
1993  bool have_gpu = false;
1994 #endif
1995  if (this->NumRows() == 0) {
1996  return;
1997  }
1998  if (inv_cholesky == NULL && this->NumRows() >= block_size * 2 && have_gpu) {
1999  // Even if the user did not request the inverse Cholesky, for large enough
2000  // matrices (on GPUs) it's going to be more efficient to compute it anyway
2001  // as the recursion depends on it.
2002  CuMatrix<Real> inv(this->NumRows(), this->NumCols());
2003  Cholesky(&inv);
2004  return;
2005  }
2006  if (this->NumRows() <= block_size || inv_cholesky == NULL || !have_gpu) {
2007  // Don't recurse: compute the Cholesky (and inverse Cholesky, if requested)
2008  // directly, on the CPu.
2009  int32 dim = this->NumRows();
2010  CuSpMatrix<Real> this_sp(dim, kUndefined);
2011  this_sp.CopyFromMat(*this, kTakeLower);
2012  SpMatrix<Real> this_sp_cpu(this_sp);
2013  TpMatrix<Real> C_cpu(dim);
2014  C_cpu.Cholesky(this_sp_cpu);
2015  CuTpMatrix<Real> C(C_cpu);
2016  this->CopyFromTp(C);
2017  if (inv_cholesky != NULL) {
2018  C_cpu.Invert(); // Get inverse Cholesky on CPU.
2019  C.CopyFromTp(C_cpu);
2020  inv_cholesky->CopyFromTp(C); // Copy inverse Cholesky from CPU.
2021  }
2022  return;
2023  }
2024  // At this point, if none of the other cases apply, we recurse.
2025 
2026  // The selection of dim1 is a heuristic. We could also just take half.
2027  int32 tot_dim = this->NumRows();
2028  int32 dim1;
2029  // Break it up into a whole number of blocks, for better memory alignment.
2030  // The line below, setting dim1 can be decided on a heuristic basis: from
2031  // the point of view of correctness, it can really be any value
2032  // 0 < dim1 < tot_dim.
2033  dim1 = block_size * std::max<int32>(1, tot_dim / (2 * block_size));
2034 
2035  int32 dim2 = tot_dim - dim1;
2036  CuSubMatrix<Real> this_11(*this, 0, dim1, 0, dim1),
2037  this_12(*this, 0, dim1, dim1, dim2),
2038  this_21(*this, dim1, dim2, 0, dim1),
2039  this_22(*this, dim1, dim2, dim1, dim2);
2040  CuSubMatrix<Real> inv_11(*inv_cholesky, 0, dim1, 0, dim1),
2041  inv_12(*inv_cholesky, 0, dim1, dim1, dim2),
2042  inv_21(*inv_cholesky, dim1, dim2, 0, dim1),
2043  inv_22(*inv_cholesky, dim1, dim2, dim1, dim2);
2044  /*
2045  Here is the math on block-wise Cholesky. We'll use a Matlab-like notation for blocks of a matrix,
2046  e.g. [ A B; C D ], and also for transposes, e.g. A' is the transpose of A.
2047  Let A be the input matrix; we want to compute both its Cholesky L and its inverse Cholesky, which
2048  we'll call M.
2049  OK. let L = [ L11 0; L21 L22 ] be the Cholesky factor of A.
2050  We have A = L L' = [ L11 0; L21 L22 ] * [ L11' L21'; 0 L22' ]. Multiplying it out,
2051  if A = [ A11 A12; A21 A22 ]; then
2052  A11 = L11 L11', A21 = L21 L11', A22 = L21 L21' + L22 L22', and A12 = A21'.
2053 
2054  We also want an expression for the inverse of L (we call this M).
2055  If M = [ M11 0; M21 M22 ], then it's not hard to see that
2056  M11 = inv(L11), M22 = inv(L22).
2057  We can work out M21 as follows. We know that [ L11 0; L21 L22 ] [ M11 0; M21 M22 ] = [ I 0; 0 I ].
2058  Considering the zero on the bottom of the rhs, we have: L21 M11 + L22 M21 = 0, which gives us:
2059  M21 = - L22^{-1} L21 M11 = - M22 L21 M11.
2060 
2061  Next, we want expressions for L21 and L22. From the equation A21 = L21 L11', we have:
2062  L21 = A21 inv(L11') = A21 M11'
2063  We can compute L22 and M22 recursively by doing Cholesky (and computing the inverse Cholesky)
2064  on the quantity T = (A22 - L21 L21'). [we give it the name T just for easy reference.]
2065 
2066  Computationally, we do this as follows:
2067  (1) Recurse to get L11 and M11.
2068  (2) Compute L21 = A21 M11'
2069  (3) Compute T = A22 - L21 L21'
2070  (4) Recurse on T to get L22 and M22.
2071  (5) Compute M21 = -M22 L21 M11.
2072  Next, we have to consider the in-place nature of the computation, since L overwrites A
2073  [M has its own storage, in "inv_cholesky"].
2074  We address this here:
2075  (1) is in-place [L11 replaces A11, M11 has its own storage].
2076  (2) L21 gets written where M21 belongs.
2077  (3) T replaces A22.
2078  (4) is in-place [L22 replaces T where A22 was, M22 has its own storage]
2079  (5):(a) we first compute the transpose of (L21 M11) is done in the upper part of A/L,
2080  where A12 or L12 would be. Define a temporary expression
2081  U = (L21 M11)' = M11' L21'; this goes where A12 or L12 would be.
2082  (b) copy L21 to where it should be, in *this.
2083  (c) Compute M21 = -M22 U', in the correct place for M21.
2084  (d) zero L12 and M12. */
2085 
2086  // (1) compute L11 and M11.
2087  this_11.Cholesky(&inv_11);
2088  // (2) compute L21 = A21 M11'. For now it's in the "wrong place", where M21 should be.
2089  inv_21.AddMatMat(1.0, this_21, kNoTrans, inv_11, kTrans, 0.0);
2090  // (3) compute T = A22 - L21 L21'. Note: only the lower triangle of T will be valid, but
2091  // that's OK because Cholesky will ignore the upper part.
2092  this_22.SymAddMat2(-1.0, inv_21, kNoTrans, 1.0);
2093  // (4) Recurse to compute L22 and M22.
2094  this_22.Cholesky(&inv_22);
2095  // (5)(a) compute U = M11' L21'. We use the storage of this_12 for this. Note that L21 is
2096  // currently where M21 should be.
2097  this_12.AddMatMat(1.0, inv_11, kTrans, inv_21, kTrans, 0.0);
2098  // (5)(b) copy L21 to where it should be.
2099  this_21.CopyFromMat(inv_21);
2100  // (5)(c) compute M21 = -M22 U'.
2101  inv_21.AddMatMat(-1.0, inv_22, kNoTrans, this_12, kTrans, 0.0);
2102  // (5)(d) zero L12 and M12.
2103  this_12.SetZero();
2104  inv_12.SetZero();
2105 
2106 }
friend class CuSpMatrix< Real >
Definition: cu-matrix.h:86
kaldi::int32 int32
friend class CuSubMatrix< Real >
Definition: cu-matrix.h:90
void Cholesky(CuMatrixBase< Real > *inv_cholesky=NULL)
This function does sets *this to the Cholesky factor of *this (i.e.
Definition: cu-matrix.cc:1987
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
void CopyFromTp(const CuTpMatrix< OtherReal > &M, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:280
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ ColRange()

CuSubMatrix<Real> ColRange ( const MatrixIndexT  col_offset,
const MatrixIndexT  num_cols 
) const
inline

Definition at line 665 of file cu-matrix.h.

Referenced by ConvolutionComponent::Backprop(), StatisticsExtractionComponent::Backprop(), StatisticsPoolingComponent::Backprop(), MaxpoolingComponent::Backprop(), BlockAffineComponent::Backprop(), Convolutional1dComponent::Backprop(), MaxPoolingComponent::BackpropagateFnc(), AveragePoolingComponent::BackpropagateFnc(), BlockSoftmax::BackpropagateFnc(), ParallelComponent::BackpropagateFnc(), SentenceAveragingComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), ConvolutionalComponent::BackpropagateFnc(), BlstmProjected::BackpropagateFnc(), MultiTaskLoss::Eval(), ConvolutionComponent::Propagate(), StatisticsExtractionComponent::Propagate(), StatisticsPoolingComponent::Propagate(), MaxpoolingComponent::Propagate(), BlockAffineComponent::Propagate(), Convolutional1dComponent::Propagate(), MaxPoolingComponent::PropagateFnc(), AveragePoolingComponent::PropagateFnc(), BlockSoftmax::PropagateFnc(), FramePoolingComponent::PropagateFnc(), ParallelComponent::PropagateFnc(), SentenceAveragingComponent::PropagateFnc(), ConvolutionalComponent::PropagateFnc(), MultiBasisComponent::PropagateFnc(), BlstmProjected::PropagateFnc(), kaldi::UnitTestLstmNonlinearity(), ConvolutionComponent::Update(), FramePoolingComponent::Update(), SentenceAveragingComponent::Update(), ConvolutionalComponent::Update(), NaturalGradientRepeatedAffineComponent::Update(), and Convolutional1dComponent::Update().

666  {
667  return CuSubMatrix<Real>(*this, 0, num_rows_, col_offset, num_cols);
668  }
friend class CuSubMatrix< Real >
Definition: cu-matrix.h:90
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ CopyColFromVec()

void CopyColFromVec ( const CuVectorBase< Real > &  v,
const MatrixIndexT  col 
)

Copy vector into specific column of matrix.

Definition at line 2414 of file cu-matrix.cc.

Referenced by kaldi::cu::NormalizePerRow(), StatisticsExtractionComponent::Propagate(), DropoutMaskComponent::Propagate(), CuMatrixBase< float >::SizeInBytes(), NaturalGradientRepeatedAffineComponent::Update(), and TimeHeightConvolutionComponent::UpdateNaturalGradient().

2415  {
2416  KALDI_ASSERT(v.Dim() == num_rows_ &&
2417  static_cast<UnsignedMatrixIndexT>(col) <
2418  static_cast<UnsignedMatrixIndexT>(num_cols_));
2419 #if HAVE_CUDA == 1
2420  if (CuDevice::Instantiate().Enabled()) {
2421  CuTimer tim;
2422  cublas_copy(GetCublasHandle(),
2423  v.Dim(), v.Data(), 1,
2424  this->data_ + col, this->stride_);
2425  CU_SAFE_CALL(cudaGetLastError());
2426  CuDevice::Instantiate().AccuProfile(__func__, tim);
2427  } else
2428 #endif
2429  {
2430  Mat().CopyColFromVec(v.Vec(), col);
2431  }
2432 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
uint32 UnsignedMatrixIndexT
MatrixIndexT stride_
Definition: cu-matrix.h:787
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ CopyCols()

void CopyCols ( const CuMatrixBase< Real > &  src,
const CuArrayBase< MatrixIndexT > &  indexes 
)

Copies column r from column indexes[r] of src.

As a special case, if indexes[i] == -1, sets column i to zero indexes.size() must equal this->NumCols(), and src.NumRows() must equal this.NumRows()

Definition at line 2656 of file cu-matrix.cc.

Referenced by SumGroupComponent::Backprop(), PermuteComponent::Backprop(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParamsInternal(), kaldi::nnet3::time_height_convolution::ConvolveForwardInternal(), ConvolutionComponent::InputToInputPatches(), MaxpoolingComponent::InputToInputPatches(), PermuteComponent::Propagate(), Convolutional1dComponent::Propagate(), and Convolutional1dComponent::Update().

2657  {
2658 #if HAVE_CUDA == 1
2659  if (CuDevice::Instantiate().Enabled()) {
2660  KALDI_ASSERT(indices.Dim() == NumCols());
2661  KALDI_ASSERT(NumRows() == src.NumRows());
2662  CuTimer tim;
2663  dim3 dimGrid, dimBlock;
2664  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2665  &dimGrid, &dimBlock);
2666  cuda_copy_cols(dimGrid, dimBlock, data_, src.Data(), indices.Data(), Dim(), src.Stride());
2667  CU_SAFE_CALL(cudaGetLastError());
2668  CuDevice::Instantiate().AccuProfile(__func__, tim);
2669  } else
2670 #endif
2671  {
2672  Mat().CopyCols(src.Mat(), indices.Data());
2673  }
2674 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ CopyColsFromVec()

void CopyColsFromVec ( const CuVectorBase< Real > &  v)

Copies vector into matrix, column-by-column.

Note that rv.Dim() must either equal NumRows()*NumCols() or NumRows(); this has two modes of operation.

Definition at line 2376 of file cu-matrix.cc.

Referenced by DropoutComponent::Propagate(), CuMatrixBase< float >::SizeInBytes(), and kaldi::UnitTestCuMatrixCopyColsFromVec().

2376  {
2377 #if HAVE_CUDA == 1
2378  if (CuDevice::Instantiate().Enabled()) {
2379  CuTimer tim;
2380  if (rv.Dim() == num_rows_ * num_cols_) {
2381  // treat rv as a matrix of the size (num_cols x num_rows_)
2382  // and use transposed copy to fill *this
2383  // see CuMatrixBase<Real>::CopyFromMat() for more detail of the impl
2384  MatrixDim rv_dim = { num_cols_, num_rows_, num_rows_ };
2385  const int32 warpSize = 32;
2386  dim3 dimBlock(warpSize, CU1DBLOCK / warpSize);
2387  dim3 dimGrid(n_blocks(rv_dim.cols, warpSize),
2388  n_blocks(rv_dim.rows, warpSize));
2389  cuda_copy_from_mat_trans(dimGrid, dimBlock, data_, rv.Data(), Dim(),
2390  rv_dim);
2391  CU_SAFE_CALL(cudaGetLastError());
2392  } else if (rv.Dim() == num_rows_) {
2393  // use 2D block (8x32) and large enough grid to cover matrix *this
2394  // dimBlock.x need to be at least warpSize for coalesced memory access.
2395  const int32 warpSize = 32;
2396  dim3 dimBlock(warpSize, CU1DBLOCK / warpSize);
2397  dim3 dimGrid(n_blocks(num_cols_, dimBlock.x),
2398  n_blocks(num_rows_, dimBlock.y));
2399  cuda_copy_cols_from_vec(dimGrid, dimBlock, Data(), Dim(), rv.Data());
2400  CU_SAFE_CALL(cudaGetLastError());
2401  } else {
2402  KALDI_ERR<< "Wrong sized arguments";
2403  }
2404  CuDevice::Instantiate().AccuProfile(__func__, tim);
2405  } else
2406 #endif
2407  {
2408  Mat().CopyColsFromVec(rv.Vec());
2409  }
2410 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
int32_cuda rows
Definition: cu-matrixdim.h:47
Structure containing size of the matrix plus stride.
Definition: cu-matrixdim.h:46
kaldi::int32 int32
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
#define KALDI_ERR
Definition: kaldi-error.h:147
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
int32_cuda cols
Definition: cu-matrixdim.h:48
const Real * Data() const
Return data pointer (const).
Definition: cu-matrix.h:746
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ CopyFromBlock()

void CopyFromBlock ( const CuBlockMatrix< Real > &  B,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 161 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::AddMatTp().

162  {
163  this->SetZero();
164  if (trans == kNoTrans) {
165  KALDI_ASSERT(NumRows() == B.NumRows() && NumCols() == B.NumCols());
166  int32 row_offset = 0, col_offset = 0;
167  for (int32 b = 0; b < B.NumBlocks(); b++) {
168  const CuMatrixBase<Real> &block = B.Block(b);
169  int32 num_rows = block.NumRows(), num_cols = block.NumCols();
170  CuSubMatrix<Real> this_block(*this, row_offset, num_rows,
171  col_offset, num_cols);
172  this_block.CopyFromMat(block);
173  row_offset += num_rows;
174  col_offset += num_cols;
175  }
176  KALDI_ASSERT(row_offset == NumRows() && col_offset == NumCols());
177  } else {
178  KALDI_ASSERT(NumRows() == B.NumCols() && NumCols() == B.NumRows());
179  int32 row_offset = 0, col_offset = 0;
180  for (int32 b = 0; b < B.NumBlocks(); b++) {
181  const CuMatrixBase<Real> &block = B.Block(b);
182  int32 num_rows = block.NumCols(), num_cols = block.NumRows();
183  CuSubMatrix<Real> this_block(*this, row_offset, num_rows,
184  col_offset, num_cols);
185  this_block.CopyFromMat(block, kTrans);
186  row_offset += num_rows;
187  col_offset += num_cols;
188  }
189  KALDI_ASSERT(row_offset == NumRows() && col_offset == NumCols());
190  }
191 }
kaldi::int32 int32
friend class CuSubMatrix< Real >
Definition: cu-matrix.h:90
void SetZero()
Math operations, some calling kernels.
Definition: cu-matrix.cc:509
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ CopyFromGeneralMat()

void CopyFromGeneralMat ( const GeneralMatrix src,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 3096 of file cu-matrix.cc.

Referenced by NnetComputer::AcceptInputs(), kaldi::nnet3::ComputeObjectiveFunction(), and CuMatrixBase< float >::SizeInBytes().

3097  {
3098  switch (src.Type()) {
3099  case kFullMatrix: {
3100  const Matrix<BaseFloat> &src_full_mat = src.GetFullMatrix();
3101  this->CopyFromMat(src_full_mat, trans);
3102  return;
3103  }
3104  case kCompressedMatrix: {
3105  Matrix<BaseFloat> mat;
3106  src.GetMatrix(&mat);
3107  this->CopyFromMat(mat, trans);
3108  return;
3109  }
3110  case kSparseMatrix: {
3111  const SparseMatrix<BaseFloat> &smat = src.GetSparseMatrix();
3112 #if HAVE_CUDA == 1
3113  if (CuDevice::Instantiate().Enabled()) {
3114  // only take this branch if we're actually using CUDA, or it would
3115  // entail a wasteful copy of the sparse matrix.
3116  CuSparseMatrix<BaseFloat> cu_smat(smat);
3117  cu_smat.CopyToMat(this, trans);
3118  return;
3119  }
3120 #endif
3121  smat.CopyToMat(&(Mat()), trans);
3122  return;
3123  }
3124  default:
3125  KALDI_ERR << "Invalid GeneralMatrix type.";
3126  }
3127 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
void CopyFromMat(const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:344
#define KALDI_ERR
Definition: kaldi-error.h:147

◆ CopyFromMat() [1/3]

void CopyFromMat ( const MatrixBase< OtherReal > &  src,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 344 of file cu-matrix.cc.

Referenced by kaldi::nnet3::attention::AttentionForward(), ElementwiseProductComponent::Backprop(), BatchNormComponent::Backprop(), BackpropTruncationComponent::Backprop(), TanhComponent::Backprop(), PowerComponent::Backprop(), RectifiedLinearComponent::Backprop(), ScaleComponent::Backprop(), GeneralDropoutComponent::Backprop(), SpecAugmentTimeMaskComponent::Backprop(), FixedScaleComponent::Backprop(), FixedBiasComponent::Backprop(), NoOpComponent::Backprop(), ClipGradientComponent::Backprop(), PerElementScaleComponent::Backprop(), PerElementOffsetComponent::Backprop(), Softmax::BackpropagateFnc(), HiddenSoftmax::BackpropagateFnc(), BlockSoftmax::BackpropagateFnc(), ParallelComponent::BackpropagateFnc(), SentenceAveragingComponent::BackpropagateFnc(), LengthNormComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), Dropout::BackpropagateFnc(), AddShift::BackpropagateFnc(), Rescale::BackpropagateFnc(), ScaleAndOffsetComponent::BackpropInternal(), BlockAffineComponent::BlockAffineComponent(), NnetOnlineComputer::Compute(), LstmNonlinearityComponent::ConsolidateMemory(), kaldi::nnet3::ConstrainOrthonormal(), kaldi::nnet3::time_height_convolution::ConvolveForwardSimple(), CuMatrixBase< float >::CopyFromBlock(), CuBlockMatrix< Real >::CopyFromMat(), GeneralMatrix::CopyToMat(), CuMatrixBase< float >::DiffLogSoftmaxPerRow(), CuMatrixBase< float >::DiffSoftmaxPerRow(), NnetComputer::ExecuteCommand(), NnetRescaler::FormatInput(), NnetBatchComputer::FormatInputs(), kaldi::nnet3::attention::GetAttentionDotProducts(), GeneralDropoutComponent::GetMemo(), main(), kaldi::nnet2::NnetComputation(), kaldi::nnet2::NnetComputationChunked(), kaldi::cu::NormalizePerRow(), CuMatrix< float >::operator=(), kaldi::nnet2::PreconditionDirections(), OnlinePreconditionerSimple::PreconditionDirections(), OnlineNaturalGradientSimple::PreconditionDirections(), kaldi::nnet2::PreconditionDirectionsAlphaRescaled(), DropoutComponent::Propagate(), ElementwiseProductComponent::Propagate(), BatchNormComponent::Propagate(), BackpropTruncationComponent::Propagate(), PowerComponent::Propagate(), RectifiedLinearComponent::Propagate(), ScaleComponent::Propagate(), GeneralDropoutComponent::Propagate(), SpecAugmentTimeMaskComponent::Propagate(), SpliceMaxComponent::Propagate(), NoOpComponent::Propagate(), ClipGradientComponent::Propagate(), FixedScaleComponent::Propagate(), PerElementScaleComponent::Propagate(), FixedBiasComponent::Propagate(), PerElementOffsetComponent::Propagate(), AdditiveNoiseComponent::Propagate(), KlHmm::PropagateFnc(), ParallelComponent::PropagateFnc(), LengthNormComponent::PropagateFnc(), Dropout::PropagateFnc(), LstmProjected::PropagateFnc(), AddShift::PropagateFnc(), Rescale::PropagateFnc(), BlstmProjected::PropagateFnc(), ScaleAndOffsetComponent::PropagateInternal(), kaldi::nnet1::RandGauss(), CuRand< float >::RandGaussian(), CuRand< float >::RandUniform(), kaldi::nnet1::RandUniform(), OnlineNaturalGradient::ReorthogonalizeRt1(), OnlinePreconditioner::ReorthogonalizeXt1(), CuMatrixBase< float >::SizeInBytes(), NnetBatchComputer::SplitUtteranceIntoTasks(), kaldi::TestCuFindRowMaxId(), kaldi::TestCuMatrixTransposeCross(), kaldi::nnet3::TestSimpleComponentPropagateProperties(), kaldi::TestSymInvertPosDef(), NoOpTransform::TrainingForward(), AppendTransform::TrainingForward(), SimpleMeanTransform::TrainingForward(), kaldi::UnitInvert(), kaldi::UnitTestCheck(), kaldi::UnitTestCholesky(), kaldi::UnitTestConstructor(), kaldi::UnitTestCopyFromMat(), kaldi::UnitTestCopySp(), kaldi::UnitTestCuCopy(), kaldi::UnitTestCuDiffLogSoftmax(), kaldi::UnitTestCuDiffNormalizePerRow(), kaldi::UnitTestCuDiffSigmoid(), kaldi::UnitTestCuDiffSoftmax(), kaldi::UnitTestCuDiffTanh(), kaldi::UnitTestCuDiffXent(), kaldi::UnitTestCuFindRowMaxId(), kaldi::UnitTestCuLogSoftmax(), kaldi::UnitTestCuMathNormalizePerRow(), kaldi::UnitTestCuMathNormalizePerRow_v2(), kaldi::UnitTestCuMatrixAddMat(), kaldi::UnitTestCuMatrixAddMatDiagVec(), kaldi::UnitTestCuMatrixAddMatMat(), kaldi::UnitTestCuMatrixAddMatMatBatched(), kaldi::UnitTestCuMatrixAddMatMatElements(), kaldi::UnitTestCuMatrixAddVecToCols(), kaldi::UnitTestCuMatrixAddVecToRows(), kaldi::UnitTestCuMatrixCopyCross(), kaldi::UnitTestCuMatrixCopyCross2(), kaldi::UnitTestCuMatrixCopyFromMat(), kaldi::UnitTestCuMatrixDiffGroupPnorm(), kaldi::UnitTestCuMatrixDivElements(), kaldi::UnitTestCuMatrixDivRowsVec(), kaldi::UnitTestCuMatrixGroupMaxDeriv(), kaldi::UnitTestCuMatrixInvertElements(), kaldi::UnitTestCuMatrixMax(), kaldi::UnitTestCuMatrixMin(), kaldi::UnitTestCuMatrixMulColsVec(), kaldi::UnitTestCuMatrixMulElements(), kaldi::UnitTestCuMatrixMulRowsGroupMat(), kaldi::UnitTestCuMatrixMulRowsVec(), kaldi::UnitTestCuSigmoid(), kaldi::UnitTestCuSoftmax(), kaldi::UnitTestCuTanh(), kaldi::UnitTestCuVectorAddColSumMat(), kaldi::UnitTestCuVectorAddColSumMatLarge(), kaldi::UnitTestCuVectorAddRowSumMat(), kaldi::UnitTestCuVectorAddRowSumMatLarge(), kaldi::UnitTestInvert(), kaldi::UnitTestSwapCu2Cu(), kaldi::UnitTestSwapCu2M(), and BlockAffineComponentPreconditioned::Update().

345  {
346  CuMatrix<OtherReal> temp(src);
347  this->CopyFromMat(temp, trans);
348 }
void CopyFromMat(const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:344

◆ CopyFromMat() [2/3]

void CopyFromMat ( const MatrixBase< Real > &  src,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 314 of file cu-matrix.cc.

315  {
316 #if HAVE_CUDA == 1
317  if (CuDevice::Instantiate().Enabled()) {
318  if (trans == kNoTrans) {
319  KALDI_ASSERT(src.NumRows() == num_rows_ && src.NumCols() == num_cols_);
320  CuTimer tim;
321 
322  MatrixIndexT dst_pitch = stride_*sizeof(Real);
323  MatrixIndexT src_pitch = src.Stride()*sizeof(Real);
324  MatrixIndexT width = src.NumCols()*sizeof(Real);
325  CU_SAFE_CALL(cudaMemcpy2DAsync(data_, dst_pitch, src.Data(), src_pitch,
326  width, src.NumRows(), cudaMemcpyHostToDevice,
327  cudaStreamPerThread));
328  CU_SAFE_CALL(cudaStreamSynchronize(cudaStreamPerThread));
329 
330  CuDevice::Instantiate().AccuProfile("CuMatrixBase::CopyFromMat(from CPU)", tim);
331  } else {
332  CuMatrix<Real> trans_mat(src); // Do the transpose on the GPU board.
333  this->CopyFromMat(trans_mat, kTrans);
334  }
335  } else
336 #endif
337  {
338  Mat().CopyFromMat(src, trans);
339  }
340 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
void CopyFromMat(const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:344
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT stride_
Definition: cu-matrix.h:787
int32 MatrixIndexT
Definition: matrix-common.h:98
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ CopyFromMat() [3/3]

void CopyFromMat ( const CuMatrixBase< OtherReal > &  M,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 208 of file cu-matrix.cc.

209  {
210  if (sizeof(Real) == sizeof(OtherReal) &&
211  static_cast<const void*>(M.Data()) ==
212  static_cast<const void*>(this->Data())) {
213  if (M.Data() == NULL)
214  return;
215  // CopyFromMat called on same data. Nothing to do (except sanity checks)
216  KALDI_ASSERT(trans == kNoTrans && M.NumRows() == NumRows() &&
217  M.NumCols() == NumCols() && M.Stride() == Stride());
218  return;
219  }
220 #if HAVE_CUDA == 1
221  if (CuDevice::Instantiate().Enabled()) {
222  if (trans == kNoTrans) {
223  KALDI_ASSERT(M.NumRows() == num_rows_ && M.NumCols() == num_cols_);
224  } else {
225  KALDI_ASSERT(M.NumCols() == num_rows_ && M.NumRows() == num_cols_);
226  }
227  if (M.num_rows_ == 0) return; // Nothing to do.
228  CuTimer tim;
229  if (sizeof(Real) == sizeof(OtherReal) && trans == kNoTrans ) {
230  MatrixIndexT dst_pitch = stride_ * sizeof(Real);
231  MatrixIndexT src_pitch = M.Stride() * sizeof(Real);
232  MatrixIndexT width = M.NumCols() * sizeof(Real);
233  CU_SAFE_CALL(
234  cudaMemcpy2DAsync(data_, dst_pitch, M.data_, src_pitch,
235  width, M.num_rows_, cudaMemcpyDeviceToDevice,
236  cudaStreamPerThread));
237  } else {
238  if (trans == kNoTrans) {
239  dim3 dimGrid, dimBlock;
240  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
241  &dimGrid, &dimBlock);
242  cuda_copy_from_mat(dimGrid, dimBlock, data_, M.data_, Dim(), M.Dim());
243  } else {
244  // 2D thread block with warps (blockDim.x) along the row-dim of input M.
245  // Each (8x32) thread block will transpose (32x32) data
246  const int32 warpSize = 32;
247  dim3 dimBlock(warpSize, CU1DBLOCK / warpSize);
248  dim3 dimGrid(n_blocks(M.NumCols(), warpSize),
249  n_blocks(M.NumRows(), warpSize));
250  cuda_copy_from_mat_trans(dimGrid, dimBlock, data_, M.data_, Dim(),
251  M.Dim());
252  }
253  CU_SAFE_CALL(cudaGetLastError());
254  }
255  CuDevice::Instantiate().AccuProfile("CuMatrixBase::CopyFromMat(from other CuMatrixBase)", tim);
256  } else
257 #endif
258  {
259  Mat().CopyFromMat(M.Mat(), trans);
260  }
261 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
MatrixIndexT Stride() const
Definition: cu-matrix.h:217
kaldi::int32 int32
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT stride_
Definition: cu-matrix.h:787
int32 MatrixIndexT
Definition: matrix-common.h:98
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
const Real * Data() const
Return data pointer (const).
Definition: cu-matrix.h:746
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ CopyFromSp()

void CopyFromSp ( const CuSpMatrix< Real > &  M)

Definition at line 360 of file cu-matrix.cc.

Referenced by CuMatrix< float >::CuMatrix(), CuSpMatrix< Real >::Invert(), CuMatrixBase< float >::SizeInBytes(), and kaldi::TestCuMatrixCopyFromSp().

360  {
361  KALDI_ASSERT(num_rows_ == M.NumRows() && num_cols_ == num_rows_);
362  if (num_rows_ == 0)
363  return;
364 #if HAVE_CUDA == 1
365  if (CuDevice::Instantiate().Enabled()) {
366  CuTimer tim;
367  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
368  dim3 dimGrid(n_blocks(NumRows(), CU2DBLOCK),
369  n_blocks(NumRows(), CU2DBLOCK));
370  cuda_copy_from_sp(dimGrid, dimBlock, M.Data(), data_, Dim());
371  CuDevice::Instantiate().AccuProfile("CuMatrix::CopyFromSp", tim);
372  } else
373 #endif
374  {
375  Mat().CopyFromSp(M.Mat());
376  }
377 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
#define CU2DBLOCK
Definition: cu-matrixdim.h:61
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ CopyFromTp()

template void CopyFromTp ( const CuTpMatrix< OtherReal > &  M,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 280 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::Cholesky(), CuMatrix< float >::CuMatrix(), CuTpMatrix< Real >::Invert(), CuMatrixBase< float >::SizeInBytes(), kaldi::TestCuMatrixCopyFromTp(), and kaldi::UnitTestCuMatrixCopyFromTp().

281  {
282  KALDI_ASSERT(num_rows_ == M.NumRows() && num_cols_ == num_rows_);
283  if (num_rows_ == 0)
284  return;
285 #if HAVE_CUDA == 1
286  if (CuDevice::Instantiate().Enabled()) {
287  CuTimer tim;
288  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
289  dim3 dimGrid(n_blocks(num_rows_, CU2DBLOCK),
290  n_blocks(num_rows_, CU2DBLOCK));
291  if (trans == kNoTrans) {
292  cuda_copy_from_tp(dimGrid, dimBlock, data_, M.Data(), Dim());
293  } else {
294  cuda_copy_from_tp_trans(dimGrid, dimBlock, data_, M.Data(), Dim());
295  }
296  CuDevice::Instantiate().AccuProfile(__func__, tim);
297  } else
298 #endif
299  {
300  Mat().CopyFromTp(M.Mat(), trans);
301  }
302 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
#define CU2DBLOCK
Definition: cu-matrixdim.h:61
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ CopyLowerToUpper()

void CopyLowerToUpper ( )

Definition at line 2969 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::AddMatTp(), kaldi::nnet3::ConstrainOrthonormalInternal(), kaldi::nnet2::PreconditionDirections(), kaldi::TestCuMatrixCopyLowerToUpper(), kaldi::UnitTestCuCholesky(), and kaldi::UnitTestCuMatrixCopyLowerToUpper().

2969  {
2971  if (num_rows_ == 0) return;
2972 #if HAVE_CUDA == 1
2973  if (CuDevice::Instantiate().Enabled()) {
2974  CuTimer tim;
2975  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2976  int32 dim = num_rows_;
2977  dim3 dimGrid(n_blocks(dim, CU2DBLOCK),
2978  n_blocks(dim, CU2DBLOCK));
2979  cuda_copy_low_upp(dimGrid, dimBlock, data_, Dim());
2980  CU_SAFE_CALL(cudaGetLastError());
2981  CuDevice::Instantiate().AccuProfile(__func__, tim);
2982  } else
2983 #endif
2984  {
2985  Mat().CopyLowerToUpper();
2986  }
2987 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
kaldi::int32 int32
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
#define CU2DBLOCK
Definition: cu-matrixdim.h:61
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ CopyRangeFromMatClamped()

void CopyRangeFromMatClamped ( const CuMatrixBase< Real > &  src,
int32_t  start_range,
int32_t  end_range,
int32_t  clamp_low,
int32_t  clamp_high 
)

Definition at line 419 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::SizeInBytes().

421  {
422 
423  KALDI_ASSERT(NumCols() == this->NumCols());
424  KALDI_ASSERT(NumRows() == end_range-start_range);
425 
426 #if HAVE_CUDA == 1
427  if (CuDevice::Instantiate().Enabled()) {
428  cuda_mat_copy_range_clamped(start_range, end_range, NumCols(),
429  src.Data(), src.Stride(), clamp_low, clamp_high,
430  Data(), Stride());
431  } else
432 #endif
433  {
434  for (int32 t = start_range; t < end_range; t++) {
435  int32 t_clamped = t;
436  if (t_clamped < clamp_low) t_clamped = clamp_low;
437  if (t_clamped >= clamp_high) t_clamped = clamp_high;
438  CuSubVector<Real> dest_row=this->Row(t - start_range);
439  const CuSubVector<Real> src_row=src.Row(t_clamped);
440  dest_row.CopyFromVec(src_row);
441  }
442  }
443 }
MatrixIndexT Stride() const
Definition: cu-matrix.h:217
const CuSubVector< Real > Row(MatrixIndexT i) const
Definition: cu-matrix.h:670
kaldi::int32 int32
friend class CuSubVector< Real >
Definition: cu-matrix.h:92
const Real * Data() const
Return data pointer (const).
Definition: cu-matrix.h:746
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ CopyRows() [1/2]

void CopyRows ( const CuMatrixBase< Real > &  src,
const CuArrayBase< MatrixIndexT > &  indexes 
)

Copies row r from row indexes[r] of src.

As a special case, if indexes[i] < 0, sets row i to zero. src.NumCols() must equal this.NumCols()

Definition at line 2678 of file cu-matrix.cc.

Referenced by StatisticsExtractionComponent::Backprop(), SpliceComponent::Backprop(), NnetComputer::ExecuteCommand(), main(), DistributeComponent::Propagate(), and SpliceMaxComponent::Propagate().

2679  {
2680 #if HAVE_CUDA == 1
2681  if (CuDevice::Instantiate().Enabled()) {
2682  KALDI_ASSERT(static_cast<MatrixIndexT>(indices.Dim()) == NumRows());
2683  KALDI_ASSERT(NumCols() == src.NumCols());
2684 
2685  CuTimer tim;
2686  dim3 dimGrid, dimBlock;
2687  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2688  &dimGrid, &dimBlock);
2689  cuda_copy_rows(dimGrid, dimBlock, data_, src.Data(), indices.Data(),
2690  Dim(), src.Stride());
2691  CU_SAFE_CALL(cudaGetLastError());
2692  CuDevice::Instantiate().AccuProfile(__func__, tim);
2693  } else
2694 #endif
2695  {
2696  Mat().CopyRows(src.Mat(), indices.Data());
2697  }
2698 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ CopyRows() [2/2]

void CopyRows ( const CuArrayBase< const Real *> &  src)

Copies row r of this matrix from an array of floats at the location given by src[r], where src[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (the point is: the data it points to should be on the GPU if we're using a GPU, and on a CPU otherwise).

src.size() must equal this.NumRows(), and if any src[r] is NULL then this.Row(r) will be set to zero.

Definition at line 2723 of file cu-matrix.cc.

2723  {
2724  if (NumRows() == 0) return;
2725 #if HAVE_CUDA == 1
2726  if (CuDevice::Instantiate().Enabled()) {
2727  KALDI_ASSERT(static_cast<MatrixIndexT>(src.Dim()) == NumRows());
2728  CuTimer tim;
2729  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2730  dim3 dimGrid(n_blocks(num_cols_, CU2DBLOCK),
2731  n_blocks(num_rows_, CU2DBLOCK));
2732  cuda_copy_rows(dimGrid, dimBlock, data_, src.Data(), Dim());
2733  CU_SAFE_CALL(cudaGetLastError());
2734  CuDevice::Instantiate().AccuProfile(__func__, tim);
2735  } else
2736 #endif
2737  {
2738  Mat().CopyRows(src.Data());
2739  }
2740 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
#define CU2DBLOCK
Definition: cu-matrixdim.h:61
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ CopyRowsFromVec() [1/2]

void CopyRowsFromVec ( const CuVectorBase< Real > &  v)

This function has two modes of operation.

If v.Dim() == NumRows() * NumCols(), then treats the vector as a row-by-row concatenation of a matrix and copies to *this. if v.Dim() == NumCols(), it sets each row of *this to a copy of v.

Definition at line 2301 of file cu-matrix.cc.

Referenced by kaldi::CuVectorUnitTestCopyFromMat(), NnetOnlineComputer::Flush(), NnetRescaler::FormatInput(), TimeHeightConvolutionComponent::Propagate(), TdnnComponent::Propagate(), RepeatedAffineComponent::Propagate(), ConstantComponent::Propagate(), AffineComponent::Propagate(), FixedAffineComponent::Propagate(), BlockAffineComponent::Propagate(), ConstantFunctionComponent::Propagate(), CuMatrixBase< float >::SizeInBytes(), and kaldi::UnitTestCuMatrixCopyRowsFromVec().

2301  {
2302 #if HAVE_CUDA == 1
2303  if (CuDevice::Instantiate().Enabled()) {
2304  CuTimer tim;
2305  if (v.Dim() == num_rows_*num_cols_) {
2306  if (stride_ == num_cols_) {
2307  const Real* v_data = v.Data();
2308  CU_SAFE_CALL(
2309  cudaMemcpyAsync(data_, v_data, sizeof(Real)*num_rows_*num_cols_,
2310  cudaMemcpyDeviceToDevice, cudaStreamPerThread));
2311  } else {
2312  CU_SAFE_CALL(
2313  cudaMemcpy2DAsync(data_, stride_ * sizeof(Real), v.Data(),
2314  num_cols_*sizeof(Real), num_cols_*sizeof(Real),
2315  num_rows_, cudaMemcpyDeviceToDevice,
2316  cudaStreamPerThread));
2317  }
2318  } else if (v.Dim() == num_cols_) {
2319  dim3 dimGrid, dimBlock;
2320  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2321  &dimGrid, &dimBlock);
2322  cuda_copy_rows_from_vec(dimGrid, dimBlock, data_, this->Dim(), v.Data());
2323  CU_SAFE_CALL(cudaGetLastError());
2324  } else {
2325  KALDI_ERR << "Wrong sized arguments";
2326  }
2327  CuDevice::Instantiate().AccuProfile(__func__, tim);
2328  } else
2329 #endif
2330  {
2331  Mat().CopyRowsFromVec(v.Vec());
2332  }
2333 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT stride_
Definition: cu-matrix.h:787
#define KALDI_ERR
Definition: kaldi-error.h:147
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ CopyRowsFromVec() [2/2]

void CopyRowsFromVec ( const VectorBase< Real > &  v)

Version of CopyRowsFromVec() that takes a CPU-based vector.

Definition at line 2336 of file cu-matrix.cc.

2336  {
2337 #if HAVE_CUDA == 1
2338  if (CuDevice::Instantiate().Enabled()) {
2339  CuTimer tim;
2340  if (v.Dim() == num_rows_*num_cols_) {
2341  if (stride_ == num_cols_) {
2342  const Real* v_data = v.Data();
2343  CU_SAFE_CALL(cudaMemcpyAsync(data_, v_data,
2344  sizeof(Real)*num_rows_*num_cols_,
2345  cudaMemcpyHostToDevice,
2346  cudaStreamPerThread));
2347  } else {
2348  const Real *v_data = v.Data();
2349  for (MatrixIndexT r = 0; r < num_rows_; r++) {
2350  Real *row_data = RowData(r);
2351  CU_SAFE_CALL(cudaMemcpyAsync(row_data, v_data, sizeof(Real)*num_cols_,
2352  cudaMemcpyHostToDevice,
2353  cudaStreamPerThread));
2354  v_data += num_cols_;
2355  }
2356  }
2357  CU_SAFE_CALL(cudaStreamSynchronize(cudaStreamPerThread));
2358  } else if (v.Dim() == num_cols_) {
2359  dim3 dimGrid, dimBlock;
2360  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2361  &dimGrid, &dimBlock);
2362  cuda_copy_rows_from_vec(dimGrid, dimBlock, this->data_, this->Dim(), v.Data());
2363  CU_SAFE_CALL(cudaGetLastError());
2364  } else {
2365  KALDI_ERR << "Wrong sized arguments";
2366  }
2367  CuDevice::Instantiate().AccuProfile(__func__, tim);
2368  } else
2369 #endif
2370  {
2371  Mat().CopyRowsFromVec(v);
2372  }
2373 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT stride_
Definition: cu-matrix.h:787
int32 MatrixIndexT
Definition: matrix-common.h:98
#define KALDI_ERR
Definition: kaldi-error.h:147
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
const Real * RowData(MatrixIndexT r) const
Get raw row pointer (const).
Definition: cu-matrix.h:740
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ CopyToMat()

template void CopyToMat ( MatrixBase< OtherReal > *  dst,
MatrixTransposeType  trans = kNoTrans 
) const

Definition at line 447 of file cu-matrix.cc.

Referenced by NnetComputerFromEg::Compute(), CuMatrixBase< float >::CopyToMat(), kaldi::nnet1::MomentStatistics(), kaldi::operator<<(), KlHmm::PropagateFnc(), CuMatrixBase< float >::SizeInBytes(), kaldi::UnitInvert(), kaldi::UnitTestCholesky(), kaldi::UnitTestCuDiffLogSoftmax(), kaldi::UnitTestCuDiffSigmoid(), kaldi::UnitTestCuDiffSoftmax(), kaldi::UnitTestCuDiffTanh(), kaldi::UnitTestCuDiffXent(), kaldi::UnitTestCuMatrixAddMat(), kaldi::UnitTestCuMatrixAddMatMat(), kaldi::UnitTestCuMatrixAddVecToCols(), kaldi::UnitTestCuMatrixAddVecToRows(), kaldi::UnitTestCuMatrixAddVecVec(), kaldi::UnitTestCuMatrixDiffGroupPnorm(), kaldi::UnitTestCuMatrixDivElements(), kaldi::UnitTestCuMatrixDivRowsVec(), kaldi::UnitTestCuMatrixGroupMaxDeriv(), kaldi::UnitTestCuMatrixInvertElements(), kaldi::UnitTestCuMatrixMax(), kaldi::UnitTestCuMatrixMin(), kaldi::UnitTestCuMatrixMulColsVec(), kaldi::UnitTestCuMatrixMulElements(), kaldi::UnitTestCuMatrixMulRowsGroupMat(), kaldi::UnitTestCuMatrixMulRowsVec(), kaldi::UnitTestCuSigmoid(), kaldi::UnitTestCuTanh(), kaldi::UnitTestInvert(), kaldi::UnitTestMatrix(), UnitTestMatrixRandomizer(), kaldi::UnitTestSetZeroAboveDiag(), kaldi::UnitTestSwapCu2Cu(), and kaldi::UnitTestSwapCu2M().

448  {
449 #if HAVE_CUDA == 1
450  if (CuDevice::Instantiate().Enabled()) {
451  if (trans == kTrans || sizeof(OtherReal) != sizeof(Real)) {
452  CuMatrix<OtherReal> this_trans(*this, trans);
453  this_trans.CopyToMat(dst, kNoTrans);
454  } else {
455  KALDI_ASSERT(dst->NumRows() == NumRows() && dst->NumCols() == NumCols());
456  if (num_rows_ == 0) return;
457  CuTimer tim;
458 
459  MatrixIndexT src_pitch = stride_*sizeof(Real);
460  MatrixIndexT dst_pitch = dst->Stride()*sizeof(Real);
461  MatrixIndexT width = NumCols()*sizeof(Real);
462  CU_SAFE_CALL(cudaMemcpy2DAsync(dst->Data(), dst_pitch, this->data_,
463  src_pitch, width, this->num_rows_,
464  cudaMemcpyDeviceToHost, cudaStreamPerThread));
465  CU_SAFE_CALL(cudaStreamSynchronize(cudaStreamPerThread));
466  CuDevice::Instantiate().AccuProfile("CuMatrix::CopyToMatD2H", tim);
467  }
468  } else
469  #endif
470  {
471  dst->CopyFromMat(Mat(), trans);
472  }
473 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT stride_
Definition: cu-matrix.h:787
int32 MatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ CopyToRows()

void CopyToRows ( const CuArrayBase< Real *> &  dst) const

For each row r of this matrix, copies it to the array of floats at the location given by dst[r], where dst[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (i.e.

it should point to memory on the GPU if we're using a GPU, or on the CPU otherwise). If dst[r] is NULL, does not copy anywhere. Requires that none of the memory regions pointed to by the pointers in "dst" overlap (e.g. none of the pointers should be the same).

Definition at line 2744 of file cu-matrix.cc.

Referenced by DistributeComponent::Backprop(), NnetComputer::ExecuteCommand(), and kaldi::UnitTestCuMatrixCopyToRows().

2744  {
2745  if (NumRows() == 0) return;
2746 #if HAVE_CUDA == 1
2747  if (CuDevice::Instantiate().Enabled()) {
2748  KALDI_ASSERT(static_cast<MatrixIndexT>(dst.Dim()) == NumRows());
2749 
2750  CuTimer tim;
2751  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2752  dim3 dimGrid(n_blocks(num_cols_, CU2DBLOCK),
2753  n_blocks(num_rows_, CU2DBLOCK));
2754  cuda_copy_to_rows(dimGrid, dimBlock, dst.Data(), data_, Dim());
2755  CU_SAFE_CALL(cudaGetLastError());
2756  CuDevice::Instantiate().AccuProfile(__func__, tim);
2757  } else
2758 #endif
2759  {
2760  Mat().CopyToRows(dst.Data());
2761  }
2762 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
#define CU2DBLOCK
Definition: cu-matrixdim.h:61
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ CopyUpperToLower()

void CopyUpperToLower ( )

Definition at line 2990 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::AddMatTp(), kaldi::TestCuMatrixCopyUpperToLower(), and kaldi::UnitTestCuMatrixCopyUpperToLower().

2990  {
2992  if (num_rows_ == 0) return;
2993 #if HAVE_CUDA == 1
2994  if (CuDevice::Instantiate().Enabled()) {
2995  CuTimer tim;
2996  int32 dim = this->num_rows_;
2997  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2998  dim3 dimGrid(n_blocks(dim, CU2DBLOCK),
2999  n_blocks(dim, CU2DBLOCK));
3000  cuda_copy_upp_low(dimGrid, dimBlock, data_, Dim());
3001  CU_SAFE_CALL(cudaGetLastError());
3002  CuDevice::Instantiate().AccuProfile(__func__, tim);
3003  } else
3004 #endif
3005  {
3006  Mat().CopyUpperToLower();
3007  }
3008 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
kaldi::int32 int32
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
#define CU2DBLOCK
Definition: cu-matrixdim.h:61
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ Data() [1/2]

const Real* Data ( ) const
inline

Return data pointer (const).

Warning: may return a pointer to GPU memory. Use at your own risk.

Definition at line 746 of file cu-matrix.h.

Referenced by CuMatrixBase< float >::AddCols(), CuVectorBase< float >::AddColSumMat(), CuVectorBase< float >::AddDiagMatMat(), CuMatrixBase< float >::AddDiagVecMat(), CuSpMatrix< Real >::AddMat2(), CuMatrixBase< float >::AddMatBlock(), CuMatrixBase< float >::AddMatDiagVec(), CuBlockMatrix< Real >::AddMatMat(), CuMatrixBase< float >::AddMatMatElements(), CuMatrixBase< float >::AddMatSmat(), CuVectorBase< float >::AddMatVec(), CuMatrixBase< float >::AddRowRanges(), CuMatrixBase< float >::AddRows(), CuVectorBase< float >::AddRowSumMat(), CuMatrixBase< float >::AddSmatMat(), CuMatrixBase< float >::AddToRows(), NormalizeComponent::Backprop(), BatchNormComponent::Backprop(), RepeatedAffineComponent::Backprop(), GeneralDropoutComponent::Backprop(), PerElementScaleComponent::Backprop(), PerElementOffsetComponent::Backprop(), ScaleAndOffsetComponent::Backprop(), ScaleAndOffsetComponent::BackpropInternal(), kaldi::cu::BackpropLstmNonlinearity(), CuMatrix< float >::CompObjfAndDeriv(), DistributeComponent::ComputeInputPointers(), kaldi::cu::ComputeLstmNonlinearity(), kaldi::nnet3::time_height_convolution::ConvolveBackwardData(), kaldi::nnet3::time_height_convolution::ConvolveBackwardDataInternal(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParams(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParamsInternal(), kaldi::nnet3::time_height_convolution::ConvolveForward(), kaldi::nnet3::time_height_convolution::ConvolveForwardInternal(), kaldi::cu::Copy(), CuVectorBase< float >::CopyColFromMat(), CuMatrixBase< float >::CopyCols(), CuVectorBase< float >::CopyDiagFromMat(), CuVectorBase< float >::CopyElements(), CuTpMatrix< Real >::CopyFromMat(), CuCompressedMatrix< I >::CopyFromMat(), CuSpMatrix< Real >::CopyFromMat(), CuMatrixBase< float >::CopyFromMat(), CuMatrixBase< float >::CopyRangeFromMatClamped(), CuMatrixBase< float >::CopyRows(), CuVectorBase< float >::CopyRowsFromMat(), VectorBase< float >::CopyRowsFromMat(), CuSparseMatrix< Real >::CopyToMat(), CuCompressedMatrix< I >::CopyToMat(), CuMatrixBase< float >::DiffGroupPnorm(), CuMatrixBase< float >::DiffLogSoftmaxPerRow(), kaldi::cu::DiffNormalizePerRow(), CuMatrixBase< float >::DiffSoftmaxPerRow(), kaldi::cu::EnsureNonzero(), CuMatrixBase< float >::EqualElementMask(), NnetBatchComputer::FormatInputs(), NnetBatchComputer::FormatOutputs(), TdnnComponent::GetInputPart(), NnetComputer::GetPointers(), CuMatrixBase< float >::GroupMaxDeriv(), CuTpMatrix< Real >::Invert(), kaldi::nnet3::MergeTaskOutput(), CuMatrixBase< float >::MulRows(), kaldi::cu::NormalizePerRow(), NormalizeComponent::Propagate(), BatchNormComponent::Propagate(), TimeHeightConvolutionComponent::Propagate(), RepeatedAffineComponent::Propagate(), GeneralDropoutComponent::Propagate(), PerElementOffsetComponent::Propagate(), ScaleAndOffsetComponent::Propagate(), ScaleAndOffsetComponent::PropagateInternal(), CuRand< float >::RandGaussian(), kaldi::cu::Randomize(), CuRand< float >::RandUniform(), kaldi::cu::RegularizeL1(), RectifiedLinearComponent::RepairGradients(), CuBlockMatrix< Real >::SetCudaData(), kaldi::cu::Splice(), BatchNormComponent::StoreStats(), CuMatrixBase< float >::SumColumnRanges(), CuMatrixBase< float >::SymAddMat2(), kaldi::TraceMatMat(), kaldi::TraceMatSmat(), RepeatedAffineComponent::Update(), NaturalGradientRepeatedAffineComponent::Update(), TimeHeightConvolutionComponent::UpdateNaturalGradient(), and TimeHeightConvolutionComponent::UpdateSimple().

746 { return data_; }
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777

◆ Data() [2/2]

Real* Data ( )
inline

Return data pointer.

Warning: may return a pointer to GPU memory. Use at your own risk.

Definition at line 749 of file cu-matrix.h.

749 { return data_; }
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777

◆ DiffGroupPnorm()

void DiffGroupPnorm ( const CuMatrixBase< Real > &  in_value,
const CuMatrixBase< Real > &  out_value,
const CuMatrixBase< Real > &  out_deriv,
Real  power 
)

Differentiate backward through the GroupPnorm function.

It is a combination of GroupPnormDeriv and MulRowsGroupMat.

Definition at line 841 of file cu-matrix.cc.

Referenced by PnormComponent::Backprop(), CuMatrixBase< float >::SizeInBytes(), and kaldi::UnitTestCuMatrixDiffGroupPnorm().

844  {
845  KALDI_ASSERT(out_value.NumCols() > 0);
846  KALDI_ASSERT(out_value.NumCols() == out_deriv.NumCols());
847  int group_size = this->NumCols() / out_value.NumCols();
848  KALDI_ASSERT(this->NumCols() == out_value.NumCols() * group_size);
849 #if HAVE_CUDA == 1
850  if (CuDevice::Instantiate().Enabled()) {
851  CuTimer tim;
852  const int kWarpSize = 32;
853  dim3 dimBlock(kWarpSize, CU1DBLOCK / kWarpSize);
854  dim3 dimGrid(n_blocks(NumCols(), dimBlock.x),
855  n_blocks(NumRows(), dimBlock.y));
856  if (dimGrid.x * dimGrid.y > 1024) {
857  dimGrid.y = std::max(1024 / dimGrid.x, unsigned(1));
858  }
859  cuda_diff_group_pnorm(dimGrid, dimBlock, this->data_, in_value.Data(),
860  out_value.Data(), out_deriv.Data(), Dim(),
861  in_value.Stride(), out_value.Stride(),
862  out_deriv.Stride(), group_size, power);
863  CU_SAFE_CALL(cudaGetLastError());
864  CuDevice::Instantiate().AccuProfile(__func__, tim);
865  } else
866 #endif
867  {
868  Mat().GroupPnormDeriv(in_value.Mat(), out_value.Mat(), power);
869  MulRowsGroupMat(out_deriv);
870  }
871 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
void MulRowsGroupMat(const CuMatrixBase< Real > &src)
divide each row into src.NumCols() groups, and then scale i&#39;th row&#39;s jth group of elements by src[i...
Definition: cu-matrix.cc:816
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ DiffLogSoftmaxPerRow()

void DiffLogSoftmaxPerRow ( const CuMatrixBase< Real > &  out_value,
const CuMatrixBase< Real > &  out_deriv 
)

Differentiate backward through the log softmax function.

Here, "out_value" is the log softmax output. Does, for each row i, *this(i) = out_deriv(i) - sum(out_deriv(i)) .* exp(out_value(i)) xxxx(i) is row-vector. Supports in-place operation, this == &out_deriv.

Definition at line 1903 of file cu-matrix.cc.

Referenced by LogSoftmaxComponent::Backprop(), CuMatrixBase< float >::DiffLogSoftmaxPerRow(), CuMatrixBase< float >::SizeInBytes(), and kaldi::UnitTestCuDiffLogSoftmax().

1904  {
1905 
1906  KALDI_ASSERT(SameDim(out_value, out_deriv) && SameDim(out_value, *this) &&
1907  this != &out_value);
1908 
1909 #if HAVE_CUDA == 1
1910  if (CuDevice::Instantiate().Enabled()) {
1911  CuTimer tim;
1912 
1913  // CUDA thread layout: one thread block per matrix-row.
1914  dim3 dimBlock(CU1DBLOCK);
1915  dim3 dimGrid(num_rows_);
1916  cuda_diff_log_softmax(dimGrid, dimBlock, this->Dim(), out_value.Data(),
1917  out_value.Stride(), out_deriv.Data(),
1918  out_deriv.Stride(), data_);
1919  CU_SAFE_CALL(cudaGetLastError());
1920 
1921  CuDevice::Instantiate().AccuProfile(__func__, tim);
1922  } else
1923 #endif
1924  {
1925  if (this == &out_deriv) {
1926  // the code below doesn't work for in-place, so make a copy and recurse.
1927  CuMatrix<Real> temp(NumRows(), NumCols(), kUndefined);
1928  temp.DiffLogSoftmaxPerRow(out_value, out_deriv);
1929  CopyFromMat(temp);
1930  return;
1931  }
1932  /*
1933  Let the output be y, then
1934  y_i = x_i - log(sum_i exp(x_i))
1935  where x_i is the input to the component. The Jacobian matrix of this
1936  function is
1937  J = I - 1 exp(y^T)
1938  where 1 is a vector of ones. Let the derivative vector at the output be e,
1939  and at the input be d, then we have
1940  d = e - exp(y) Sum(e)
1941  d_i = e_i - exp(y_i) Sum(e)
1942  */
1943  const CuMatrixBase<Real> &Y(out_value), &E(out_deriv);
1944  CuMatrixBase<Real> &D(*this);
1945 
1946  D.CopyFromMat(Y);
1947  D.ApplyExp(); // exp(y)
1948  CuVector<Real> E_sum(D.NumRows()); // Initializes to zero
1949  E_sum.AddColSumMat(1.0, E); // Sum(e)
1950  D.MulRowsVec(E_sum); // exp(y) Sum(e)
1951  D.Scale(-1.0); // - exp(y) Sum(e)
1952  D.AddMat(1.0, E, kNoTrans); // e - exp(y_i) Sum(e)
1953  }
1954 }
void CopyFromMat(const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:344
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ DiffParametricRelu()

void DiffParametricRelu ( const CuMatrixBase< Real > &  value,
const CuMatrixBase< Real > &  diff,
const CuVectorBase< Real > &  alpha,
const CuVectorBase< Real > &  beta 
)

Differentiate backward through the parametric relu function.

Here the "value" is the Relu input. Does, element-by-element. *this = diff * (value > 0 ? alpha : beta)

Definition at line 1501 of file cu-matrix.cc.

Referenced by ParametricRelu::BackpropagateFnc(), and CuMatrixBase< float >::SizeInBytes().

1505  {
1506 #if HAVE_CUDA == 1
1507  if (CuDevice::Instantiate().Enabled()) {
1508  CuTimer tim;
1509 
1510  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
1511  dim3 dimGrid(n_blocks(num_cols_, CU2DBLOCK), n_blocks(num_rows_, CU2DBLOCK));
1512 
1513  cuda_diff_parametric_relu(dimGrid, dimBlock, data_, diff.data_, value.data_,
1514  Dim(), diff.Stride(), value.Stride(),
1515  alpha.data_, beta.data_);
1516  CU_SAFE_CALL(cudaGetLastError());
1517 
1518  CuDevice::Instantiate().AccuProfile(__func__, tim);
1519  } else
1520 #endif
1521  {
1522  // Do it on CPU,
1523  for (MatrixIndexT r = 0; r < NumRows(); r++) {
1524  for (MatrixIndexT c = 0; c < NumCols(); c++) {
1525  Real value_elem = value.Mat()(r,c);
1526  this->Mat()(r,c) = diff.Mat()(r,c) *
1527  (value_elem >= 0.0 ? alpha.Vec()(c) : beta.Vec()(c));
1528  }
1529  }
1530  }
1531 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
int32 MatrixIndexT
Definition: matrix-common.h:98
#define CU2DBLOCK
Definition: cu-matrixdim.h:61
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ DiffSigmoid()

void DiffSigmoid ( const CuMatrixBase< Real > &  value,
const CuMatrixBase< Real > &  diff 
)

Differentiate backward through the sigmoid function.

Here, "value" is the sigmoid output. Does, element-by-element, *this = diff * value * (1 - value).

Definition at line 1764 of file cu-matrix.cc.

Referenced by SigmoidComponent::Backprop(), Sigmoid::BackpropagateFnc(), CuMatrixBase< float >::SizeInBytes(), and kaldi::UnitTestCuDiffSigmoid().

1765  {
1766  KALDI_ASSERT(SameDim(*this, value) && SameDim(*this, diff));
1767 #if HAVE_CUDA == 1
1768  if (CuDevice::Instantiate().Enabled()) {
1769  CuTimer tim;
1770  dim3 dimGrid, dimBlock;
1771  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1772  &dimGrid, &dimBlock);
1773  cuda_diff_sigmoid(dimGrid, dimBlock, data_, diff.data_, value.data_, Dim(), diff.Stride(), value.Stride());
1774  CU_SAFE_CALL(cudaGetLastError());
1775 
1776  CuDevice::Instantiate().AccuProfile(__func__, tim);
1777  } else
1778 #endif
1779  {
1780  Mat().DiffSigmoid(value.Mat(), diff.Mat());
1781  }
1782 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ DiffSoftmaxPerRow()

void DiffSoftmaxPerRow ( const CuMatrixBase< Real > &  value,
const CuMatrixBase< Real > &  diff 
)

Differentiate backward through the softmax function.

Here, "value" is the softmax output. Does, for each row i, *this(i) = diff(i) * diag(value(i)) - diff(i) * (value(i)^T * value(i)) xxxx(i) is row-vector; '*' and '-' are matrix operations. Supports in-place operation, this == &diff.

Definition at line 1868 of file cu-matrix.cc.

Referenced by kaldi::nnet3::attention::AttentionBackward(), SoftmaxComponent::Backprop(), CuMatrixBase< float >::SizeInBytes(), and kaldi::UnitTestCuDiffSoftmax().

1869  {
1870 
1871  KALDI_ASSERT(SameDim(value, diff) && SameDim(value, *this) &&
1872  this != &value);
1873 
1874 #if HAVE_CUDA == 1
1875  if (CuDevice::Instantiate().Enabled()) {
1876  CuTimer tim;
1877 
1878  // CUDA thread layout: one thread block per matrix-row.
1879  dim3 dimBlock(CU1DBLOCK);
1880  dim3 dimGrid(num_rows_);
1881  cuda_diff_softmax(dimGrid, dimBlock, data_, this->Dim(), value.Data(),
1882  value.Stride(), diff.Data(), diff.Stride());
1883  CU_SAFE_CALL(cudaGetLastError());
1884 
1885  CuDevice::Instantiate().AccuProfile(__func__, tim);
1886  } else
1887 #endif
1888  {
1889  const CuMatrixBase<Real> &P(value), &E(diff);
1890  CuMatrixBase<Real> &D(*this);
1891 
1892  CuVector<Real> pe_vec(D.NumRows()); // For each row i, the dot product (p_t . e_t).
1893  pe_vec.AddDiagMatMat(1.0, P, kNoTrans, E, kTrans, 0.0);
1894 
1895  D.CopyFromMat(E);
1896  D.MulElements(P);
1897  // At this point, D = P .* E (in matlab notation)
1898  D.AddDiagVecMat(-1.0, pe_vec, P, kNoTrans, 1.0); // does D -= diag(pe_vec) * P.
1899  }
1900 }
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ DiffTanh()

void DiffTanh ( const CuMatrixBase< Real > &  value,
const CuMatrixBase< Real > &  diff 
)

Differentiate backward through the tanh function.

Here, "value" is the tanh output. Does, element-by-element, *this = diff * (1 - value^2).

Definition at line 1809 of file cu-matrix.cc.

Referenced by TanhComponent::Backprop(), RecurrentComponent::BackpropagateFnc(), Tanh::BackpropagateFnc(), LstmNonlinearityComponent::ConsolidateMemory(), CuMatrixBase< float >::SizeInBytes(), and kaldi::UnitTestCuDiffTanh().

1810  {
1811 #if HAVE_CUDA == 1
1812  if (CuDevice::Instantiate().Enabled()) {
1813  CuTimer tim;
1814  dim3 dimGrid, dimBlock;
1815  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1816  &dimGrid, &dimBlock);
1817  cuda_diff_tanh(dimGrid, dimBlock, data_, diff.data_, value.data_, Dim(), diff.Stride(), value.Stride());
1818  CU_SAFE_CALL(cudaGetLastError());
1819 
1820  CuDevice::Instantiate().AccuProfile(__func__, tim);
1821  } else
1822 #endif
1823  {
1824  Mat().DiffTanh(value.Mat(), diff.Mat());
1825  }
1826 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ DiffXent()

void DiffXent ( const CuArrayBase< int32 > &  tgt,
CuVector< Real > *  log_post_tgt 
)

Differentiate the block [softmax+cross-entropy] : dE/da = posterior_mat - target_mat, 'E' is error function, 'a' is activation on softmax input.

Interface: tgt ... index vector, encodes the matrix of targets net_out_or_diff ... before invocation net output, after diff dE/da log_post_tgt ... per-frame statistics for cross-entropy computations : log(sum_row(posterior_mat .* target_mat))

Definition at line 1957 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::SizeInBytes(), and kaldi::UnitTestCuDiffXent().

1958  {
1959 
1960  KALDI_ASSERT(tgt.Dim() == num_rows_);
1961  log_post_tgt->Resize(tgt.Dim());
1962 
1963 #if HAVE_CUDA == 1
1964  if (CuDevice::Instantiate().Enabled()) {
1965  CuTimer tim;
1966  dim3 dimBlock(1, CU2DBLOCK*8);
1967  dim3 dimGrid(1, n_blocks(tgt.Dim(), CU2DBLOCK*8));
1968  cuda_diff_xent(dimGrid, dimBlock, tgt.Data(), data_,
1969  log_post_tgt->data_, Dim());
1970 
1971  CuDevice::Instantiate().AccuProfile(__func__, tim);
1972  } else
1973 #endif
1974  {
1975  MatrixIndexT num_rows = num_rows_;
1976  for(int32 r = 0; r < num_rows; r++) {
1977  int32 col_tgt = tgt.Data()[r];
1978  Real &value = Mat()(r, col_tgt);
1979  log_post_tgt->Vec()(r) = kaldi::Log(value);
1980  value -= 1.0;
1981  }
1982  }
1983 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
kaldi::int32 int32
const T * Data() const
Get raw pointer.
Definition: cu-array.h:52
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
int32 MatrixIndexT
Definition: matrix-common.h:98
double Log(double x)
Definition: kaldi-math.h:100
#define CU2DBLOCK
Definition: cu-matrixdim.h:61
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:49
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ Dim()

::MatrixDim Dim ( ) const
inline

◆ DivElements()

void DivElements ( const CuMatrixBase< Real > &  A)

Divide two matrices elementwise: C = A ./ A.

Definition at line 691 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), CuVectorBase< float >::DivElements(), kaldi::UnitTestCuMatrixDivElements(), and kaldi::UnitTestCuMatrixSetMatMatDivMat().

691  {
692  #if HAVE_CUDA == 1
693  if (CuDevice::Instantiate().Enabled()) {
694  CuTimer tim;
695 
696  KALDI_ASSERT(num_cols_ == A.NumCols());
697  KALDI_ASSERT(num_rows_ == A.NumRows());
698 
699  dim3 dimGrid, dimBlock;
700  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
701  &dimGrid, &dimBlock);
702 
703  cuda_div_elements(dimGrid, dimBlock, data_, A.data_, Dim(), A.Stride());
704  CU_SAFE_CALL(cudaGetLastError());
705 
706  CuDevice::Instantiate().AccuProfile(__func__, tim);
707  } else
708  #endif
709  {
710  Mat().DivElements(A.Mat());
711  }
712 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ DivRowsVec()

void DivRowsVec ( const CuVectorBase< Real > &  div)

divide i'th row by scale[i]

Definition at line 899 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), StatisticsPoolingComponent::Backprop(), StatisticsPoolingComponent::Propagate(), kaldi::TestCuMatrixDivRowsVec(), and kaldi::UnitTestCuMatrixDivRowsVec().

899  {
900 #if HAVE_CUDA == 1
901  if (CuDevice::Instantiate().Enabled()) {
902  CuTimer tim;
903 
904  KALDI_ASSERT(div.Dim() == NumRows());
905 
906  dim3 dimGrid, dimBlock;
907  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
908  &dimGrid, &dimBlock);
909  // For large matrix we do more work per thread by limiting the
910  // the grid size to reduce the block launching overhead.
911  if (dimGrid.x * dimGrid.y > 1024) {
912  dimGrid.x = 1024 / dimGrid.y;
913  if (dimGrid.x == 0) {
914  dimGrid.x = 1;
915  }
916  }
917  cuda_div_rows_vec(dimGrid, dimBlock, data_, div.data_, Dim());
918  CU_SAFE_CALL(cudaGetLastError());
919 
920  CuDevice::Instantiate().AccuProfile(__func__, tim);
921  } else
922 #endif
923  {
924  Vector<Real> temp(div.Vec()); // will copy.
925  temp.InvertElements();
926  Mat().MulRowsVec(temp);
927  }
928 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ EqualElementMask()

void EqualElementMask ( const CuMatrixBase< Real > &  mat,
CuMatrix< Real > *  mask 
) const

Definition at line 3429 of file cu-matrix.cc.

Referenced by MaxpoolingComponent::Backprop(), MaxPoolingComponent::BackpropagateFnc(), and CuMatrixBase< float >::operator()().

3429  {
3430  // Check the inputs:
3431  KALDI_ASSERT(mat.NumRows() == NumRows() && mat.NumCols() == NumCols());
3432  KALDI_ASSERT(mask != NULL);
3433  // Resizes the output matrix:
3434  mask->Resize(NumRows(), NumCols(), kSetZero);
3435 
3436 #if HAVE_CUDA == 1
3437  if (CuDevice::Instantiate().Enabled()) {
3438  CuTimer tim;
3439  dim3 dimGrid, dimBlock;
3440  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
3441  &dimGrid, &dimBlock);
3442  cuda_equal_element_mask(dimGrid, dimBlock, this->data_, mat.Data(),
3443  mask->Data(), this->Dim(), mat.Stride(),
3444  mask->Stride());
3445  CU_SAFE_CALL(cudaGetLastError());
3446 
3447  CuDevice::Instantiate().AccuProfile(__func__, tim);
3448  } else
3449 #endif
3450  {
3451  for (int32 r = 0; r < NumRows(); r++) {
3452  for (int32 c = 0; c < NumCols(); c++) {
3453  (*mask)(r,c) = ((*this)(r,c) == mat(r,c) ? 1.0 : 0.0);
3454  }
3455  }
3456  }
3457 }
kaldi::int32 int32
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ Exp()

void Exp ( const CuMatrixBase< Real > &  src)

Definition at line 2456 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyExp(), and CuMatrixBase< float >::SizeInBytes().

2456  {
2457  KALDI_ASSERT(SameDim(*this, src));
2458 #if HAVE_CUDA == 1
2459  if (CuDevice::Instantiate().Enabled()) {
2460  CuTimer tim;
2461  dim3 dimGrid, dimBlock;
2462  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2463  &dimGrid, &dimBlock);
2464  cuda_exp(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
2465  src.Stride());
2466  CU_SAFE_CALL(cudaGetLastError());
2467 
2468  CuDevice::Instantiate().AccuProfile(__func__, tim);
2469  } else
2470  #endif
2471  {
2472  Mat().Exp(src.Mat());
2473  }
2474 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ ExpLimited()

void ExpLimited ( const CuMatrixBase< Real > &  src,
Real  lower_limit,
Real  upper_limit 
)

This is equivalent to running: Floor(src, lower_limit); Ceiling(src, upper_limit); Exp(src)

Definition at line 2541 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyExpLimited(), and CuMatrixBase< float >::SizeInBytes().

2541  {
2542  KALDI_ASSERT(SameDim(*this, src));
2543  KALDI_ASSERT(upper_limit > lower_limit);
2544 #if HAVE_CUDA == 1
2545  if (CuDevice::Instantiate().Enabled()) {
2546  CuTimer tim;
2547  dim3 dimGrid, dimBlock;
2548  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2549  &dimGrid, &dimBlock);
2550  cuda_exp_limited(dimGrid, dimBlock, this->data_, src.data_, lower_limit, upper_limit,
2551  this->Dim(), src.Stride());
2552  CU_SAFE_CALL(cudaGetLastError());
2553  CuDevice::Instantiate().AccuProfile(__func__, tim);
2554  } else
2555 #endif
2556  {
2557  Mat().ExpLimited(src.Mat(), lower_limit, upper_limit);
2558  }
2559 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ ExpSpecial()

void ExpSpecial ( const CuMatrixBase< Real > &  src)

For each element x of the matrix, set it to (x < 0 ? exp(x) : x + 1).

This function is used in our RNNLM training.

Definition at line 2563 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyExpSpecial(), and CuMatrixBase< float >::SizeInBytes().

2563  {
2564  KALDI_ASSERT(SameDim(*this, src));
2565 #if HAVE_CUDA == 1
2566  if (CuDevice::Instantiate().Enabled()) {
2567  CuTimer tim;
2568  dim3 dimGrid, dimBlock;
2569  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2570  &dimGrid, &dimBlock);
2571  cuda_exp_special(dimGrid, dimBlock, this->data_, src.data_, Dim(), src.Stride());
2572  CU_SAFE_CALL(cudaGetLastError());
2573  CuDevice::Instantiate().AccuProfile(__func__, tim);
2574  } else
2575 #endif
2576  {
2577  Mat().ExpSpecial(src.Mat());
2578  }
2579 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ FindRowMaxId()

void FindRowMaxId ( CuArray< int32 > *  id) const

Find the id of the maximal element for each row (resizes the 'id' array to the appropriate size).

Definition at line 1829 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), kaldi::nnet3::ComputeAccuracy(), NnetUpdater::ComputeTotAccuracy(), Xent::Eval(), kaldi::TestCuFindRowMaxId(), and kaldi::UnitTestCuFindRowMaxId().

1829  {
1830 #if HAVE_CUDA == 1
1831  if (CuDevice::Instantiate().Enabled()) {
1832  CuTimer tim;
1833  id->Resize(num_rows_);
1834  MatrixDim d = Dim();
1835 
1836  // CUDA thread layout: one thread block per matrix-row.
1837  dim3 dimBlock(CU1DBLOCK);
1838  dim3 dimGrid(num_rows_);
1839  cuda_find_row_max_id(dimGrid, dimBlock, data_, NULL, id->Data(), d);
1840  CU_SAFE_CALL(cudaGetLastError());
1841 
1842  // now we have the indices!
1843  CuDevice::Instantiate().AccuProfile(__func__, tim);
1844  } else
1845 #endif
1846  {
1847  // allocate index buffer
1848  id->Resize(num_rows_);
1849  id->Set(-1);
1850  // find maxima
1851  MatrixIndexT num_rows = num_rows_, num_cols = num_cols_;
1852  for (MatrixIndexT r = 0; r < num_rows; r++) {
1853  Real max = -1e21;
1854  int32 max_id = -1;
1855  const Real *row_data = Mat().RowData(r);
1856  for (MatrixIndexT c = 0; c < num_cols; c++) {
1857  if (max < row_data[c]) {
1858  max = row_data[c];
1859  max_id = c;
1860  }
1861  }
1862  id->Data()[r] = max_id;
1863  }
1864  }
1865 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Structure containing size of the matrix plus stride.
Definition: cu-matrixdim.h:46
kaldi::int32 int32
const T * Data() const
Get raw pointer.
Definition: cu-array.h:52
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
int32 MatrixIndexT
Definition: matrix-common.h:98
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ Floor()

void Floor ( const CuMatrixBase< Real > &  src,
Real  floor_val 
)

Definition at line 2582 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyFloor(), and CuMatrixBase< float >::SizeInBytes().

2582  {
2583  KALDI_ASSERT(SameDim(*this, src));
2584 #if HAVE_CUDA == 1
2585  if (CuDevice::Instantiate().Enabled()) {
2586  CuTimer tim;
2587  dim3 dimGrid, dimBlock;
2588  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2589  &dimGrid, &dimBlock);
2590  cuda_floor(dimGrid, dimBlock, data_, src.data_, floor_val, this->Dim(), src.Stride());
2591  CU_SAFE_CALL(cudaGetLastError());
2592  CuDevice::Instantiate().AccuProfile(__func__, tim);
2593  } else
2594 #endif
2595  {
2596  Mat().Floor(src.Mat(), floor_val);
2597  }
2598 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ FrobeniusNorm()

Real FrobeniusNorm ( ) const
inline

Definition at line 226 of file cu-matrix.h.

Referenced by CuMatrixBase< float >::ApproxEqual(), kaldi::nnet3::ConstrainOrthonormalInternal(), and kaldi::UnitTestCuSparseMatrixFrobeniusNorm().

226 { return sqrt(TraceMatMat(*this, *this, kTrans)); }
friend Real TraceMatMat(const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, MatrixTransposeType trans)
Definition: cu-matrix.cc:2145

◆ GroupMax()

void GroupMax ( const CuMatrixBase< Real > &  src)

Apply the function y(i) = (max_{j = i*G}^{(i+1)*G-1} x_j where G = x.NumCols() / y.NumCols() must be an integer.

[note: y corresponds to *this and x to src, so src.NumCols() / this->NumCols() must be an integer.

Definition at line 1617 of file cu-matrix.cc.

Referenced by MaxoutComponent::Propagate(), CuMatrixBase< float >::SizeInBytes(), kaldi::TestCuMatrixGroupMax(), kaldi::TestCuMatrixGroupMaxAllGroupSizes(), and kaldi::UnitTestCuMatrixGroupMax().

1617  {
1618  int group_size = src.NumCols() / this->NumCols();
1619  KALDI_ASSERT(src.NumCols() == this->NumCols() * group_size &&
1620  this->NumRows() == src.NumRows());
1621 #if HAVE_CUDA == 1
1622  if (CuDevice::Instantiate().Enabled()) {
1623  CuTimer tim;
1624  // One thread block per row.
1625  // Use 2D block for small group size to simplify the calculation.
1626  // Each group is reduced by threads_per_group threads.
1627  // threads_per_group should be a power of 2 for fast tree reduction.
1628  // group size: 1 2 3 4 5 6 7 .. 12 13 .. 24 25 .. 48 ...
1629  // threads_per_group: 1 1 1 2 2 2 4 .. 4 8 .. 8 16 .. 16 ...
1630  int threads_per_group = CU1DBLOCK;
1631  while (threads_per_group * 3 / 2 >= group_size) {
1632  threads_per_group >>= 1;
1633  }
1634  if (group_size == 1) {
1635  threads_per_group = 1;
1636  }
1637  dim3 dimBlock(threads_per_group, CU1DBLOCK / threads_per_group);
1638  dim3 dimGrid(NumRows());
1639  cuda_group_max(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
1640  src.Stride(), group_size);
1641  CU_SAFE_CALL(cudaGetLastError());
1642  CuDevice::Instantiate().AccuProfile(__func__, tim);
1643  } else
1644 #endif
1645  {
1646  Mat().GroupMax(src.Mat());
1647  }
1648 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ GroupMaxDeriv()

void GroupMaxDeriv ( const CuMatrixBase< Real > &  input,
const CuMatrixBase< Real > &  output 
)

Calculate derivatives for the GroupMax function above, where "input" is the input to the GroupMax function above (i.e.

the "src" variable), and "output" is the result of the computation (i.e. the "this" of that function call), and *this must have the same dimension as "input". Each element of *this will be set to 1 if the corresponding input equals the output of the group, and 0 otherwise. The equals the function derivative where it is defined (it's not defined where multiple inputs in the group are equal to the output).

Definition at line 874 of file cu-matrix.cc.

Referenced by MaxoutComponent::Backprop(), CuMatrixBase< float >::SizeInBytes(), kaldi::TestCuMatrixGroupMaxDeriv(), and kaldi::UnitTestCuMatrixGroupMaxDeriv().

875  {
876  KALDI_ASSERT(src2.NumCols() > 0);
877  int group_size = this->NumCols() / src2.NumCols();
878  KALDI_ASSERT(this->NumCols() == src2.NumCols() * group_size);
879 #if HAVE_CUDA == 1
880  if (CuDevice::Instantiate().Enabled()) {
881  CuTimer tim;
882  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
883  dim3 dimGrid(n_blocks(NumCols(), CU2DBLOCK),
884  n_blocks(NumRows(), CU2DBLOCK));
885  cuda_calc_group_max_deriv(dimGrid, dimBlock, this->data_, src1.Data(),
886  src2.Data(), Dim(), src1.Stride(), src2.Stride(),
887  group_size);
888  CU_SAFE_CALL(cudaGetLastError());
889 
890  CuDevice::Instantiate().AccuProfile(__func__, tim);
891  } else
892 #endif
893  {
894  Mat().GroupMaxDeriv(src1.Mat(), src2.Mat());
895  }
896 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
#define CU2DBLOCK
Definition: cu-matrixdim.h:61
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ GroupPnorm()

void GroupPnorm ( const CuMatrixBase< Real > &  src,
Real  pow 
)

Apply the function y(i) = (sum_{j = i*G}^{(i+1)*G-1} x_j ^ (power)) ^ (1 / p) where G = x.NumCols() / y.NumCols() must be an integer.

[note: y corresponds to *this and x to src, so src.NumCols() / this->NumCols() must be an integer.

Definition at line 1576 of file cu-matrix.cc.

Referenced by PnormComponent::Propagate(), CuMatrixBase< float >::SizeInBytes(), kaldi::TestCuMatrixDiffGroupPnorm(), kaldi::TestCuMatrixGroupPnorm(), and kaldi::UnitTestCuMatrixGroupPnorm().

1576  {
1577  int group_size = src.NumCols() / this->NumCols();
1578  KALDI_ASSERT(src.NumCols() == this->NumCols() * group_size &&
1579  this->NumRows() == src.NumRows());
1580 #if HAVE_CUDA == 1
1581  if (CuDevice::Instantiate().Enabled()) {
1582  CuTimer tim;
1583  if (power == Real(0) || power == Real(1) || power == Real(2)
1584  || power == std::numeric_limits<Real>::infinity()) {
1585  // One thread block per row.
1586  // Use 2D block for small group size to simplify the calculation
1587  // Each group is reduced by threads_per_group threads.
1588  // threads_per_group should be a power of 2 for fast tree reduction.
1589  int threads_per_group = CU1DBLOCK;
1590  while (threads_per_group * 3 / 2 >= group_size) {
1591  threads_per_group >>= 1;
1592  }
1593  if (group_size == 1) {
1594  threads_per_group = 1;
1595  }
1596  dim3 dimBlock(threads_per_group, CU1DBLOCK / threads_per_group);
1597  dim3 dimGrid(NumRows());
1598  cuda_group_spec_pnorm(dimGrid, dimBlock, this->data_, src.data_,
1599  this->Dim(), src.Stride(), group_size, power);
1600  } else {
1601  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
1602  dim3 dimGrid(n_blocks(NumCols(), CU2DBLOCK),
1603  n_blocks(NumRows(), CU2DBLOCK));
1604  cuda_group_pnorm(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
1605  src.Stride(), group_size, power);
1606  }
1607  CU_SAFE_CALL(cudaGetLastError());
1608  CuDevice::Instantiate().AccuProfile(__func__, tim);
1609  } else
1610 #endif
1611  {
1612  Mat().GroupPnorm(src.Mat(), power);
1613  }
1614 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
#define CU2DBLOCK
Definition: cu-matrixdim.h:61
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ Heaviside()

void Heaviside ( const CuMatrixBase< Real > &  src)

Set each element to the Heaviside function of the corresponding element of "src", which we define as the function (x > 0 ? 1.0 : 0.0) [note: in general, there are different ways to deal with the situation when x==0.

]

Definition at line 2435 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyHeaviside(), RectifiedLinearComponent::Backprop(), CuRand< float >::BinarizeProbs(), kaldi::CuCompressedMatrixTestSign(), CuMatrixBase< float >::SizeInBytes(), RectifiedLinearComponent::StoreStats(), and kaldi::UnitTestCuMatrixHeaviside().

2435  {
2436  KALDI_ASSERT(SameDim(*this, src));
2437 #if HAVE_CUDA == 1
2438  if (CuDevice::Instantiate().Enabled()) {
2439  CuTimer tim;
2440  dim3 dimGrid, dimBlock;
2441  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2442  &dimGrid, &dimBlock);
2443  cuda_heaviside(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
2444  src.Stride());
2445  CU_SAFE_CALL(cudaGetLastError());
2446 
2447  CuDevice::Instantiate().AccuProfile(__func__, tim);
2448  } else
2449  #endif
2450  {
2451  Mat().Heaviside(src.Mat());
2452  }
2453 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ InvertElements()

void InvertElements ( )

invert the matrix by elements.

Definition at line 932 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), kaldi::TestCuMatrixCompObjfAndDeriv(), NnetEnsembleTrainer::TrainOneMinibatch(), kaldi::UnitTestCuMatrixInvertElements(), and kaldi::UnitTestCuMatrixObjfDeriv().

932  {
933 #if HAVE_CUDA == 1
934  if (CuDevice::Instantiate().Enabled()) {
935  CuTimer tim;
936 
937  dim3 dimGrid, dimBlock;
938  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
939  &dimGrid, &dimBlock);
940 
941  cuda_invert_elements(dimGrid, dimBlock, data_, Dim());
942  CU_SAFE_CALL(cudaGetLastError());
943 
944  CuDevice::Instantiate().AccuProfile(__func__, tim);
945  } else
946 #endif
947  {
948  Mat().InvertElements();
949  }
950 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ IsUnit()

bool IsUnit ( Real  tol = 0.001) const

Definition at line 629 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::FrobeniusNorm(), OnlinePreconditioner::InitOrthonormalSpecial(), kaldi::UnitTestCuMatrixSymInvertPosDef(), and kaldi::UnitTestCuSpMatrixInvert().

629  {
630  // want to return:
631  //FrobeniusNorm(*this - I) <= tol * NumRows(), i.e.:
632  //sqrt (trace((*this - I)(*this-I)) <= tol * NumRows()
633  // trace((*this - I)(*this - I)) <= tol * NumRows()
634  // trace(*this * *this) + trace(I) - 2 * trace(*this) <= tol * NumRows()
635  // trace(*this * *this) + dim - 2*this.Trace() <= tol * NumRows()
636  KALDI_ASSERT(this->NumRows() == this->NumCols());
637  return (TraceMatMat(*this, *this, kTrans) + this->NumRows() - 2.0 * this->Trace() <=
638  tol * this->NumRows());
639 }
Real Trace(bool check_square=true) const
Return the trace. If check_square = true, will crash if matrix is not square.
Definition: cu-matrix.cc:3075
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
friend Real TraceMatMat(const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, MatrixTransposeType trans)
Definition: cu-matrix.cc:2145
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ KALDI_DISALLOW_COPY_AND_ASSIGN()

KALDI_DISALLOW_COPY_AND_ASSIGN ( CuMatrixBase< Real >  )
private

◆ Log()

void Log ( const CuMatrixBase< Real > &  src)

Definition at line 2477 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), and CuMatrixBase< float >::SizeInBytes().

2477  {
2478  KALDI_ASSERT(SameDim(*this, src));
2479 #if HAVE_CUDA == 1
2480  if (CuDevice::Instantiate().Enabled()) {
2481  if (num_rows_ == 0) return;
2482  CuTimer tim;
2483  dim3 dimGrid, dimBlock;
2484  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2485  &dimGrid, &dimBlock);
2486 
2487  cuda_log(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
2488  src.Stride());
2489  CU_SAFE_CALL(cudaGetLastError());
2490 
2491  CuDevice::Instantiate().AccuProfile(__func__, tim);
2492  } else
2493  #endif
2494  {
2495  Mat().Log(src.Mat());
2496  }
2497 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ LogSoftMaxPerRow()

void LogSoftMaxPerRow ( const CuMatrixBase< Real > &  src)

LogSoftmax nonlinearity Y = LogSoftmax(X) : Yij = Xij - log(sum_k(e^Xik)), done to each row, with attention to avoiding overflow or underflow.

Supports in-place operation (i.e. this == &src).

Definition at line 1740 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLogSoftMaxPerRow(), LogSoftmaxComponent::Propagate(), CuMatrixBase< float >::SizeInBytes(), kaldi::TestCuMatrixLogSoftmax(), and kaldi::UnitTestCuLogSoftmax().

1740  {
1741  KALDI_ASSERT(SameDim(*this, src));
1742 #if HAVE_CUDA == 1
1743  if (CuDevice::Instantiate().Enabled()) {
1744  CuTimer tim;
1745  size_t dimBlock = CU1DBLOCK;
1746  size_t dimGrid = src.num_rows_;
1747  cuda_log_softmax_reduce(dimGrid, dimBlock,
1748  data_, src.data_, Dim(), src.Stride());
1749  CU_SAFE_CALL(cudaGetLastError());
1750 
1751  CuDevice::Instantiate().AccuProfile(__func__, tim);
1752  } else
1753 #endif
1754  {
1755  MatrixBase<Real> &mat(this->Mat());
1756  mat.CopyFromMat(src.Mat());
1757  for(MatrixIndexT r = 0; r < mat.NumRows(); r++) {
1758  mat.Row(r).ApplyLogSoftMax();
1759  }
1760  }
1761 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
int32 MatrixIndexT
Definition: matrix-common.h:98
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
::MatrixDim Dim() const
Definition: cu-matrix.h:221

◆ Lookup() [1/2]

void Lookup ( const std::vector< Int32Pair > &  indexes,
Real *  output 
) const

Definition at line 3370 of file cu-matrix.cc.

Referenced by NnetDiscriminativeUpdater::LatticeComputations(), CuMatrixBase< float >::operator()(), kaldi::TestCuMatrixLookup(), and kaldi::UnitTestCuMatrixLookup().

3371  {
3372  // Checks the dimension.
3373  MatrixIndexT num_rows = this->num_rows_, num_cols = this->num_cols_;
3374  for (int32 i = 0; i < indices.size(); ++i) {
3375  KALDI_ASSERT(indices[i].first < num_rows && indices[i].first >= 0 &&
3376  indices[i].second < num_cols && indices[i].second >= 0);
3377  }
3378  if (indices.size() == 0) return;
3379  KALDI_ASSERT(output != NULL);
3380 
3381 #if HAVE_CUDA == 1
3382  if (CuDevice::Instantiate().Enabled()) {
3383  CuArray<Int32Pair> cuda_indices(indices);
3384  Lookup(cuda_indices, output);
3385  } else
3386 #endif
3387  {
3388  for (int32 i = 0; i < indices.size(); i++) {
3389  output[i] = (*this)(indices[i].first, indices[i].second);
3390  }
3391  }
3392 }
kaldi::int32 int32
void Lookup(const std::vector< Int32Pair > &indexes, Real *output) const
Definition: cu-matrix.cc:3370
int32 MatrixIndexT
Definition: matrix-common.h:98
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ Lookup() [2/2]

void Lookup ( const CuArrayBase< Int32Pair > &  indexes,
Real *  output 
) const

Definition at line 3395 of file cu-matrix.cc.

3396  {
3397  int32 num_elements = indices.Dim();
3398  if (num_elements == 0) return;
3399  KALDI_ASSERT(output != NULL);
3400 
3401 #if HAVE_CUDA == 1
3402  if (CuDevice::Instantiate().Enabled()) {
3403  CuArray<Real> cuda_output(num_elements);
3404  CuTimer tim;
3405  dim3 dimBlock(CU1DBLOCK, 1);
3406  dim3 dimGrid(n_blocks(num_elements, CU1DBLOCK), 1);
3407 
3408  cuda_matrix_lookup(dimGrid, dimBlock, this->data_, this->Dim(),
3409  indices.Data(), num_elements, cuda_output.Data());
3410  CU_SAFE_CALL(cudaGetLastError());
3411 
3412  cuda_output.CopyToHost(output);
3413  CuDevice::Instantiate().AccuProfile(__func__, tim);
3414  } else
3415 #endif
3416  {
3417  MatrixIndexT num_rows = this->num_rows_, num_cols = this->num_cols_;
3418  const Int32Pair *index = indices.Data();
3419  for (int32 i = 0; i < num_elements; i++) {
3420  KALDI_ASSERT(index[i].first < num_rows && index[i].first >= 0 &&
3421  index[i].second < num_cols && index[i].second >= 0);
3422  output[i] = (*this)(index[i].first, index[i].second);
3423  }
3424  }
3425 }
kaldi::int32 int32
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
int32 MatrixIndexT
Definition: matrix-common.h:98
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
int32_cuda second
Definition: cu-matrixdim.h:80
int32_cuda first
Definition: cu-matrixdim.h:79
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ Mat() [1/2]

const MatrixBase<Real>& Mat ( ) const
inline

Definition at line 755 of file cu-matrix.h.

Referenced by CuMatrixBase< float >::AddCols(), CuVectorBase< float >::AddColSumMat(), CuVectorBase< float >::AddDiagMat2(), CuVectorBase< float >::AddDiagMatMat(), CuMatrixBase< float >::AddDiagVecMat(), CuMatrixBase< float >::AddMat(), CuSpMatrix< Real >::AddMat2(), CuMatrixBase< float >::AddMatBlocks(), CuMatrixBase< float >::AddMatDiagVec(), CuMatrixBase< float >::AddMatMat(), CuMatrixBase< float >::AddMatMatElements(), CuMatrixBase< float >::AddMatSmat(), CuVectorBase< float >::AddMatVec(), CuMatrixBase< float >::AddRows(), CuVectorBase< float >::AddRowSumMat(), CuMatrixBase< float >::AddSmatMat(), GeneralMatrix::AddToMat(), CuMatrixBase< float >::AddToRows(), kaldi::cu::BackpropLstmNonlinearity(), CuMatrixBase< float >::Ceiling(), kaldi::cu::ComputeLstmNonlinearity(), kaldi::cu::Copy(), CuVectorBase< float >::CopyColFromMat(), CuMatrixBase< float >::CopyCols(), CuVectorBase< float >::CopyElements(), CuTpMatrix< Real >::CopyFromMat(), CuSpMatrix< Real >::CopyFromMat(), CuMatrixBase< float >::CopyFromMat(), CuMatrixBase< float >::CopyRows(), CuVectorBase< float >::CopyRowsFromMat(), VectorBase< float >::CopyRowsFromMat(), CuSparseMatrix< Real >::CopyToMat(), GeneralMatrix::CopyToMat(), CuMatrixBase< float >::DiffGroupPnorm(), CuMatrixBase< float >::DiffParametricRelu(), CuMatrixBase< float >::DiffSigmoid(), CuMatrixBase< float >::DiffTanh(), CuMatrixBase< float >::DivElements(), CuMatrixBase< float >::Exp(), CuMatrixBase< float >::ExpLimited(), CuMatrixBase< float >::ExpSpecial(), CuMatrixBase< float >::Floor(), CuMatrixBase< float >::GroupMax(), CuMatrixBase< float >::GroupMaxDeriv(), CuMatrixBase< float >::GroupPnorm(), CuMatrixBase< float >::Heaviside(), CuMatrixBase< float >::Log(), CuMatrixBase< float >::LogSoftMaxPerRow(), CuMatrixBase< float >::Max(), CuMatrixBase< float >::Min(), CuMatrixBase< float >::MulElements(), CuMatrixBase< float >::MulRows(), CuMatrixBase< float >::MulRowsGroupMat(), CuMatrixBase< float >::ParametricRelu(), CuMatrixBase< float >::Pow(), CuMatrixBase< float >::PowAbs(), CuRand< float >::RandGaussian(), kaldi::cu::Randomize(), CuRand< float >::RandUniform(), kaldi::cu::RegularizeL1(), CuMatrixBase< float >::SetMatMatDivMat(), CuMatrixBase< float >::Sigmoid(), CuMatrixBase< float >::SoftHinge(), CuMatrixBase< float >::SoftMaxPerRow(), kaldi::cu::Splice(), CuMatrixBase< float >::SymAddMat2(), CuMatrixBase< float >::Tanh(), kaldi::TraceMatMat(), and kaldi::TraceMatSmat().

755  {
756  return *(reinterpret_cast<const MatrixBase<Real>* >(this));
757  }

◆ Mat() [2/2]

MatrixBase<Real>& Mat ( )
inline

Definition at line 758 of file cu-matrix.h.

758  {
759  return *(reinterpret_cast<MatrixBase<Real>* >(this));
760  }

◆ Max() [1/2]

void Max ( const CuMatrixBase< Real > &  A)

Do, elementwise, *this = max(*this, A).

Definition at line 715 of file cu-matrix.cc.

Referenced by kaldi::CuCompressedMatrixTestNonnegative(), kaldi::CuCompressedMatrixTestSymmetric(), main(), MaxpoolingComponent::Propagate(), SpliceMaxComponent::Propagate(), kaldi::TestCuMatrixMax(), kaldi::UnitTestCuMatrixMax(), and kaldi::UnitTestCuMatrixReduceMax().

715  {
716  #if HAVE_CUDA == 1
717  if (CuDevice::Instantiate().Enabled()) {
718  CuTimer tim;
719 
720  KALDI_ASSERT(num_cols_ == A.NumCols());
721  KALDI_ASSERT(num_rows_ == A.NumRows());
722 
723  dim3 dimGrid, dimBlock;
724  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
725  &dimGrid, &dimBlock);
726 
727  cuda_max(dimGrid, dimBlock, data_, A.data_, Dim(), A.Stride());
728  CU_SAFE_CALL(cudaGetLastError());
729 
730  CuDevice::Instantiate().AccuProfile(__func__, tim);
731  } else
732  #endif
733  {
734  Mat().Max(A.Mat());
735  }
736 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ Max() [2/2]

Real Max ( ) const

Definition at line 3033 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), and CuMatrixBase< float >::operator()().

3033  {
3034 #if HAVE_CUDA == 1
3035  if (CuDevice::Instantiate().Enabled()) {
3036  KALDI_ASSERT(num_rows_ > 0 && num_cols_ > 0);
3037  CuTimer tim;
3038 
3039  CuVector<Real> col_max(num_rows_, kUndefined);
3040  cuda_max_mat_cols(num_rows_, CU1DBLOCK, col_max.Data(), data_, Dim());
3041  Real ans = col_max.Max();
3042 
3043  CuDevice::Instantiate().AccuProfile(__func__, tim);
3044  return ans;
3045  } else
3046 #endif
3047  {
3048  return Mat().Max();
3049  }
3050 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ Min() [1/2]

void Min ( const CuMatrixBase< Real > &  A)

Do, elementwise, *this = min(*this, A).

Definition at line 740 of file cu-matrix.cc.

Referenced by kaldi::CuCompressedMatrixTestNonnegative(), kaldi::CuCompressedMatrixTestSymmetric(), main(), kaldi::TestCuMatrixMin(), kaldi::UnitTestCuMatrixMin(), and kaldi::UnitTestCuMatrixReduceMin().

740  {
741  #if HAVE_CUDA == 1
742  if (CuDevice::Instantiate().Enabled()) {
743  CuTimer tim;
744 
745  KALDI_ASSERT(num_cols_ == A.NumCols());
746  KALDI_ASSERT(num_rows_ == A.NumRows());
747 
748  dim3 dimGrid, dimBlock;
749  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
750  &dimGrid, &dimBlock);
751 
752  cuda_min(dimGrid, dimBlock, data_, A.data_, Dim(), A.Stride());
753  CU_SAFE_CALL(cudaGetLastError());
754 
755  CuDevice::Instantiate().AccuProfile(__func__, tim);
756  } else
757  #endif
758  {
759  Mat().Min(A.Mat());
760  }
761 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ Min() [2/2]

Real Min ( ) const

Definition at line 3054 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), and CuMatrixBase< float >::operator()().

3054  {
3055 #if HAVE_CUDA == 1
3056  if (CuDevice::Instantiate().Enabled()) {
3057  KALDI_ASSERT(num_rows_ > 0 && num_cols_ > 0);
3058  CuTimer tim;
3059 
3060  CuVector<Real> col_min(num_rows_, kUndefined);
3061  cuda_min_mat_cols(num_rows_, CU1DBLOCK, col_min.Data(), data_, Dim());
3062  Real ans = col_min.Min();
3063 
3064  CuDevice::Instantiate().AccuProfile(__func__, tim);
3065  return ans;
3066  } else
3067 #endif
3068  {
3069  return Mat().Min();
3070  }
3071 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ MulColsVec()

void MulColsVec ( const CuVectorBase< Real > &  scale)

scale i'th column by scale[i]

Definition at line 765 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), BatchNormComponent::Backprop(), FixedScaleComponent::Backprop(), PerElementScaleComponent::Backprop(), Rescale::BackpropagateFnc(), ScaleAndOffsetComponent::BackpropInternal(), LstmNonlinearityComponent::ConsolidateMemory(), ModelCollapser::PreMultiplyAffineParameters(), BatchNormComponent::Propagate(), FixedScaleComponent::Propagate(), PerElementScaleComponent::Propagate(), Rescale::PropagateFnc(), ScaleAndOffsetComponent::PropagateInternal(), kaldi::UnitTestCuMatrixAddMatDiagVec(), and kaldi::UnitTestCuMatrixMulColsVec().

765  {
766 #if HAVE_CUDA == 1
767  if (CuDevice::Instantiate().Enabled()) {
768  CuTimer tim;
769 
770  KALDI_ASSERT(scale.Dim() == NumCols());
771 
772 
773  dim3 dimGrid, dimBlock;
774  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
775  &dimGrid, &dimBlock);
776 
777  cuda_mul_cols_vec(dimGrid, dimBlock, data_, scale.data_, Dim());
778  CU_SAFE_CALL(cudaGetLastError());
779 
780 
781  CuDevice::Instantiate().AccuProfile(__func__, tim);
782  } else
783 #endif
784  {
785  Mat().MulColsVec(scale.Vec());
786  }
787 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ MulElements()

void MulElements ( const CuMatrixBase< Real > &  A)

Multiply two matrices elementwise: C = C .* A.

Definition at line 667 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), ElementwiseProductComponent::Backprop(), BackpropTruncationComponent::Backprop(), MaxpoolingComponent::Backprop(), SigmoidComponent::Backprop(), TanhComponent::Backprop(), PowerComponent::Backprop(), RectifiedLinearComponent::Backprop(), SoftHingeComponent::Backprop(), HiddenSoftmax::BackpropagateFnc(), Dropout::BackpropagateFnc(), ScaleAndOffsetComponent::BackpropInternal(), kaldi::nnet1::ComputeStdDev(), LstmNonlinearityComponent::ConsolidateMemory(), CuMatrixBase< float >::DiffSoftmaxPerRow(), Mse::Eval(), ElementwiseProductComponent::Propagate(), DropoutComponent::Propagate(), KlHmm::PropagateFnc(), LengthNormComponent::PropagateFnc(), Dropout::PropagateFnc(), ClipGradientComponent::RepairGradients(), NnetEnsembleTrainer::TrainOneMinibatch(), kaldi::UnitTestCuMatrixAddMatMatElements(), kaldi::UnitTestCuMatrixMulElements(), kaldi::nnet1::UnitTestLengthNorm(), AffineTransform::Update(), FramePoolingComponent::Update(), ConvolutionalComponent::Update(), Rescale::Update(), and NaturalGradientPerElementScaleComponent::Update().

667  {
668  #if HAVE_CUDA == 1
669  if (CuDevice::Instantiate().Enabled()) {
670  CuTimer tim;
671 
672  KALDI_ASSERT(num_cols_ == A.NumCols());
673  KALDI_ASSERT(num_rows_ == A.NumRows());
674 
675  dim3 dimGrid, dimBlock;
676  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
677  &dimGrid, &dimBlock);
678 
679  cuda_mul_elements(dimGrid, dimBlock, data_, A.data_, Dim(), A.Stride());
680  CU_SAFE_CALL(cudaGetLastError());
681 
682  CuDevice::Instantiate().AccuProfile(__func__, tim);
683  } else
684  #endif
685  {
686  Mat().MulElements(A.Mat());
687  }
688 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ MulRows()

void MulRows ( const CuMatrixBase< Real > &  src,
const CuArrayBase< MatrixIndexT > &  indexes 
)

Does for each row r, this.Row(r) *= alpha * src.row(indexes[r]), where '*=' is elementwise multiplication.

If indexes[r] < 0, does not add anything. src.NumCols() must equal this.NumCols()

Definition at line 2790 of file cu-matrix.cc.

Referenced by GeneralDropoutComponent::Backprop(), and GeneralDropoutComponent::Propagate().

2791  {
2792  if (NumRows() == 0) return;
2793  KALDI_ASSERT(static_cast<MatrixIndexT>(indexes.Dim()) == NumRows());
2794 #if HAVE_CUDA == 1
2795  if (CuDevice::Instantiate().Enabled()) {
2796  KALDI_ASSERT(src.NumCols() == NumCols());
2797  CuTimer tim;
2798  dim3 dimGrid, dimBlock;
2799  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2800  &dimGrid, &dimBlock);
2801  cuda_mul_rows(dimGrid, dimBlock,
2802  data_, src.Data(), indexes.Data(), Dim(), src.Stride());
2803  CU_SAFE_CALL(cudaGetLastError());
2804  CuDevice::Instantiate().AccuProfile(__func__, tim);
2805  } else
2806 #endif
2807  {
2808  MatrixBase<Real> &this_mat(Mat());
2809  const MatrixBase<Real> &src_mat(src.Mat());
2810  int32 num_rows = NumRows();
2811  const MatrixIndexT *index_ptr = indexes.Data();
2812  for (int32 r = 0; r < num_rows; r++) {
2813  int32 src_r = index_ptr[r];
2814  if (src_r < 0)
2815  continue;
2816  SubVector<Real> this_row(this_mat, r),
2817  src_row(src_mat, src_r);
2818  this_row.MulElements(src_row);
2819  }
2820  }
2821 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
kaldi::int32 int32
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
int32 MatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ MulRowsGroupMat()

void MulRowsGroupMat ( const CuMatrixBase< Real > &  src)

divide each row into src.NumCols() groups, and then scale i'th row's jth group of elements by src[i, j].

Definition at line 816 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), MaxoutComponent::Backprop(), and kaldi::UnitTestCuMatrixMulRowsGroupMat().

816  {
817  KALDI_ASSERT(src.NumCols() > 0);
818 #if HAVE_CUDA == 1
819  if (CuDevice::Instantiate().Enabled()) {
820  CuTimer tim;
821  int group_size = this->NumCols() / src.NumCols();
822 
823  dim3 dimGrid, dimBlock;
824  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
825  &dimGrid, &dimBlock);
826 
827  cuda_mul_rows_group_mat(dimGrid, dimBlock, this->data_, src.data_,
828  this->Dim(), src.Stride(), group_size);
829  CU_SAFE_CALL(cudaGetLastError());
830 
831  CuDevice::Instantiate().AccuProfile(__func__, tim);
832  } else
833 #endif
834  {
835  Mat().MulRowsGroupMat(src.Mat());
836  }
837 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ MulRowsVec()

void MulRowsVec ( const CuVectorBase< Real > &  scale)

scale i'th row by scale[i]

Definition at line 792 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), BackpropTruncationComponent::Backprop(), SpecAugmentTimeMaskComponent::Backprop(), ClipGradientComponent::Backprop(), BlockSoftmax::BackpropagateFnc(), LengthNormComponent::BackpropagateFnc(), CuMatrixBase< float >::DiffLogSoftmaxPerRow(), kaldi::cu::DiffNormalizePerRow(), Xent::Eval(), Mse::Eval(), kaldi::nnet2::PreconditionDirections(), NnetChainTrainer::ProcessOutputs(), NnetDiscriminativeTrainer::ProcessOutputs(), SpecAugmentTimeMaskComponent::Propagate(), LengthNormComponent::PropagateFnc(), OnlineNaturalGradient::ReorthogonalizeRt1(), OnlinePreconditioner::ReorthogonalizeXt1(), kaldi::UnitTestCuMatrixMulRowsVec(), and kaldi::nnet3::time_height_convolution::ZeroBlankRows().

792  {
793  #if HAVE_CUDA == 1
794  if (CuDevice::Instantiate().Enabled()) {
795  CuTimer tim;
796 
797  KALDI_ASSERT(scale.Dim() == NumRows());
798 
799  dim3 dimGrid, dimBlock;
800  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
801  &dimGrid, &dimBlock);
802 
803  cuda_mul_rows_vec(dimGrid, dimBlock, data_, scale.data_, Dim());
804  CU_SAFE_CALL(cudaGetLastError());
805 
806 
807  CuDevice::Instantiate().AccuProfile(__func__, tim);
808  } else
809  #endif
810  {
811  Mat().MulRowsVec(scale.Vec());
812  }
813 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ NumCols()

MatrixIndexT NumCols ( ) const
inline

Definition at line 216 of file cu-matrix.h.

Referenced by NnetComputer::AcceptInput(), NnetLdaStatsAccumulator::AccStatsFromOutput(), RestrictedAttentionComponent::Add(), MatrixRandomizer::AddData(), CuVectorBase< float >::AddDiagMat2(), CuVectorBase< float >::AddDiagMatMat(), CuMatrixBase< float >::AddDiagVecMat(), CuRand< float >::AddGaussNoise(), CuMatrixBase< float >::AddMat(), CuSpMatrix< Real >::AddMat2(), CuMatrixBase< float >::AddMatBlock(), CuMatrixBase< float >::AddMatBlocks(), CuMatrixBase< float >::AddMatDiagVec(), CuBlockMatrix< Real >::AddMatMat(), CuMatrixBase< float >::AddMatMat(), kaldi::AddMatMatBatched(), CuMatrixBase< float >::AddMatSmat(), CuVectorBase< float >::AddMatVec(), CuMatrixBase< float >::AddRowRanges(), CuMatrixBase< float >::AddRows(), CuVectorBase< float >::AddRowSumMat(), CuMatrixBase< float >::AddSmatMat(), CuMatrixBase< float >::AddToRows(), kaldi::nnet3::attention::ApplyScalesToInput(), kaldi::nnet3::attention::ApplyScalesToInputSimple(), kaldi::nnet3::attention::ApplyScalesToOutput(), kaldi::nnet3::attention::ApplyScalesToOutputSimple(), kaldi::nnet3::attention::AttentionBackward(), kaldi::nnet3::attention::AttentionForward(), NnetUpdater::Backprop(), ConvolutionComponent::Backprop(), BatchNormComponent::Backprop(), StatisticsExtractionComponent::Backprop(), MaxoutComponent::Backprop(), MaxpoolingComponent::Backprop(), PnormComponent::Backprop(), NormalizeComponent::Backprop(), RepeatedAffineComponent::Backprop(), SigmoidComponent::Backprop(), TanhComponent::Backprop(), PowerComponent::Backprop(), RectifiedLinearComponent::Backprop(), SoftHingeComponent::Backprop(), ScaleComponent::Backprop(), SoftmaxComponent::Backprop(), LogSoftmaxComponent::Backprop(), GeneralDropoutComponent::Backprop(), SpliceComponent::Backprop(), SpliceMaxComponent::Backprop(), BlockAffineComponent::Backprop(), PermuteComponent::Backprop(), DctComponent::Backprop(), DropoutComponent::Backprop(), PerElementOffsetComponent::Backprop(), Convolutional1dComponent::Backprop(), ScaleAndOffsetComponent::Backprop(), CompositeComponent::Backprop(), Component::Backpropagate(), Splice::BackpropagateFnc(), SentenceAveragingComponent::BackpropagateFnc(), kaldi::cu::BackpropLstmNonlinearity(), RestrictedAttentionComponent::BackpropOneHead(), CuRand< float >::BinarizeProbs(), ChunkInfo::CheckSize(), ModelCollapser::CollapseComponentsAffine(), NnetComputerFromEg::Compute(), NnetOnlineComputer::Compute(), kaldi::nnet3::ComputeAccuracy(), NnetComputer::ComputeLastLayerDeriv(), kaldi::cu::ComputeLstmNonlinearity(), kaldi::nnet3::ComputeObjectiveFunction(), kaldi::nnet1::ComputeStdDev(), OnlinePreconditioner::ComputeWt1(), OnlineNaturalGradient::ComputeWt1(), LstmNonlinearityComponent::ConsolidateMemory(), kaldi::nnet3::ConstrainOrthonormal(), kaldi::nnet3::ConstrainOrthonormalInternal(), ConvolutionComponent::ConvolutionComponent(), kaldi::nnet3::time_height_convolution::ConvolveBackwardData(), kaldi::nnet3::time_height_convolution::ConvolveBackwardDataInternal(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParams(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParamsInternal(), kaldi::nnet3::time_height_convolution::ConvolveForward(), kaldi::nnet3::time_height_convolution::ConvolveForwardInternal(), kaldi::cu::Copy(), CuVectorBase< float >::CopyColFromMat(), CuVectorBase< float >::CopyDiagFromMat(), CuVectorBase< float >::CopyElements(), CuMatrixBase< float >::CopyFromBlock(), CuTpMatrix< Real >::CopyFromMat(), CuCompressedMatrix< I >::CopyFromMat(), CuSpMatrix< Real >::CopyFromMat(), CuBlockMatrix< Real >::CopyFromMat(), CuMatrixBase< float >::CopyFromMat(), CuMatrixBase< float >::CopyRows(), CuVectorBase< float >::CopyRowsFromMat(), VectorBase< float >::CopyRowsFromMat(), CuSparseMatrix< Real >::CopyToMat(), CuCompressedMatrix< I >::CopyToMat(), CuMatrix< float >::CuMatrix(), kaldi::CuRandGaussianMatrixBaseSpeedTest(), kaldi::CuRandGaussianMatrixSpeedTest(), kaldi::CuRandUniformMatrixBaseSpeedTest(), kaldi::CuRandUniformMatrixSpeedTest(), CuSubVector< Real >::CuSubVector(), CuTpMatrix< Real >::CuTpMatrix(), CuMatrixBase< float >::DiffGroupPnorm(), kaldi::cu::DiffNormalizePerRow(), CuMatrixBase< float >::DivElements(), kaldi::cu::EnsureNonzero(), CuMatrixBase< float >::EqualElementMask(), Xent::Eval(), Mse::Eval(), MultiTaskLoss::Eval(), NnetComputer::ExecuteCommand(), NnetBatchComputer::FormatOutputs(), kaldi::nnet3::attention::GetAttentionDotProducts(), kaldi::nnet3::attention::GetAttentionDotProductsSimple(), TdnnComponent::GetInputPart(), NnetComputer::GetPointers(), CuMatrixBase< float >::GroupMax(), CuMatrixBase< float >::GroupMaxDeriv(), CuMatrixBase< float >::GroupPnorm(), ConvolutionComponent::InderivPatchesToInderiv(), MaxpoolingComponent::InderivPatchesToInderiv(), RestrictedAttentionComponent::Info(), ConvolutionComponent::Init(), OnlinePreconditioner::Init(), OnlineNaturalGradient::Init(), AffineComponent::Init(), AffineComponentPreconditioned::Init(), AffineComponentPreconditionedOnline::Init(), FixedAffineComponent::Init(), Convolutional1dComponent::Init(), NaturalGradientAffineComponent::InitFromConfig(), OnlinePreconditioner::InitOrthonormalSpecial(), OnlineNaturalGradient::InitOrthonormalSpecial(), ConvolutionComponent::InputToInputPatches(), MaxpoolingComponent::InputToInputPatches(), NnetDiscriminativeUpdater::LatticeComputations(), main(), NnetComputer::MatrixStddev(), CuMatrixBase< float >::Max(), kaldi::MeanVariance(), kaldi::nnet3::MergeTaskOutput(), CuMatrixBase< float >::Min(), kaldi::nnet1::MomentStatistics(), CuMatrixBase< float >::MulElements(), CuMatrixBase< float >::MulRows(), CuMatrixBase< float >::MulRowsGroupMat(), kaldi::nnet2::NnetComputationChunked(), NnetComputer::NnetComputer(), kaldi::cu::NormalizePerRow(), kaldi::operator<<(), CuMatrix< float >::operator=(), CuMatrixBase< float >::ParametricRelu(), kaldi::nnet2::PreconditionDirections(), OnlinePreconditioner::PreconditionDirections(), OnlineNaturalGradient::PreconditionDirections(), kaldi::nnet2::PreconditionDirectionsAlpha(), kaldi::nnet2::PreconditionDirectionsAlphaRescaled(), OnlinePreconditioner::PreconditionDirectionsInternal(), OnlineNaturalGradient::PreconditionDirectionsInternal(), ModelCollapser::PreMultiplyAffineParameters(), kaldi::nnet3::PrintParameterStats(), NnetComputeProb::ProcessOutputs(), DistributeComponent::Propagate(), NormalizeComponent::Propagate(), Component::Propagate(), ConvolutionComponent::Propagate(), ElementwiseProductComponent::Propagate(), BatchNormComponent::Propagate(), StatisticsExtractionComponent::Propagate(), TimeHeightConvolutionComponent::Propagate(), StatisticsPoolingComponent::Propagate(), RepeatedAffineComponent::Propagate(), DropoutMaskComponent::Propagate(), GeneralDropoutComponent::Propagate(), SpliceComponent::Propagate(), BlockAffineComponent::Propagate(), SumBlockComponent::Propagate(), DctComponent::Propagate(), DropoutComponent::Propagate(), PerElementOffsetComponent::Propagate(), AdditiveNoiseComponent::Propagate(), Convolutional1dComponent::Propagate(), ScaleAndOffsetComponent::Propagate(), CompositeComponent::Propagate(), KlHmm::PropagateFnc(), FramePoolingComponent::PropagateFnc(), SentenceAveragingComponent::PropagateFnc(), Dropout::PropagateFnc(), RestrictedAttentionComponent::PropagateOneHead(), kaldi::nnet1::RandGauss(), CuRand< float >::RandGaussian(), kaldi::cu::Randomize(), CuRand< float >::RandUniform(), kaldi::nnet1::RandUniform(), Rbm::RbmUpdate(), Rbm::Reconstruct(), kaldi::cu::RegularizeL1(), DctComponent::Reorder(), OnlineNaturalGradient::ReorthogonalizeRt1(), OnlinePreconditioner::ReorthogonalizeXt1(), RectifiedLinearComponent::RepairGradients(), NnetRescaler::RescaleComponent(), CuMatrixBase< float >::Row(), kaldi::nnet3::RunNnetComputation(), CuBlockMatrix< Real >::SetCudaData(), LinearTransform::SetLinearity(), AffineTransform::SetLinearity(), kaldi::cu::Splice(), kaldi::nnet3::utterance_splitting::SplitInputToTasks(), NnetBatchComputer::SplitUtteranceIntoTasks(), NonlinearComponent::StoreBackpropStats(), RestrictedAttentionComponent::StoreStats(), BatchNormComponent::StoreStats(), SigmoidComponent::StoreStats(), RectifiedLinearComponent::StoreStats(), NonlinearComponent::StoreStatsInternal(), PdfPrior::SubtractOnLogpost(), kaldi::nnet3::attention::TestAttentionForwardBackward(), kaldi::TraceMatMat(), kaldi::TraceMatSmat(), kaldi::UnitTestCuMathCopy(), kaldi::UnitTestCuMathRandomize(), kaldi::UnitTestCuMathSplice(), kaldi::UnitTestCuTanh(), UnitTestMatrixRandomizer(), kaldi::nnet2::UnitTestNnetCompute(), kaldi::UnitTestSwapCu2Cu(), kaldi::UnitTestSwapCu2M(), ConvolutionComponent::Update(), RepeatedAffineComponent::Update(), NaturalGradientRepeatedAffineComponent::Update(), NaturalGradientAffineComponent::Update(), AffineComponentPreconditioned::Update(), AffineComponentPreconditionedOnline::Update(), Convolutional1dComponent::Update(), TimeHeightConvolutionComponent::UpdateNaturalGradient(), TdnnComponent::UpdateNaturalGradient(), TimeHeightConvolutionComponent::UpdateSimple(), TdnnComponent::UpdateSimple(), NonlinearComponent::UpdateStats(), and kaldi::VecMatVec().

216 { return num_cols_; }
MatrixIndexT num_cols_
Definition: cu-matrix.h:785

◆ NumRows()

MatrixIndexT NumRows ( ) const
inline

Dimensions.

Definition at line 215 of file cu-matrix.h.

Referenced by NnetComputer::AcceptInput(), NnetLdaStatsAccumulator::AccStatsFromOutput(), RestrictedAttentionComponent::Add(), CuMatrixBase< float >::AddCols(), CuVectorBase< float >::AddColSumMat(), MatrixRandomizer::AddData(), CuVectorBase< float >::AddDiagMat2(), CuVectorBase< float >::AddDiagMatMat(), CuMatrixBase< float >::AddDiagVecMat(), CuRand< float >::AddGaussNoise(), CuMatrixBase< float >::AddMat(), CuSpMatrix< Real >::AddMat2(), CuMatrixBase< float >::AddMatBlock(), CuMatrixBase< float >::AddMatBlocks(), CuMatrixBase< float >::AddMatDiagVec(), CuBlockMatrix< Real >::AddMatMat(), CuMatrixBase< float >::AddMatMat(), kaldi::AddMatMatBatched(), CuMatrixBase< float >::AddMatSmat(), CuVectorBase< float >::AddMatVec(), kaldi::nnet3::utterance_splitting::AddOnlineIvectorsToTasks(), CuMatrixBase< float >::AddSmatMat(), AffineComponent::AffineComponent(), kaldi::nnet3::attention::ApplyScalesToInput(), kaldi::nnet3::attention::ApplyScalesToInputSimple(), kaldi::nnet3::attention::ApplyScalesToOutput(), kaldi::nnet3::attention::ApplyScalesToOutputSimple(), kaldi::nnet3::attention::AttentionBackward(), kaldi::nnet3::attention::AttentionForward(), DistributeComponent::Backprop(), NnetUpdater::Backprop(), RestrictedAttentionComponent::Backprop(), ConvolutionComponent::Backprop(), ElementwiseProductComponent::Backprop(), BatchNormComponent::Backprop(), StatisticsExtractionComponent::Backprop(), LstmNonlinearityComponent::Backprop(), StatisticsPoolingComponent::Backprop(), MaxoutComponent::Backprop(), TdnnComponent::Backprop(), MaxpoolingComponent::Backprop(), BackpropTruncationComponent::Backprop(), PnormComponent::Backprop(), NormalizeComponent::Backprop(), RepeatedAffineComponent::Backprop(), SigmoidComponent::Backprop(), TanhComponent::Backprop(), PowerComponent::Backprop(), RectifiedLinearComponent::Backprop(), SoftHingeComponent::Backprop(), ScaleComponent::Backprop(), SoftmaxComponent::Backprop(), LogSoftmaxComponent::Backprop(), AffineComponent::Backprop(), GeneralDropoutComponent::Backprop(), SpliceComponent::Backprop(), SpliceMaxComponent::Backprop(), BlockAffineComponent::Backprop(), SumGroupComponent::Backprop(), PermuteComponent::Backprop(), ClipGradientComponent::Backprop(), DctComponent::Backprop(), FixedLinearComponent::Backprop(), FixedAffineComponent::Backprop(), DropoutComponent::Backprop(), PerElementOffsetComponent::Backprop(), Convolutional1dComponent::Backprop(), ScaleAndOffsetComponent::Backprop(), CompositeComponent::Backprop(), Component::Backpropagate(), HiddenSoftmax::BackpropagateFnc(), Splice::BackpropagateFnc(), SimpleSentenceAveragingComponent::BackpropagateFnc(), BlockSoftmax::BackpropagateFnc(), RecurrentComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), LstmProjected::BackpropagateFnc(), BlstmProjected::BackpropagateFnc(), kaldi::cu::BackpropLstmNonlinearity(), RestrictedAttentionComponent::BackpropOneHead(), CuRand< float >::BinarizeProbs(), ChunkInfo::CheckSize(), ModelCollapser::CollapseComponentsAffine(), NnetComputerFromEg::Compute(), NnetOnlineComputer::Compute(), NnetBatchComputer::Compute(), kaldi::nnet3::ComputeAccuracy(), NnetComputer::ComputeLastLayerDeriv(), kaldi::cu::ComputeLstmNonlinearity(), kaldi::nnet3::ComputeObjectiveFunction(), BatchedXvectorComputer::ComputeOneBatch(), kaldi::nnet1::ComputeStdDev(), NnetUpdater::ComputeTotAccuracy(), LstmNonlinearityComponent::ConsolidateMemory(), kaldi::nnet3::ConstrainOrthonormal(), kaldi::nnet3::ConstrainOrthonormalInternal(), Convolutional1dComponent::Convolutional1dComponent(), ConvolutionComponent::ConvolutionComponent(), kaldi::nnet3::time_height_convolution::ConvolveBackwardData(), kaldi::nnet3::time_height_convolution::ConvolveBackwardDataInternal(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParams(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParamsInternal(), kaldi::nnet3::time_height_convolution::ConvolveForward(), kaldi::nnet3::time_height_convolution::ConvolveForwardInternal(), kaldi::cu::Copy(), CuVectorBase< float >::CopyColFromMat(), CuMatrixBase< float >::CopyCols(), CuVectorBase< float >::CopyDiagFromMat(), CuVectorBase< float >::CopyElements(), CuMatrixBase< float >::CopyFromBlock(), CuTpMatrix< Real >::CopyFromMat(), CuCompressedMatrix< I >::CopyFromMat(), CuSpMatrix< Real >::CopyFromMat(), CuBlockMatrix< Real >::CopyFromMat(), CuMatrixBase< float >::CopyFromMat(), CuVectorBase< float >::CopyRowsFromMat(), VectorBase< float >::CopyRowsFromMat(), CuSparseMatrix< Real >::CopyToMat(), CuCompressedMatrix< I >::CopyToMat(), CuMatrix< float >::CuMatrix(), kaldi::CuRandGaussianMatrixBaseSpeedTest(), kaldi::CuRandGaussianMatrixSpeedTest(), kaldi::CuRandUniformMatrixBaseSpeedTest(), kaldi::CuRandUniformMatrixSpeedTest(), CuTpMatrix< Real >::CuTpMatrix(), DecodableAmNnet::DecodableAmNnet(), CuMatrixBase< float >::DiffLogSoftmaxPerRow(), kaldi::cu::DiffNormalizePerRow(), CuMatrixBase< float >::DiffSoftmaxPerRow(), CuMatrixBase< float >::DivElements(), kaldi::cu::EnsureNonzero(), CuMatrixBase< float >::EqualElementMask(), Xent::Eval(), Mse::Eval(), MultiTaskLoss::Eval(), NnetBatchComputer::FormatOutputs(), kaldi::nnet3::attention::GetAttentionDotProducts(), kaldi::nnet3::attention::GetAttentionDotProductsSimple(), TdnnComponent::GetInputPart(), NnetComputer::GetPointers(), AffineComponentPreconditioned::GetScalingFactor(), CuMatrixBase< float >::GroupMax(), CuMatrixBase< float >::GroupPnorm(), ConvolutionComponent::Init(), OnlinePreconditioner::Init(), OnlineNaturalGradient::Init(), AffineComponent::Init(), AffineComponentPreconditioned::Init(), AffineComponentPreconditionedOnline::Init(), FixedAffineComponent::Init(), Convolutional1dComponent::Init(), NaturalGradientAffineComponent::InitFromConfig(), FixedAffineComponent::InitFromConfig(), FixedLinearComponent::InitFromString(), FixedAffineComponent::InitFromString(), OnlinePreconditioner::InitOrthonormalSpecial(), OnlineNaturalGradient::InitOrthonormalSpecial(), NnetDiscriminativeUpdater::LatticeComputations(), main(), NnetComputer::MatrixStddev(), CuMatrixBase< float >::Max(), kaldi::MeanVariance(), kaldi::nnet3::MergeTaskOutput(), CuMatrixBase< float >::Min(), kaldi::nnet1::MomentStatistics(), CuMatrixBase< float >::MulElements(), NaturalGradientAffineComponent::NaturalGradientAffineComponent(), kaldi::nnet2::NnetComputationChunked(), NnetComputer::NnetComputer(), kaldi::cu::NormalizePerRow(), kaldi::operator<<(), CuMatrix< float >::operator=(), CuMatrixBase< float >::ParametricRelu(), kaldi::nnet2::PreconditionDirections(), OnlinePreconditioner::PreconditionDirections(), kaldi::nnet2::PreconditionDirectionsAlpha(), kaldi::nnet2::PreconditionDirectionsAlphaRescaled(), OnlinePreconditioner::PreconditionDirectionsInternal(), OnlineNaturalGradient::PreconditionDirectionsInternal(), ModelCollapser::PreMultiplyAffineParameters(), kaldi::nnet3::PrintParameterStats(), SingleUtteranceNnet2DecoderThreaded::ProcessLoglikes(), DistributeComponent::Propagate(), NnetOnlineComputer::Propagate(), NormalizeComponent::Propagate(), DropoutComponent::Propagate(), RestrictedAttentionComponent::Propagate(), Component::Propagate(), ConvolutionComponent::Propagate(), ElementwiseProductComponent::Propagate(), BatchNormComponent::Propagate(), StatisticsExtractionComponent::Propagate(), TimeHeightConvolutionComponent::Propagate(), StatisticsPoolingComponent::Propagate(), TdnnComponent::Propagate(), MaxpoolingComponent::Propagate(), RepeatedAffineComponent::Propagate(), DropoutMaskComponent::Propagate(), GeneralDropoutComponent::Propagate(), SpliceComponent::Propagate(), BlockAffineComponent::Propagate(), SumBlockComponent::Propagate(), DctComponent::Propagate(), PerElementOffsetComponent::Propagate(), AdditiveNoiseComponent::Propagate(), Convolutional1dComponent::Propagate(), ScaleAndOffsetComponent::Propagate(), CompositeComponent::Propagate(), KlHmm::PropagateFnc(), SimpleSentenceAveragingComponent::PropagateFnc(), RecurrentComponent::PropagateFnc(), SentenceAveragingComponent::PropagateFnc(), LengthNormComponent::PropagateFnc(), ConvolutionalComponent::PropagateFnc(), Dropout::PropagateFnc(), LstmProjected::PropagateFnc(), BlstmProjected::PropagateFnc(), RestrictedAttentionComponent::PropagateOneHead(), kaldi::nnet1::RandGauss(), CuRand< float >::RandGaussian(), kaldi::cu::Randomize(), CuRand< float >::RandUniform(), kaldi::nnet1::RandUniform(), Rbm::RbmUpdate(), Rbm::Reconstruct(), kaldi::cu::RegularizeL1(), DctComponent::Reorder(), OnlineNaturalGradient::ReorthogonalizeRt1(), OnlinePreconditioner::ReorthogonalizeXt1(), RectifiedLinearComponent::RepairGradients(), ClipGradientComponent::RepairGradients(), NnetRescaler::RescaleComponent(), SingleUtteranceNnet2DecoderThreaded::RunNnetEvaluationInternal(), CuBlockMatrix< Real >::SetCudaData(), LinearTransform::SetLinearity(), AffineTransform::SetLinearity(), kaldi::cu::Splice(), kaldi::nnet3::utterance_splitting::SplitInputToTasks(), NnetBatchComputer::SplitUtteranceIntoTasks(), NonlinearComponent::StoreBackpropStats(), RestrictedAttentionComponent::StoreStats(), BatchNormComponent::StoreStats(), SigmoidComponent::StoreStats(), RectifiedLinearComponent::StoreStats(), NonlinearComponent::StoreStatsInternal(), CuMatrixBase< float >::SumColumnRanges(), kaldi::TraceMatMat(), kaldi::TraceMatSmat(), kaldi::UnitTestCheck(), kaldi::UnitTestCuMathCopy(), kaldi::UnitTestCuMathRandomize(), kaldi::UnitTestCuMathSplice(), kaldi::UnitTestCuTanh(), UnitTestMatrixRandomizer(), kaldi::nnet2::UnitTestNnetCompute(), kaldi::nnet3::UnitTestPreconditionDirectionsOnline(), kaldi::nnet2::UnitTestPreconditionDirectionsOnline(), kaldi::nnet1::UnitTestSimpleSentenceAveragingComponent(), kaldi::UnitTestSwapCu2Cu(), kaldi::UnitTestSwapCu2M(), ConvolutionComponent::Update(), LinearTransform::Update(), AffineTransform::Update(), SentenceAveragingComponent::Update(), RecurrentComponent::Update(), RepeatedAffineComponent::Update(), NaturalGradientRepeatedAffineComponent::Update(), NaturalGradientAffineComponent::Update(), AffineComponentPreconditioned::Update(), AffineComponentPreconditionedOnline::Update(), BlockAffineComponentPreconditioned::Update(), Convolutional1dComponent::Update(), TimeHeightConvolutionComponent::UpdateNaturalGradient(), TdnnComponent::UpdateNaturalGradient(), TimeHeightConvolutionComponent::UpdateSimple(), TdnnComponent::UpdateSimple(), BlockAffineComponent::UpdateSimple(), NonlinearComponent::UpdateStats(), kaldi::VecMatVec(), and kaldi::nnet3::time_height_convolution::ZeroBlankRows().

215 { return num_rows_; }
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ operator()() [1/2]

CuValue<Real> operator() ( MatrixIndexT  r,
MatrixIndexT  c 
)
inline

Definition at line 682 of file cu-matrix.h.

682  {
683  KALDI_PARANOID_ASSERT(static_cast<UnsignedMatrixIndexT>(r) <
684  static_cast<UnsignedMatrixIndexT>(num_rows_) &&
685  static_cast<UnsignedMatrixIndexT>(c) <
686  static_cast<UnsignedMatrixIndexT>(num_cols_));
687  return CuValue<Real>(data_ + r * stride_ + c);
688  }
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT stride_
Definition: cu-matrix.h:787
#define KALDI_PARANOID_ASSERT(cond)
Definition: kaldi-error.h:206
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ operator()() [2/2]

Real operator() ( MatrixIndexT  r,
MatrixIndexT  c 
) const
inline

Definition at line 690 of file cu-matrix.h.

690  {
691  KALDI_PARANOID_ASSERT(static_cast<UnsignedMatrixIndexT>(r) <
692  static_cast<UnsignedMatrixIndexT>(num_rows_) &&
693  static_cast<UnsignedMatrixIndexT>(c) <
694  static_cast<UnsignedMatrixIndexT>(num_cols_));
695  return CuValue<Real>(data_ + r * stride_ + c); // will be casted to Real.
696  }
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT stride_
Definition: cu-matrix.h:787
#define KALDI_PARANOID_ASSERT(cond)
Definition: kaldi-error.h:206
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ ParametricRelu()

void ParametricRelu ( const CuMatrixBase< Real > &  src,
const CuVectorBase< Real > &  alpha,
const CuVectorBase< Real > &  beta 
)

Compute the parametric rectified linear unit function; element by element, *this = src * (src > 0 ? alpha : beta)

Definition at line 1467 of file cu-matrix.cc.

Referenced by ParametricRelu::PropagateFnc(), and CuMatrixBase< float >::SizeInBytes().

1470  {
1471  KALDI_ASSERT(src.NumRows() == this->NumRows());
1472  KALDI_ASSERT(src.NumCols() == this->NumCols());
1473  KALDI_ASSERT(alpha.Dim() == this->NumCols());
1474  KALDI_ASSERT(beta.Dim() == this->NumCols());
1475 #if HAVE_CUDA == 1
1476  if (CuDevice::Instantiate().Enabled()) {
1477  CuTimer tim;
1478 
1479  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
1480  dim3 dimGrid(n_blocks(src.NumCols(), CU2DBLOCK), n_blocks(src.NumRows(), CU2DBLOCK));
1481 
1482  cuda_parametric_relu(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
1483  src.Stride(), alpha.data_, beta.data_);
1484  CU_SAFE_CALL(cudaGetLastError());
1485 
1486  CuDevice::Instantiate().AccuProfile(__func__, tim);
1487  } else
1488 #endif
1489  {
1490  // Do it on CPU,
1491  for (MatrixIndexT r = 0; r < NumRows(); r++) {
1492  for (MatrixIndexT c = 0; c < NumCols(); c++) {
1493  Real src_elem = src.Mat()(r,c);
1494  this->Mat()(r,c) = src_elem * (src_elem >= 0.0 ? alpha.Vec()(c) : beta.Vec()(c));
1495  }
1496  }
1497  }
1498 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
int32 MatrixIndexT
Definition: matrix-common.h:98
#define CU2DBLOCK
Definition: cu-matrixdim.h:61
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ Pow()

void Pow ( const CuMatrixBase< Real > &  src,
Real  power 
)

Definition at line 2500 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyPow(), and CuMatrixBase< float >::SizeInBytes().

2500  {
2501  KALDI_ASSERT(SameDim(*this, src));
2502 #if HAVE_CUDA == 1
2503  if (CuDevice::Instantiate().Enabled()) {
2504  CuTimer tim;
2505  dim3 dimGrid, dimBlock;
2506  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2507  &dimGrid, &dimBlock);
2508  cuda_pow(dimGrid, dimBlock, this->data_, src.data_, power, this->Dim(),
2509  src.Stride());
2510  CU_SAFE_CALL(cudaGetLastError());
2511 
2512  CuDevice::Instantiate().AccuProfile(__func__, tim);
2513  } else
2514  #endif
2515  {
2516  Mat().Pow(src.Mat(), power);
2517  }
2518 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ PowAbs()

void PowAbs ( const CuMatrixBase< Real > &  src,
Real  power,
bool  include_sign = false 
)

Apply power to the absolute value of each element.

If include_sign is true, the result will be multiplied with the sign of the input value. If the power is negative and the input to the power is zero, The output will be set zero. If include_sign is true, it will multiply the result by the sign of the input.

Definition at line 2521 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyPowAbs(), and CuMatrixBase< float >::SizeInBytes().

2521  {
2522  KALDI_ASSERT(SameDim(*this, src));
2523 #if HAVE_CUDA == 1
2524  if (CuDevice::Instantiate().Enabled()) {
2525  CuTimer tim;
2526  dim3 dimGrid, dimBlock;
2527  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2528  &dimGrid, &dimBlock);
2529  cuda_pow_abs(dimGrid, dimBlock, this->data_, src.data_, power, include_sign,
2530  this->Dim(), src.Stride());
2531  CU_SAFE_CALL(cudaGetLastError());
2532  CuDevice::Instantiate().AccuProfile(__func__, tim);
2533  } else
2534 #endif
2535  {
2536  Mat().PowAbs(src.Mat(), power, include_sign);
2537  }
2538 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ Range()

◆ Row() [1/2]

const CuSubVector<Real> Row ( MatrixIndexT  i) const
inline

Definition at line 670 of file cu-matrix.h.

Referenced by kaldi::nnet3::utterance_splitting::AddOnlineIvectorsToTasks(), DecodableNnetSimpleLooped::AdvanceChunk(), MultiBasisComponent::BackpropagateFnc(), NnetOnlineComputer::Compute(), CuMatrixBase< float >::CopyRangeFromMatClamped(), DecodableNnetSimple::DoNnetComputation(), NnetBatchComputer::FormatInputs(), kaldi::nnet2::NnetComputationChunked(), NnetComputer::NnetComputer(), RecurrentComponent::PropagateFnc(), kaldi::nnet3::RunNnetComputation(), kaldi::UnitTestCuMatrixAddDiagVecMat(), kaldi::nnet2::UnitTestGenericComponentInternal(), kaldi::UnitTestLstmNonlinearity(), kaldi::nnet1::UnitTestSimpleSentenceAveragingComponent(), NaturalGradientRepeatedAffineComponent::Update(), and TimeHeightConvolutionComponent::UpdateNaturalGradient().

670  {
671  KALDI_ASSERT(static_cast<UnsignedMatrixIndexT>(i) <
672  static_cast<UnsignedMatrixIndexT>(num_rows_));
673  return CuSubVector<Real>(data_ + (i * stride_), NumCols());
674  }
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
friend class CuSubVector< Real >
Definition: cu-matrix.h:92
MatrixIndexT stride_
Definition: cu-matrix.h:787
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ Row() [2/2]

CuSubVector<Real> Row ( MatrixIndexT  i)
inline

Definition at line 676 of file cu-matrix.h.

676  {
677  KALDI_ASSERT(static_cast<UnsignedMatrixIndexT>(i) <
678  static_cast<UnsignedMatrixIndexT>(num_rows_));
679  return CuSubVector<Real>(data_ + (i * stride_), NumCols());
680  }
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
friend class CuSubVector< Real >
Definition: cu-matrix.h:92
MatrixIndexT stride_
Definition: cu-matrix.h:787
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ RowData() [1/2]

◆ RowData() [2/2]

Real* RowData ( MatrixIndexT  r)
inline

Get raw row pointer.

Warning: may return a pointer to GPU memory. Use at your own risk.

Definition at line 743 of file cu-matrix.h.

743 { return data_ + r * stride_; }
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT stride_
Definition: cu-matrix.h:787

◆ RowRange()

◆ Scale()

void Scale ( Real  value)

Definition at line 644 of file cu-matrix.cc.

Referenced by DecodableNnetLoopedOnlineBase::AdvanceChunk(), DecodableNnetSimpleLooped::AdvanceChunk(), CuMatrixBase< float >::ApplyLog(), BackpropTruncationComponent::Backprop(), TanhComponent::Backprop(), PowerComponent::Backprop(), ScaleComponent::Backprop(), NoOpComponent::Backprop(), AveragePoolingComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), Dropout::BackpropagateFnc(), NnetBatchComputer::Compute(), DecodableNnet2Online::ComputeForFrame(), LstmNonlinearityComponent::ConsolidateMemory(), kaldi::CuCompressedMatrixTestNonnegative(), kaldi::CuCompressedMatrixTestSymmetric(), CuMatrixBase< float >::DiffLogSoftmaxPerRow(), DecodableNnetSimple::DoNnetComputation(), NnetComputer::ExecuteCommand(), GeneralDropoutComponent::GetMemo(), main(), kaldi::nnet2::PreconditionDirectionsAlphaRescaled(), SingleUtteranceNnet2DecoderThreaded::ProcessLoglikes(), NnetChainTrainer::ProcessOutputs(), NnetDiscriminativeTrainer::ProcessOutputs(), BackpropTruncationComponent::Propagate(), ScaleComponent::Propagate(), DropoutMaskComponent::Propagate(), DropoutComponent::Propagate(), KlHmm::PropagateFnc(), Dropout::PropagateFnc(), ClipGradientComponent::RepairGradients(), RestrictedAttentionComponent::Scale(), TanhComponent::StoreStats(), kaldi::nnet3::attention::TestAttentionForwardBackward(), kaldi::UnitTestCuMatrixAddMatDiagVec(), kaldi::UnitTestCuMatrixAddMatMatElements(), kaldi::UnitTestCuMatrixScale(), and kaldi::UnitTestLstmNonlinearity().

644  {
645 #if HAVE_CUDA == 1
646  if (CuDevice::Instantiate().Enabled()) {
647  if (num_rows_ == 0) return;
648  CuTimer tim;
649 
650  dim3 dimGrid, dimBlock;
651  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
652  &dimGrid, &dimBlock);
653 
654  cuda_scale(dimGrid, dimBlock, data_, value, Dim());
655  CU_SAFE_CALL(cudaGetLastError());
656 
657  CuDevice::Instantiate().AccuProfile(__func__, tim);
658  } else
659 #endif
660  {
661  Mat().Scale(value);
662  }
663 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ Set()

void Set ( Real  value)

Definition at line 531 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), BackpropTruncationComponent::Backprop(), SigmoidComponent::Backprop(), NnetComputer::ExecuteCommand(), MaxpoolingComponent::Propagate(), DropoutMaskComponent::Propagate(), MaxPoolingComponent::PropagateFnc(), SigmoidComponent::StoreStats(), kaldi::UnitTestCuMatrixObjfDeriv(), kaldi::UnitTestCuMatrixSet(), kaldi::nnet1::UnitTestDropoutComponent(), and kaldi::nnet1::UnitTestMaxPoolingComponent().

531  {
532  #if HAVE_CUDA == 1
533  if (CuDevice::Instantiate().Enabled()) {
534  if (num_rows_ == 0) return;
535  CuTimer tim;
536 
537  dim3 dimGrid, dimBlock;
538  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
539  &dimGrid, &dimBlock);
540 
541  cuda_set_const(dimGrid, dimBlock, data_, value, Dim());
542  CU_SAFE_CALL(cudaGetLastError());
543 
544  CuDevice::Instantiate().AccuProfile(__func__, tim);
545  } else
546  #endif
547  {
548  Mat().Set(value);
549  }
550 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ SetMatMatDivMat()

void SetMatMatDivMat ( const CuMatrixBase< Real > &  A,
const CuMatrixBase< Real > &  B,
const CuMatrixBase< Real > &  C 
)

*this = a * b / c (by element; when c = 0, *this = a) *this can be an alias of a, b or c safely and get expected result.

dst = a * b / c (by element; when c = 0, dst = a) dst can be an alias of a, b or c safely and get expected result.

Definition at line 1206 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), and DropoutComponent::Backprop().

1207  {
1208 #if HAVE_CUDA == 1
1209  if (CuDevice::Instantiate().Enabled()) {
1210  CuTimer tim;
1211 
1212  KALDI_ASSERT(num_rows_ == A.num_rows_ && num_cols_ == A.num_cols_);
1213  KALDI_ASSERT(num_rows_ == B.num_rows_ && num_cols_ == B.num_cols_);
1214  KALDI_ASSERT(num_rows_ == C.num_rows_ && num_cols_ == C.num_cols_);
1215  if (num_rows_ == 0) return;
1216  dim3 dimGrid, dimBlock;
1217  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1218  &dimGrid, &dimBlock);
1219  cuda_set_mat_mat_div_mat(dimGrid, dimBlock, A.data_, B.data_, C.data_,
1220  data_, Dim(), A.Stride(), B.Stride(), C.Stride());
1221  CU_SAFE_CALL(cudaGetLastError());
1222 
1223  CuDevice::Instantiate().AccuProfile(__func__, tim);
1224  } else
1225 #endif
1226  {
1227  Mat().SetMatMatDivMat(A.Mat(), B.Mat(), C.Mat());
1228  }
1229 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ SetRandn()

void SetRandn ( )

Definition at line 3132 of file cu-matrix.cc.

Referenced by LstmNonlinearityComponent::ConsolidateMemory(), kaldi::CuCompressedMatrixTestSign(), kaldi::CuVectorUnitTestAddColSumMat(), kaldi::CuVectorUnitTestAddDiagMat2(), kaldi::CuVectorUnitTestAddDiagMatMat(), kaldi::CuVectorUnitTestAddMatVec(), kaldi::CuVectorUnitTestAddRowSumMat(), kaldi::CuVectorUnitTestCopyElements(), kaldi::CuVectorUnitTestCopyFromMat(), FixedAffineComponent::InitFromConfig(), CuMatrixBase< float >::operator()(), ConvolutionComponent::PerturbParams(), TimeHeightConvolutionComponent::PerturbParams(), LstmNonlinearityComponent::PerturbParams(), TdnnComponent::PerturbParams(), RepeatedAffineComponent::PerturbParams(), AffineComponent::PerturbParams(), LinearComponent::PerturbParams(), BlockAffineComponent::PerturbParams(), Convolutional1dComponent::PerturbParams(), kaldi::nnet3::attention::TestAttentionForwardBackward(), kaldi::TestCuMatrixAddMat(), kaldi::TestCuMatrixAddMatBlocks(), kaldi::TestCuMatrixAddRowRanges(), kaldi::TestCuMatrixAddRows1(), kaldi::TestCuMatrixAddRows2(), kaldi::TestCuMatrixAddToRows(), kaldi::TestCuMatrixCompObjfAndDeriv(), kaldi::TestCuMatrixCopyLowerToUpper(), kaldi::TestCuMatrixCopyRows1(), kaldi::TestCuMatrixCopyRows2(), kaldi::TestCuMatrixCopyToRows(), kaldi::TestCuMatrixCopyUpperToLower(), kaldi::TestCuMatrixDiffGroupPnorm(), kaldi::TestCuMatrixDiffLogSoftmax(), kaldi::TestCuMatrixDiffSoftmax(), kaldi::TestCuMatrixDivRowsVec(), kaldi::TestCuMatrixGroupMaxAllGroupSizes(), kaldi::TestCuMatrixHeaviside(), kaldi::TestCuMatrixLogSoftmax(), kaldi::TestCuMatrixLookup(), kaldi::TestCuMatrixMatMatBatched(), kaldi::TestCuMatrixMax(), kaldi::TestCuMatrixMin(), kaldi::TestCuMatrixMulRowsGroupMat(), kaldi::TestCuMatrixSetZeroAboveDiag(), kaldi::TestCuMatrixSigmoid(), kaldi::TestCuMatrixSoftmax(), kaldi::TestCuMatrixSum(), kaldi::TestCuMatrixTraceMatMat(), kaldi::TestCuMatrixTransposeNS(), kaldi::TestCuMatrixTransposeS(), kaldi::TestCuSparseMatrixTraceMatSmat(), kaldi::TestCuVectorAddColSumMat(), kaldi::TestCuVectorAddDiagMat2(), kaldi::TestCuVectorAddDiagMat2OnVariousShapes(), kaldi::TestCuVectorAddDiagMatMat(), kaldi::TestCuVectorAddDiagMatMatShape(), kaldi::TestCuVectorAddRowSumMat(), kaldi::nnet3::time_height_convolution::TestDataBackprop(), kaldi::nnet3::time_height_convolution::TestParamsBackprop(), kaldi::nnet3::time_height_convolution::TestRunningComputation(), kaldi::nnet3::TestSimpleComponentDataDerivative(), kaldi::nnet3::TestSimpleComponentModelDerivative(), kaldi::nnet3::TestSimpleComponentPropagateProperties(), kaldi::nnet3::attention::UnitTestAttentionDotProductAndAddScales(), kaldi::UnitTestBackpropLstmNonlinearity(), kaldi::UnitTestCuApproxEqual(), kaldi::UnitTestCuBlockMatrixAddMatBlock(), kaldi::UnitTestCuBlockMatrixAddMatMat(), kaldi::UnitTestCuCholesky(), kaldi::UnitTestCuDiffNormalizePerRow(), kaldi::UnitTestCuMathComputeLstmNonlinearity(), kaldi::UnitTestCuMathCopy(), kaldi::UnitTestCuMathNormalizePerRow(), kaldi::UnitTestCuMathRandomize(), kaldi::UnitTestCuMathSplice(), kaldi::UnitTestCuMatrixAddCols(), kaldi::UnitTestCuMatrixAddDiagVecMat(), kaldi::UnitTestCuMatrixAddElements(), kaldi::UnitTestCuMatrixAddMatBlocks1(), kaldi::UnitTestCuMatrixAddMatBlocks1Trans(), kaldi::UnitTestCuMatrixAddMatBlocks2(), kaldi::UnitTestCuMatrixAddRows(), kaldi::UnitTestCuMatrixAddToElements(), kaldi::UnitTestCuMatrixAddToRows(), kaldi::UnitTestCuMatrixCopyCols(), kaldi::UnitTestCuMatrixCopyCross(), kaldi::UnitTestCuMatrixCopyCross2(), kaldi::UnitTestCuMatrixCopyLowerToUpper(), kaldi::UnitTestCuMatrixCopyRows(), kaldi::UnitTestCuMatrixCopyToRows(), kaldi::UnitTestCuMatrixCopyUpperToLower(), kaldi::UnitTestCuMatrixIO(), kaldi::UnitTestCuMatrixLookup(), kaldi::UnitTestCuMatrixMulRows(), kaldi::UnitTestCuMatrixObjfDeriv(), kaldi::UnitTestCuMatrixReduceMax(), kaldi::UnitTestCuMatrixReduceMin(), kaldi::UnitTestCuMatrixReduceSum(), kaldi::UnitTestCuMatrixSetMatMatDivMat(), kaldi::UnitTestCuMatrixSetRandn(), kaldi::UnitTestCuMatrixSetZeroAboveDiag(), kaldi::UnitTestCuMatrixSymAddMat2(), kaldi::UnitTestCuMatrixSymInvertPosDef(), kaldi::UnitTestCuMatrixTraceMatMat(), kaldi::UnitTestCuMatrixTranspose(), kaldi::UnitTestCuSparseMatrixFrobeniusNorm(), kaldi::UnitTestCuSparseMatrixSum(), kaldi::UnitTestCuSparseMatrixTraceMatSmat(), kaldi::UnitTestCuSpMatrixCopyFromMat(), kaldi::UnitTestCuSubMatrix(), kaldi::UnitTestCuTpMatrixCopyFromMat(), kaldi::UnitTestEnsureNonzero(), kaldi::nnet2::UnitTestFixedAffineComponent(), kaldi::nnet2::UnitTestFixedLinearComponent(), kaldi::nnet2::UnitTestGenericComponentInternal(), kaldi::UnitTestMatrix(), kaldi::nnet2::UnitTestNnetCompute(), kaldi::nnet3::UnitTestNnetCompute(), kaldi::nnet2::UnitTestNnetComputeChunked(), kaldi::nnet3::UnitTestNnetInputDerivatives(), kaldi::nnet3::UnitTestNnetModelDerivatives(), kaldi::nnet3::UnitTestNnetOptimizeWithOptions(), kaldi::nnet2::UnitTestPreconditionDirections(), and kaldi::UnitTestVecMatVec().

3132  {
3133  if (num_rows_ == 0) return;
3134 #if HAVE_CUDA == 1
3135  if (CuDevice::Instantiate().Enabled()) {
3136  CuRand<Real> tmp;
3137  tmp.RandGaussian(this);
3138  } else
3139 #endif
3140  {
3141  Mat().SetRandn();
3142  }
3143 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
friend class CuRand< Real >
Definition: cu-matrix.h:91
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ SetRandUniform()

void SetRandUniform ( )

Definition at line 3146 of file cu-matrix.cc.

Referenced by kaldi::CuCompressedMatrixTestNonnegative(), kaldi::CuCompressedMatrixTestSymmetric(), CuMatrixBase< float >::operator()(), kaldi::UnitTestCuMatrixEqualElementMask(), kaldi::UnitTestCuMatrixSetRandUniform(), and kaldi::UnitTestCuMatrixTraceMatMat().

3146  {
3147  if (num_rows_ == 0) return;
3148 #if HAVE_CUDA == 1
3149  if (CuDevice::Instantiate().Enabled()) {
3150  CuRand<Real> tmp;
3151  tmp.RandUniform(this);
3152  } else
3153 #endif
3154  {
3155  Mat().SetRandUniform();
3156  }
3157 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
friend class CuRand< Real >
Definition: cu-matrix.h:91
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ SetZero()

void SetZero ( )

Math operations, some calling kernels.

Definition at line 509 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), DistributeComponent::Backprop(), StatisticsExtractionComponent::Backprop(), ClipGradientComponent::Backprop(), MaxPoolingComponent::BackpropagateFnc(), AveragePoolingComponent::BackpropagateFnc(), CopyComponent::BackpropagateFnc(), LstmNonlinearityComponent::ConsolidateMemory(), CuSparseMatrix< Real >::CopyToMat(), NnetComputer::ExecuteCommand(), OnlinePreconditioner::InitOrthonormalSpecial(), OnlineNaturalGradient::InitOrthonormalSpecial(), StatisticsExtractionComponent::Propagate(), StatisticsPoolingComponent::Propagate(), AveragePoolingComponent::PropagateFnc(), FramePoolingComponent::PropagateFnc(), kaldi::nnet3::attention::TestAttentionForwardBackward(), kaldi::TestCuMatrixCompObjfAndDeriv(), kaldi::UnitTestCuMatrixSetMatMatDivMat(), and RestrictedAttentionComponent::ZeroStats().

509  {
510 #if HAVE_CUDA == 1
511  if (CuDevice::Instantiate().Enabled()) {
512  CuTimer tim;
513  CU_SAFE_CALL(cudaMemset2DAsync(data_, stride_ * sizeof(Real), 0,
514  num_cols_ * sizeof(Real), num_rows_ ,
515  cudaStreamPerThread));
516  CuDevice::Instantiate().AccuProfile("CuMatrix::SetZero", tim);
517  } else
518 #endif
519  {
520  Mat().SetZero();
521  }
522 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT stride_
Definition: cu-matrix.h:787
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ SetZeroAboveDiag()

void SetZeroAboveDiag ( )

Zeroes all elements for which col > row.

Definition at line 554 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), kaldi::TestCuMatrixSetZeroAboveDiag(), kaldi::UnitTestCuMatrixSetZeroAboveDiag(), and kaldi::UnitTestSetZeroAboveDiag().

554  {
555 #if HAVE_CUDA == 1
556  if (CuDevice::Instantiate().Enabled()) {
557  if (num_rows_ == 0) return;
558  CuTimer tim;
559 
560  dim3 dimGrid, dimBlock;
561  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
562  &dimGrid, &dimBlock);
563 
564  cuda_set_zero_above_diag(dimGrid, dimBlock, data_, Dim());
565  CU_SAFE_CALL(cudaGetLastError());
566 
567  CuDevice::Instantiate().AccuProfile(__func__, tim);
568  } else
569 #endif
570  {
571  MatrixBase<Real> &mat = Mat();
572  int32 num_rows = mat.NumRows(), num_cols = mat.NumCols();
573  for (int32 r = 0; r + 1 < num_rows; r++) {
574  SubVector<Real> vec(mat, r),
575  vec_part(vec, r + 1, num_cols - (r + 1));
576  vec_part.SetZero();
577  }
578  }
579 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
kaldi::int32 int32
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ Sigmoid()

void Sigmoid ( const CuMatrixBase< Real > &  src)

Set each element to the sigmoid of the corresponding element of "src": element by element, x = 1 / (1 + exp(-x))

Definition at line 1534 of file cu-matrix.cc.

Referenced by SoftHingeComponent::Backprop(), SigmoidComponent::Propagate(), Sigmoid::PropagateFnc(), Rbm::PropagateFnc(), Rbm::Reconstruct(), CuMatrixBase< float >::SizeInBytes(), kaldi::TestCuMatrixSigmoid(), kaldi::UnitTestCuMatrixSigmoid(), and kaldi::UnitTestCuSigmoid().

1534  {
1535  KALDI_ASSERT(SameDim(*this, src));
1536 #if HAVE_CUDA == 1
1537  if (CuDevice::Instantiate().Enabled()) {
1538  CuTimer tim;
1539  dim3 dimGrid, dimBlock;
1540  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1541  &dimGrid, &dimBlock);
1542  cuda_sigmoid(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
1543  src.Stride());
1544  CU_SAFE_CALL(cudaGetLastError());
1545 
1546  CuDevice::Instantiate().AccuProfile(__func__, tim);
1547  } else
1548  #endif
1549  {
1550  Mat().Sigmoid(src.Mat());
1551  }
1552 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ SizeInBytes()

MatrixIndexT SizeInBytes ( ) const
inline

Get size of matrix in bytes.

Definition at line 234 of file cu-matrix.h.

234 { return num_rows_*stride_*sizeof(Real); }
MatrixIndexT stride_
Definition: cu-matrix.h:787
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ SoftHinge()

void SoftHinge ( const CuMatrixBase< Real > &  src)

Apply the function y = log(1 + exp(x)), to each element.

Note: the derivative of this function is the sigmoid function. This is like a soft ReLU.

Definition at line 1555 of file cu-matrix.cc.

Referenced by SoftHingeComponent::Propagate(), CuMatrixBase< float >::SizeInBytes(), and kaldi::UnitTestCuMatrixSoftHinge().

1555  {
1556  KALDI_ASSERT(SameDim(*this, src));
1557 #if HAVE_CUDA == 1
1558  if (CuDevice::Instantiate().Enabled()) {
1559  CuTimer tim;
1560  dim3 dimGrid, dimBlock;
1561  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1562  &dimGrid, &dimBlock);
1563  cuda_soft_hinge(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
1564  src.Stride());
1565  CU_SAFE_CALL(cudaGetLastError());
1566 
1567  CuDevice::Instantiate().AccuProfile(__func__, tim);
1568  } else
1569  #endif
1570  {
1571  Mat().SoftHinge(src.Mat());
1572  }
1573 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ SoftMaxPerRow()

void SoftMaxPerRow ( const CuMatrixBase< Real > &  src)

Softmax nonlinearity Y = Softmax(X) : Yij = e^Xij / sum_k(e^Xik), done to each row, with attention to avoiding overflow or underflow.

Supports in-place operation (i.e. this == &src).

Definition at line 1717 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplySoftMaxPerRow(), kaldi::nnet3::attention::AttentionForward(), SoftmaxComponent::Propagate(), Softmax::PropagateFnc(), HiddenSoftmax::PropagateFnc(), BlockSoftmax::PropagateFnc(), CuMatrixBase< float >::SizeInBytes(), kaldi::TestCuMatrixSoftmax(), and kaldi::UnitTestCuSoftmax().

1717  {
1718  KALDI_ASSERT(SameDim(*this, src));
1719 #if HAVE_CUDA == 1
1720  if (CuDevice::Instantiate().Enabled()) {
1721  CuTimer tim;
1722  size_t dimBlock = CU1DBLOCK;
1723  size_t dimGrid = src.num_rows_;
1724  cuda_softmax_reduce(dimGrid, dimBlock, data_, src.data_, Dim(), src.Stride());
1725  CU_SAFE_CALL(cudaGetLastError());
1726 
1727  CuDevice::Instantiate().AccuProfile(__func__, tim);
1728  } else
1729  #endif
1730  {
1731  MatrixBase<Real> &mat(this->Mat());
1732  mat.CopyFromMat(src.Mat());
1733  for(MatrixIndexT r = 0; r < mat.NumRows(); r++) {
1734  mat.Row(r).ApplySoftMax();
1735  }
1736  }
1737 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
int32 MatrixIndexT
Definition: matrix-common.h:98
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
::MatrixDim Dim() const
Definition: cu-matrix.h:221

◆ Stride()

MatrixIndexT Stride ( ) const
inline

Definition at line 217 of file cu-matrix.h.

Referenced by NnetComputer::AcceptInput(), CuMatrixBase< float >::AddCols(), CuVectorBase< float >::AddDiagMatMat(), CuMatrixBase< float >::AddDiagVecMat(), CuMatrixBase< float >::AddMat(), CuSpMatrix< Real >::AddMat2(), CuMatrixBase< float >::AddMatBlock(), CuMatrixBase< float >::AddMatBlocks(), CuMatrixBase< float >::AddMatDiagVec(), CuBlockMatrix< Real >::AddMatMat(), CuMatrixBase< float >::AddMatMat(), CuMatrixBase< float >::AddMatMatElements(), CuMatrixBase< float >::AddMatSmat(), CuVectorBase< float >::AddMatVec(), CuMatrixBase< float >::AddRows(), CuMatrixBase< float >::AddSmatMat(), CuMatrixBase< float >::AddToRows(), NormalizeComponent::Backprop(), BatchNormComponent::Backprop(), RepeatedAffineComponent::Backprop(), GeneralDropoutComponent::Backprop(), PerElementOffsetComponent::Backprop(), ScaleAndOffsetComponent::Backprop(), kaldi::cu::BackpropLstmNonlinearity(), CuMatrixBase< float >::Ceiling(), DistributeComponent::ComputeInputPointers(), kaldi::cu::ComputeLstmNonlinearity(), kaldi::nnet3::time_height_convolution::ConvolveBackwardData(), kaldi::nnet3::time_height_convolution::ConvolveBackwardDataInternal(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParams(), kaldi::nnet3::time_height_convolution::ConvolveBackwardParamsInternal(), kaldi::nnet3::time_height_convolution::ConvolveForward(), kaldi::nnet3::time_height_convolution::ConvolveForwardInternal(), CuVectorBase< float >::CopyColFromMat(), CuMatrixBase< float >::CopyCols(), CuVectorBase< float >::CopyDiagFromMat(), CuVectorBase< float >::CopyElements(), CuMatrixBase< float >::CopyFromMat(), CuMatrixBase< float >::CopyRangeFromMatClamped(), CuMatrixBase< float >::CopyRows(), CuVectorBase< float >::CopyRowsFromMat(), VectorBase< float >::CopyRowsFromMat(), CuMatrixBase< float >::DiffGroupPnorm(), CuMatrixBase< float >::DiffLogSoftmaxPerRow(), kaldi::cu::DiffNormalizePerRow(), CuMatrixBase< float >::DiffParametricRelu(), CuMatrixBase< float >::DiffSigmoid(), CuMatrixBase< float >::DiffSoftmaxPerRow(), CuMatrixBase< float >::DiffTanh(), CuMatrixBase< float >::DivElements(), kaldi::cu::EnsureNonzero(), CuMatrixBase< float >::EqualElementMask(), CuMatrixBase< float >::Exp(), CuMatrixBase< float >::ExpLimited(), CuMatrixBase< float >::ExpSpecial(), CuMatrixBase< float >::Floor(), NnetBatchComputer::FormatInputs(), NnetBatchComputer::FormatOutputs(), TdnnComponent::GetInputPart(), NnetComputer::GetPointers(), CuMatrixBase< float >::GroupMax(), CuMatrixBase< float >::GroupMaxDeriv(), CuMatrixBase< float >::GroupPnorm(), CuMatrixBase< float >::Heaviside(), CuMatrixBase< float >::Log(), CuMatrixBase< float >::LogSoftMaxPerRow(), CuMatrixBase< float >::Max(), kaldi::nnet3::MergeTaskOutput(), CuMatrixBase< float >::Min(), CuMatrixBase< float >::MulElements(), CuMatrixBase< float >::MulRows(), CuMatrixBase< float >::MulRowsGroupMat(), kaldi::cu::NormalizePerRow(), CuMatrixBase< float >::ParametricRelu(), CuMatrixBase< float >::Pow(), CuMatrixBase< float >::PowAbs(), NormalizeComponent::Propagate(), BatchNormComponent::Propagate(), TimeHeightConvolutionComponent::Propagate(), RepeatedAffineComponent::Propagate(), GeneralDropoutComponent::Propagate(), PerElementOffsetComponent::Propagate(), ScaleAndOffsetComponent::Propagate(), CuRand< float >::RandGaussian(), CuRand< float >::RandUniform(), kaldi::cu::RegularizeL1(), RectifiedLinearComponent::RepairGradients(), CuMatrixBase< float >::SetMatMatDivMat(), CuMatrixBase< float >::Sigmoid(), CuMatrixBase< float >::SoftHinge(), CuMatrixBase< float >::SoftMaxPerRow(), BatchNormComponent::StoreStats(), CuMatrixBase< float >::SymAddMat2(), CuMatrixBase< float >::Tanh(), kaldi::TraceMatMat(), kaldi::UnitTestCopySp(), RepeatedAffineComponent::Update(), NaturalGradientRepeatedAffineComponent::Update(), TimeHeightConvolutionComponent::UpdateNaturalGradient(), and TimeHeightConvolutionComponent::UpdateSimple().

217 { return stride_; }
MatrixIndexT stride_
Definition: cu-matrix.h:787

◆ Sum()

Real Sum ( ) const

Definition at line 3012 of file cu-matrix.cc.

Referenced by BackpropTruncationComponent::Backprop(), kaldi::nnet1::CheckNanInf(), kaldi::nnet3::ComputeObjectiveFunction(), kaldi::nnet1::ComputeStdDev(), Xent::Eval(), Mse::Eval(), main(), kaldi::MeanVariance(), CuMatrixBase< float >::operator()(), kaldi::nnet3::PrintParameterStats(), NnetRescaler::RescaleComponent(), kaldi::TestCuMatrixSum(), kaldi::UnitTestCuBlockMatrixAddMatMat(), kaldi::UnitTestCuMatrixAddDiagVecMat(), kaldi::UnitTestCuMatrixEqualElementMask(), kaldi::UnitTestCuMatrixReduceSum(), kaldi::UnitTestCuMatrixSetRandn(), kaldi::UnitTestCuMatrixSetRandUniform(), kaldi::UnitTestCuMatrixTraceMatMat(), kaldi::UnitTestCuSparseMatrixSum(), kaldi::nnet1::UnitTestDropoutComponent(), kaldi::nnet3::UnitTestNnetCompute(), kaldi::nnet3::UnitTestNnetOptimizeWithOptions(), FramePoolingComponent::Update(), and NaturalGradientRepeatedAffineComponent::Update().

3012  {
3013 #if HAVE_CUDA == 1
3014  if (CuDevice::Instantiate().Enabled()) {
3015  KALDI_ASSERT(num_rows_ > 0 && num_cols_ > 0);
3016  CuTimer tim;
3017 
3018  CuVector<Real> col_sum(num_rows_, kUndefined);
3019  cuda_sum_mat_cols(num_rows_, CU1DBLOCK, col_sum.Data(), data_, Dim());
3020  Real ans = col_sum.Sum();
3021 
3022  CuDevice::Instantiate().AccuProfile(__func__, tim);
3023  return ans;
3024  } else
3025 #endif
3026  {
3027  return Mat().Sum();
3028  }
3029 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ SumColumnRanges()

void SumColumnRanges ( const CuMatrixBase< Real > &  src,
const CuArrayBase< Int32Pair > &  indexes 
)

For each row r of this and for each column c, sets (*this)(r, c) to the sum src(r, j), where j ranges from indexes[c].first through indexes[c].second - 1.

Definition at line 2893 of file cu-matrix.cc.

Referenced by SumGroupComponent::Propagate(), and kaldi::UnitTestCuMatrixSumColumnRanges().

2894  {
2895  KALDI_ASSERT(static_cast<MatrixIndexT>(indices.Dim()) == NumCols());
2896  KALDI_ASSERT(NumRows() == src.NumRows());
2897  if (NumRows() == 0) return;
2898 #if HAVE_CUDA == 1
2899  if (CuDevice::Instantiate().Enabled()) {
2900  CuTimer tim;
2901  dim3 dimGrid, dimBlock;
2902  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2903  &dimGrid, &dimBlock);
2904  cuda_sum_column_ranges(dimGrid, dimBlock, data_, Dim(), src.Data(),
2905  src.Dim(), indices.Data());
2906  CU_SAFE_CALL(cudaGetLastError());
2907  CuDevice::Instantiate().AccuProfile(__func__, tim);
2908  } else
2909 #endif
2910  {
2911  int32 num_rows = this->num_rows_, num_cols = this->num_cols_,
2912  this_stride = this->stride_, src_stride = src.stride_;
2913  Real *data = this->data_;
2914  const Real *src_data = src.data_;
2915  const Int32Pair *indices_data = indices.Data();
2916  for (int32 row = 0; row < num_rows; row++) {
2917  for (int32 col = 0; col < num_cols; col++) {
2918  int32 start_col = indices_data[col].first,
2919  end_col = indices_data[col].second;
2920  Real sum = 0.0;
2921  for (int32 src_col = start_col; src_col < end_col; src_col++)
2922  sum += src_data[row * src_stride + src_col];
2923  data[row * this_stride + col] = sum;
2924  }
2925  }
2926  }
2927 }
kaldi::int32 int32
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT stride_
Definition: cu-matrix.h:787
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
::MatrixDim Dim() const
Definition: cu-matrix.h:221
int32_cuda second
Definition: cu-matrixdim.h:80
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215
int32_cuda first
Definition: cu-matrixdim.h:79
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ SymAddMat2()

void SymAddMat2 ( const Real  alpha,
const CuMatrixBase< Real > &  M,
MatrixTransposeType  transA,
Real  beta 
)

*this = beta * *this + alpha * M M^T, for symmetric matrices.

It only updates the lower triangle of *this. It will leave the matrix asymmetric; if you need it symmetric as a regular matrix, do CopyLowerToUpper().

Definition at line 1353 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::ApplyLog(), CuMatrixBase< float >::Cholesky(), kaldi::nnet3::ConstrainOrthonormalInternal(), kaldi::nnet2::PreconditionDirections(), OnlineNaturalGradient::ReorthogonalizeRt1(), OnlinePreconditioner::ReorthogonalizeXt1(), kaldi::TestSymInvertPosDef(), kaldi::UnitTestCuCholesky(), kaldi::UnitTestCuMatrixSymAddMat2(), and kaldi::UnitTestCuMatrixSymInvertPosDef().

1355  {
1357  ((transA == kNoTrans && A.num_rows_ == num_rows_) ||
1358  (transA == kTrans && A.num_cols_ == num_cols_)));
1359  if (num_rows_ == 0) return;
1360  KALDI_ASSERT(A.data_ != data_);
1361 
1362 #if HAVE_CUDA == 1
1363  if (CuDevice::Instantiate().Enabled()) {
1364  CuTimer tim;
1365  cublasOperation_t trans = (transA == kTrans ? CUBLAS_OP_N : CUBLAS_OP_T);
1366  MatrixIndexT A_other_dim = (transA == kNoTrans ? A.num_cols_ : A.num_rows_);
1367  CUBLAS_SAFE_CALL(cublas_syrk(GetCublasHandle(), CUBLAS_FILL_MODE_UPPER,
1368  trans, num_rows_, A_other_dim,
1369  alpha, A.Data(), A.Stride(),
1370  beta, this->data_, this->stride_));
1371 
1372  CuDevice::Instantiate().AccuProfile(__func__, tim);
1373  } else
1374 #endif
1375  {
1376  Mat().SymAddMat2(alpha, A.Mat(), transA, beta);
1377  }
1378 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
MatrixIndexT stride_
Definition: cu-matrix.h:787
int32 MatrixIndexT
Definition: matrix-common.h:98
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ SymInvertPosDef()

void SymInvertPosDef ( )

Inversion for positive definite symmetric matrices.

Treats the input as symmetric but only reads the lower triangle. The output is symmetric.

Definition at line 2111 of file cu-matrix.cc.

Referenced by CuSpMatrix< Real >::Invert(), kaldi::nnet2::PreconditionDirections(), CuMatrixBase< float >::SizeInBytes(), kaldi::TestSymInvertPosDef(), kaldi::UnitInvert(), kaldi::UnitTestCuMatrixSymInvertPosDef(), and kaldi::UnitTestInvert().

2111  {
2113  if (num_rows_ == 0) return;
2114 #if HAVE_CUDA == 1
2115  if (CuDevice::Instantiate().Enabled()) {
2116  CuTimer tim;
2117  CuMatrix<Real> inv_cholesky(num_rows_, num_rows_);
2118  this->Cholesky(&inv_cholesky);
2119  // note: SymAddMat2 only updates lower part of *this.
2120  this->SymAddMat2(1.0, inv_cholesky, kTrans, 0.0);
2121  this->CopyLowerToUpper();
2122  CuDevice::Instantiate().AccuProfile(__func__, tim);
2123  } else
2124 #endif
2125  {
2126  SpMatrix<Real> temp_sp(this->Mat(), kTakeLower);
2127  TpMatrix<Real> C(temp_sp.NumRows(), kUndefined);
2128  C.Cholesky(temp_sp);
2129  C.Invert();
2130  temp_sp.AddTp2(1.0, C, kTrans, 0.0);
2131  this->Mat().CopyFromSp(temp_sp);
2132  // was previously just: CuSpMatrix::Invert().
2133  }
2134 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
void SymAddMat2(const Real alpha, const CuMatrixBase< Real > &M, MatrixTransposeType transA, Real beta)
*this = beta * *this + alpha * M M^T, for symmetric matrices.
Definition: cu-matrix.cc:1353
void Cholesky(CuMatrixBase< Real > *inv_cholesky=NULL)
This function does sets *this to the Cholesky factor of *this (i.e.
Definition: cu-matrix.cc:1987
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ Tanh()

void Tanh ( const CuMatrixBase< Real > &  src)

Compute the hyperbolic tangent (tanh) function; element by element, *this = tanh(src).

Definition at line 1786 of file cu-matrix.cc.

Referenced by LstmNonlinearityComponent::ConsolidateMemory(), TanhComponent::Propagate(), Tanh::PropagateFnc(), CuMatrixBase< float >::SizeInBytes(), and kaldi::UnitTestCuTanh().

1786  {
1787  KALDI_ASSERT(SameDim(*this, src));
1788 #if HAVE_CUDA == 1
1789  if (CuDevice::Instantiate().Enabled()) {
1790  CuTimer tim;
1791  dim3 dimGrid, dimBlock;
1792  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1793  &dimGrid, &dimBlock);
1794 
1795  cuda_tanh(dimGrid, dimBlock, this->data_, src.data_, this->Dim(), src.Stride());
1796  CU_SAFE_CALL(cudaGetLastError());
1797 
1798  CuDevice::Instantiate().AccuProfile(__func__, tim);
1799  } else
1800 #endif
1801  {
1802  Mat().Tanh(src.Mat());
1803  }
1804 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ Trace()

Real Trace ( bool  check_square = true) const

Return the trace. If check_square = true, will crash if matrix is not square.

Definition at line 3075 of file cu-matrix.cc.

Referenced by kaldi::nnet3::ConstrainOrthonormalInternal(), kaldi::CuVectorUnitTestCopyDiagFromMat(), and CuMatrixBase< float >::operator()().

3075  {
3076 #if HAVE_CUDA == 1
3077  if (CuDevice::Instantiate().Enabled()) {
3078  CuTimer tim;
3079  if (check_square) KALDI_ASSERT(this->num_rows_ == this->num_cols_);
3080  MatrixIndexT dim = std::min(this->num_rows_, this->num_cols_);
3081  CuVector<Real> tmp(1, kUndefined); // for result.
3082  int dimBlock(CU1DBLOCK);
3083  int dimGrid = 1;// only 1 block here. we have loops in each thread //(n_blocks(dim_, CU1DBLOCK));
3084  cuda_vec_sum(dimGrid, dimBlock, data_, tmp.Data(), dim, Stride() + 1);
3085  CU_SAFE_CALL(cudaGetLastError());
3086  CuDevice::Instantiate().AccuProfile("CuVectorBase::Sum", tim);
3087  return tmp(0);
3088  } else
3089 #endif
3090  {
3091  return Mat().Trace(check_square);
3092  }
3093 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
MatrixIndexT Stride() const
Definition: cu-matrix.h:217
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:777
int32 MatrixIndexT
Definition: matrix-common.h:98
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

◆ Write()

void Write ( std::ostream &  os,
bool  binary 
) const

Definition at line 502 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::operator()(), kaldi::UnitTestCuMatrixIO(), and RestrictedAttentionComponent::Write().

502  {
503  Matrix<Real> temp(this->num_rows_, this->num_cols_, kUndefined);
504  this->CopyToMat(&temp);
505  temp.Write(os, binary);
506 }
void CopyToMat(MatrixBase< OtherReal > *dst, MatrixTransposeType trans=kNoTrans) const
Definition: cu-matrix.cc:447
MatrixIndexT num_cols_
Definition: cu-matrix.h:785
MatrixIndexT num_rows_
Definition: cu-matrix.h:786

Friends And Related Function Documentation

◆ AddMatMatBatched

void AddMatMatBatched ( const Real  alpha,
std::vector< CuSubMatrix< Real > * > &  C,
const std::vector< CuSubMatrix< Real > * > &  A,
MatrixTransposeType  transA,
const std::vector< CuSubMatrix< Real > * > &  B,
MatrixTransposeType  transB,
const Real  beta 
)
friend

Does multiple matrix multiplications, executing them in parallel using cuBLAS's gemmBatched if we are using a GPU.

Vectors A, B and C must have the same length; for each i, this function executes the matrix operation C[i] = alpha * A[i](^T)*B[i](^T) + beta * C[i].

Definition at line 2207 of file cu-matrix.cc.

2211  {
2212  KALDI_ASSERT(A.size() == B.size() && B.size() == C.size());
2213  int32 size = A.size();
2214 
2215  if (size == 0) return;
2216 
2217  // all elements must have the same num-rows, num-cols and stride
2218  for (int32 i = 0; i + 1 < size; i++) {
2219  KALDI_ASSERT(A[i]->NumRows() == A[i+1]->NumRows());
2220  KALDI_ASSERT(A[i]->NumCols() == A[i+1]->NumCols());
2221  KALDI_ASSERT(A[i]->Stride() == A[i+1]->Stride());
2222  KALDI_ASSERT(B[i]->NumRows() == B[i+1]->NumRows());
2223  KALDI_ASSERT(B[i]->NumCols() == B[i+1]->NumCols());
2224  KALDI_ASSERT(B[i]->Stride() == B[i+1]->Stride());
2225  KALDI_ASSERT(C[i]->NumRows() == C[i+1]->NumRows());
2226  KALDI_ASSERT(C[i]->NumCols() == C[i+1]->NumCols());
2227  KALDI_ASSERT(C[i]->Stride() == C[i+1]->Stride());
2228  }
2229  // CUBLAS is col-major, cudamatrix is row-major, how to do the mapping?
2230  // keep trans..., just swap A&B matrices: A->B B->A
2231  MatrixIndexT m = ((transB==kTrans)? B[0]->NumRows() : B[0]->NumCols());
2232  MatrixIndexT n = ((transA==kTrans)? A[0]->NumCols() : A[0]->NumRows());
2233  MatrixIndexT k = ((transB==kTrans)? B[0]->NumCols() : B[0]->NumRows());
2234  MatrixIndexT k1 = ((transA==kTrans)? A[0]->NumRows() : A[0]->NumCols());
2235 
2236  KALDI_ASSERT(m == C[0]->NumCols());
2237  KALDI_ASSERT(n == C[0]->NumRows());
2238  KALDI_ASSERT(k == k1);
2239 
2240  if (m == 0) return;
2241 
2242 #if HAVE_CUDA == 1
2243  if (CuDevice::Instantiate().Enabled()) {
2244  CuTimer tim;
2245  Real **device_abc_array =
2246  static_cast<Real**>(CuDevice::Instantiate().Malloc(3 * size * sizeof(Real*)));
2247  const Real **device_a_array = const_cast<const Real**>(device_abc_array);
2248  const Real **device_b_array = const_cast<const Real**>(device_abc_array) + size;
2249  Real **device_c_array = device_abc_array + 2 * size;
2250  const Real **host_abc_array = new const Real*[3*size];
2251  const Real **host_a_array = host_abc_array;
2252  const Real **host_b_array = host_abc_array + size;
2253  const Real **host_c_array = host_abc_array + 2 * size;
2254 
2255  for (int32 i = 0; i < size; i++) {
2256  host_a_array[i] = A[i]->data_;
2257  host_b_array[i] = B[i]->data_;
2258  host_c_array[i] = C[i]->data_;
2259  }
2260 
2261  CU_SAFE_CALL(cudaMemcpyAsync(device_abc_array, host_abc_array,
2262  3*size*sizeof(Real*), cudaMemcpyHostToDevice,
2263  cudaStreamPerThread));
2264 
2265  CUBLAS_SAFE_CALL(cublas_gemmBatched(GetCublasHandle(),
2266  (transB==kTrans? CUBLAS_OP_T:CUBLAS_OP_N),
2267  (transA==kTrans? CUBLAS_OP_T:CUBLAS_OP_N),
2268  m, n, k, alpha, device_b_array,
2269  B[0]->Stride(), device_a_array,
2270  A[0]->Stride(), beta, device_c_array,
2271  C[0]->Stride(), size));
2272 
2273  CuDevice::Instantiate().Free(device_abc_array);
2274  delete[] host_abc_array;
2275 
2276  CuDevice::Instantiate().AccuProfile(__func__, tim);
2277  } else
2278 #endif
2279  {
2280  for (int32 i = 0; i < size; i++) {
2281  C[i]->Mat().AddMatMat(alpha, A[i]->Mat(), transA, B[i]->Mat(), transB, beta);
2282  }
2283  }
2284 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:755
MatrixIndexT Stride() const
Definition: cu-matrix.h:217
kaldi::int32 int32
int32 MatrixIndexT
Definition: matrix-common.h:98
struct rnnlm::@11::@12 n
MatrixIndexT NumCols() const
Definition: cu-matrix.h:216
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:215

◆ CuBlockMatrix< Real >

friend class CuBlockMatrix< Real >
friend

Definition at line 93 of file cu-matrix.h.

◆ CuMatrixBase< double >

friend class CuMatrixBase< double >
friend

Definition at line 82 of file cu-matrix.h.

◆ CuMatrixBase< float >

friend class CuMatrixBase< float >
friend

Definition at line 81 of file cu-matrix.h.

◆ CuRand< Real >

friend class CuRand< Real >
friend

Definition at line 91 of file cu-matrix.h.

◆ CuSparseMatrix< double >

friend class CuSparseMatrix< double >
friend

Definition at line 95 of file cu-matrix.h.

◆ CuSparseMatrix< float >

friend class CuSparseMatrix< float >
friend

Definition at line 94 of file cu-matrix.h.

◆ CuSparseMatrix< Real >

friend class CuSparseMatrix< Real >
friend

Definition at line 96 of file cu-matrix.h.

◆ CuSpMatrix< Real >

friend class CuSpMatrix< Real >
friend

Definition at line 86 of file cu-matrix.h.

◆ CuSubMatrix< Real >

friend class CuSubMatrix< Real >
friend

Definition at line 90 of file cu-matrix.h.

◆ CuSubVector< Real >

friend class CuSubVector< Real >
friend

Definition at line 92 of file cu-matrix.h.

◆ CuTpMatrix< double >

friend class CuTpMatrix< double >
friend

Definition at line 88 of file cu-matrix.h.

◆ CuTpMatrix< float >

friend class CuTpMatrix< float >
friend

Definition at line 87 of file cu-matrix.h.

◆ CuVectorBase< double >

friend class CuVectorBase< double >
friend

Definition at line 84 of file cu-matrix.h.

◆ CuVectorBase< float >

friend class CuVectorBase< float >
friend

Definition at line 83 of file cu-matrix.h.

◆ CuVectorBase< Real >

friend class CuVectorBase< Real >
friend

Definition at line 89 of file cu-matrix.h.

◆ TraceMatMat

Real TraceMatMat ( const CuMatrixBase< Real > &  A,
const CuMatrixBase< Real > &  B,
MatrixTransposeType  trans 
)
friend

Definition at line 2145 of file cu-matrix.cc.

Referenced by CuMatrixBase< float >::FrobeniusNorm().

2147  {
2148  if (A.num_rows_ == 0) {
2149  KALDI_ASSERT(B.num_rows_ == 0);
2150  return 0.0;
2151  }
2152  Real result = 0;
2153 #if HAVE_CUDA == 1
2154  if (CuDevice::Instantiate().Enabled()) {
2155  if (trans == kNoTrans) {
2156  KALDI_ASSERT(A.NumRows() == B.NumCols() && A.NumCols() == B.NumRows());
2157  } else {
2158  KALDI_ASSERT(A.NumRows() == B.NumRows() && A.NumCols() == B.NumCols());
2159  }
2160  CuTimer tim;
2161  // 2D blocks: each (8x32) block sums up (32x32) elements.
2162  // 2D grid: try to cover all the matrix A unless it is too big.
2163  // Kernel will reduce to ~256 elements with good performance,
2164  // if the matrix is not in a very bad shape.
2165  // (wider or taller than 32x8192)
2166  // CPU will then reduce to 1 element.
2167  const int kWarpSize = 32;
2168  dim3 dimBlock(kWarpSize, CU1DBLOCK / kWarpSize);
2169  dim3 dimGrid(n_blocks(A.NumCols(), kWarpSize),
2170  n_blocks(A.NumRows(), kWarpSize));
2171  if (dimGrid.x * dimGrid.y > 256) {
2172  dimGrid.y = 256 / dimGrid.x;
2173  if (dimGrid.y == 0) {
2174  dimGrid.y = 1;
2175  }
2176  }
2177  CuVector<Real> result_vec(dimGrid.x * dimGrid.y, kUndefined);
2178  if (trans == kNoTrans) {
2179  cuda_trace_mat_mat(dimGrid, dimBlock, A.Data(), B.Data(), A.Dim(),
2180  B.Stride(), result_vec.Data());
2181  } else {
2182  cuda_trace_mat_mat_trans(dimGrid, dimBlock, A.Data(), B.Data(), A.Dim(),
2183  B.Stride(), result_vec.Data());
2184  }
2185  CU_SAFE_CALL(cudaGetLastError());
2186  Vector<Real> result_cpu(result_vec); // copying from CUDA faster than summing in CUDA.
2187  result = result_cpu.Sum();
2188  CuDevice::Instantiate().AccuProfile(__func__, tim);
2189  } else
2190 #endif
2191  {
2192  result = TraceMatMat(A.Mat(), B.Mat(), trans);
2193  }
2194  return result;
2195 }
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
friend Real TraceMatMat(const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, MatrixTransposeType trans)
Definition: cu-matrix.cc:2145

◆ TraceMatSmat

Real TraceMatSmat ( const CuMatrixBase< Real > &  A,
const CuSparseMatrix< Real > &  B,
MatrixTransposeType  trans 
)
friend

Definition at line 524 of file cu-sparse-matrix.cc.

526  {
527  if (A.NumCols() == 0) {
528  KALDI_ASSERT(B.NumCols() == 0);
529  return 0.0;
530  }
531  if (B.NumElements() == 0) {
532  return 0.0;
533  }
534  Real result = 0;
535 #if HAVE_CUDA == 1
536  if (CuDevice::Instantiate().Enabled()) {
537  if (trans == kTrans) {
538  KALDI_ASSERT(A.NumRows() == B.NumRows() && A.NumCols() == B.NumCols());
539  } else {
540  KALDI_ASSERT(A.NumCols() == B.NumRows() && A.NumRows() == B.NumCols());
541  }
542 
543  // The Sum() method in CuVector handles a bunch of logic, we use that to
544  // comptue the trace.
545  CuVector<Real> sum_vec(B.NumElements());
546  CuTimer tim;
547 
548  // We use warpSize threads per row to access only the nnz elements.
549  // Every CU1DBLOCK/warpSize rows share one thread block.
550  // 1D grid to cover all rows of B.
551  const int warpSize = 32;
552  dim3 dimBlock(warpSize, CU1DBLOCK / warpSize);
553  dim3 dimGrid(n_blocks(B.NumRows(), dimBlock.y));
554 
555  if (trans == kNoTrans) {
556  cuda_trace_mat_smat(dimGrid, dimBlock, A.Data(), A.Dim(), B.CsrRowPtr(),
557  B.CsrColIdx(), B.CsrVal(), sum_vec.Data());
558  } else {
559  cuda_trace_mat_smat_trans(dimGrid, dimBlock, A.Data(), A.Dim(),
560  B.CsrRowPtr(), B.CsrColIdx(), B.CsrVal(),
561  sum_vec.Data());
562  }
563  result = sum_vec.Sum();
564  CuDevice::Instantiate().AccuProfile(__func__, tim);
565  } else
566 #endif
567  {
568  result = TraceMatSmat(A.Mat(), B.Smat(), trans);
569  }
570  return result;
571 }
friend Real TraceMatSmat(const CuMatrixBase< Real > &A, const CuSparseMatrix< Real > &B, MatrixTransposeType trans)
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185

◆ VectorBase< Real >

friend class VectorBase< Real >
friend

Definition at line 85 of file cu-matrix.h.

Member Data Documentation

◆ data_

Real* data_
protected

GPU data pointer (or regular matrix data pointer,.

if either CUDA was not compiled in or we could not acquire the device).

Definition at line 777 of file cu-matrix.h.

Referenced by CuMatrixBase< float >::AddMat(), CuMatrixBase< float >::AddMatBlocks(), CuMatrixBase< float >::AddMatMat(), kaldi::AddMatMatBatched(), CuMatrixBase< float >::AddRowRanges(), CuMatrixBase< float >::Ceiling(), CuMatrixBase< float >::CopyFromMat(), CuSubMatrix< Real >::CuSubMatrix(), CuMatrixBase< float >::Data(), CuMatrixBase< float >::DiffParametricRelu(), CuMatrixBase< float >::DiffSigmoid(), CuMatrixBase< float >::DiffTanh(), CuMatrixBase< float >::DivElements(), CuMatrixBase< float >::Exp(), CuMatrixBase< float >::ExpLimited(), CuMatrixBase< float >::ExpSpecial(), CuMatrixBase< float >::Floor(), CuMatrixBase< float >::GroupMax(), CuMatrixBase< float >::GroupPnorm(), CuMatrixBase< float >::Heaviside(), CuMatrixBase< float >::Log(), CuMatrixBase< float >::LogSoftMaxPerRow(), CuMatrixBase< float >::Max(), CuMatrixBase< float >::Min(), CuMatrixBase< float >::MulElements(), CuMatrixBase< float >::MulRowsGroupMat(), CuMatrixBase< float >::operator()(), CuMatrixBase< float >::ParametricRelu(), CuMatrixBase< float >::Pow(), CuMatrixBase< float >::PowAbs(), CuMatrixBase< float >::Row(), CuMatrixBase< float >::RowData(), CuMatrixBase< float >::SetMatMatDivMat(), CuMatrixBase< float >::Sigmoid(), CuMatrixBase< float >::SoftHinge(), CuMatrixBase< float >::SoftMaxPerRow(), CuMatrixBase< float >::SumColumnRanges(), CuMatrix< float >::Swap(), CuMatrixBase< float >::SymAddMat2(), and CuMatrixBase< float >::Tanh().

◆ num_cols_

◆ num_rows_

◆ stride_


The documentation for this class was generated from the following files: