All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
CuMatrixBase< Real > Singleton Reference

Matrix for CUDA computing. More...

#include <matrix-common.h>

Inheritance diagram for CuMatrixBase< Real >:
Collaboration diagram for CuMatrixBase< Real >:

Public Member Functions

void CopyCols (const CuMatrixBase< Real > &src, const CuArray< MatrixIndexT > &indexes)
 Copies column r from column indexes[r] of src. More...
 
void AddCols (const CuMatrixBase< Real > &src, const CuArray< MatrixIndexT > &indices)
 Add column indices[r] of src to column r. More...
 
void CopyRows (const CuMatrixBase< Real > &src, const CuArray< MatrixIndexT > &indexes)
 Copies row r from row indexes[r] of src. More...
 
void CopyRows (const CuArray< const Real * > &src)
 Copies row r of this matrix from an array of floats at the location given by src[r], where src[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (the point is: the data it points to should be on the GPU if we're using a GPU, and on a CPU otherwise). More...
 
void CopyToRows (const CuArray< Real * > &dst) const
 For each row r of this matrix, copies it to the array of floats at the location given by dst[r], where dst[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (i.e. More...
 
void AddRows (Real alpha, const CuMatrixBase< Real > &src, const CuArray< MatrixIndexT > &indexes)
 Does for each row r, this.Row(r) += alpha * src.row(indexes[r]). More...
 
void AddRows (Real alpha, const CuArray< const Real * > &src)
 Does for each row r, this.Row(r) += alpha * src[r], treating src[r] as the beginning of a region of memory representing a vector of floats, of the same length as this.NumCols(). More...
 
void AddToRows (Real alpha, const CuArray< Real * > &dst) const
 For each row r of this matrix, adds it (times alpha) to the array of floats at the location given by dst[r], where dst[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (i.e. More...
 
void SumColumnRanges (const CuMatrixBase< Real > &src, const CuArray< Int32Pair > &indexes)
 For each row r of this and for each column c, sets (*this)(r, c) to the sum src(r, j), where j ranges from indexes[c].first through indexes[c].second - 1. More...
 
void AddRowRanges (const CuMatrixBase< Real > &src, const CuArray< Int32Pair > &indexes)
 For each row r of this and for each column c, do (*this)(r, c) += src(j, c), where j ranges from indexes[r].first through indexes[r].second - 1. More...
 
void AddToDiag (Real value)
 Adds "value" to the diagonal elements of the matrix. More...
 
MatrixIndexT NumRows () const
 Dimensions. More...
 
MatrixIndexT NumCols () const
 
MatrixIndexT Stride () const
 
::MatrixDim Dim () const
 
Real FrobeniusNorm () const
 
bool IsUnit (Real tol=0.001) const
 
bool ApproxEqual (const CuMatrixBase< Real > &other, float tol=0.01) const
 True if ((*this)-other).FrobeniusNorm() <= tol * this->FrobeniusNorm() More...
 
MatrixIndexT SizeInBytes () const
 Get size of matrix in bytes. More...
 
template<typename OtherReal >
void CopyFromMat (const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
 
void CopyFromGeneralMat (const GeneralMatrix &src, MatrixTransposeType trans=kNoTrans)
 
void CopyFromMat (const MatrixBase< Real > &src, MatrixTransposeType trans=kNoTrans)
 
void CopyFromSp (const CuSpMatrix< Real > &M)
 
template<typename OtherReal >
void CopyFromTp (const CuTpMatrix< OtherReal > &M, MatrixTransposeType trans=kNoTrans)
 
template<typename OtherReal >
void CopyFromMat (const CuMatrixBase< OtherReal > &M, MatrixTransposeType trans=kNoTrans)
 
template<typename OtherReal >
void CopyToMat (MatrixBase< OtherReal > *dst, MatrixTransposeType trans=kNoTrans) const
 
void CopyRowsFromVec (const CuVectorBase< Real > &v)
 This function has two modes of operation. More...
 
void CopyRowsFromVec (const VectorBase< Real > &v)
 Version of CopyRowsFromVec() that takes a CPU-based vector. More...
 
void CopyColFromVec (const CuVectorBase< Real > &v, const MatrixIndexT col)
 Copy vector into specific column of matrix. More...
 
void Sigmoid (const CuMatrixBase< Real > &src)
 Set each element to the sigmoid of the corresponding element of "src": element by element, x = 1 / (1 + exp(-x)) More...
 
void Heaviside (const CuMatrixBase< Real > &src)
 Set each element to the Heaviside function of the corresponding element of "src", which we define as the function (x > 0 ? 1.0 : 0.0) [note: in general, there are different ways to deal with the situation when x==0. More...
 
void SoftHinge (const CuMatrixBase< Real > &src)
 Apply the function y = log(1 + exp(x)), to each element. More...
 
void GroupPnorm (const CuMatrixBase< Real > &src, Real pow)
 Apply the function y(i) = (sum_{j = i*G}^{(i+1)*G-1} x_j ^ (power)) ^ (1 / p) where G = x.NumCols() / y.NumCols() must be an integer. More...
 
void DiffGroupPnorm (const CuMatrixBase< Real > &in_value, const CuMatrixBase< Real > &out_value, const CuMatrixBase< Real > &out_deriv, Real power)
 Differentiate backward through the GroupPnorm function. More...
 
void GroupMax (const CuMatrixBase< Real > &src)
 Apply the function y(i) = (max_{j = i*G}^{(i+1)*G-1} x_j where G = x.NumCols() / y.NumCols() must be an integer. More...
 
void GroupMaxDeriv (const CuMatrixBase< Real > &input, const CuMatrixBase< Real > &output)
 Calculate derivatives for the GroupMax function above, where "input" is the input to the GroupMax function above (i.e. More...
 
void ParametricRelu (const CuMatrixBase< Real > &src, const CuVectorBase< Real > &alpha, const CuVectorBase< Real > &beta)
 Compute the parametric rectified linear unit function; element by element, *this = src * (src > 0 ? alpha : beta) More...
 
void DiffParametricRelu (const CuMatrixBase< Real > &value, const CuMatrixBase< Real > &diff, const CuVectorBase< Real > &alpha, const CuVectorBase< Real > &beta)
 Differentiate backward through the parametric relu function. More...
 
void Tanh (const CuMatrixBase< Real > &src)
 Compute the hyperbolic tangent (tanh) function; element by element, *this = tanh(src). More...
 
void DiffSigmoid (const CuMatrixBase< Real > &value, const CuMatrixBase< Real > &diff)
 Differentiate backward through the sigmoid function. More...
 
void DiffTanh (const CuMatrixBase< Real > &value, const CuMatrixBase< Real > &diff)
 Differentiate backward through the tanh function. More...
 
void DiffSoftmaxPerRow (const CuMatrixBase< Real > &value, const CuMatrixBase< Real > &diff)
 Differentiate backward through the softmax function. More...
 
void DiffLogSoftmaxPerRow (const CuMatrixBase< Real > &out_value, const CuMatrixBase< Real > &out_deriv)
 Differentiate backward through the log softmax function. More...
 
void DiffXent (const CuArray< int32 > &tgt, CuVector< Real > *log_post_tgt)
 Differentiate the block [softmax+cross-entropy] : dE/da = posterior_mat - target_mat, 'E' is error function, 'a' is activation on softmax input. More...
 
void Cholesky (CuMatrixBase< Real > *inv_cholesky=NULL)
 This function does sets *this to the Cholesky factor of *this (i.e. More...
 
void SymInvertPosDef ()
 Inversion for positive definite symmetric matrices. More...
 
void ApplyPow (Real power)
 Apply power to the absolute value of each element. More...
 
void ApplyPowAbs (Real power, bool include_sign=false)
 
void ApplyHeaviside ()
 For each element, sets x = (x > 0 ? 1.0 : 0.0). More...
 
void ApplyFloor (Real floor_val)
 
void ApplyCeiling (Real ceiling_val)
 
void ApplyExp ()
 
void ApplySoftMaxPerRow (const CuMatrixBase< Real > &src)
 Softmax nonlinearity Y = Softmax(X) : Yij = e^Xij / sum_k(e^Xik), done to each row for each row, the max value is first subtracted for good numerical stability. More...
 
void ApplyLogSoftMaxPerRow (const CuMatrixBase< Real > &src)
 LogSoftmax nonlinearity Y = LogSoftmax(X) : Yij = Xij - log(sum_k(e^Xik)), done to each row for each row, the max value is first subtracted for good numerical stability. More...
 
void FindRowMaxId (CuArray< int32 > *id) const
 Find the id of the maximal element for each row. More...
 
void SetZero ()
 Math operations, some calling kernels. More...
 
void Set (Real value)
 
void Add (Real value)
 
void SetZeroAboveDiag ()
 Zeroes all elements for which col > row. More...
 
void Scale (Real value)
 
void ApplyLog ()
 
void MulElements (const CuMatrixBase< Real > &A)
 Multiply two matrices elementwise: C = C .* A. More...
 
void DivElements (const CuMatrixBase< Real > &A)
 Divide two matrices elementwise: C = A ./ A. More...
 
void Max (const CuMatrixBase< Real > &A)
 Do, elementwise, *this = max(*this, A). More...
 
void MulColsVec (const CuVectorBase< Real > &scale)
 scale i'th column by scale[i] More...
 
void MulRowsVec (const CuVectorBase< Real > &scale)
 scale i'th row by scale[i] More...
 
void MulRowsGroupMat (const CuMatrixBase< Real > &src)
 divide each row into src.NumCols() groups, and then scale i'th row's jth group of elements by src[i, j]. More...
 
void DivRowsVec (const CuVectorBase< Real > &div)
 divide i'th row by scale[i] More...
 
void InvertElements ()
 invert the matrix by elements. More...
 
void AddMat (Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType trans=kNoTrans)
 *this += alpha * A More...
 
void AddMatBlocks (Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType trans=kNoTrans)
 if A.NumRows() is multiple of (*this)->NumRows and A.NumCols() is multiple of (*this)->NumCols divide A into blocks of the same size as (*this) and add them to *this (times alpha) More...
 
void AddVecToCols (Real alpha, const CuVectorBase< Real > &col, Real beta=1.0)
 (for each column c of *this), c = alpha * col + beta * c More...
 
void AddVecToRows (Real alpha, const CuVectorBase< Real > &row, Real beta=1.0)
 (for each row r of *this), r = alpha * row + beta * r More...
 
void AddMatMat (Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
 C = alpha * A(^T)*B(^T) + beta * C. More...
 
void AddVecVec (Real alpha, const CuVectorBase< Real > &x, const CuVectorBase< Real > &y)
 A = alpha * x * y^T + A . More...
 
void SetMatMatDivMat (const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, const CuMatrixBase< Real > &C)
 *this = a * b / c (by element; when c = 0, *this = a) *this can be an alias of a, b or c safely and get expected result. More...
 
void SymAddMat2 (const Real alpha, const CuMatrixBase< Real > &M, MatrixTransposeType transA, Real beta)
 *this = beta * *this + alpha * M M^T, for symmetric matrices. More...
 
void AddMatBlock (Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuBlockMatrix< Real > &B, MatrixTransposeType transB, Real beta)
 This function is like AddMatMat but for where the second argument is of type CuBlockMatrix (a block-diagonal matrix of blocks). More...
 
void AddDiagVecMat (const Real alpha, const CuVectorBase< Real > &v, const CuMatrixBase< Real > &M, MatrixTransposeType transM, Real beta=1.0)
 *this = beta * *this + alpha * diag(v) * M [or M^T]. More...
 
void AddMatDiagVec (const Real alpha, const CuMatrixBase< Real > &M, MatrixTransposeType transM, CuVectorBase< Real > &v, Real beta=1.0)
 
void AddMatMatElements (const Real alpha, const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, const Real beta)
 *this = beta * *this + alpha * A .* B (.* element by element multiplication) More...
 
void AddMatSp (const Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuSpMatrix< Real > &B, const Real beta)
 this <– beta*this + alpha*A*B More...
 
void AddSpMat (const Real alpha, const CuSpMatrix< Real > &A, const CuMatrixBase< Real > &B, MatrixTransposeType transB, const Real beta)
 this <– beta*this + alpha*SpA*B More...
 
void AddTpMat (const Real alpha, const CuTpMatrix< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, const Real beta)
 this <– beta*this + alpha*A*B. More...
 
void AddMatTp (const Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuTpMatrix< Real > &B, MatrixTransposeType transB, const Real beta)
 this <– beta*this + alpha*A*B. More...
 
void CopyFromBlock (const CuBlockMatrix< Real > &B, MatrixTransposeType trans=kNoTrans)
 
void CopyLowerToUpper ()
 
void CopyUpperToLower ()
 
CuSubMatrix< Real > Range (const MatrixIndexT row_offset, const MatrixIndexT num_rows, const MatrixIndexT col_offset, const MatrixIndexT num_cols) const
 
CuSubMatrix< Real > RowRange (const MatrixIndexT row_offset, const MatrixIndexT num_rows) const
 
CuSubMatrix< Real > ColRange (const MatrixIndexT col_offset, const MatrixIndexT num_cols) const
 
const CuSubVector< Real > Row (MatrixIndexT i) const
 
CuSubVector< Real > Row (MatrixIndexT i)
 
CuValue< Real > operator() (MatrixIndexT r, MatrixIndexT c)
 
Real operator() (MatrixIndexT r, MatrixIndexT c) const
 
Real Sum () const
 
Real Max () const
 
Real Min () const
 
Real Trace (bool check_square=true) const
 Return the trace. If check_square = true, will crash if matrix is not square. More...
 
void SetRandn ()
 
void SetRandUniform ()
 
void Write (std::ostream &os, bool binary) const
 
void AddElements (Real alpha, const std::vector< MatrixElement< Real > > &input)
 
void AddElements (Real alpha, const CuArray< Int32Pair > &indexes, const Real *input)
 
void Lookup (const std::vector< Int32Pair > &indexes, Real *output) const
 
void Lookup (const CuArray< Int32Pair > &indexes, Real *output) const
 
void EqualElementMask (const CuMatrixBase< Real > &mat, CuMatrix< Real > *mask) const
 
const Real * RowData (MatrixIndexT r) const
 Get raw row pointer (const). More...
 
Real * RowData (MatrixIndexT r)
 Get raw row pointer. More...
 
const Real * Data () const
 Return data pointer (const). More...
 
Real * Data ()
 Return data pointer. More...
 
const MatrixBase< Real > & Mat () const
 
MatrixBase< Real > & Mat ()
 

Protected Member Functions

 CuMatrixBase ()
 
 CuMatrixBase (Real *data, MatrixIndexT num_rows, MatrixIndexT num_cols, MatrixIndexT stride)
 This constructor takes the #rows, #cols and stride; it's called from the constructor of CuSubMatrix. More...
 

Protected Attributes

Real * data_
 GPU data pointer (or regular matrix data pointer,. More...
 
MatrixIndexT num_cols_
 
MatrixIndexT num_rows_
 
MatrixIndexT stride_
 

Private Member Functions

 KALDI_DISALLOW_COPY_AND_ASSIGN (CuMatrixBase)
 

Friends

class CuMatrixBase< float >
 
class CuMatrixBase< double >
 
class CuVectorBase< float >
 
class CuVectorBase< double >
 
class VectorBase< Real >
 
class CuSpMatrix< Real >
 
class CuTpMatrix< float >
 
class CuTpMatrix< double >
 
class CuVectorBase< Real >
 
class CuSubMatrix< Real >
 
class CuRand< Real >
 
class CuSubVector< Real >
 
class CuBlockMatrix< Real >
 
class CuSparseMatrix< float >
 
class CuSparseMatrix< double >
 
class CuSparseMatrix< Real >
 
Real TraceMatMat (const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, MatrixTransposeType trans)
 
Real TraceMatSmat (const CuMatrixBase< Real > &A, const CuSparseMatrix< Real > &B, MatrixTransposeType trans)
 
void AddMatMatBatched (const Real alpha, std::vector< CuSubMatrix< Real > * > &C, const std::vector< CuSubMatrix< Real > * > &A, MatrixTransposeType transA, const std::vector< CuSubMatrix< Real > * > &B, MatrixTransposeType transB, const Real beta)
 Does multiple matrix multiplications, executing them in parallel using cuBLAS's gemmBatched if we are using a GPU. More...
 

Detailed Description

template<typename Real>
singleton kaldi::CuMatrixBase< Real >

Matrix for CUDA computing.

Does the computation on the CUDA card when CUDA is compiled in and we have a suitable GPU (CuDevice::Instantiate().Enabled() == true); otherwise, does it on the CPU.

Definition at line 67 of file matrix-common.h.

Constructor & Destructor Documentation

CuMatrixBase ( )
inlineprotected

Definition at line 624 of file cu-matrix.h.

624 : data_(NULL), num_cols_(0), num_rows_(0), stride_(0) { }
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
MatrixIndexT stride_
Definition: cu-matrix.h:644
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
CuMatrixBase ( Real *  data,
MatrixIndexT  num_rows,
MatrixIndexT  num_cols,
MatrixIndexT  stride 
)
inlineprotected

This constructor takes the #rows, #cols and stride; it's called from the constructor of CuSubMatrix.

Definition at line 628 of file cu-matrix.h.

631  :
632  data_(data), num_cols_(num_cols), num_rows_(num_rows), stride_(stride) { }
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
MatrixIndexT stride_
Definition: cu-matrix.h:644
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
MatrixIndexT num_rows_
Definition: cu-matrix.h:643

Member Function Documentation

void Add ( Real  value)

Definition at line 546 of file cu-matrix.cc.

References data_, and Timer::Elapsed().

Referenced by BackpropTruncationComponent::Backprop(), TanhComponent::Backprop(), Xent::Eval(), main(), kaldi::MeanVariance(), DropoutComponent::Propagate(), Dropout::PropagateFnc(), ClipGradientComponent::RepairGradients(), TanhComponent::StoreStats(), kaldi::TestCuMatrixCompObjfAndDeriv(), kaldi::nnet3::TestSimpleComponentPropagateProperties(), kaldi::UnitTestCuMatrixAdd(), kaldi::UnitTestCuMatrixAdd2(), kaldi::UnitTestCuMatrixEqualElementMask(), kaldi::UnitTestCuMatrixObjfDeriv(), and kaldi::UnitTestCuMatrixSetRandUniform().

546  {
547 #if HAVE_CUDA == 1
548  if (CuDevice::Instantiate().Enabled()) {
549  if (num_rows_ == 0) return;
550  Timer tim;
551 
552  dim3 dimGrid, dimBlock;
553  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
554  &dimGrid, &dimBlock);
555 
556  cuda_add(dimGrid, dimBlock, data_, value, Dim());
557  CU_SAFE_CALL(cudaGetLastError());
558 
559  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
560  } else
561  #endif
562  {
563  Mat().Add(value);
564  }
565 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void AddCols ( const CuMatrixBase< Real > &  src,
const CuArray< MatrixIndexT > &  indices 
)

Add column indices[r] of src to column r.

As a special case, if indexes[i] == -1, skip column i indices.size() must equal this->NumCols(), all elements of "reorder" must be in [-1, src.NumCols()-1], and src.NumRows() must equal this.NumRows()

Definition at line 2369 of file cu-matrix.cc.

References CuArray< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuArray< T >::Dim(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by Convolutional1dComponent::Backprop(), ConvolutionalComponent::BackpropagateFnc(), ConvolutionComponent::InderivPatchesToInderiv(), and MaxpoolingComponent::InderivPatchesToInderiv().

2370  {
2371 #if HAVE_CUDA == 1
2372  if (CuDevice::Instantiate().Enabled()) {
2373  KALDI_ASSERT(indices.Dim() == NumCols());
2374  KALDI_ASSERT(NumRows() == src.NumRows());
2375  Timer tim;
2376  dim3 dimGrid, dimBlock;
2377  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2378  &dimGrid, &dimBlock);
2379  cuda_add_cols(dimGrid, dimBlock, data_, src.Data(), indices.Data(),
2380  Dim(), src.Stride());
2381  CU_SAFE_CALL(cudaGetLastError());
2382  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2383  } else
2384 #endif
2385  {
2386  Mat().AddCols(src.Mat(), indices.Data());
2387  }
2388 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddDiagVecMat ( const Real  alpha,
const CuVectorBase< Real > &  v,
const CuMatrixBase< Real > &  M,
MatrixTransposeType  transM,
Real  beta = 1.0 
)

*this = beta * *this + alpha * diag(v) * M [or M^T].

The same as adding M but scaling each row M_i by v(i).

Definition at line 1167 of file cu-matrix.cc.

References CU2DBLOCK, CuVectorBase< Real >::Data(), CuMatrixBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), kaldi::SameDim(), CuMatrixBase< Real >::Stride(), kaldi::swap(), and CuVectorBase< Real >::Vec().

Referenced by NormalizeComponent::Backprop(), HiddenSoftmax::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), OnlineNaturalGradient::ComputeWt1(), OnlinePreconditioner::ComputeWt1(), CuMatrixBase< Real >::DiffSoftmaxPerRow(), MultiBasisComponent::PropagateFnc(), and kaldi::TestCuMatrixAddDiagVecMat().

1170  {
1171 #if HAVE_CUDA == 1
1172  if (CuDevice::Instantiate().Enabled()) {
1173  if (transM == kNoTrans) {
1174  KALDI_ASSERT(SameDim(*this, M));
1175  } else {
1176  KALDI_ASSERT(M.NumRows() == NumCols() && M.NumCols() == NumRows());
1177  }
1178  KALDI_ASSERT(v.Dim() == this->NumRows());
1179 
1180  Timer tim;
1181  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
1182  dim3 dimGrid(n_blocks(num_cols_, CU2DBLOCK),
1183  n_blocks(num_rows_, CU2DBLOCK));
1184  MatrixIndexT M_row_stride = M.Stride(), M_col_stride = 1;
1185  if (transM == kTrans)
1186  std::swap(M_row_stride, M_col_stride);
1187  cuda_add_diag_vec_mat(dimGrid, dimBlock, alpha, data_, Dim(),
1188  v.Data(), M.Data(), M_row_stride, M_col_stride, beta);
1189  CU_SAFE_CALL(cudaGetLastError());
1190  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1191  } else
1192 #endif
1193  {
1194  Mat().AddDiagVecMat(alpha, v.Vec(), M.Mat(), transM, beta);
1195  }
1196 }
void swap(basic_filebuf< CharT, Traits > &x, basic_filebuf< CharT, Traits > &y)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void AddElements ( Real  alpha,
const std::vector< MatrixElement< Real > > &  input 
)

Definition at line 2886 of file cu-matrix.cc.

References CU1DBLOCK, data_, rnnlm::i, and KALDI_ASSERT.

Referenced by OnlineNaturalGradient::InitOrthonormalSpecial(), OnlinePreconditioner::InitOrthonormalSpecial(), DiscriminativeComputation::ProcessPosteriors(), and kaldi::UnitTestCuMatrixAddElements().

2887  {
2888  // Checks the dimension.
2889  MatrixIndexT num_rows = this->num_rows_, num_cols = this->num_cols_;
2890  for (int32 i = 0; i < input.size(); ++i) {
2891  KALDI_ASSERT(input[i].row < num_rows && input[i].row >= 0 &&
2892  input[i].column < num_cols && input[i].column >= 0);
2893  }
2894 #if HAVE_CUDA == 1
2895  if (CuDevice::Instantiate().Enabled()) {
2896  void *addr = CuDevice::Instantiate().Malloc(input.size() * sizeof(MatrixElement<Real>));
2897  CU_SAFE_CALL(cudaMemcpy(addr, input.data(),
2898  input.size() * sizeof(MatrixElement<Real>),
2899  cudaMemcpyHostToDevice));
2900 
2901  Timer tim;
2902  int dimBlock(CU1DBLOCK);
2903  int dimGrid(n_blocks(input.size(), CU1DBLOCK));
2904 
2905  cuda_matrix_add_elements(dimGrid, dimBlock, this->data_, this->Dim(),
2906  alpha, (MatrixElement<Real>*)addr, input.size());
2907  CU_SAFE_CALL(cudaGetLastError());
2908  CuDevice::Instantiate().Free(addr);
2909  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2910  } else
2911 #endif
2912  {
2913  for (int32 i = 0; i < input.size(); i++) {
2914  (*this)(input[i].row, input[i].column) += alpha * input[i].weight;
2915  }
2916  }
2917 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
int32 MatrixIndexT
Definition: matrix-common.h:96
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void AddElements ( Real  alpha,
const CuArray< Int32Pair > &  indexes,
const Real *  input 
)

Definition at line 2920 of file cu-matrix.cc.

References CU1DBLOCK, CuArray< T >::Data(), data_, CuArray< T >::Dim(), Timer::Elapsed(), Int32Pair::first, rnnlm::i, KALDI_ASSERT, kaldi::kUndefined, and Int32Pair::second.

2921  {
2922  if (indexes.Dim() == 0) return;
2923  KALDI_ASSERT(input != NULL);
2924 
2925 #if HAVE_CUDA == 1
2926  if (CuDevice::Instantiate().Enabled()) {
2927  Timer tim;
2928  CuVector<Real> tmp_vec(indexes.Dim(), kUndefined);
2929  CU_SAFE_CALL(cudaMemcpy(tmp_vec.Data(), input, indexes.Dim() * sizeof(Real),
2930  cudaMemcpyHostToDevice));
2931 
2932  int dimBlock(CU1DBLOCK);
2933  int dimGrid = n_blocks(indexes.Dim(), CU1DBLOCK);
2934  cuda_matrix_add_indexed_values(dimGrid, dimBlock, this->Dim(), alpha,
2935  indexes.Data(), tmp_vec.Data(), indexes.Dim(), this->data_);
2936  CU_SAFE_CALL(cudaGetLastError());
2937  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2938  } else
2939 #endif
2940  {
2941  MatrixIndexT num_rows = this->num_rows_, num_cols = this->num_cols_;
2942  const Int32Pair *index = indexes.Data();
2943  for (int32 i = 0; i < indexes.Dim(); i++) {
2944  KALDI_ASSERT(index[i].first < num_rows && index[i].first >= 0 &&
2945  index[i].second < num_cols && index[i].second >= 0);
2946  (*this)(index[i].first, index[i].second) += alpha * input[i];
2947  }
2948  }
2949 }
const T * Data() const
Get raw pointer.
Definition: cu-array.h:65
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
int32 MatrixIndexT
Definition: matrix-common.h:96
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:62
int32_cuda second
Definition: cu-matrixdim.h:86
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
int32_cuda first
Definition: cu-matrixdim.h:85
void AddMat ( Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  trans = kNoTrans 
)

*this += alpha * A

Definition at line 913 of file cu-matrix.cc.

References CU2DBLOCK, data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by RepeatedAffineComponent::Add(), NaturalGradientAffineComponent::Add(), AffineComponent::Add(), BlockAffineComponent::Add(), ConvolutionComponent::Add(), LstmNonlinearityComponent::Add(), Convolutional1dComponent::Add(), CuRand< Real >::AddGaussNoise(), GeneralMatrix::AddToMat(), CuMatrixBase< Real >::ApproxEqual(), SigmoidComponent::Backprop(), LstmNonlinearityComponent::Backprop(), Splice::BackpropagateFnc(), AveragePoolingComponent::BackpropagateFnc(), AveragePooling2DComponent::BackpropagateFnc(), Convolutional2DComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), DiscriminativeComputation::Compute(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), Xent::Eval(), Mse::Eval(), NnetComputer::ExecuteCommand(), RepeatedAffineComponent::PerturbParams(), AffineComponent::PerturbParams(), BlockAffineComponent::PerturbParams(), ConvolutionComponent::PerturbParams(), LstmNonlinearityComponent::PerturbParams(), Convolutional1dComponent::PerturbParams(), SumReduceComponent::Propagate(), AdditiveNoiseComponent::Propagate(), Rbm::RbmUpdate(), ClipGradientComponent::RepairGradients(), kaldi::UnitTestCuMatrixAddMat(), kaldi::UnitTestCuMatrixAddMatDiagVec(), kaldi::UnitTestCuMatrixAddMatMatElements(), kaldi::UnitTestLstmNonlinearity(), kaldi::nnet3::UnitTestNnetInputDerivatives(), LinearTransform::Update(), AffineTransform::Update(), RecurrentComponent::Update(), ConvolutionalComponent::Update(), Convolutional2DComponent::Update(), NaturalGradientRepeatedAffineComponent::Update(), LstmProjected::Update(), BlstmProjected::Update(), ConvolutionComponent::Update(), and Convolutional1dComponent::Update().

914  {
915 
916 #if HAVE_CUDA == 1
917  if (CuDevice::Instantiate().Enabled()) {
918  if (transA == kNoTrans) {
919  KALDI_ASSERT(A.NumRows() == num_rows_ && A.NumCols() == num_cols_);
920  } else {
921  KALDI_ASSERT(A.NumCols() == num_rows_ && A.NumRows() == num_cols_);
922  }
923  if (num_rows_ == 0) return;
924  Timer tim;
925  // This block dimension seems to work better than the
926  // one from GetBlockSizesForSimpleMatrixOperation().
927  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
928  dim3 dimGrid(n_blocks(NumCols(), CU2DBLOCK),
929  n_blocks(NumRows(), CU2DBLOCK));
930  cuda_add_mat(dimGrid, dimBlock, alpha, A.data_,
931  data_, Dim(), A.Stride(),
932  (transA == kTrans ? 1 : 0));
933  CU_SAFE_CALL(cudaGetLastError());
934 
935  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
936  } else
937 #endif
938  {
939  Mat().AddMat(alpha, A.Mat(), transA);
940  }
941 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void AddMatBlock ( Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  transA,
const CuBlockMatrix< Real > &  B,
MatrixTransposeType  transB,
Real  beta 
)

This function is like AddMatMat but for where the second argument is of type CuBlockMatrix (a block-diagonal matrix of blocks).

Definition at line 2814 of file cu-matrix.cc.

References CuMatrixBase< Real >::AddMatMat(), CuBlockMatrix< Real >::Block(), CU2DBLOCK, CuMatrixBase< Real >::Data(), data_, Timer::Elapsed(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuBlockMatrix< Real >::NumBlocks(), CuBlockMatrix< Real >::NumCols(), CuMatrixBase< Real >::NumCols(), CuBlockMatrix< Real >::NumRows(), CuMatrixBase< Real >::NumRows(), CuMatrixBase< Real >::Stride(), and kaldi::swap().

Referenced by kaldi::UnitTestCuBlockMatrixAddMatBlock().

2818  {
2819  // Check dimensions
2820  int32 A_num_rows = A.NumRows(), A_num_cols = A.NumCols(),
2821  A_row_stride = A.Stride(), A_col_stride = 1,
2822  B_num_rows = B.NumRows(), B_num_cols = B.NumCols();
2823  if (transA == kTrans) {
2824  std::swap(A_num_rows, A_num_cols);
2825  std::swap(A_row_stride, A_col_stride);
2826  }
2827  if (transB == kTrans) {
2828  std::swap(B_num_rows, B_num_cols);
2829  }
2830  // At this point the {A,B}_{rows,cols} variables are
2831  // after any transposition.
2832  KALDI_ASSERT(NumRows() == A_num_rows && NumCols() == B_num_cols);
2833  KALDI_ASSERT(A_num_cols == B_num_rows);
2834  int32 B_num_blocks = B.NumBlocks();
2835 
2836  if (num_rows_ == 0) return;
2837 #if HAVE_CUDA == 1
2838  if (CuDevice::Instantiate().Enabled()) {
2839  Timer tim;
2840  MatrixDim this_dim = Dim();
2841 
2842  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2843  // (x,y) indices will be (row of *this, block of B)
2844  dim3 dimGrid(n_blocks(num_rows_, CU2DBLOCK),
2845  n_blocks(B_num_blocks, CU2DBLOCK));
2846 
2847  // caution: the use of x as the row-index is not good, but
2848  // this code is not much used, so I'm not updating it.a
2849  cuda_add_mat_blockmat(dimGrid, dimBlock, data_, this_dim, A.Data(),
2850  A_num_rows, A_num_cols, A_row_stride, A_col_stride,
2851  B.CuData(), B_num_blocks, alpha, beta,
2852  (transB == kTrans ? 1 : 0));
2853 
2854  CU_SAFE_CALL(cudaGetLastError());
2855 
2856  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2857  } else
2858 #endif
2859  {
2860  // "row_offset" and "col_offset" are offsets into B (or into B^T, if
2861  // transB == kTrans).
2862  int32 row_offset = 0, col_offset = 0;
2863  for (int32 b = 0; b < B_num_blocks; b++) {
2864  const CuSubMatrix<Real> this_block = B.Block(b);
2865  int32 this_num_rows = this_block.NumRows(),
2866  this_num_cols = this_block.NumCols();
2867  if (transB == kTrans) std::swap(this_num_rows, this_num_cols);
2868  CuSubMatrix<Real> this_part(*this, 0, num_rows_,
2869  col_offset, this_num_cols);
2870  CuSubMatrix<Real> A_part = (transA == kNoTrans ?
2872  row_offset, this_num_rows) :
2873  CuSubMatrix<Real>(A, row_offset, this_num_rows,
2874  0, num_rows_));
2875  this_part.AddMatMat(alpha, A_part, transA, this_block, transB, beta);
2876  row_offset += this_num_rows;
2877  col_offset += this_num_cols;
2878  }
2879  // Note: the values being compared below are all after applying any
2880  // transposition to B.
2881  KALDI_ASSERT(row_offset == B_num_rows && col_offset == B_num_cols);
2882  }
2883 }
friend class CuSubMatrix< Real >
Definition: cu-matrix.h:88
Structure containing size of the matrix plus stride.
Definition: cu-matrixdim.h:52
void swap(basic_filebuf< CharT, Traits > &x, basic_filebuf< CharT, Traits > &y)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void AddMatBlocks ( Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  trans = kNoTrans 
)

if A.NumRows() is multiple of (*this)->NumRows and A.NumCols() is multiple of (*this)->NumCols divide A into blocks of the same size as (*this) and add them to *this (times alpha)

Definition at line 944 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), rnnlm::i, rnnlm::j, KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by kaldi::UnitTestCuMatrixAddMatBlocks(), ConvolutionComponent::Update(), and Convolutional1dComponent::Update().

945  {
946  if (num_rows_ == 0 || num_cols_ == 0) return;
947  int32 num_row_blocks, num_col_blocks;
948  if (transA == kNoTrans) {
949  KALDI_ASSERT(A.NumRows() % num_rows_ == 0 && A.NumCols() % num_cols_ == 0);
950  num_row_blocks = A.Mat().NumRows() / num_rows_;
951  num_col_blocks = A.Mat().NumCols() / num_cols_;
952  } else {
953  KALDI_ASSERT(A.NumRows() % num_cols_ == 0 && A.NumCols() % num_rows_ == 0);
954  num_row_blocks = A.Mat().NumRows() / num_cols_;
955  num_col_blocks = A.Mat().NumCols() / num_rows_;
956  }
957 #if HAVE_CUDA == 1
958  if (CuDevice::Instantiate().Enabled()) {
959  Timer tim;
960  dim3 dimGrid, dimBlock;
961  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
962  &dimGrid, &dimBlock);
963  cuda_add_mat_blocks(dimGrid, dimBlock, alpha, A.data_, num_row_blocks,
964  num_col_blocks, data_, Dim(), A.Stride(),
965  (transA == kTrans ? 1 : 0));
966  CU_SAFE_CALL(cudaGetLastError());
967 
968  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
969  } else
970 #endif
971  {
972  int32 nr, nc;
973  if (transA == kNoTrans) {
974  nr = num_rows_;
975  nc = num_cols_;
976  } else {
977  nr = num_cols_;
978  nc = num_rows_;
979  }
980  for (int32 i = 0; i < num_row_blocks; i++) {
981  for (int32 j = 0; j < num_col_blocks; j++) {
982  Mat().AddMat(alpha, SubMatrix<Real>(A.Mat(), i * nr, nr, j * nc, nc),
983  transA);
984  }
985  }
986  }
987 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void AddMatDiagVec ( const Real  alpha,
const CuMatrixBase< Real > &  M,
MatrixTransposeType  transM,
CuVectorBase< Real > &  v,
Real  beta = 1.0 
)

Definition at line 1200 of file cu-matrix.cc.

References CuVectorBase< Real >::Data(), CuMatrixBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), kaldi::SameDim(), CuMatrixBase< Real >::Stride(), kaldi::swap(), and CuVectorBase< Real >::Vec().

Referenced by SigmoidComponent::RepairGradients(), and TanhComponent::RepairGradients().

1204  {
1205 #if HAVE_CUDA == 1
1206  if (CuDevice::Instantiate().Enabled()) {
1207  if (transM == kNoTrans) {
1208  KALDI_ASSERT(SameDim(*this, M));
1209  } else {
1210  KALDI_ASSERT(M.NumRows() == NumCols() && M.NumCols() == NumRows());
1211  }
1212  KALDI_ASSERT(v.Dim() == this->NumCols());
1213 
1214  Timer tim;
1215  dim3 dimGrid, dimBlock;
1216  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1217  &dimGrid, &dimBlock);
1218  MatrixIndexT M_row_stride = M.Stride(), M_col_stride = 1;
1219  if (transM == kTrans) std::swap(M_row_stride, M_col_stride);
1220  cuda_add_mat_diag_vec(dimGrid, dimBlock, alpha, data_, Dim(),
1221  M.Data(), M_row_stride, M_col_stride, v.Data(), beta);
1222  CU_SAFE_CALL(cudaGetLastError());
1223  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1224  } else
1225 #endif
1226  {
1227  Mat().AddMatDiagVec(alpha, M.Mat(), transM, v.Vec(), beta);
1228  }
1229 }
void swap(basic_filebuf< CharT, Traits > &x, basic_filebuf< CharT, Traits > &y)
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddMatMat ( Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  transA,
const CuMatrixBase< Real > &  B,
MatrixTransposeType  transB,
Real  beta 
)

C = alpha * A(^T)*B(^T) + beta * C.

Definition at line 1077 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, kaldi::kTrans, CuMatrixBase< Real >::Mat(), rnnlm::n, CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by CuMatrixBase< Real >::AddMatBlock(), CuBlockMatrix< Real >::AddMatMat(), CuMatrixBase< BaseFloat >::AddMatSp(), CuMatrixBase< BaseFloat >::AddMatTp(), CuMatrixBase< BaseFloat >::AddSpMat(), CuMatrixBase< BaseFloat >::AddTpMat(), RepeatedAffineComponent::Backprop(), AffineComponent::Backprop(), FixedLinearComponent::Backprop(), FixedAffineComponent::Backprop(), LinearTransform::BackpropagateFnc(), AffineTransform::BackpropagateFnc(), RecurrentComponent::BackpropagateFnc(), ConvolutionalComponent::BackpropagateFnc(), LstmProjected::BackpropagateFnc(), BlstmProjected::BackpropagateFnc(), AffineComponent::CollapseWithNext(), AffineComponent::CollapseWithPrevious(), OnlineNaturalGradient::ComputeWt1(), OnlinePreconditioner::ComputeWt1(), kaldi::CuVectorUnitTestAddDiagMatMat(), OnlineNaturalGradient::InitOrthonormalSpecial(), OnlinePreconditioner::InitOrthonormalSpecial(), kaldi::nnet2::PreconditionDirections(), OnlineNaturalGradient::PreconditionDirectionsInternal(), OnlinePreconditioner::PreconditionDirectionsInternal(), AffineComponent::Propagate(), DctComponent::Propagate(), FixedLinearComponent::Propagate(), FixedAffineComponent::Propagate(), KlHmm::PropagateFnc(), LinearTransform::PropagateFnc(), AffineTransform::PropagateFnc(), RecurrentComponent::PropagateFnc(), Rbm::PropagateFnc(), LstmProjected::PropagateFnc(), BlstmProjected::PropagateFnc(), Rbm::RbmUpdate(), Rbm::Reconstruct(), OnlineNaturalGradient::ReorthogonalizeXt1(), OnlinePreconditioner::ReorthogonalizeXt1(), kaldi::TestCuMatrixMatMat(), kaldi::UnitTestCuBlockMatrixAddMatMat(), kaldi::UnitTestCuCholesky(), kaldi::UnitTestCuMatrixAddMatMat(), kaldi::UnitTestCuMatrixSymAddMat2(), kaldi::UnitTestCuMatrixSymInvertPosDef(), kaldi::UnitTestCuSpMatrixInvert(), LinearTransform::Update(), AffineTransform::Update(), RecurrentComponent::Update(), ConvolutionalComponent::Update(), Convolutional2DComponent::Update(), RepeatedAffineComponent::Update(), NaturalGradientAffineComponent::Update(), AffineComponentPreconditioned::Update(), AffineComponentPreconditionedOnline::Update(), BlockAffineComponentPreconditioned::Update(), AffineComponent::UpdateSimple(), and BlockAffineComponent::UpdateSimple().

1079  {
1080 
1081 
1082  // CUBLAS is col-major, cudamatrix is row-major, how to do the mapping?
1083  // keep trans..., just swap A&B matrices: A->B B->A
1084  MatrixIndexT m = ((transB==kTrans)? B.NumRows() : B.NumCols());
1085  MatrixIndexT n = ((transA==kTrans)? A.NumCols() : A.NumRows());
1086  MatrixIndexT k = ((transB==kTrans)? B.NumCols() : B.NumRows());
1087  MatrixIndexT k1 = ((transA==kTrans)? A.NumRows() : A.NumCols());
1088 
1089  KALDI_ASSERT(m == NumCols());
1090  KALDI_ASSERT(n == NumRows());
1091  KALDI_ASSERT(k == k1);
1092 
1093  if (m == 0) return;
1094 
1095 
1096 #if HAVE_CUDA == 1
1097  if (CuDevice::Instantiate().Enabled()) {
1098  Timer tim;
1099  CU_SAFE_CALL(cublas_gemm(GetCublasHandle(),
1100  (transB==kTrans? CUBLAS_OP_T:CUBLAS_OP_N),
1101  (transA==kTrans? CUBLAS_OP_T:CUBLAS_OP_N),
1102  m, n, k, alpha, B.data_, B.Stride(),
1103  A.data_, A.Stride(), beta, data_, Stride()));
1104 
1105  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1106  } else
1107 #endif
1108  {
1109  Mat().AddMatMat(alpha, A.Mat(), transA, B.Mat(), transB, beta);
1110  }
1111 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
struct rnnlm::@11::@12 n
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT Stride() const
Definition: cu-matrix.h:202
void AddMatMatElements ( const Real  alpha,
const CuMatrixBase< Real > &  A,
const CuMatrixBase< Real > &  B,
const Real  beta 
)

*this = beta * *this + alpha * A .* B (.* element by element multiplication)

Definition at line 1232 of file cu-matrix.cc.

References CuMatrixBase< Real >::Data(), data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by StatisticsExtractionComponent::Backprop(), StatisticsPoolingComponent::Propagate(), and kaldi::UnitTestCuMatrixSetMatMatDivMat().

1233  {
1234 #if HAVE_CUDA == 1
1235  if (CuDevice::Instantiate().Enabled()) {
1236  KALDI_ASSERT(SameDim(*this, A) && SameDim(A, B));
1237  Timer tim;
1238  dim3 dimGrid, dimBlock;
1239  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1240  &dimGrid, &dimBlock);
1241  cuda_add_mat_mat_elements(dimGrid, dimBlock, this->data_, A.Data(),
1242  B.Data(), Dim(), A.Stride(), B.Stride(), alpha, beta);
1243  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1244  } else
1245 #endif
1246  {
1247  Mat().AddMatMatElements(alpha, A.Mat(), B.Mat(), beta);
1248  }
1249 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddMatSp ( const Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  transA,
const CuSpMatrix< Real > &  B,
const Real  beta 
)
inline

this <– beta*this + alpha*A*B

Definition at line 471 of file cu-matrix.h.

474  {
475  CuMatrix<Real> M(B);
476  return AddMatMat(alpha, A, transA, M, kNoTrans, beta);
477  }
void AddMatMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
C = alpha * A(^T)*B(^T) + beta * C.
Definition: cu-matrix.cc:1077
void AddMatTp ( const Real  alpha,
const CuMatrixBase< Real > &  A,
MatrixTransposeType  transA,
const CuTpMatrix< Real > &  B,
MatrixTransposeType  transB,
const Real  beta 
)
inline

this <– beta*this + alpha*A*B.

Definition at line 498 of file cu-matrix.h.

Referenced by kaldi::UnitTestCuMatrixAddMatTp().

501  {
502  CuMatrix<Real> M(B);
503  return AddMatMat(alpha, A, transA, M, transB, beta);
504  }
void AddMatMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
C = alpha * A(^T)*B(^T) + beta * C.
Definition: cu-matrix.cc:1077
void AddRowRanges ( const CuMatrixBase< Real > &  src,
const CuArray< Int32Pair > &  indexes 
)

For each row r of this and for each column c, do (*this)(r, c) += src(j, c), where j ranges from indexes[r].first through indexes[r].second - 1.

In general indexes must be >= 0 and < src.NumRows(); but to represent an empty range you may use the pair (-1, -1) or any pair of numbers (i, j) such that i >= j.

Definition at line 2540 of file cu-matrix.cc.

References CuArray< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuMatrixBase< Real >::data_, CuArray< T >::Dim(), CuMatrixBase< Real >::Dim(), Timer::Elapsed(), Int32Pair::first, KALDI_ASSERT, CuMatrixBase< Real >::NumCols(), Int32Pair::second, and CuMatrixBase< Real >::stride_.

Referenced by StatisticsPoolingComponent::Backprop(), NnetComputer::ExecuteCommand(), StatisticsPoolingComponent::Propagate(), and kaldi::UnitTestCuMatrixAddRowRanges().

2541  {
2542  KALDI_ASSERT(static_cast<MatrixIndexT>(indexes.Dim()) == NumRows());
2543  KALDI_ASSERT(src.NumCols() == NumCols());
2544  if (NumRows() == 0) return;
2545 #if HAVE_CUDA == 1
2546  if (CuDevice::Instantiate().Enabled()) {
2547  Timer tim;
2548  dim3 dimGrid, dimBlock;
2549  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2550  &dimGrid, &dimBlock);
2551  cuda_add_row_ranges(dimGrid, dimBlock,
2552  data_, Dim(), src.Data(), src.Dim(), indexes.Data());
2553  CU_SAFE_CALL(cudaGetLastError());
2554  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2555  } else
2556 #endif
2557  { // Implement here for the CPU..
2558  int32 num_rows = this->num_rows_, num_cols = this->num_cols_,
2559  this_stride = this->stride_, src_stride = src.stride_;
2560  Real *data = this->data_;
2561  const Real *src_data = src.data_;
2562  const Int32Pair *indexes_data = indexes.Data();
2563  for (int32 row = 0; row < num_rows; row++) {
2564  int32 start_row = indexes_data[row].first,
2565  end_row = indexes_data[row].second;
2566  for (int32 col = 0; col < num_cols; col++) {
2567  Real sum = 0.0;
2568  for (int32 src_row = start_row; src_row < end_row; src_row++)
2569  sum += src_data[src_row * src_stride + col];
2570  data[row * this_stride + col] += sum;
2571  }
2572  }
2573  }
2574 }
const T * Data() const
Get raw pointer.
Definition: cu-array.h:65
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
MatrixIndexT stride_
Definition: cu-matrix.h:644
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:62
int32_cuda second
Definition: cu-matrixdim.h:86
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
int32_cuda first
Definition: cu-matrixdim.h:85
void AddRows ( Real  alpha,
const CuMatrixBase< Real > &  src,
const CuArray< MatrixIndexT > &  indexes 
)

Does for each row r, this.Row(r) += alpha * src.row(indexes[r]).

If indexes[r] < 0, does not add anything. "reorder".size() must equal this->NumRows(), all elements of "reorder" must be in [0, src.NumRows()-1], and src.NumCols() must equal this.NumCols()

Definition at line 2434 of file cu-matrix.cc.

References CuArray< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuArray< T >::Dim(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by StatisticsExtractionComponent::Backprop(), and NnetComputer::ExecuteCommand().

2436  {
2437  if (NumRows() == 0) return;
2438 #if HAVE_CUDA == 1
2439  if (CuDevice::Instantiate().Enabled()) {
2440  KALDI_ASSERT(static_cast<MatrixIndexT>(indexes.Dim()) == NumRows());
2441  KALDI_ASSERT(src.NumCols() == NumCols());
2442  Timer tim;
2443  dim3 dimGrid, dimBlock;
2444  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2445  &dimGrid, &dimBlock);
2446  cuda_add_rows(dimGrid, dimBlock, alpha,
2447  data_, src.Data(), indexes.Data(), Dim(), src.Stride());
2448  CU_SAFE_CALL(cudaGetLastError());
2449  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2450  } else
2451 #endif
2452  {
2453  Mat().AddRows(alpha, src.Mat(), indexes.Data());
2454  }
2455 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddRows ( Real  alpha,
const CuArray< const Real * > &  src 
)

Does for each row r, this.Row(r) += alpha * src[r], treating src[r] as the beginning of a region of memory representing a vector of floats, of the same length as this.NumCols().

Definition at line 2459 of file cu-matrix.cc.

References CuArray< T >::Data(), data_, CuArray< T >::Dim(), and KALDI_ASSERT.

2459  {
2460  if (NumRows() == 0) return;
2461 #if HAVE_CUDA == 1
2462  if (CuDevice::Instantiate().Enabled()) {
2463  KALDI_ASSERT(static_cast<MatrixIndexT>(src.Dim()) == NumRows());
2464  Timer tim;
2465  dim3 dimGrid, dimBlock;
2466  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2467  &dimGrid, &dimBlock);
2468  cuda_add_rows(dimGrid, dimBlock, alpha, data_, src.Data(), Dim());
2469  CU_SAFE_CALL(cudaGetLastError());
2470  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2471  } else
2472 #endif
2473  {
2474  Mat().AddRows(alpha, src.Data());
2475  }
2476 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddSpMat ( const Real  alpha,
const CuSpMatrix< Real > &  A,
const CuMatrixBase< Real > &  B,
MatrixTransposeType  transB,
const Real  beta 
)
inline

this <– beta*this + alpha*SpA*B

Definition at line 480 of file cu-matrix.h.

483  {
484  CuMatrix<Real> M(A);
485  return AddMatMat(alpha, M, kNoTrans, B, transB, beta);
486  }
void AddMatMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
C = alpha * A(^T)*B(^T) + beta * C.
Definition: cu-matrix.cc:1077
void AddToDiag ( Real  value)

Adds "value" to the diagonal elements of the matrix.

The matrix *this does not have to be square.

Definition at line 568 of file cu-matrix.cc.

References CU1DBLOCK, rnnlm::d, data_, and Timer::Elapsed().

Referenced by kaldi::nnet2::PreconditionDirections(), kaldi::TestCuMatrixCholesky(), and kaldi::UnitTestCuMatrixAddToDiag().

568  {
569 #if HAVE_CUDA == 1
570  if (CuDevice::Instantiate().Enabled()) {
571  if (num_rows_ == 0) return;
572  Timer tim;
573  // We'll create a fake matrix with "num_diag" rows, one
574  // columnn, and a stride of "this_stride". The y-value of
575  // the grid/blocks corresponds to the row, in this kernel.
576  MatrixIndexT num_diag = std::min(num_rows_, num_cols_),
577  this_stride = stride_ + 1;
578  dim3 dimBlock(1, CU1DBLOCK);
579  dim3 dimGrid(1, n_blocks(num_diag, CU1DBLOCK));
580  ::MatrixDim d = { num_diag, 1, this_stride };
581  cuda_add(dimGrid, dimBlock, data_, value, d);
582  CU_SAFE_CALL(cudaGetLastError());
583 
584  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
585  } else
586  #endif
587  {
588  Mat().AddToDiag(value);
589  }
590 }
Structure containing size of the matrix plus stride.
Definition: cu-matrixdim.h:52
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT stride_
Definition: cu-matrix.h:644
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void AddToRows ( Real  alpha,
const CuArray< Real * > &  dst 
) const

For each row r of this matrix, adds it (times alpha) to the array of floats at the location given by dst[r], where dst[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (i.e.

it should point to memory on the GPU if we're using a GPU, or on the CPU otherwise). If dst[r] is NULL, does not do anything for that row. Requires that none of the memory regions pointed to by the pointers in "dst" overlap (e.g. none of the pointers should be the same).

Definition at line 2480 of file cu-matrix.cc.

References CuArray< T >::Data(), data_, CuArray< T >::Dim(), and KALDI_ASSERT.

Referenced by NnetComputer::ExecuteCommand(), and kaldi::UnitTestCuMatrixAddToRows().

2481  {
2482  if (NumRows() == 0) return;
2483 #if HAVE_CUDA == 1
2484  if (CuDevice::Instantiate().Enabled()) {
2485  KALDI_ASSERT(static_cast<MatrixIndexT>(dst.Dim()) == NumRows());
2486  Timer tim;
2487  dim3 dimGrid, dimBlock;
2488  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2489  &dimGrid, &dimBlock);
2490  cuda_add_to_rows(dimGrid, dimBlock, alpha, dst.Data(), data_, Dim());
2491  CU_SAFE_CALL(cudaGetLastError());
2492  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2493  } else
2494 #endif
2495  {
2496  Mat().AddToRows(alpha, dst.Data());
2497  }
2498 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void AddTpMat ( const Real  alpha,
const CuTpMatrix< Real > &  A,
MatrixTransposeType  transA,
const CuMatrixBase< Real > &  B,
MatrixTransposeType  transB,
const Real  beta 
)
inline

this <– beta*this + alpha*A*B.

Definition at line 489 of file cu-matrix.h.

Referenced by kaldi::UnitTestCuMatrixAddTpMat().

492  {
493  CuMatrix<Real> M(A);
494  return AddMatMat(alpha, M, transA, B, transB, beta);
495  }
void AddMatMat(Real alpha, const CuMatrixBase< Real > &A, MatrixTransposeType transA, const CuMatrixBase< Real > &B, MatrixTransposeType transB, Real beta)
C = alpha * A(^T)*B(^T) + beta * C.
Definition: cu-matrix.cc:1077
void AddVecToCols ( Real  alpha,
const CuVectorBase< Real > &  col,
Real  beta = 1.0 
)

(for each column c of *this), c = alpha * col + beta * c

Definition at line 1018 of file cu-matrix.cc.

References data_, CuVectorBase< Real >::data_, CuVectorBase< Real >::Dim(), Timer::Elapsed(), KALDI_ERR, and CuVectorBase< Real >::Vec().

Referenced by KlHmm::PropagateFnc(), and kaldi::UnitTestCuMatrixAddVecToCols().

1020  {
1021  if (col.Dim() != NumRows()) {
1022  KALDI_ERR << "Non matching dimensions: Rows:" << NumRows() << " VectorDim:" << col.Dim();
1023  }
1024 
1025  #if HAVE_CUDA == 1
1026  if (CuDevice::Instantiate().Enabled()) {
1027  Timer tim;
1028  dim3 dimGrid, dimBlock;
1029  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1030  &dimGrid, &dimBlock);
1031  cuda_add_vec_to_cols(dimGrid, dimBlock, alpha, col.data_, beta,
1032  data_, Dim());
1033  CU_SAFE_CALL(cudaGetLastError());
1034 
1035  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1036  } else
1037  #endif
1038  {
1039  if (beta != 1.0) Mat().Scale(beta);
1040  Mat().AddVecToCols(alpha, col.Vec());
1041  }
1042 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define KALDI_ERR
Definition: kaldi-error.h:127
::MatrixDim Dim() const
Definition: cu-matrix.h:206
void AddVecToRows ( Real  alpha,
const CuVectorBase< Real > &  row,
Real  beta = 1.0 
)

(for each row r of *this), r = alpha * row + beta * r

Definition at line 1047 of file cu-matrix.cc.

References data_, CuVectorBase< Real >::data_, CuVectorBase< Real >::Dim(), Timer::Elapsed(), KALDI_ERR, and CuVectorBase< Real >::Vec().

Referenced by SimpleSentenceAveragingComponent::BackpropagateFnc(), DecodableAmNnetParallel::Compute(), DecodableNnet2Online::ComputeForFrame(), DecodableNnet3SimpleOnline::DoNnetComputation(), DecodableNnetSimple::DoNnetComputation(), PerElementOffsetComponent::Propagate(), FixedAffineComponent::Propagate(), ConvolutionComponent::Propagate(), FixedBiasComponent::Propagate(), Convolutional1dComponent::Propagate(), SimpleSentenceAveragingComponent::PropagateFnc(), AffineTransform::PropagateFnc(), RecurrentComponent::PropagateFnc(), Rbm::PropagateFnc(), ConvolutionalComponent::PropagateFnc(), Convolutional2DComponent::PropagateFnc(), AddShift::PropagateFnc(), Rbm::Reconstruct(), SigmoidComponent::RepairGradients(), RectifiedLinearComponent::RepairGradients(), PdfPrior::SubtractOnLogpost(), kaldi::UnitTestCuMatrixAddVecToRows(), and SentenceAveragingComponent::Update().

1049  {
1050  if (row.Dim() != NumCols()) {
1051  KALDI_ERR << "Non matching dimensions: Cols:" << NumCols() << " VectorDim:" << row.Dim();
1052  }
1053 #if HAVE_CUDA == 1
1054  if (CuDevice::Instantiate().Enabled()) {
1055  Timer tim;
1056  dim3 dimGrid, dimBlock;
1057  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1058  &dimGrid, &dimBlock);
1059  cuda_add_vec_to_rows(dimGrid, dimBlock, alpha, row.data_, beta, data_, Dim());
1060  CU_SAFE_CALL(cudaGetLastError());
1061 
1062  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1063  } else
1064 #endif
1065  {
1066  if (beta != 1.0) Mat().Scale(beta);
1067  Mat().AddVecToRows(alpha, row.Vec());
1068  }
1069 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define KALDI_ERR
Definition: kaldi-error.h:127
::MatrixDim Dim() const
Definition: cu-matrix.h:206
void AddVecVec ( Real  alpha,
const CuVectorBase< Real > &  x,
const CuVectorBase< Real > &  y 
)

A = alpha * x * y^T + A .

Definition at line 1115 of file cu-matrix.cc.

References CuVectorBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), Timer::Elapsed(), KALDI_ASSERT, rnnlm::n, and CuVectorBase< Real >::Vec().

Referenced by kaldi::UnitTestCuMatrixAddVecVec().

1116  {
1117 
1118  MatrixIndexT m = y.Dim();
1119  MatrixIndexT n = x.Dim();
1120  KALDI_ASSERT(m == NumCols());
1121  KALDI_ASSERT(n == NumRows());
1122 
1123 #if HAVE_CUDA == 1
1124  if (CuDevice::Instantiate().Enabled()) {
1125  Timer tim;
1126  CU_SAFE_CALL(cublas_ger(GetCublasHandle(), m, n, alpha,
1127  y.Data(), 1, x.Data(), 1, data_, Stride()));
1128 
1129  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1130  } else
1131 #endif
1132  {
1133  Mat().AddVecVec(alpha, x.Vec(), y.Vec());
1134  }
1135 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
struct rnnlm::@11::@12 n
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT Stride() const
Definition: cu-matrix.h:202
void ApplyCeiling ( Real  ceiling_val)

Definition at line 2273 of file cu-matrix.cc.

References data_, and Timer::Elapsed().

Referenced by ClipGradientComponent::Backprop(), RecurrentComponent::BackpropagateFnc(), kaldi::UnitTestCuMatrixApplyCeiling(), ParametricRelu::Update(), LstmProjected::Update(), and BlstmProjected::Update().

2273  {
2274 #if HAVE_CUDA == 1
2275  if (CuDevice::Instantiate().Enabled()) {
2276  Timer tim;
2277  dim3 dimGrid, dimBlock;
2278  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2279  &dimGrid, &dimBlock);
2280  cuda_apply_ceiling(dimGrid, dimBlock, data_, ceiling_val, Dim());
2281  CU_SAFE_CALL(cudaGetLastError());
2282  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2283  } else
2284 #endif
2285  {
2286  Mat().ApplyCeiling(ceiling_val);
2287  }
2288 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
void ApplyExp ( )

Definition at line 2236 of file cu-matrix.cc.

References data_, and Timer::Elapsed().

Referenced by DiscriminativeComputation::Compute(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), and kaldi::UnitTestCuMatrixApplyExp().

2236  {
2237 #if HAVE_CUDA == 1
2238  if (CuDevice::Instantiate().Enabled()) {
2239  Timer tim;
2240  dim3 dimGrid, dimBlock;
2241  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2242  &dimGrid, &dimBlock);
2243  cuda_apply_exp(dimGrid, dimBlock, data_, Dim());
2244  CU_SAFE_CALL(cudaGetLastError());
2245  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2246  } else
2247 #endif
2248  {
2249  Mat().ApplyExp();
2250  }
2251 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
void ApplyFloor ( Real  floor_val)

Definition at line 2255 of file cu-matrix.cc.

References data_, and Timer::Elapsed().

Referenced by ClipGradientComponent::Backprop(), RecurrentComponent::BackpropagateFnc(), DecodableAmNnetParallel::Compute(), DecodableNnet2Online::ComputeForFrame(), main(), StatisticsPoolingComponent::Propagate(), RectifiedLinearComponent::Propagate(), SoftmaxComponent::Propagate(), LogSoftmaxComponent::Propagate(), ClipGradientComponent::RepairGradients(), kaldi::TestCuMatrixCompObjfAndDeriv(), kaldi::UnitTestCuMatrixApplyFloor(), kaldi::UnitTestCuMatrixObjfDeriv(), ParametricRelu::Update(), LstmProjected::Update(), BlstmProjected::Update(), and NnetLogprobTask::~NnetLogprobTask().

2255  {
2256 #if HAVE_CUDA == 1
2257  if (CuDevice::Instantiate().Enabled()) {
2258  Timer tim;
2259  dim3 dimGrid, dimBlock;
2260  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2261  &dimGrid, &dimBlock);
2262  cuda_apply_floor(dimGrid, dimBlock, data_, floor_val, Dim());
2263  CU_SAFE_CALL(cudaGetLastError());
2264  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2265  } else
2266 #endif
2267  {
2268  Mat().ApplyFloor(floor_val);
2269  }
2270 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
void ApplyHeaviside ( )

For each element, sets x = (x > 0 ? 1.0 : 0.0).

See also Heaviside().

Definition at line 2197 of file cu-matrix.cc.

References data_, and Timer::Elapsed().

Referenced by BackpropTruncationComponent::Backprop(), RectifiedLinearComponent::Backprop(), DropoutComponent::Propagate(), SigmoidComponent::RepairGradients(), TanhComponent::RepairGradients(), ClipGradientComponent::RepairGradients(), kaldi::TestCuMatrixHeaviside(), and kaldi::UnitTestCuMatrixApplyHeaviside().

2197  {
2198 #if HAVE_CUDA == 1
2199  if (CuDevice::Instantiate().Enabled()) {
2200  Timer tim;
2201  dim3 dimGrid, dimBlock;
2202  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2203  &dimGrid, &dimBlock);
2204  cuda_apply_heaviside(dimGrid, dimBlock, data_, Dim());
2205  CU_SAFE_CALL(cudaGetLastError());
2206  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2207  } else
2208 #endif
2209  {
2210  Mat().ApplyHeaviside();
2211  }
2212 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
void ApplyLog ( )

Definition at line 630 of file cu-matrix.cc.

References data_, and Timer::Elapsed().

Referenced by DecodableAmNnetParallel::Compute(), DecodableNnet2Online::ComputeForFrame(), Xent::Eval(), main(), kaldi::TestCuMatrixCompObjfAndDeriv(), kaldi::UnitTestCuMatrixApplyLog(), kaldi::UnitTestCuMatrixObjfDeriv(), and NnetLogprobTask::~NnetLogprobTask().

630  {
631  #if HAVE_CUDA == 1
632  if (CuDevice::Instantiate().Enabled()) {
633  if (num_rows_ == 0) return;
634  Timer tim;
635 
636  dim3 dimGrid, dimBlock;
637  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
638  &dimGrid, &dimBlock);
639 
640  cuda_apply_log(dimGrid, dimBlock, data_, Dim());
641  CU_SAFE_CALL(cudaGetLastError());
642 
643  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
644  } else
645  #endif
646  {
647  Mat().ApplyLog();
648  }
649 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void ApplyLogSoftMaxPerRow ( const CuMatrixBase< Real > &  src)

LogSoftmax nonlinearity Y = LogSoftmax(X) : Yij = Xij - log(sum_k(e^Xik)), done to each row for each row, the max value is first subtracted for good numerical stability.

Definition at line 1522 of file cu-matrix.cc.

References MatrixBase< Real >::CopyFromMat(), CU1DBLOCK, data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::num_rows_, MatrixBase< Real >::NumRows(), MatrixBase< Real >::Row(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by LogSoftmaxComponent::Propagate(), kaldi::TestCuMatrixLogSoftmax(), and kaldi::UnitTestCuLogSoftmax().

1522  {
1523  KALDI_ASSERT(SameDim(*this, src));
1524 #if HAVE_CUDA == 1
1525  if (CuDevice::Instantiate().Enabled()) {
1526  Timer tim;
1527  size_t dimBlock = CU1DBLOCK;
1528  size_t dimGrid = src.num_rows_;
1529  cuda_log_softmax_reduce(dimGrid, dimBlock,
1530  data_, src.data_, Dim(), src.Stride());
1531  CU_SAFE_CALL(cudaGetLastError());
1532 
1533  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1534  } else
1535 #endif
1536  {
1537  MatrixBase<Real> &mat(this->Mat());
1538  mat.CopyFromMat(src.Mat());
1539  for(MatrixIndexT r = 0; r < mat.NumRows(); r++) {
1540  mat.Row(r).ApplyLogSoftMax();
1541  }
1542  }
1543 }
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
int32 MatrixIndexT
Definition: matrix-common.h:96
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void ApplyPow ( Real  power)

Apply power to the absolute value of each element.

If inlude_sign is true, the result will be multiplied with the sign of the input value. If the power is negative and the input to the power is zero, The output will be set zero. If include_sign is true, it will multiply the result by the sign of the input.

Definition at line 2161 of file cu-matrix.cc.

References data_, and Timer::Elapsed().

Referenced by TanhComponent::Backprop(), kaldi::MeanVariance(), StatisticsExtractionComponent::Propagate(), StatisticsPoolingComponent::Propagate(), TanhComponent::StoreStats(), kaldi::UnitTestCuMatrixApplyPow(), kaldi::UnitTestCuMatrixSetRandn(), and kaldi::UnitTestCuMatrixSetRandUniform().

2161  {
2162 #if HAVE_CUDA == 1
2163  if (CuDevice::Instantiate().Enabled()) {
2164  Timer tim;
2165  dim3 dimGrid, dimBlock;
2166  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2167  &dimGrid, &dimBlock);
2168  cuda_apply_pow(dimGrid, dimBlock, data_, power, Dim());
2169  CU_SAFE_CALL(cudaGetLastError());
2170  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2171  } else
2172 #endif
2173  {
2174  Mat().ApplyPow(power);
2175  }
2176 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
void ApplyPowAbs ( Real  power,
bool  include_sign = false 
)

Definition at line 2179 of file cu-matrix.cc.

References data_, and Timer::Elapsed().

Referenced by PowerComponent::Backprop(), PowerComponent::Propagate(), ClipGradientComponent::RepairGradients(), and kaldi::UnitTestCuMatrixApplyPowAbs().

2179  {
2180 #if HAVE_CUDA == 1
2181  if (CuDevice::Instantiate().Enabled()) {
2182  Timer tim;
2183  dim3 dimGrid, dimBlock;
2184  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2185  &dimGrid, &dimBlock);
2186  cuda_apply_pow_abs(dimGrid, dimBlock, data_, power, include_sign, Dim());
2187  CU_SAFE_CALL(cudaGetLastError());
2188  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2189  } else
2190 #endif
2191  {
2192  Mat().ApplyPowAbs(power, include_sign);
2193  }
2194 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
void ApplySoftMaxPerRow ( const CuMatrixBase< Real > &  src)

Softmax nonlinearity Y = Softmax(X) : Yij = e^Xij / sum_k(e^Xik), done to each row for each row, the max value is first subtracted for good numerical stability.

Definition at line 1499 of file cu-matrix.cc.

References MatrixBase< Real >::CopyFromMat(), CU1DBLOCK, data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::num_rows_, MatrixBase< Real >::NumRows(), MatrixBase< Real >::Row(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by SoftmaxComponent::Propagate(), Softmax::PropagateFnc(), HiddenSoftmax::PropagateFnc(), BlockSoftmax::PropagateFnc(), kaldi::TestCuMatrixSoftmax(), and kaldi::UnitTestCuSoftmax().

1499  {
1500  KALDI_ASSERT(SameDim(*this, src));
1501 #if HAVE_CUDA == 1
1502  if (CuDevice::Instantiate().Enabled()) {
1503  Timer tim;
1504  size_t dimBlock = CU1DBLOCK;
1505  size_t dimGrid = src.num_rows_;
1506  cuda_softmax_reduce(dimGrid, dimBlock, data_, src.data_, Dim(), src.Stride());
1507  CU_SAFE_CALL(cudaGetLastError());
1508 
1509  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1510  } else
1511  #endif
1512  {
1513  MatrixBase<Real> &mat(this->Mat());
1514  mat.CopyFromMat(src.Mat());
1515  for(MatrixIndexT r = 0; r < mat.NumRows(); r++) {
1516  mat.Row(r).ApplySoftMax();
1517  }
1518  }
1519 }
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
int32 MatrixIndexT
Definition: matrix-common.h:96
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
bool ApproxEqual ( const CuMatrixBase< Real > &  other,
float  tol = 0.01 
) const

True if ((*this)-other).FrobeniusNorm() <= tol * this->FrobeniusNorm()

Definition at line 1909 of file cu-matrix.cc.

References CuMatrixBase< Real >::AddMat(), and CuMatrixBase< Real >::FrobeniusNorm().

Referenced by kaldi::UnitTestCuCholesky(), and kaldi::UnitTestCuCopy().

1910  {
1911  CuMatrix<Real> diff(*this);
1912  diff.AddMat(-1.0, other);
1913  return (diff.FrobeniusNorm() <= tol * (*this).FrobeniusNorm());
1914 }
void Cholesky ( CuMatrixBase< Real > *  inv_cholesky = NULL)

This function does sets *this to the Cholesky factor of *this (i.e.

the C satisfying *this = C C^T), and sets "inv_cholesky" (if supplied) to its inverse. *this is treated as a symmetric matrix but only the lower triangle is accessed.

Definition at line 1760 of file cu-matrix.cc.

References TpMatrix< Real >::Cholesky(), CuMatrixBase< Real >::Cholesky(), CuSpMatrix< Real >::CopyFromMat(), CuTpMatrix< Real >::CopyFromTp(), CuMatrixBase< Real >::CopyFromTp(), TpMatrix< Real >::Invert(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTakeLower, kaldi::kTrans, kaldi::kUndefined, and CuMatrixBase< Real >::SymAddMat2().

Referenced by CuTpMatrix< Real >::Cholesky(), CuMatrixBase< Real >::Cholesky(), kaldi::TestCuMatrixCholesky(), kaldi::UnitTestCholesky(), and kaldi::UnitTestCuCholesky().

1760  {
1761  KALDI_ASSERT(this->NumRows() == this->NumCols());
1762  const int32 block_size = 64; // We can tune this.
1763 #if HAVE_CUDA == 1
1764  bool have_gpu = CuDevice::Instantiate().Enabled();
1765 #else
1766  bool have_gpu = false;
1767 #endif
1768  if (this->NumRows() == 0) {
1769  return;
1770  }
1771  if (inv_cholesky == NULL && this->NumRows() >= block_size * 2 && have_gpu) {
1772  // Even if the user did not request the inverse Cholesky, for large enough
1773  // matrices (on GPUs) it's going to be more efficient to compute it anyway
1774  // as the recursion depends on it.
1775  CuMatrix<Real> inv(this->NumRows(), this->NumCols());
1776  Cholesky(&inv);
1777  return;
1778  }
1779  if (this->NumRows() <= block_size || inv_cholesky == NULL || !have_gpu) {
1780  // Don't recurse: compute the Cholesky (and inverse Cholesky, if requested)
1781  // directly, on the CPu.
1782  int32 dim = this->NumRows();
1783  CuSpMatrix<Real> this_sp(dim, kUndefined);
1784  this_sp.CopyFromMat(*this, kTakeLower);
1785  SpMatrix<Real> this_sp_cpu(this_sp);
1786  TpMatrix<Real> C_cpu(dim);
1787  C_cpu.Cholesky(this_sp_cpu);
1788  CuTpMatrix<Real> C(C_cpu);
1789  this->CopyFromTp(C);
1790  if (inv_cholesky != NULL) {
1791  C_cpu.Invert(); // Get inverse Cholesky on CPU.
1792  C.CopyFromTp(C_cpu);
1793  inv_cholesky->CopyFromTp(C); // Copy inverse Cholesky from CPU.
1794  }
1795  return;
1796  }
1797  // At this point, if none of the other cases apply, we recurse.
1798 
1799  // The selection of dim1 is a heuristic. We could also just take half.
1800  int32 tot_dim = this->NumRows();
1801  int32 dim1;
1802  // Break it up into a whole number of blocks, for better memory alignment.
1803  // The line below, setting dim1 can be decided on a heuristic basis: from
1804  // the point of view of correctness, it can really be any value
1805  // 0 < dim1 < tot_dim.
1806  dim1 = block_size * std::max<int32>(1, tot_dim / (2 * block_size));
1807 
1808  int32 dim2 = tot_dim - dim1;
1809  CuSubMatrix<Real> this_11(*this, 0, dim1, 0, dim1),
1810  this_12(*this, 0, dim1, dim1, dim2),
1811  this_21(*this, dim1, dim2, 0, dim1),
1812  this_22(*this, dim1, dim2, dim1, dim2);
1813  CuSubMatrix<Real> inv_11(*inv_cholesky, 0, dim1, 0, dim1),
1814  inv_12(*inv_cholesky, 0, dim1, dim1, dim2),
1815  inv_21(*inv_cholesky, dim1, dim2, 0, dim1),
1816  inv_22(*inv_cholesky, dim1, dim2, dim1, dim2);
1817  /*
1818  Here is the math on block-wise Cholesky. We'll use a Matlab-like notation for blocks of a matrix,
1819  e.g. [ A B; C D ], and also for transposes, e.g. A' is the transpose of A.
1820  Let A be the input matrix; we want to compute both its Cholesky L and its inverse Cholesky, which
1821  we'll call M.
1822  OK. let L = [ L11 0; L21 L22 ] be the Cholesky factor of A.
1823  We have A = L L' = [ L11 0; L21 L22 ] * [ L11' L21'; 0 L22' ]. Multiplying it out,
1824  if A = [ A11 A12; A21 A22 ]; then
1825  A11 = L11 L11', A21 = L21 L11', A22 = L21 L21' + L22 L22', and A12 = A21'.
1826 
1827  We also want an expression for the inverse of L (we call this M).
1828  If M = [ M11 0; M21 M22 ], then it's not hard to see that
1829  M11 = inv(L11), M22 = inv(L22).
1830  We can work out M21 as follows. We know that [ L11 0; L21 L22 ] [ M11 0; M21 M22 ] = [ I 0; 0 I ].
1831  Considering the zero on the bottom of the rhs, we have: L21 M11 + L22 M21 = 0, which gives us:
1832  M21 = - L22^{-1} L21 M11 = - M22 L21 M11.
1833 
1834  Next, we want expressions for L21 and L22. From the equation A21 = L21 L11', we have:
1835  L21 = A21 inv(L11') = A21 M11'
1836  We can compute L22 and M22 recursively by doing Cholesky (and computing the inverse Cholesky)
1837  on the quantity T = (A22 - L21 L21'). [we give it the name T just for easy reference.]
1838 
1839  Computationally, we do this as follows:
1840  (1) Recurse to get L11 and M11.
1841  (2) Compute L21 = A21 M11'
1842  (3) Compute T = A22 - L21 L21'
1843  (4) Recurse on T to get L22 and M22.
1844  (5) Compute M21 = -M22 L21 M11.
1845  Next, we have to consider the in-place nature of the computation, since L overwrites A
1846  [M has its own storage, in "inv_cholesky"].
1847  We address this here:
1848  (1) is in-place [L11 replaces A11, M11 has its own storage].
1849  (2) L21 gets written where M21 belongs.
1850  (3) T replaces A22.
1851  (4) is in-place [L22 replaces T where A22 was, M22 has its own storage]
1852  (5):(a) we first compute the transpose of (L21 M11) is done in the upper part of A/L,
1853  where A12 or L12 would be. Define a temporary expression
1854  U = (L21 M11)' = M11' L21'; this goes where A12 or L12 would be.
1855  (b) copy L21 to where it should be, in *this.
1856  (c) Compute M21 = -M22 U', in the correct place for M21.
1857  (d) zero L12 and M12. */
1858 
1859  // (1) compute L11 and M11.
1860  this_11.Cholesky(&inv_11);
1861  // (2) compute L21 = A21 M11'. For now it's in the "wrong place", where M21 should be.
1862  inv_21.AddMatMat(1.0, this_21, kNoTrans, inv_11, kTrans, 0.0);
1863  // (3) compute T = A22 - L21 L21'. Note: only the lower triangle of T will be valid, but
1864  // that's OK because Cholesky will ignore the upper part.
1865  this_22.SymAddMat2(-1.0, inv_21, kNoTrans, 1.0);
1866  // (4) Recurse to compute L22 and M22.
1867  this_22.Cholesky(&inv_22);
1868  // (5)(a) compute U = M11' L21'. We use the storage of this_12 for this. Note that L21 is
1869  // currently where M21 should be.
1870  this_12.AddMatMat(1.0, inv_11, kTrans, inv_21, kTrans, 0.0);
1871  // (5)(b) copy L21 to where it should be.
1872  this_21.CopyFromMat(inv_21);
1873  // (5)(c) compute M21 = -M22 U'.
1874  inv_21.AddMatMat(-1.0, inv_22, kNoTrans, this_12, kTrans, 0.0);
1875  // (5)(d) zero L12 and M12.
1876  this_12.SetZero();
1877  inv_12.SetZero();
1878 }
friend class CuSubMatrix< Real >
Definition: cu-matrix.h:88
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
void Cholesky(CuMatrixBase< Real > *inv_cholesky=NULL)
This function does sets *this to the Cholesky factor of *this (i.e.
Definition: cu-matrix.cc:1760
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
void CopyFromTp(const CuTpMatrix< OtherReal > &M, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:275
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
friend class CuSpMatrix< Real >
Definition: cu-matrix.h:84
CuSubMatrix<Real> ColRange ( const MatrixIndexT  col_offset,
const MatrixIndexT  num_cols 
) const
inline

Definition at line 522 of file cu-matrix.h.

Referenced by StatisticsExtractionComponent::Backprop(), StatisticsPoolingComponent::Backprop(), MaxpoolingComponent::Backprop(), BlockAffineComponent::Backprop(), ConvolutionComponent::Backprop(), Convolutional1dComponent::Backprop(), MaxPoolingComponent::BackpropagateFnc(), AveragePoolingComponent::BackpropagateFnc(), AveragePooling2DComponent::BackpropagateFnc(), MaxPooling2DComponent::BackpropagateFnc(), BlockSoftmax::BackpropagateFnc(), ParallelComponent::BackpropagateFnc(), SentenceAveragingComponent::BackpropagateFnc(), Convolutional2DComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), ConvolutionalComponent::BackpropagateFnc(), LstmProjected::BackpropagateFnc(), BlstmProjected::BackpropagateFnc(), MultiTaskLoss::Eval(), LstmProjected::InfoGradient(), BlstmProjected::InfoGradient(), StatisticsExtractionComponent::Propagate(), StatisticsPoolingComponent::Propagate(), MaxpoolingComponent::Propagate(), BlockAffineComponent::Propagate(), ConvolutionComponent::Propagate(), Convolutional1dComponent::Propagate(), AveragePoolingComponent::PropagateFnc(), MaxPoolingComponent::PropagateFnc(), AveragePooling2DComponent::PropagateFnc(), MaxPooling2DComponent::PropagateFnc(), BlockSoftmax::PropagateFnc(), FramePoolingComponent::PropagateFnc(), ParallelComponent::PropagateFnc(), SentenceAveragingComponent::PropagateFnc(), ConvolutionalComponent::PropagateFnc(), Convolutional2DComponent::PropagateFnc(), MultiBasisComponent::PropagateFnc(), LstmProjected::PropagateFnc(), BlstmProjected::PropagateFnc(), kaldi::UnitTestLstmNonlinearity(), FramePoolingComponent::Update(), SentenceAveragingComponent::Update(), ConvolutionalComponent::Update(), Convolutional2DComponent::Update(), NaturalGradientRepeatedAffineComponent::Update(), ConvolutionComponent::Update(), and Convolutional1dComponent::Update().

523  {
524  return CuSubMatrix<Real>(*this, 0, num_rows_, col_offset, num_cols);
525  }
friend class CuSubMatrix< Real >
Definition: cu-matrix.h:88
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void CopyColFromVec ( const CuVectorBase< Real > &  v,
const MatrixIndexT  col 
)

Copy vector into specific column of matrix.

Definition at line 2140 of file cu-matrix.cc.

References CuVectorBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), Timer::Elapsed(), KALDI_ASSERT, and CuVectorBase< Real >::Vec().

Referenced by kaldi::cu::NormalizePerRow(), StatisticsExtractionComponent::Propagate(), and NaturalGradientRepeatedAffineComponent::Update().

2141  {
2142  KALDI_ASSERT(v.Dim() == num_rows_ &&
2143  static_cast<UnsignedMatrixIndexT>(col) <
2144  static_cast<UnsignedMatrixIndexT>(num_cols_));
2145 #if HAVE_CUDA == 1
2146  if (CuDevice::Instantiate().Enabled()) {
2147  Timer tim;
2148  cublas_copy(GetCublasHandle(),
2149  v.Dim(), v.Data(), 1,
2150  this->data_ + col, this->stride_);
2151  CU_SAFE_CALL(cudaGetLastError());
2152  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2153  } else
2154 #endif
2155  {
2156  Mat().CopyColFromVec(v.Vec(), col);
2157  }
2158 }
uint32 UnsignedMatrixIndexT
Definition: matrix-common.h:98
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT stride_
Definition: cu-matrix.h:644
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void CopyCols ( const CuMatrixBase< Real > &  src,
const CuArray< MatrixIndexT > &  indexes 
)

Copies column r from column indexes[r] of src.

As a special case, if indexes[i] == -1, sets column i to zero indexes.size() must equal this->NumCols(), all elements of "reorder" must be in [-1, src.NumCols()-1], and src.NumRows() must equal this.NumRows()

Definition at line 2324 of file cu-matrix.cc.

References CuArray< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuArray< T >::Dim(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by SumGroupComponent::Backprop(), PermuteComponent::Backprop(), ConvolutionComponent::InputToInputPatches(), MaxpoolingComponent::InputToInputPatches(), PermuteComponent::Propagate(), Convolutional1dComponent::Propagate(), ConvolutionalComponent::PropagateFnc(), and Convolutional1dComponent::Update().

2325  {
2326 #if HAVE_CUDA == 1
2327  if (CuDevice::Instantiate().Enabled()) {
2328  KALDI_ASSERT(indices.Dim() == NumCols());
2329  KALDI_ASSERT(NumRows() == src.NumRows());
2330  Timer tim;
2331  dim3 dimGrid, dimBlock;
2332  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2333  &dimGrid, &dimBlock);
2334  cuda_copy_cols(dimGrid, dimBlock, data_, src.Data(), indices.Data(), Dim(), src.Stride());
2335  CU_SAFE_CALL(cudaGetLastError());
2336  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2337  } else
2338 #endif
2339  {
2340  Mat().CopyCols(src.Mat(), indices.Data());
2341  }
2342 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void CopyFromBlock ( const CuBlockMatrix< Real > &  B,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 158 of file cu-matrix.cc.

References CuBlockMatrix< Real >::Block(), CuMatrixBase< Real >::CopyFromMat(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuBlockMatrix< Real >::NumBlocks(), CuBlockMatrix< Real >::NumCols(), CuMatrixBase< Real >::NumCols(), CuBlockMatrix< Real >::NumRows(), CuMatrixBase< Real >::NumRows(), and kaldi::nnet3::SetZero().

Referenced by CuMatrix< Real >::CuMatrix().

159  {
160  this->SetZero();
161  if (trans == kNoTrans) {
162  KALDI_ASSERT(NumRows() == B.NumRows() && NumCols() == B.NumCols());
163  int32 row_offset = 0, col_offset = 0;
164  for (int32 b = 0; b < B.NumBlocks(); b++) {
165  const CuMatrixBase<Real> &block = B.Block(b);
166  int32 num_rows = block.NumRows(), num_cols = block.NumCols();
167  CuSubMatrix<Real> this_block(*this, row_offset, num_rows,
168  col_offset, num_cols);
169  this_block.CopyFromMat(block);
170  row_offset += num_rows;
171  col_offset += num_cols;
172  }
173  KALDI_ASSERT(row_offset == NumRows() && col_offset == NumCols());
174  } else {
175  KALDI_ASSERT(NumRows() == B.NumCols() && NumCols() == B.NumRows());
176  int32 row_offset = 0, col_offset = 0;
177  for (int32 b = 0; b < B.NumBlocks(); b++) {
178  const CuMatrixBase<Real> &block = B.Block(b);
179  int32 num_rows = block.NumCols(), num_cols = block.NumRows();
180  CuSubMatrix<Real> this_block(*this, row_offset, num_rows,
181  col_offset, num_cols);
182  this_block.CopyFromMat(block, kTrans);
183  row_offset += num_rows;
184  col_offset += num_cols;
185  }
186  KALDI_ASSERT(row_offset == NumRows() && col_offset == NumCols());
187  }
188 }
friend class CuSubMatrix< Real >
Definition: cu-matrix.h:88
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
void SetZero()
Math operations, some calling kernels.
Definition: cu-matrix.cc:474
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void CopyFromGeneralMat ( const GeneralMatrix src,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 2705 of file cu-matrix.cc.

References CuSparseMatrix< Real >::CopyToMat(), SparseMatrix< Real >::CopyToMat(), GeneralMatrix::GetFullMatrix(), GeneralMatrix::GetMatrix(), GeneralMatrix::GetSparseMatrix(), KALDI_ERR, kaldi::kCompressedMatrix, kaldi::kFullMatrix, kaldi::kSparseMatrix, and GeneralMatrix::Type().

Referenced by NnetComputer::AcceptInputs(), and kaldi::nnet3::ComputeObjectiveFunction().

2706  {
2707  switch (src.Type()) {
2708  case kFullMatrix: {
2709  const Matrix<BaseFloat> &src_full_mat = src.GetFullMatrix();
2710  this->CopyFromMat(src_full_mat, trans);
2711  return;
2712  }
2713  case kCompressedMatrix: {
2714  Matrix<BaseFloat> mat;
2715  src.GetMatrix(&mat);
2716  this->CopyFromMat(mat, trans);
2717  return;
2718  }
2719  case kSparseMatrix: {
2720  const SparseMatrix<BaseFloat> &smat = src.GetSparseMatrix();
2721 #if HAVE_CUDA == 1
2722  if (CuDevice::Instantiate().Enabled()) {
2723  // only take this branch if we're actually using CUDA, or it would
2724  // entail a wasteful copy of the sparse matrix.
2725  CuSparseMatrix<BaseFloat> cu_smat(smat);
2726  cu_smat.CopyToMat(this, trans);
2727  return;
2728  }
2729 #endif
2730  smat.CopyToMat(&(Mat()), trans);
2731  return;
2732  }
2733  default:
2734  KALDI_ERR << "Invalid GeneralMatrix type.";
2735  }
2736 }
void CopyFromMat(const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:337
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
#define KALDI_ERR
Definition: kaldi-error.h:127
void CopyFromMat ( const MatrixBase< OtherReal > &  src,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 337 of file cu-matrix.cc.

Referenced by ElementwiseProductComponent::Backprop(), SumReduceComponent::Backprop(), BackpropTruncationComponent::Backprop(), TanhComponent::Backprop(), PowerComponent::Backprop(), RectifiedLinearComponent::Backprop(), ScaleComponent::Backprop(), FixedScaleComponent::Backprop(), FixedBiasComponent::Backprop(), NoOpComponent::Backprop(), ClipGradientComponent::Backprop(), PerElementScaleComponent::Backprop(), PerElementOffsetComponent::Backprop(), Softmax::BackpropagateFnc(), HiddenSoftmax::BackpropagateFnc(), BlockSoftmax::BackpropagateFnc(), ParallelComponent::BackpropagateFnc(), SentenceAveragingComponent::BackpropagateFnc(), LengthNormComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), Dropout::BackpropagateFnc(), AddShift::BackpropagateFnc(), Rescale::BackpropagateFnc(), BlockAffineComponent::BlockAffineComponent(), NnetOnlineComputer::Compute(), CuMatrixBase< Real >::CopyFromBlock(), CuBlockMatrix< Real >::CopyFromMat(), GeneralMatrix::CopyToMat(), CuMatrix< Real >::CuMatrix(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), CuMatrixBase< Real >::DiffSoftmaxPerRow(), NnetComputer::ExecuteCommand(), NnetRescaler::FormatInput(), OnlineNaturalGradient::Init(), OnlinePreconditioner::Init(), NaturalGradientAffineComponent::Init(), AffineComponent::Init(), AffineComponentPreconditioned::Init(), AffineComponentPreconditionedOnline::Init(), ConvolutionComponent::Init(), Convolutional1dComponent::Init(), main(), kaldi::nnet2::NnetComputation(), kaldi::cu::NormalizePerRow(), CuMatrix< BaseFloat >::operator=(), kaldi::nnet2::PreconditionDirections(), OnlinePreconditionerSimple::PreconditionDirections(), OnlineNaturalGradientSimple::PreconditionDirections(), kaldi::nnet2::PreconditionDirectionsAlphaRescaled(), ElementwiseProductComponent::Propagate(), SumReduceComponent::Propagate(), BackpropTruncationComponent::Propagate(), PowerComponent::Propagate(), RectifiedLinearComponent::Propagate(), ScaleComponent::Propagate(), NoOpComponent::Propagate(), ClipGradientComponent::Propagate(), SpliceMaxComponent::Propagate(), PerElementScaleComponent::Propagate(), PerElementOffsetComponent::Propagate(), FixedScaleComponent::Propagate(), FixedBiasComponent::Propagate(), AdditiveNoiseComponent::Propagate(), KlHmm::PropagateFnc(), ParallelComponent::PropagateFnc(), LengthNormComponent::PropagateFnc(), Dropout::PropagateFnc(), LstmProjected::PropagateFnc(), AddShift::PropagateFnc(), Rescale::PropagateFnc(), BlstmProjected::PropagateFnc(), kaldi::nnet1::RandGauss(), CuRand< Real >::RandGaussian(), CuRand< Real >::RandUniform(), kaldi::nnet1::RandUniform(), OnlineNaturalGradient::ReorthogonalizeXt1(), OnlinePreconditioner::ReorthogonalizeXt1(), LinearTransform::SetLinearity(), AffineTransform::SetLinearity(), kaldi::TestCuFindRowMaxId(), kaldi::TestCuMatrixTransposeCross(), kaldi::nnet3::TestSimpleComponentPropagateProperties(), kaldi::TestSymInvertPosDef(), kaldi::UnitInvert(), kaldi::UnitTestCheck(), kaldi::UnitTestCholesky(), kaldi::UnitTestConstructor(), kaldi::UnitTestCopyFromMat(), kaldi::UnitTestCopySp(), kaldi::UnitTestCuCopy(), kaldi::UnitTestCuDiffLogSoftmax(), kaldi::UnitTestCuDiffSigmoid(), kaldi::UnitTestCuDiffSoftmax(), kaldi::UnitTestCuDiffTanh(), kaldi::UnitTestCuDiffXent(), kaldi::UnitTestCuFindRowMaxId(), kaldi::UnitTestCuLogSoftmax(), kaldi::UnitTestCuMathNormalizePerRow(), kaldi::UnitTestCuMatrixAddMat(), kaldi::UnitTestCuMatrixAddMatBlocks(), kaldi::UnitTestCuMatrixAddMatDiagVec(), kaldi::UnitTestCuMatrixAddMatMat(), kaldi::UnitTestCuMatrixAddMatMatBatched(), kaldi::UnitTestCuMatrixAddMatMatElements(), kaldi::UnitTestCuMatrixAddVecToCols(), kaldi::UnitTestCuMatrixAddVecToRows(), kaldi::UnitTestCuMatrixCopyCross(), kaldi::UnitTestCuMatrixCopyCross2(), kaldi::UnitTestCuMatrixCopyFromMat(), kaldi::UnitTestCuMatrixDiffGroupPnorm(), kaldi::UnitTestCuMatrixDivElements(), kaldi::UnitTestCuMatrixDivRowsVec(), kaldi::UnitTestCuMatrixGroupMaxDeriv(), kaldi::UnitTestCuMatrixInvertElements(), kaldi::UnitTestCuMatrixMax(), kaldi::UnitTestCuMatrixMulColsVec(), kaldi::UnitTestCuMatrixMulElements(), kaldi::UnitTestCuMatrixMulRowsGroupMat(), kaldi::UnitTestCuMatrixMulRowsVec(), kaldi::UnitTestCuSigmoid(), kaldi::UnitTestCuSoftmax(), kaldi::UnitTestCuTanh(), kaldi::UnitTestCuVectorAddColSumMat(), kaldi::UnitTestCuVectorAddColSumMatLarge(), kaldi::UnitTestCuVectorAddRowSumMat(), kaldi::UnitTestCuVectorAddRowSumMatLarge(), kaldi::UnitTestInvert(), kaldi::UnitTestSwapCu2Cu(), kaldi::UnitTestSwapCu2M(), BlockAffineComponentPreconditioned::Update(), and MatrixRandomizer::Value().

338  {
339  CuMatrix<OtherReal> temp(src);
340  this->CopyFromMat(temp, trans);
341 }
void CopyFromMat(const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:337
void CopyFromMat ( const MatrixBase< Real > &  src,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 309 of file cu-matrix.cc.

References MatrixBase< Real >::Data(), data_, KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, MatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), and MatrixBase< Real >::Stride().

310  {
311 #if HAVE_CUDA == 1
312  if (CuDevice::Instantiate().Enabled()) {
313  if (trans == kNoTrans) {
314  KALDI_ASSERT(src.NumRows() == num_rows_ && src.NumCols() == num_cols_);
315  Timer tim;
316 
317  MatrixIndexT dst_pitch = stride_*sizeof(Real);
318  MatrixIndexT src_pitch = src.Stride()*sizeof(Real);
319  MatrixIndexT width = src.NumCols()*sizeof(Real);
320  CU_SAFE_CALL(cudaMemcpy2D(data_, dst_pitch, src.Data(), src_pitch,
321  width, src.NumRows(), cudaMemcpyHostToDevice));
322 
323  CuDevice::Instantiate().AccuProfile("CuMatrixBase::CopyFromMat(from CPU)",tim.Elapsed());
324  } else {
325  CuMatrix<Real> trans_mat(src); // Do the transpose on the GPU board.
326  this->CopyFromMat(trans_mat, kTrans);
327  }
328  } else
329 #endif
330  {
331  Mat().CopyFromMat(src, trans);
332  }
333 }
void CopyFromMat(const MatrixBase< OtherReal > &src, MatrixTransposeType trans=kNoTrans)
Definition: cu-matrix.cc:337
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT stride_
Definition: cu-matrix.h:644
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void CopyFromMat ( const CuMatrixBase< OtherReal > &  M,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 205 of file cu-matrix.cc.

References CU1DBLOCK, CuMatrixBase< Real >::Data(), data_, CuMatrixBase< Real >::data_, CuMatrixBase< Real >::Dim(), Timer::Elapsed(), KALDI_ASSERT, kaldi::kNoTrans, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::num_rows_, CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

206  {
207  if (sizeof(Real) == sizeof(OtherReal) &&
208  static_cast<const void*>(M.Data()) ==
209  static_cast<const void*>(this->Data())) {
210  if (M.Data() == NULL)
211  return;
212  // CopyFromMat called on same data. Nothing to do (except sanity checks)
213  KALDI_ASSERT(trans == kNoTrans && M.NumRows() == NumRows() &&
214  M.NumCols() == NumCols() && M.Stride() == Stride());
215  return;
216  }
217 #if HAVE_CUDA == 1
218  if (CuDevice::Instantiate().Enabled()) {
219  if (trans == kNoTrans) {
220  KALDI_ASSERT(M.NumRows() == num_rows_ && M.NumCols() == num_cols_);
221  } else {
222  KALDI_ASSERT(M.NumCols() == num_rows_ && M.NumRows() == num_cols_);
223  }
224  if (M.num_rows_ == 0) return; // Nothing to do.
225  Timer tim;
226  if (sizeof(Real) == sizeof(OtherReal) && trans == kNoTrans ) {
227  MatrixIndexT dst_pitch = stride_ * sizeof(Real);
228  MatrixIndexT src_pitch = M.Stride() * sizeof(Real);
229  MatrixIndexT width = M.NumCols() * sizeof(Real);
230  CU_SAFE_CALL(cudaMemcpy2D(data_, dst_pitch, M.data_, src_pitch,
231  width, M.num_rows_, cudaMemcpyDeviceToDevice));
232  } else {
233  if (trans == kNoTrans) {
234  dim3 dimGrid, dimBlock;
235  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
236  &dimGrid, &dimBlock);
237  cuda_copy_from_mat(dimGrid, dimBlock, data_, M.data_, Dim(), M.Dim());
238  } else {
239  // 2D thread block with warps (blockDim.x) along the row-dim of input M.
240  // Each (8x32) thread block will transpose (32x32) data
241  const int32 warpSize = 32;
242  dim3 dimBlock(warpSize, CU1DBLOCK / warpSize);
243  dim3 dimGrid(n_blocks(M.NumCols(), warpSize),
244  n_blocks(M.NumRows(), warpSize));
245  cuda_copy_from_mat_trans(dimGrid, dimBlock, data_, M.data_, Dim(),
246  M.Dim());
247  }
248  CU_SAFE_CALL(cudaGetLastError());
249  }
250  CuDevice::Instantiate().AccuProfile("CuMatrixBase::CopyFromMat(from other CuMatrixBase)", tim.Elapsed());
251  } else
252 #endif
253  {
254  Mat().CopyFromMat(M.Mat(), trans);
255  }
256 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT stride_
Definition: cu-matrix.h:644
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
MatrixIndexT Stride() const
Definition: cu-matrix.h:202
const Real * Data() const
Return data pointer (const).
Definition: cu-matrix.h:603
void CopyFromSp ( const CuSpMatrix< Real > &  M)

Definition at line 353 of file cu-matrix.cc.

References CU2DBLOCK, CuPackedMatrix< Real >::Data(), data_, Timer::Elapsed(), KALDI_ASSERT, CuSpMatrix< Real >::Mat(), and CuPackedMatrix< Real >::NumRows().

Referenced by CuMatrix< BaseFloat >::CuMatrix(), CuSpMatrix< Real >::Invert(), and kaldi::TestCuMatrixCopyFromSp().

353  {
354  KALDI_ASSERT(num_rows_ == M.NumRows() && num_cols_ == num_rows_);
355  if (num_rows_ == 0)
356  return;
357 #if HAVE_CUDA == 1
358  if (CuDevice::Instantiate().Enabled()) {
359  Timer tim;
360  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
361  dim3 dimGrid(n_blocks(NumRows(), CU2DBLOCK),
362  n_blocks(NumRows(), CU2DBLOCK));
363  cuda_copy_from_sp(dimGrid, dimBlock, M.Data(), data_, Dim());
364  CuDevice::Instantiate().AccuProfile("CuMatrix::CopyFromSp",tim.Elapsed());
365  } else
366 #endif
367  {
368  Mat().CopyFromSp(M.Mat());
369  }
370 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
template void CopyFromTp ( const CuTpMatrix< OtherReal > &  M,
MatrixTransposeType  trans = kNoTrans 
)

Definition at line 275 of file cu-matrix.cc.

References CU2DBLOCK, CuPackedMatrix< Real >::Data(), data_, Timer::Elapsed(), KALDI_ASSERT, kaldi::kNoTrans, CuTpMatrix< Real >::Mat(), and CuPackedMatrix< Real >::NumRows().

Referenced by CuMatrixBase< Real >::Cholesky(), CuMatrix< BaseFloat >::CuMatrix(), CuTpMatrix< Real >::Invert(), kaldi::TestCuMatrixCopyFromTp(), and kaldi::UnitTestCuMatrixCopyFromTp().

276  {
277  KALDI_ASSERT(num_rows_ == M.NumRows() && num_cols_ == num_rows_);
278  if (num_rows_ == 0)
279  return;
280 #if HAVE_CUDA == 1
281  if (CuDevice::Instantiate().Enabled()) {
282  Timer tim;
283  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
284  dim3 dimGrid(n_blocks(num_rows_, CU2DBLOCK),
285  n_blocks(num_rows_, CU2DBLOCK));
286  if (trans == kNoTrans) {
287  cuda_copy_from_tp(dimGrid, dimBlock, data_, M.Data(), Dim());
288  } else {
289  cuda_copy_from_tp_trans(dimGrid, dimBlock, data_, M.Data(), Dim());
290  }
291  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
292  } else
293 #endif
294  {
295  Mat().CopyFromTp(M.Mat(), trans);
296  }
297 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void CopyLowerToUpper ( )

Definition at line 2578 of file cu-matrix.cc.

References CU2DBLOCK, data_, Timer::Elapsed(), and KALDI_ASSERT.

Referenced by kaldi::nnet2::PreconditionDirections(), kaldi::TestCuMatrixCopyLowerToUpper(), kaldi::UnitTestCuCholesky(), and kaldi::UnitTestCuMatrixCopyLowerToUpper().

2578  {
2580  if (num_rows_ == 0) return;
2581 #if HAVE_CUDA == 1
2582  if (CuDevice::Instantiate().Enabled()) {
2583  Timer tim;
2584  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2585  int32 dim = num_rows_;
2586  dim3 dimGrid(n_blocks(dim, CU2DBLOCK),
2587  n_blocks(dim, CU2DBLOCK));
2588  cuda_copy_low_upp(dimGrid, dimBlock, data_, Dim());
2589  CU_SAFE_CALL(cudaGetLastError());
2590  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2591  } else
2592 #endif
2593  {
2594  Mat().CopyLowerToUpper();
2595  }
2596 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void CopyRows ( const CuMatrixBase< Real > &  src,
const CuArray< MatrixIndexT > &  indexes 
)

Copies row r from row indexes[r] of src.

As a special case, if indexes[i] < 0, sets row i to zero "reorder".size() must equal this->NumRows(), and src.NumCols() must equal this.NumCols()

Definition at line 2346 of file cu-matrix.cc.

References CuArray< T >::Data(), CuMatrixBase< Real >::Data(), data_, CuArray< T >::Dim(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by StatisticsExtractionComponent::Backprop(), SpliceComponent::Backprop(), NnetComputer::ExecuteCommand(), main(), DistributeComponent::Propagate(), and SpliceMaxComponent::Propagate().

2347  {
2348 #if HAVE_CUDA == 1
2349  if (CuDevice::Instantiate().Enabled()) {
2350  KALDI_ASSERT(static_cast<MatrixIndexT>(indices.Dim()) == NumRows());
2351  KALDI_ASSERT(NumCols() == src.NumCols());
2352 
2353  Timer tim;
2354  dim3 dimGrid, dimBlock;
2355  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2356  &dimGrid, &dimBlock);
2357  cuda_copy_rows(dimGrid, dimBlock, data_, src.Data(), indices.Data(),
2358  Dim(), src.Stride());
2359  CU_SAFE_CALL(cudaGetLastError());
2360  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2361  } else
2362 #endif
2363  {
2364  Mat().CopyRows(src.Mat(), indices.Data());
2365  }
2366 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void CopyRows ( const CuArray< const Real * > &  src)

Copies row r of this matrix from an array of floats at the location given by src[r], where src[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (the point is: the data it points to should be on the GPU if we're using a GPU, and on a CPU otherwise).

src.size() must equal this.NumRows(), and if any src[r] is NULL then this.Row(r) will be set to zero.

Definition at line 2391 of file cu-matrix.cc.

References CU2DBLOCK, CuArray< T >::Data(), data_, CuArray< T >::Dim(), and KALDI_ASSERT.

2391  {
2392  if (NumRows() == 0) return;
2393 #if HAVE_CUDA == 1
2394  if (CuDevice::Instantiate().Enabled()) {
2395  KALDI_ASSERT(static_cast<MatrixIndexT>(src.Dim()) == NumRows());
2396  Timer tim;
2397  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2398  dim3 dimGrid(n_blocks(num_cols_, CU2DBLOCK),
2399  n_blocks(num_rows_, CU2DBLOCK));
2400  cuda_copy_rows(dimGrid, dimBlock, data_, src.Data(), Dim());
2401  CU_SAFE_CALL(cudaGetLastError());
2402  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2403  } else
2404 #endif
2405  {
2406  Mat().CopyRows(src.Data());
2407  }
2408 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void CopyRowsFromVec ( const CuVectorBase< Real > &  v)

This function has two modes of operation.

If v.Dim() == NumRows() * NumCols(), then treats the vector as a row-by-row concatenation of a matrix and copies to *this. if v.Dim() == NumCols(), it sets each row of *this to a copy of v.

Definition at line 2071 of file cu-matrix.cc.

References CuVectorBase< Real >::Data(), data_, CuVectorBase< Real >::Dim(), Timer::Elapsed(), KALDI_ERR, and CuVectorBase< Real >::Vec().

Referenced by kaldi::CuVectorUnitTestCopyFromMat(), NnetOnlineComputer::Flush(), NnetRescaler::FormatInput(), RepeatedAffineComponent::Propagate(), FixedAffineComponent::Propagate(), AffineComponent::Propagate(), BlockAffineComponent::Propagate(), ConstantFunctionComponent::Propagate(), LinearTransform::SetParams(), AffineTransform::SetParams(), RecurrentComponent::SetParams(), ConvolutionalComponent::SetParams(), Convolutional2DComponent::SetParams(), LstmProjected::SetParams(), BlstmProjected::SetParams(), kaldi::UnitTestCuMatrixCopyRowsFromVec(), RepeatedAffineComponent::UnVectorize(), AffineComponent::UnVectorize(), BlockAffineComponent::UnVectorize(), ConvolutionComponent::UnVectorize(), and LstmNonlinearityComponent::UnVectorize().

2071  {
2072 #if HAVE_CUDA == 1
2073  if (CuDevice::Instantiate().Enabled()) {
2074  Timer tim;
2075  if (v.Dim() == num_rows_*num_cols_) {
2076  if (stride_ == num_cols_) {
2077  const Real* v_data = v.Data();
2078  CU_SAFE_CALL(cudaMemcpy(data_, v_data,
2079  sizeof(Real)*num_rows_*num_cols_,
2080  cudaMemcpyDeviceToDevice));
2081  } else {
2082  CU_SAFE_CALL(cudaMemcpy2D(data_, stride_ * sizeof(Real), v.Data(),
2083  num_cols_*sizeof(Real), num_cols_*sizeof(Real),
2084  num_rows_,
2085  cudaMemcpyDeviceToDevice));
2086  }
2087  } else if (v.Dim() == num_cols_) {
2088  dim3 dimGrid, dimBlock;
2089  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2090  &dimGrid, &dimBlock);
2091  cuda_copy_rows_from_vec(dimGrid, dimBlock, data_, this->Dim(), v.Data());
2092  CU_SAFE_CALL(cudaGetLastError());
2093  } else {
2094  KALDI_ERR << "Wrong sized arguments";
2095  }
2096  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2097  } else
2098 #endif
2099  {
2100  Mat().CopyRowsFromVec(v.Vec());
2101  }
2102 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT stride_
Definition: cu-matrix.h:644
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define KALDI_ERR
Definition: kaldi-error.h:127
::MatrixDim Dim() const
Definition: cu-matrix.h:206
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void CopyRowsFromVec ( const VectorBase< Real > &  v)

Version of CopyRowsFromVec() that takes a CPU-based vector.

Definition at line 2105 of file cu-matrix.cc.

References VectorBase< Real >::Data(), data_, VectorBase< Real >::Dim(), Timer::Elapsed(), and KALDI_ERR.

2105  {
2106 #if HAVE_CUDA == 1
2107  if (CuDevice::Instantiate().Enabled()) {
2108  Timer tim;
2109  if (v.Dim() == num_rows_*num_cols_) {
2110  if (stride_ == num_cols_) {
2111  const Real* v_data = v.Data();
2112  cudaMemcpy(data_, v_data, sizeof(Real)*num_rows_*num_cols_, cudaMemcpyHostToDevice);
2113  } else {
2114  const Real *v_data = v.Data();
2115  for (MatrixIndexT r = 0; r < num_rows_; r++) {
2116  Real *row_data = RowData(r);
2117  cudaMemcpy(row_data, v_data, sizeof(Real)*num_cols_, cudaMemcpyHostToDevice);
2118  v_data += num_cols_;
2119  }
2120  }
2121  } else if (v.Dim() == num_cols_) {
2122  dim3 dimGrid, dimBlock;
2123  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2124  &dimGrid, &dimBlock);
2125  cuda_copy_rows_from_vec(dimGrid, dimBlock, this->data_, this->Dim(), v.Data());
2126  CU_SAFE_CALL(cudaGetLastError());
2127  } else {
2128  KALDI_ERR << "Wrong sized arguments";
2129  }
2130  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2131  } else
2132 #endif
2133  {
2134  Mat().CopyRowsFromVec(v);
2135  }
2136 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT stride_
Definition: cu-matrix.h:644
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define KALDI_ERR
Definition: kaldi-error.h:127
::MatrixDim Dim() const
Definition: cu-matrix.h:206
const Real * RowData(MatrixIndexT r) const
Get raw row pointer (const).
Definition: cu-matrix.h:597
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
template void CopyToMat ( MatrixBase< OtherReal > *  dst,
MatrixTransposeType  trans = kNoTrans 
) const

Definition at line 413 of file cu-matrix.cc.

References MatrixBase< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyToMat(), MatrixBase< Real >::Data(), data_, Timer::Elapsed(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, MatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), and MatrixBase< Real >::Stride().

Referenced by NnetComputerFromEg::Compute(), CuMatrixBase< Real >::CopyToMat(), kaldi::nnet1::MomentStatistics(), KlHmm::PropagateFnc(), kaldi::UnitInvert(), kaldi::UnitTestCholesky(), kaldi::UnitTestCuDiffLogSoftmax(), kaldi::UnitTestCuDiffSigmoid(), kaldi::UnitTestCuDiffSoftmax(), kaldi::UnitTestCuDiffTanh(), kaldi::UnitTestCuDiffXent(), kaldi::UnitTestCuMatrixAddMat(), kaldi::UnitTestCuMatrixAddMatBlocks(), kaldi::UnitTestCuMatrixAddMatMat(), kaldi::UnitTestCuMatrixAddVecToCols(), kaldi::UnitTestCuMatrixAddVecToRows(), kaldi::UnitTestCuMatrixAddVecVec(), kaldi::UnitTestCuMatrixDiffGroupPnorm(), kaldi::UnitTestCuMatrixDivElements(), kaldi::UnitTestCuMatrixDivRowsVec(), kaldi::UnitTestCuMatrixGroupMaxDeriv(), kaldi::UnitTestCuMatrixInvertElements(), kaldi::UnitTestCuMatrixMax(), kaldi::UnitTestCuMatrixMulColsVec(), kaldi::UnitTestCuMatrixMulElements(), kaldi::UnitTestCuMatrixMulRowsGroupMat(), kaldi::UnitTestCuMatrixMulRowsVec(), kaldi::UnitTestCuSigmoid(), kaldi::UnitTestCuTanh(), kaldi::UnitTestInvert(), kaldi::UnitTestMatrix(), UnitTestMatrixRandomizer(), kaldi::UnitTestSetZeroAboveDiag(), kaldi::UnitTestSwapCu2Cu(), and kaldi::UnitTestSwapCu2M().

414  {
415 #if HAVE_CUDA == 1
416  if (CuDevice::Instantiate().Enabled()) {
417  if (trans == kTrans || sizeof(OtherReal) != sizeof(Real)) {
418  CuMatrix<OtherReal> this_trans(*this, trans);
419  this_trans.CopyToMat(dst, kNoTrans);
420  } else {
421  KALDI_ASSERT(dst->NumRows() == NumRows() && dst->NumCols() == NumCols());
422  if (num_rows_ == 0) return;
423  Timer tim;
424 
425  MatrixIndexT src_pitch = stride_*sizeof(Real);
426  MatrixIndexT dst_pitch = dst->Stride()*sizeof(Real);
427  MatrixIndexT width = NumCols()*sizeof(Real);
428  CU_SAFE_CALL(cudaMemcpy2D(dst->Data(), dst_pitch, this->data_, src_pitch,
429  width, this->num_rows_, cudaMemcpyDeviceToHost));
430 
431  CuDevice::Instantiate().AccuProfile("CuMatrix::CopyToMatD2H",tim.Elapsed());
432  }
433  } else
434  #endif
435  {
436  dst->CopyFromMat(Mat(), trans);
437  }
438 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT stride_
Definition: cu-matrix.h:644
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void CopyToRows ( const CuArray< Real * > &  dst) const

For each row r of this matrix, copies it to the array of floats at the location given by dst[r], where dst[r] is assumed to be obtained from the RowData() function of another CuMatrix, or from CuVector::Data() (i.e.

it should point to memory on the GPU if we're using a GPU, or on the CPU otherwise). If dst[r] is NULL, does not copy anywhere. Requires that none of the memory regions pointed to by the pointers in "dst" overlap (e.g. none of the pointers should be the same).

Definition at line 2412 of file cu-matrix.cc.

References CU2DBLOCK, CuArray< T >::Data(), data_, CuArray< T >::Dim(), and KALDI_ASSERT.

Referenced by DistributeComponent::Backprop(), NnetComputer::ExecuteCommand(), and kaldi::UnitTestCuMatrixCopyToRows().

2412  {
2413  if (NumRows() == 0) return;
2414 #if HAVE_CUDA == 1
2415  if (CuDevice::Instantiate().Enabled()) {
2416  KALDI_ASSERT(static_cast<MatrixIndexT>(dst.Dim()) == NumRows());
2417 
2418  Timer tim;
2419  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2420  dim3 dimGrid(n_blocks(num_cols_, CU2DBLOCK),
2421  n_blocks(num_rows_, CU2DBLOCK));
2422  cuda_copy_to_rows(dimGrid, dimBlock, dst.Data(), data_, Dim());
2423  CU_SAFE_CALL(cudaGetLastError());
2424  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2425  } else
2426 #endif
2427  {
2428  Mat().CopyToRows(dst.Data());
2429  }
2430 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void CopyUpperToLower ( )

Definition at line 2599 of file cu-matrix.cc.

References CU2DBLOCK, data_, Timer::Elapsed(), and KALDI_ASSERT.

Referenced by kaldi::TestCuMatrixCopyUpperToLower(), and kaldi::UnitTestCuMatrixCopyUpperToLower().

2599  {
2601  if (num_rows_ == 0) return;
2602 #if HAVE_CUDA == 1
2603  if (CuDevice::Instantiate().Enabled()) {
2604  Timer tim;
2605  int32 dim = this->num_rows_;
2606  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
2607  dim3 dimGrid(n_blocks(dim, CU2DBLOCK),
2608  n_blocks(dim, CU2DBLOCK));
2609  cuda_copy_upp_low(dimGrid, dimBlock, data_, Dim());
2610  CU_SAFE_CALL(cudaGetLastError());
2611  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2612  } else
2613 #endif
2614  {
2615  Mat().CopyUpperToLower();
2616  }
2617 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
const Real* Data ( ) const
inline

Return data pointer (const).

Warning: may return a pointer to GPU memory. Use at your own risk.

Definition at line 603 of file cu-matrix.h.

Referenced by CuMatrixBase< Real >::AddCols(), CuVectorBase< Real >::AddDiagMatMat(), CuMatrixBase< Real >::AddDiagVecMat(), CuSpMatrix< Real >::AddMat2(), CuMatrixBase< Real >::AddMatBlock(), CuMatrixBase< Real >::AddMatDiagVec(), CuBlockMatrix< Real >::AddMatMat(), CuMatrixBase< Real >::AddMatMatElements(), CuVectorBase< Real >::AddMatVec(), CuMatrixBase< Real >::AddRowRanges(), CuMatrixBase< Real >::AddRows(), NormalizeComponent::Backprop(), RepeatedAffineComponent::Backprop(), kaldi::cu::BackpropLstmNonlinearity(), CuMatrix< Real >::CompObjfAndDeriv(), DistributeComponent::ComputeInputPointers(), kaldi::cu::ComputeLstmNonlinearity(), kaldi::cu::Copy(), CuVectorBase< Real >::CopyColFromMat(), CuMatrixBase< Real >::CopyCols(), CuVectorBase< Real >::CopyDiagFromMat(), CuTpMatrix< Real >::CopyFromMat(), CuSpMatrix< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyRows(), CuVectorBase< Real >::CopyRowsFromMat(), VectorBase< Real >::CopyRowsFromMat(), CuSparseMatrix< Real >::CopyToMat(), CuMatrixBase< Real >::DiffGroupPnorm(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), CuMatrixBase< Real >::DiffSoftmaxPerRow(), CuMatrixBase< Real >::EqualElementMask(), NnetComputer::GetPointers(), CuMatrixBase< Real >::GroupMaxDeriv(), CuTpMatrix< Real >::Invert(), kaldi::cu::NormalizePerRow(), RepeatedAffineComponent::Propagate(), kaldi::cu::Randomize(), kaldi::cu::RegularizeL1(), CuBlockMatrix< Real >::SetCudaData(), kaldi::cu::Splice(), CuMatrixBase< Real >::SumColumnRanges(), CuMatrixBase< Real >::SymAddMat2(), kaldi::TraceMatMat(), kaldi::TraceMatSmat(), RepeatedAffineComponent::Update(), and NaturalGradientRepeatedAffineComponent::Update().

603 { return data_; }
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
Real* Data ( )
inline

Return data pointer.

Warning: may return a pointer to GPU memory. Use at your own risk.

Definition at line 606 of file cu-matrix.h.

606 { return data_; }
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
void DiffGroupPnorm ( const CuMatrixBase< Real > &  in_value,
const CuMatrixBase< Real > &  out_value,
const CuMatrixBase< Real > &  out_deriv,
Real  power 
)

Differentiate backward through the GroupPnorm function.

It is a combination of GroupPnormDeriv and MulRowsGroupMat.

Definition at line 801 of file cu-matrix.cc.

References CU1DBLOCK, CuMatrixBase< Real >::Data(), data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by PnormComponent::Backprop(), and kaldi::UnitTestCuMatrixDiffGroupPnorm().

804  {
805  KALDI_ASSERT(out_value.NumCols() > 0);
806  KALDI_ASSERT(out_value.NumCols() == out_deriv.NumCols());
807  int group_size = this->NumCols() / out_value.NumCols();
808  KALDI_ASSERT(this->NumCols() == out_value.NumCols() * group_size);
809 #if HAVE_CUDA == 1
810  if (CuDevice::Instantiate().Enabled()) {
811  Timer tim;
812  const int kWarpSize = 32;
813  dim3 dimBlock(kWarpSize, CU1DBLOCK / kWarpSize);
814  dim3 dimGrid(n_blocks(NumCols(), dimBlock.x),
815  n_blocks(NumRows(), dimBlock.y));
816  if (dimGrid.x * dimGrid.y > 1024) {
817  dimGrid.y = std::max(1024 / dimGrid.x, unsigned(1));
818  }
819  cuda_diff_group_pnorm(dimGrid, dimBlock, this->data_, in_value.Data(),
820  out_value.Data(), out_deriv.Data(), Dim(),
821  in_value.Stride(), out_value.Stride(),
822  out_deriv.Stride(), group_size, power);
823  CU_SAFE_CALL(cudaGetLastError());
824  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
825  } else
826 #endif
827  {
828  Mat().GroupPnormDeriv(in_value.Mat(), out_value.Mat(), power);
829  MulRowsGroupMat(out_deriv);
830  }
831 }
void MulRowsGroupMat(const CuMatrixBase< Real > &src)
divide each row into src.NumCols() groups, and then scale i'th row's jth group of elements by src[i...
Definition: cu-matrix.cc:776
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void DiffLogSoftmaxPerRow ( const CuMatrixBase< Real > &  out_value,
const CuMatrixBase< Real > &  out_deriv 
)

Differentiate backward through the log softmax function.

Here, "out_value" is the log softmax output. Does, for each row i, *this(i) = out_deriv(i) - sum(out_deriv(i)) .* exp(out_value(i)) xxxx(i) is row-vector.

Definition at line 1684 of file cu-matrix.cc.

References CuVectorBase< Real >::AddColSumMat(), CuMatrixBase< Real >::AddMat(), CuMatrixBase< Real >::ApplyExp(), CuMatrixBase< Real >::CopyFromMat(), CU1DBLOCK, CuMatrixBase< Real >::Data(), data_, Timer::Elapsed(), KALDI_ASSERT, kaldi::kNoTrans, CuMatrixBase< Real >::MulRowsVec(), CuMatrixBase< Real >::NumRows(), kaldi::SameDim(), CuMatrixBase< Real >::Scale(), and CuMatrixBase< Real >::Stride().

Referenced by LogSoftmaxComponent::Backprop(), and kaldi::UnitTestCuDiffLogSoftmax().

1685  {
1686 
1687  KALDI_ASSERT(SameDim(out_value, out_deriv) && SameDim(out_value, *this));
1688 
1689 #if HAVE_CUDA == 1
1690  if (CuDevice::Instantiate().Enabled()) {
1691  Timer tim;
1692 
1693  // CUDA thread layout: one thread block per matrix-row.
1694  dim3 dimBlock(CU1DBLOCK);
1695  dim3 dimGrid(num_rows_);
1696  cuda_diff_log_softmax(dimGrid, dimBlock, this->Dim(), out_value.Data(),
1697  out_value.Stride(), out_deriv.Data(),
1698  out_deriv.Stride(), data_);
1699  CU_SAFE_CALL(cudaGetLastError());
1700 
1701  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1702  } else
1703 #endif
1704  {
1705  /*
1706  Let the output be y, then
1707  y_i = x_i - log(sum_i exp(x_i))
1708  where x_i is the input to the component. The Jacobian matrix of this
1709  function is
1710  J = I - 1 exp(y^T)
1711  where 1 is a vector of ones. Let the derivative vector at the output be e,
1712  and at the input be d, then we have
1713  d = e - exp(y) Sum(e)
1714  d_i = e_i - exp(y_i) Sum(e)
1715  */
1716  const CuMatrixBase<Real> &Y(out_value), &E(out_deriv);
1717  CuMatrixBase<Real> &D(*this);
1718 
1719  D.CopyFromMat(Y);
1720  D.ApplyExp(); // exp(y)
1721  CuVector<Real> E_sum(D.NumRows()); // Initializes to zero
1722  E_sum.AddColSumMat(1.0, E); // Sum(e)
1723  D.MulRowsVec(E_sum); // exp(y) Sum(e)
1724  D.Scale(-1.0); // - exp(y) Sum(e)
1725  D.AddMat(1.0, E, kNoTrans); // e - exp(y_i) Sum(e)
1726  }
1727 }
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void DiffParametricRelu ( const CuMatrixBase< Real > &  value,
const CuMatrixBase< Real > &  diff,
const CuVectorBase< Real > &  alpha,
const CuVectorBase< Real > &  beta 
)

Differentiate backward through the parametric relu function.

Here the "value" is the Relu input. Does, element-by-element. *this = diff * (value > 0 ? alpha : beta)

Definition at line 1286 of file cu-matrix.cc.

References CU2DBLOCK, data_, CuVectorBase< Real >::data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::Stride(), and CuVectorBase< Real >::Vec().

Referenced by ParametricRelu::BackpropagateFnc().

1290  {
1291 #if HAVE_CUDA == 1
1292  if (CuDevice::Instantiate().Enabled()) {
1293  Timer tim;
1294 
1295  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
1296  dim3 dimGrid(n_blocks(num_cols_, CU2DBLOCK), n_blocks(num_rows_, CU2DBLOCK));
1297 
1298  cuda_diff_parametric_relu(dimGrid, dimBlock, data_, diff.data_, value.data_,
1299  Dim(), diff.Stride(), value.Stride(),
1300  alpha.data_, beta.data_);
1301  CU_SAFE_CALL(cudaGetLastError());
1302 
1303  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1304  } else
1305 #endif
1306  {
1307  // Do it on CPU,
1308  for (MatrixIndexT r = 0; r < NumRows(); r++) {
1309  for (MatrixIndexT c = 0; c < NumCols(); c++) {
1310  Real value_elem = value.Mat()(r,c);
1311  this->Mat()(r,c) = diff.Mat()(r,c) *
1312  (value_elem >= 0.0 ? alpha.Vec()(c) : beta.Vec()(c));
1313  }
1314  }
1315  }
1316 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:206
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void DiffSigmoid ( const CuMatrixBase< Real > &  value,
const CuMatrixBase< Real > &  diff 
)

Differentiate backward through the sigmoid function.

Here, "value" is the sigmoid output. Does, element-by-element, *this = diff * value * (1 - value).

Definition at line 1546 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by SigmoidComponent::Backprop(), Sigmoid::BackpropagateFnc(), and kaldi::UnitTestCuDiffSigmoid().

1547  {
1548  KALDI_ASSERT(SameDim(*this, value) && SameDim(*this, diff));
1549 #if HAVE_CUDA == 1
1550  if (CuDevice::Instantiate().Enabled()) {
1551  Timer tim;
1552  dim3 dimGrid, dimBlock;
1553  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1554  &dimGrid, &dimBlock);
1555  cuda_diff_sigmoid(dimGrid, dimBlock, data_, diff.data_, value.data_, Dim(), diff.Stride(), value.Stride());
1556  CU_SAFE_CALL(cudaGetLastError());
1557 
1558  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1559  } else
1560 #endif
1561  {
1562  Mat().DiffSigmoid(value.Mat(), diff.Mat());
1563  }
1564 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void DiffSoftmaxPerRow ( const CuMatrixBase< Real > &  value,
const CuMatrixBase< Real > &  diff 
)

Differentiate backward through the softmax function.

Here, "value" is the softmax output. Does, for each row i, *this(i) = diff(i) * diag(value(i)) - diff(i) * (value(i)^T * value(i)) xxxx(i) is row-vector; '*' and '-' are matrix operations.

Definition at line 1650 of file cu-matrix.cc.

References CuVectorBase< Real >::AddDiagMatMat(), CuMatrixBase< Real >::AddDiagVecMat(), CuMatrixBase< Real >::CopyFromMat(), CU1DBLOCK, CuMatrixBase< Real >::Data(), data_, Timer::Elapsed(), KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, CuMatrixBase< Real >::MulElements(), CuMatrixBase< Real >::NumRows(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by SoftmaxComponent::Backprop(), and kaldi::UnitTestCuDiffSoftmax().

1651  {
1652 
1653  KALDI_ASSERT(SameDim(value, diff) && SameDim(value, *this));
1654 
1655 #if HAVE_CUDA == 1
1656  if (CuDevice::Instantiate().Enabled()) {
1657  Timer tim;
1658 
1659  // CUDA thread layout: one thread block per matrix-row.
1660  dim3 dimBlock(CU1DBLOCK);
1661  dim3 dimGrid(num_rows_);
1662  cuda_diff_softmax(dimGrid, dimBlock, data_, this->Dim(), value.Data(),
1663  value.Stride(), diff.Data(), diff.Stride());
1664  CU_SAFE_CALL(cudaGetLastError());
1665 
1666  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1667  } else
1668 #endif
1669  {
1670  const CuMatrixBase<Real> &P(value), &E(diff);
1671  CuMatrixBase<Real> &D(*this);
1672 
1673  D.CopyFromMat(P);
1674  D.MulElements(E);
1675  // At this point, D = P .* E (in matlab notation)
1676  CuVector<Real> pe_vec(D.NumRows()); // For each row i, the dot product (p_t . e_t).
1677  pe_vec.AddDiagMatMat(1.0, P, kNoTrans, E, kTrans, 0.0);
1678 
1679  D.AddDiagVecMat(-1.0, pe_vec, P, kNoTrans, 1.0); // does D -= diag(pe_vec) * P.
1680  }
1681 }
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void DiffTanh ( const CuMatrixBase< Real > &  value,
const CuMatrixBase< Real > &  diff 
)

Differentiate backward through the tanh function.

Here, "value" is the tanh output. Does, element-by-element, *this = diff * (1 - value^2).

Definition at line 1591 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), CuMatrixBase< Real >::Mat(), and CuMatrixBase< Real >::Stride().

Referenced by TanhComponent::Backprop(), RecurrentComponent::BackpropagateFnc(), Tanh::BackpropagateFnc(), and kaldi::UnitTestCuDiffTanh().

1592  {
1593 #if HAVE_CUDA == 1
1594  if (CuDevice::Instantiate().Enabled()) {
1595  Timer tim;
1596  dim3 dimGrid, dimBlock;
1597  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1598  &dimGrid, &dimBlock);
1599  cuda_diff_tanh(dimGrid, dimBlock, data_, diff.data_, value.data_, Dim(), diff.Stride(), value.Stride());
1600  CU_SAFE_CALL(cudaGetLastError());
1601 
1602  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1603  } else
1604 #endif
1605  {
1606  Mat().DiffTanh(value.Mat(), diff.Mat());
1607  }
1608 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
void DiffXent ( const CuArray< int32 > &  tgt,
CuVector< Real > *  log_post_tgt 
)

Differentiate the block [softmax+cross-entropy] : dE/da = posterior_mat - target_mat, 'E' is error function, 'a' is activation on softmax input.

Interface: tgt ... index vector, encodes the matrix of targets net_out_or_diff ... before invocation net output, after diff dE/da log_post_tgt ... per-frame statistics for cross-entropy computations : log(sum_row(posterior_mat .* target_mat))

Definition at line 1730 of file cu-matrix.cc.

References CU2DBLOCK, CuArray< T >::Data(), data_, CuVectorBase< Real >::data_, CuArray< T >::Dim(), Timer::Elapsed(), KALDI_ASSERT, kaldi::Log(), CuVector< Real >::Resize(), and CuVectorBase< Real >::Vec().

Referenced by kaldi::UnitTestCuDiffXent().

1731  {
1732 
1733  KALDI_ASSERT(tgt.Dim() == num_rows_);
1734  log_post_tgt->Resize(tgt.Dim());
1735 
1736 #if HAVE_CUDA == 1
1737  if (CuDevice::Instantiate().Enabled()) {
1738  Timer tim;
1739  dim3 dimBlock(1, CU2DBLOCK*8);
1740  dim3 dimGrid(1, n_blocks(tgt.Dim(), CU2DBLOCK*8));
1741  cuda_diff_xent(dimGrid, dimBlock, tgt.Data(), data_,
1742  log_post_tgt->data_, Dim());
1743 
1744  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1745  } else
1746 #endif
1747  {
1748  MatrixIndexT num_rows = num_rows_;
1749  for(int32 r = 0; r < num_rows; r++) {
1750  int32 col_tgt = tgt.Data()[r];
1751  Real &value = Mat()(r, col_tgt);
1752  log_post_tgt->Vec()(r) = Log(value);
1753  value -= 1.0;
1754  }
1755  }
1756 }
const T * Data() const
Get raw pointer.
Definition: cu-array.h:65
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
int32 MatrixIndexT
Definition: matrix-common.h:96
double Log(double x)
Definition: kaldi-math.h:100
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:62
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void DivElements ( const CuMatrixBase< Real > &  A)

Divide two matrices elementwise: C = A ./ A.

Definition at line 676 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by kaldi::UnitTestCuMatrixDivElements(), and kaldi::UnitTestCuMatrixSetMatMatDivMat().

676  {
677  #if HAVE_CUDA == 1
678  if (CuDevice::Instantiate().Enabled()) {
679  Timer tim;
680 
681  KALDI_ASSERT(num_cols_ == A.NumCols());
682  KALDI_ASSERT(num_rows_ == A.NumRows());
683 
684  dim3 dimGrid, dimBlock;
685  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
686  &dimGrid, &dimBlock);
687 
688  cuda_div_elements(dimGrid, dimBlock, data_, A.data_, Dim(), A.Stride());
689  CU_SAFE_CALL(cudaGetLastError());
690 
691  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
692  } else
693  #endif
694  {
695  Mat().DivElements(A.Mat());
696  }
697 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void DivRowsVec ( const CuVectorBase< Real > &  div)

divide i'th row by scale[i]

Definition at line 859 of file cu-matrix.cc.

References data_, CuVectorBase< Real >::data_, CuVectorBase< Real >::Dim(), Timer::Elapsed(), VectorBase< Real >::InvertElements(), KALDI_ASSERT, and CuVectorBase< Real >::Vec().

Referenced by StatisticsPoolingComponent::Backprop(), StatisticsPoolingComponent::Propagate(), kaldi::TestCuMatrixDivRowsVec(), and kaldi::UnitTestCuMatrixDivRowsVec().

859  {
860 #if HAVE_CUDA == 1
861  if (CuDevice::Instantiate().Enabled()) {
862  Timer tim;
863 
864  KALDI_ASSERT(div.Dim() == NumRows());
865 
866  dim3 dimGrid, dimBlock;
867  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
868  &dimGrid, &dimBlock);
869  // For large matrix we do more work per thread by limiting the
870  // the grid size to reduce the block launching overhead.
871  if (dimGrid.x * dimGrid.y > 1024) {
872  dimGrid.x = 1024 / dimGrid.y;
873  if (dimGrid.x == 0) {
874  dimGrid.x = 1;
875  }
876  }
877  cuda_div_rows_vec(dimGrid, dimBlock, data_, div.data_, Dim());
878  CU_SAFE_CALL(cudaGetLastError());
879 
880  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
881  } else
882 #endif
883  {
884  Vector<Real> temp(div.Vec()); // will copy.
885  temp.InvertElements();
886  Mat().MulRowsVec(temp);
887  }
888 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void EqualElementMask ( const CuMatrixBase< Real > &  mat,
CuMatrix< Real > *  mask 
) const

Definition at line 3011 of file cu-matrix.cc.

References CuMatrixBase< Real >::Data(), data_, Timer::Elapsed(), KALDI_ASSERT, kaldi::kSetZero, CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), CuMatrix< Real >::Resize(), and CuMatrixBase< Real >::Stride().

Referenced by MaxpoolingComponent::Backprop(), MaxPoolingComponent::BackpropagateFnc(), and MaxPooling2DComponent::BackpropagateFnc().

3011  {
3012  // Check the inputs:
3013  KALDI_ASSERT(mat.NumRows() == NumRows() && mat.NumCols() == NumCols());
3014  KALDI_ASSERT(mask != NULL);
3015  // Resizes the output matrix:
3016  mask->Resize(NumRows(), NumCols(), kSetZero);
3017 
3018 #if HAVE_CUDA == 1
3019  if (CuDevice::Instantiate().Enabled()) {
3020  Timer tim;
3021  dim3 dimGrid, dimBlock;
3022  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
3023  &dimGrid, &dimBlock);
3024  cuda_equal_element_mask(dimGrid, dimBlock, this->data_, mat.Data(),
3025  mask->Data(), this->Dim(), mat.Stride(),
3026  mask->Stride());
3027  CU_SAFE_CALL(cudaGetLastError());
3028 
3029  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
3030  } else
3031 #endif
3032  {
3033  for (int32 r = 0; r < NumRows(); r++) {
3034  for (int32 c = 0; c < NumCols(); c++) {
3035  (*mask)(r,c) = ((*this)(r,c) == mat(r,c) ? 1.0 : 0.0);
3036  }
3037  }
3038  }
3039 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void FindRowMaxId ( CuArray< int32 > *  id) const

Find the id of the maximal element for each row.

Definition at line 1611 of file cu-matrix.cc.

References CU1DBLOCK, rnnlm::d, CuArray< T >::Data(), data_, and Timer::Elapsed().

Referenced by kaldi::nnet3::ComputeAccuracy(), NnetUpdater::ComputeTotAccuracy(), Xent::Eval(), kaldi::TestCuFindRowMaxId(), and kaldi::UnitTestCuFindRowMaxId().

1611  {
1612 #if HAVE_CUDA == 1
1613  if (CuDevice::Instantiate().Enabled()) {
1614  Timer tim;
1615  id->Resize(num_rows_);
1616  MatrixDim d = Dim();
1617 
1618  // CUDA thread layout: one thread block per matrix-row.
1619  dim3 dimBlock(CU1DBLOCK);
1620  dim3 dimGrid(num_rows_);
1621  cuda_find_row_max_id(dimGrid, dimBlock, data_, NULL, id->Data(), d);
1622  CU_SAFE_CALL(cudaGetLastError());
1623 
1624  // now we have the indices!
1625  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1626  } else
1627 #endif
1628  {
1629  // allocate index buffer
1630  id->Resize(num_rows_);
1631  id->Set(-1);
1632  // find maxima
1633  MatrixIndexT num_rows = num_rows_, num_cols = num_cols_;
1634  for (MatrixIndexT r = 0; r < num_rows; r++) {
1635  Real max = -1e21;
1636  int32 max_id = -1;
1637  const Real *row_data = Mat().RowData(r);
1638  for (MatrixIndexT c = 0; c < num_cols; c++) {
1639  if (max < row_data[c]) {
1640  max = row_data[c];
1641  max_id = c;
1642  }
1643  }
1644  id->Data()[r] = max_id;
1645  }
1646  }
1647 }
const T * Data() const
Get raw pointer.
Definition: cu-array.h:65
Structure containing size of the matrix plus stride.
Definition: cu-matrixdim.h:52
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
int32 MatrixIndexT
Definition: matrix-common.h:96
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:206
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
Real FrobeniusNorm ( ) const
inline

Definition at line 211 of file cu-matrix.h.

Referenced by CuMatrixBase< Real >::ApproxEqual(), and kaldi::UnitTestCuSparseMatrixFrobeniusNorm().

211 { return sqrt(TraceMatMat(*this, *this, kTrans)); }
friend Real TraceMatMat(const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, MatrixTransposeType trans)
Definition: cu-matrix.cc:1917
void GroupMax ( const CuMatrixBase< Real > &  src)

Apply the function y(i) = (max_{j = i*G}^{(i+1)*G-1} x_j where G = x.NumCols() / y.NumCols() must be an integer.

[note: y corresponds to *this and x to src, so src.NumCols() / this->NumCols() must be an integer.

Definition at line 1402 of file cu-matrix.cc.

References CU1DBLOCK, data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by MaxoutComponent::Propagate(), kaldi::TestCuMatrixGroupMax(), kaldi::TestCuMatrixGroupMaxAllGroupSizes(), and kaldi::UnitTestCuMatrixGroupMax().

1402  {
1403  int group_size = src.NumCols() / this->NumCols();
1404  KALDI_ASSERT(src.NumCols() == this->NumCols() * group_size &&
1405  this->NumRows() == src.NumRows());
1406 #if HAVE_CUDA == 1
1407  if (CuDevice::Instantiate().Enabled()) {
1408  Timer tim;
1409  // One thread block per row.
1410  // Use 2D block for small group size to simplify the calculation.
1411  // Each group is reduced by threads_per_group threads.
1412  // threads_per_group should be a power of 2 for fast tree reduction.
1413  // group size: 1 2 3 4 5 6 7 .. 12 13 .. 24 25 .. 48 ...
1414  // threads_per_group: 1 1 1 2 2 2 4 .. 4 8 .. 8 16 .. 16 ...
1415  int threads_per_group = CU1DBLOCK;
1416  while (threads_per_group * 3 / 2 >= group_size) {
1417  threads_per_group >>= 1;
1418  }
1419  if (group_size == 1) {
1420  threads_per_group = 1;
1421  }
1422  dim3 dimBlock(threads_per_group, CU1DBLOCK / threads_per_group);
1423  dim3 dimGrid(NumRows());
1424  cuda_group_max(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
1425  src.Stride(), group_size);
1426  CU_SAFE_CALL(cudaGetLastError());
1427  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1428  } else
1429 #endif
1430  {
1431  Mat().GroupMax(src.Mat());
1432  }
1433 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void GroupMaxDeriv ( const CuMatrixBase< Real > &  input,
const CuMatrixBase< Real > &  output 
)

Calculate derivatives for the GroupMax function above, where "input" is the input to the GroupMax function above (i.e.

the "src" variable), and "output" is the result of the computation (i.e. the "this" of that function call), and *this must have the same dimension as "input". Each element of *this will be set to 1 if the corresponding input equals the output of the group, and 0 otherwise. The equals the function derivative where it is defined (it's not defined where multiple inputs in the group are equal to the output).

Definition at line 834 of file cu-matrix.cc.

References CU2DBLOCK, CuMatrixBase< Real >::Data(), data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by MaxoutComponent::Backprop(), kaldi::TestCuMatrixGroupMaxDeriv(), and kaldi::UnitTestCuMatrixGroupMaxDeriv().

835  {
836  KALDI_ASSERT(src2.NumCols() > 0);
837  int group_size = this->NumCols() / src2.NumCols();
838  KALDI_ASSERT(this->NumCols() == src2.NumCols() * group_size);
839 #if HAVE_CUDA == 1
840  if (CuDevice::Instantiate().Enabled()) {
841  Timer tim;
842  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
843  dim3 dimGrid(n_blocks(NumCols(), CU2DBLOCK),
844  n_blocks(NumRows(), CU2DBLOCK));
845  cuda_calc_group_max_deriv(dimGrid, dimBlock, this->data_, src1.Data(),
846  src2.Data(), Dim(), src1.Stride(), src2.Stride(),
847  group_size);
848  CU_SAFE_CALL(cudaGetLastError());
849 
850  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
851  } else
852 #endif
853  {
854  Mat().GroupMaxDeriv(src1.Mat(), src2.Mat());
855  }
856 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void GroupPnorm ( const CuMatrixBase< Real > &  src,
Real  pow 
)

Apply the function y(i) = (sum_{j = i*G}^{(i+1)*G-1} x_j ^ (power)) ^ (1 / p) where G = x.NumCols() / y.NumCols() must be an integer.

[note: y corresponds to *this and x to src, so src.NumCols() / this->NumCols() must be an integer.

Definition at line 1361 of file cu-matrix.cc.

References CU1DBLOCK, CU2DBLOCK, data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by PnormComponent::Propagate(), kaldi::TestCuMatrixDiffGroupPnorm(), kaldi::TestCuMatrixGroupPnorm(), and kaldi::UnitTestCuMatrixGroupPnorm().

1361  {
1362  int group_size = src.NumCols() / this->NumCols();
1363  KALDI_ASSERT(src.NumCols() == this->NumCols() * group_size &&
1364  this->NumRows() == src.NumRows());
1365 #if HAVE_CUDA == 1
1366  if (CuDevice::Instantiate().Enabled()) {
1367  Timer tim;
1368  if (power == Real(0) || power == Real(1) || power == Real(2)
1369  || power == std::numeric_limits<Real>::infinity()) {
1370  // One thread block per row.
1371  // Use 2D block for small group size to simplify the calculation
1372  // Each group is reduced by threads_per_group threads.
1373  // threads_per_group should be a power of 2 for fast tree reduction.
1374  int threads_per_group = CU1DBLOCK;
1375  while (threads_per_group * 3 / 2 >= group_size) {
1376  threads_per_group >>= 1;
1377  }
1378  if (group_size == 1) {
1379  threads_per_group = 1;
1380  }
1381  dim3 dimBlock(threads_per_group, CU1DBLOCK / threads_per_group);
1382  dim3 dimGrid(NumRows());
1383  cuda_group_spec_pnorm(dimGrid, dimBlock, this->data_, src.data_,
1384  this->Dim(), src.Stride(), group_size, power);
1385  } else {
1386  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
1387  dim3 dimGrid(n_blocks(NumCols(), CU2DBLOCK),
1388  n_blocks(NumRows(), CU2DBLOCK));
1389  cuda_group_pnorm(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
1390  src.Stride(), group_size, power);
1391  }
1392  CU_SAFE_CALL(cudaGetLastError());
1393  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1394  } else
1395 #endif
1396  {
1397  Mat().GroupPnorm(src.Mat(), power);
1398  }
1399 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void Heaviside ( const CuMatrixBase< Real > &  src)

Set each element to the Heaviside function of the corresponding element of "src", which we define as the function (x > 0 ? 1.0 : 0.0) [note: in general, there are different ways to deal with the situation when x==0.

]

Definition at line 2215 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), kaldi::SameDim(), and CuMatrixBase< Real >::Stride().

Referenced by RectifiedLinearComponent::Backprop(), CuRand< Real >::BinarizeProbs(), Dropout::PropagateFnc(), RectifiedLinearComponent::StoreStats(), and kaldi::UnitTestCuMatrixHeaviside().

2215  {
2216  KALDI_ASSERT(SameDim(*this, src));
2217 #if HAVE_CUDA == 1
2218  if (CuDevice::Instantiate().Enabled()) {
2219  Timer tim;
2220  dim3 dimGrid, dimBlock;
2221  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
2222  &dimGrid, &dimBlock);
2223  cuda_heaviside(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
2224  src.Stride());
2225  CU_SAFE_CALL(cudaGetLastError());
2226 
2227  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2228  } else
2229  #endif
2230  {
2231  Mat().Heaviside(src.Mat());
2232  }
2233 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
bool SameDim(const MatrixBase< Real > &M, const MatrixBase< Real > &N)
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void InvertElements ( )

invert the matrix by elements.

Definition at line 891 of file cu-matrix.cc.

References data_, and Timer::Elapsed().

Referenced by kaldi::TestCuMatrixCompObjfAndDeriv(), NnetEnsembleTrainer::TrainOneMinibatch(), kaldi::UnitTestCuMatrixInvertElements(), and kaldi::UnitTestCuMatrixObjfDeriv().

891  {
892 #if HAVE_CUDA == 1
893  if (CuDevice::Instantiate().Enabled()) {
894  Timer tim;
895 
896  dim3 dimGrid, dimBlock;
897  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
898  &dimGrid, &dimBlock);
899 
900  cuda_invert_elements(dimGrid, dimBlock, data_, Dim());
901  CU_SAFE_CALL(cudaGetLastError());
902 
903  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
904  } else
905 #endif
906  {
907  Mat().InvertElements();
908  }
909 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
bool IsUnit ( Real  tol = 0.001) const

Definition at line 593 of file cu-matrix.cc.

References KALDI_ASSERT, kaldi::kTrans, and kaldi::TraceMatMat().

Referenced by OnlineNaturalGradient::InitOrthonormalSpecial(), OnlinePreconditioner::InitOrthonormalSpecial(), kaldi::UnitTestCuMatrixSymInvertPosDef(), and kaldi::UnitTestCuSpMatrixInvert().

593  {
594  // want to return:
595  //FrobeniusNorm(*this - I) <= tol * NumRows(), i.e.:
596  //sqrt (trace((*this - I)(*this-I)) <= tol * NumRows()
597  // trace((*this - I)(*this - I)) <= tol * NumRows()
598  // trace(*this * *this) + trace(I) - 2 * trace(*this) <= tol * NumRows()
599  // trace(*this * *this) + dim - 2*this.Trace() <= tol * NumRows()
600  KALDI_ASSERT(this->NumRows() == this->NumCols());
601  return (TraceMatMat(*this, *this, kTrans) + this->NumRows() - 2.0 * this->Trace() <=
602  tol * this->NumRows());
603 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
friend Real TraceMatMat(const CuMatrixBase< Real > &A, const CuMatrixBase< Real > &B, MatrixTransposeType trans)
Definition: cu-matrix.cc:1917
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
Real Trace(bool check_square=true) const
Return the trace. If check_square = true, will crash if matrix is not square.
Definition: cu-matrix.cc:2684
KALDI_DISALLOW_COPY_AND_ASSIGN ( CuMatrixBase< Real >  )
private
void Lookup ( const std::vector< Int32Pair > &  indexes,
Real *  output 
) const

Definition at line 2952 of file cu-matrix.cc.

References rnnlm::i, and KALDI_ASSERT.

Referenced by NnetDiscriminativeUpdater::LatticeComputations(), DiscriminativeComputation::LookupNnetOutput(), kaldi::TestCuMatrixLookup(), and kaldi::UnitTestCuMatrixLookup().

2953  {
2954  // Checks the dimension.
2955  MatrixIndexT num_rows = this->num_rows_, num_cols = this->num_cols_;
2956  for (int32 i = 0; i < indices.size(); ++i) {
2957  KALDI_ASSERT(indices[i].first < num_rows && indices[i].first >= 0 &&
2958  indices[i].second < num_cols && indices[i].second >= 0);
2959  }
2960  if (indices.size() == 0) return;
2961  KALDI_ASSERT(output != NULL);
2962 
2963 #if HAVE_CUDA == 1
2964  if (CuDevice::Instantiate().Enabled()) {
2965  CuArray<Int32Pair> cuda_indices(indices);
2966  Lookup(cuda_indices, output);
2967  } else
2968 #endif
2969  {
2970  for (int32 i = 0; i < indices.size(); i++) {
2971  output[i] = (*this)(indices[i].first, indices[i].second);
2972  }
2973  }
2974 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
int32 MatrixIndexT
Definition: matrix-common.h:96
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void Lookup(const std::vector< Int32Pair > &indexes, Real *output) const
Definition: cu-matrix.cc:2952
void Lookup ( const CuArray< Int32Pair > &  indexes,
Real *  output 
) const

Definition at line 2977 of file cu-matrix.cc.

References CuArray< T >::CopyToHost(), CU1DBLOCK, CuArray< T >::Data(), data_, CuArray< T >::Dim(), Timer::Elapsed(), Int32Pair::first, rnnlm::i, KALDI_ASSERT, and Int32Pair::second.

2978  {
2979  int32 num_elements = indices.Dim();
2980  if (num_elements == 0) return;
2981  KALDI_ASSERT(output != NULL);
2982 
2983 #if HAVE_CUDA == 1
2984  if (CuDevice::Instantiate().Enabled()) {
2985  CuArray<Real> cuda_output(num_elements);
2986  Timer tim;
2987  dim3 dimBlock(CU1DBLOCK, 1);
2988  dim3 dimGrid(n_blocks(num_elements, CU1DBLOCK), 1);
2989 
2990  cuda_matrix_lookup(dimGrid, dimBlock, this->data_, this->Dim(),
2991  indices.Data(), num_elements, cuda_output.Data());
2992  CU_SAFE_CALL(cudaGetLastError());
2993 
2994  cuda_output.CopyToHost(output);
2995  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2996  } else
2997 #endif
2998  {
2999  MatrixIndexT num_rows = this->num_rows_, num_cols = this->num_cols_;
3000  const Int32Pair *index = indices.Data();
3001  for (int32 i = 0; i < num_elements; i++) {
3002  KALDI_ASSERT(index[i].first < num_rows && index[i].first >= 0 &&
3003  index[i].second < num_cols && index[i].second >= 0);
3004  output[i] = (*this)(index[i].first, index[i].second);
3005  }
3006  }
3007 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
int32 MatrixIndexT
Definition: matrix-common.h:96
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
int32_cuda second
Definition: cu-matrixdim.h:86
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
int32_cuda first
Definition: cu-matrixdim.h:85
const MatrixBase<Real>& Mat ( ) const
inline

Definition at line 612 of file cu-matrix.h.

Referenced by CuMatrixBase< Real >::AddCols(), CuVectorBase< Real >::AddDiagMat2(), CuVectorBase< Real >::AddDiagMatMat(), CuMatrixBase< Real >::AddDiagVecMat(), CuMatrixBase< Real >::AddMat(), CuSpMatrix< Real >::AddMat2(), CuMatrixBase< Real >::AddMatBlocks(), CuMatrixBase< Real >::AddMatDiagVec(), CuMatrixBase< Real >::AddMatMat(), CuMatrixBase< Real >::AddMatMatElements(), CuVectorBase< Real >::AddMatVec(), CuMatrixBase< Real >::AddRows(), GeneralMatrix::AddToMat(), CuMatrixBase< Real >::ApplyLogSoftMaxPerRow(), CuMatrixBase< Real >::ApplySoftMaxPerRow(), kaldi::cu::BackpropLstmNonlinearity(), kaldi::cu::ComputeLstmNonlinearity(), kaldi::cu::Copy(), CuVectorBase< Real >::CopyColFromMat(), CuMatrixBase< Real >::CopyCols(), CuTpMatrix< Real >::CopyFromMat(), CuSpMatrix< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyRows(), CuVectorBase< Real >::CopyRowsFromMat(), VectorBase< Real >::CopyRowsFromMat(), CuSparseMatrix< Real >::CopyToMat(), GeneralMatrix::CopyToMat(), CuMatrixBase< Real >::DiffGroupPnorm(), CuMatrixBase< Real >::DiffParametricRelu(), CuMatrixBase< Real >::DiffSigmoid(), CuMatrixBase< Real >::DiffTanh(), CuMatrixBase< Real >::DivElements(), CuMatrixBase< Real >::GroupMax(), CuMatrixBase< Real >::GroupMaxDeriv(), CuMatrixBase< Real >::GroupPnorm(), CuMatrixBase< Real >::Heaviside(), CuMatrixBase< Real >::Max(), CuMatrixBase< Real >::MulElements(), CuMatrixBase< Real >::MulRowsGroupMat(), CuMatrixBase< Real >::ParametricRelu(), CuRand< Real >::RandGaussian(), kaldi::cu::Randomize(), CuRand< Real >::RandUniform(), kaldi::cu::RegularizeL1(), CuMatrixBase< Real >::SetMatMatDivMat(), CuMatrixBase< Real >::Sigmoid(), CuMatrixBase< Real >::SoftHinge(), kaldi::cu::Splice(), CuMatrixBase< Real >::SymAddMat2(), CuMatrixBase< Real >::Tanh(), kaldi::TraceMatMat(), and kaldi::TraceMatSmat().

612  {
613  return *(reinterpret_cast<const MatrixBase<Real>* >(this));
614  }
MatrixBase<Real>& Mat ( )
inline

Definition at line 615 of file cu-matrix.h.

615  {
616  return *(reinterpret_cast<MatrixBase<Real>* >(this));
617  }
void Max ( const CuMatrixBase< Real > &  A)

Do, elementwise, *this = max(*this, A).

Definition at line 700 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by main(), SpliceMaxComponent::Propagate(), MaxpoolingComponent::Propagate(), kaldi::TestCuMatrixMax(), kaldi::UnitTestCuMatrixMax(), and kaldi::UnitTestCuMatrixReduceMax().

700  {
701  #if HAVE_CUDA == 1
702  if (CuDevice::Instantiate().Enabled()) {
703  Timer tim;
704 
705  KALDI_ASSERT(num_cols_ == A.NumCols());
706  KALDI_ASSERT(num_rows_ == A.NumRows());
707 
708  dim3 dimGrid, dimBlock;
709  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
710  &dimGrid, &dimBlock);
711 
712  cuda_max(dimGrid, dimBlock, data_, A.data_, Dim(), A.Stride());
713  CU_SAFE_CALL(cudaGetLastError());
714 
715  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
716  } else
717  #endif
718  {
719  Mat().Max(A.Mat());
720  }
721 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
Real Max ( ) const

Definition at line 2642 of file cu-matrix.cc.

References CU1DBLOCK, CuVectorBase< Real >::Data(), data_, Timer::Elapsed(), KALDI_ASSERT, kaldi::kUndefined, and CuVectorBase< Real >::Max().

2642  {
2643 #if HAVE_CUDA == 1
2644  if (CuDevice::Instantiate().Enabled()) {
2645  KALDI_ASSERT(num_rows_ > 0 && num_cols_ > 0);
2646  Timer tim;
2647 
2648  CuVector<Real> col_max(num_rows_, kUndefined);
2649  cuda_max_mat_cols(num_rows_, CU1DBLOCK, col_max.Data(), data_, Dim());
2650  Real ans = col_max.Max();
2651 
2652  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2653  return ans;
2654  } else
2655 #endif
2656  {
2657  return Mat().Max();
2658  }
2659 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
Real Min ( ) const

Definition at line 2663 of file cu-matrix.cc.

References CU1DBLOCK, CuVectorBase< Real >::Data(), data_, Timer::Elapsed(), KALDI_ASSERT, kaldi::kUndefined, and CuVectorBase< Real >::Min().

Referenced by main(), kaldi::TestCuMatrixMin(), and kaldi::UnitTestCuMatrixReduceMin().

2663  {
2664 #if HAVE_CUDA == 1
2665  if (CuDevice::Instantiate().Enabled()) {
2666  KALDI_ASSERT(num_rows_ > 0 && num_cols_ > 0);
2667  Timer tim;
2668 
2669  CuVector<Real> col_min(num_rows_, kUndefined);
2670  cuda_min_mat_cols(num_rows_, CU1DBLOCK, col_min.Data(), data_, Dim());
2671  Real ans = col_min.Min();
2672 
2673  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
2674  return ans;
2675  } else
2676 #endif
2677  {
2678  return Mat().Min();
2679  }
2680 }
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU1DBLOCK
Definition: cu-matrixdim.h:63
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void MulColsVec ( const CuVectorBase< Real > &  scale)

scale i'th column by scale[i]

Definition at line 725 of file cu-matrix.cc.

References data_, CuVectorBase< Real >::data_, CuVectorBase< Real >::Dim(), Timer::Elapsed(), KALDI_ASSERT, and CuVectorBase< Real >::Vec().

Referenced by PerElementScaleComponent::Backprop(), FixedScaleComponent::Backprop(), Convolutional2DComponent::BackpropagateFnc(), Rescale::BackpropagateFnc(), PerElementScaleComponent::Propagate(), FixedScaleComponent::Propagate(), Rescale::PropagateFnc(), kaldi::UnitTestCuMatrixAddMatDiagVec(), kaldi::UnitTestCuMatrixMulColsVec(), and NnetLogprobTask::~NnetLogprobTask().

725  {
726 #if HAVE_CUDA == 1
727  if (CuDevice::Instantiate().Enabled()) {
728  Timer tim;
729 
730  KALDI_ASSERT(scale.Dim() == NumCols());
731 
732 
733  dim3 dimGrid, dimBlock;
734  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
735  &dimGrid, &dimBlock);
736 
737  cuda_mul_cols_vec(dimGrid, dimBlock, data_, scale.data_, Dim());
738  CU_SAFE_CALL(cudaGetLastError());
739 
740 
741  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
742  } else
743 #endif
744  {
745  Mat().MulColsVec(scale.Vec());
746  }
747 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void MulElements ( const CuMatrixBase< Real > &  A)

Multiply two matrices elementwise: C = C .* A.

Definition at line 652 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Stride().

Referenced by ElementwiseProductComponent::Backprop(), BackpropTruncationComponent::Backprop(), SigmoidComponent::Backprop(), TanhComponent::Backprop(), PowerComponent::Backprop(), RectifiedLinearComponent::Backprop(), SoftHingeComponent::Backprop(), MaxpoolingComponent::Backprop(), HiddenSoftmax::BackpropagateFnc(), Dropout::BackpropagateFnc(), kaldi::nnet1::ComputeStdDev(), CuMatrixBase< Real >::DiffSoftmaxPerRow(), Xent::Eval(), Mse::Eval(), ElementwiseProductComponent::Propagate(), DropoutComponent::Propagate(), KlHmm::PropagateFnc(), LengthNormComponent::PropagateFnc(), Dropout::PropagateFnc(), ClipGradientComponent::RepairGradients(), NnetEnsembleTrainer::TrainOneMinibatch(), kaldi::UnitTestCuMatrixAddMatMatElements(), kaldi::UnitTestCuMatrixMulElements(), kaldi::nnet1::UnitTestLengthNorm(), ParametricRelu::Update(), AffineTransform::Update(), FramePoolingComponent::Update(), ConvolutionalComponent::Update(), Rescale::Update(), and NaturalGradientPerElementScaleComponent::Update().

652  {
653  #if HAVE_CUDA == 1
654  if (CuDevice::Instantiate().Enabled()) {
655  Timer tim;
656 
657  KALDI_ASSERT(num_cols_ == A.NumCols());
658  KALDI_ASSERT(num_rows_ == A.NumRows());
659 
660  dim3 dimGrid, dimBlock;
661  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
662  &dimGrid, &dimBlock);
663 
664  cuda_mul_elements(dimGrid, dimBlock, data_, A.data_, Dim(), A.Stride());
665  CU_SAFE_CALL(cudaGetLastError());
666 
667  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
668  } else
669  #endif
670  {
671  Mat().MulElements(A.Mat());
672  }
673 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void MulRowsGroupMat ( const CuMatrixBase< Real > &  src)

divide each row into src.NumCols() groups, and then scale i'th row's jth group of elements by src[i, j].

Definition at line 776 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::Stride().

Referenced by MaxoutComponent::Backprop(), and kaldi::UnitTestCuMatrixMulRowsGroupMat().

776  {
777  KALDI_ASSERT(src.NumCols() > 0);
778 #if HAVE_CUDA == 1
779  if (CuDevice::Instantiate().Enabled()) {
780  Timer tim;
781  int group_size = this->NumCols() / src.NumCols();
782 
783  dim3 dimGrid, dimBlock;
784  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
785  &dimGrid, &dimBlock);
786 
787  cuda_mul_rows_group_mat(dimGrid, dimBlock, this->data_, src.data_,
788  this->Dim(), src.Stride(), group_size);
789  CU_SAFE_CALL(cudaGetLastError());
790 
791  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
792  } else
793 #endif
794  {
795  Mat().MulRowsGroupMat(src.Mat());
796  }
797 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void MulRowsVec ( const CuVectorBase< Real > &  scale)

scale i'th row by scale[i]

Definition at line 752 of file cu-matrix.cc.

References data_, CuVectorBase< Real >::data_, CuVectorBase< Real >::Dim(), Timer::Elapsed(), KALDI_ASSERT, and CuVectorBase< Real >::Vec().

Referenced by NormalizeComponent::Backprop(), BackpropTruncationComponent::Backprop(), ClipGradientComponent::Backprop(), BlockSoftmax::BackpropagateFnc(), LengthNormComponent::BackpropagateFnc(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), Xent::Eval(), Mse::Eval(), kaldi::nnet2::PreconditionDirections(), NnetChainTrainer::ProcessOutputs(), NnetDiscriminativeTrainer::ProcessOutputs(), LengthNormComponent::PropagateFnc(), OnlineNaturalGradient::ReorthogonalizeXt1(), OnlinePreconditioner::ReorthogonalizeXt1(), kaldi::UnitTestCuMatrixMulRowsVec(), AffineTransform::Update(), and ConvolutionalComponent::Update().

752  {
753  #if HAVE_CUDA == 1
754  if (CuDevice::Instantiate().Enabled()) {
755  Timer tim;
756 
757  KALDI_ASSERT(scale.Dim() == NumRows());
758 
759  dim3 dimGrid, dimBlock;
760  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
761  &dimGrid, &dimBlock);
762 
763  cuda_mul_rows_vec(dimGrid, dimBlock, data_, scale.data_, Dim());
764  CU_SAFE_CALL(cudaGetLastError());
765 
766 
767  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
768  } else
769  #endif
770  {
771  Mat().MulRowsVec(scale.Vec());
772  }
773 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT NumCols ( ) const
inline

Definition at line 201 of file cu-matrix.h.

Referenced by NnetComputer::AcceptInput(), NnetComputer::AcceptOutputDeriv(), NnetLdaStatsAccumulator::AccStatsFromOutput(), CuVectorBase< Real >::AddColSumMat(), MatrixRandomizer::AddData(), CuVectorBase< Real >::AddDiagMatMat(), CuMatrixBase< Real >::AddDiagVecMat(), CuRand< Real >::AddGaussNoise(), CuMatrixBase< Real >::AddMat(), CuSpMatrix< Real >::AddMat2(), CuMatrixBase< Real >::AddMatBlock(), CuMatrixBase< Real >::AddMatBlocks(), CuMatrixBase< Real >::AddMatDiagVec(), CuBlockMatrix< Real >::AddMatMat(), CuMatrixBase< Real >::AddMatMat(), CuVectorBase< Real >::AddMatVec(), CuMatrixBase< Real >::AddRowRanges(), CuMatrixBase< Real >::AddRows(), CuVectorBase< Real >::AddRowSumMat(), NnetUpdater::Backprop(), StatisticsExtractionComponent::Backprop(), SumReduceComponent::Backprop(), MaxoutComponent::Backprop(), MaxpoolingComponent::Backprop(), PnormComponent::Backprop(), RepeatedAffineComponent::Backprop(), NormalizeComponent::Backprop(), SigmoidComponent::Backprop(), TanhComponent::Backprop(), PowerComponent::Backprop(), RectifiedLinearComponent::Backprop(), SoftHingeComponent::Backprop(), ScaleComponent::Backprop(), SoftmaxComponent::Backprop(), LogSoftmaxComponent::Backprop(), SpliceComponent::Backprop(), SpliceMaxComponent::Backprop(), BlockAffineComponent::Backprop(), PermuteComponent::Backprop(), DctComponent::Backprop(), FixedLinearComponent::Backprop(), FixedAffineComponent::Backprop(), ConvolutionComponent::Backprop(), DropoutComponent::Backprop(), LstmNonlinearityComponent::Backprop(), Convolutional1dComponent::Backprop(), CompositeComponent::Backprop(), Component::Backpropagate(), Splice::BackpropagateFnc(), SentenceAveragingComponent::BackpropagateFnc(), Convolutional2DComponent::BackpropagateFnc(), ConvolutionalComponent::BackpropagateFnc(), kaldi::cu::BackpropLstmNonlinearity(), CuRand< Real >::BinarizeProbs(), ChunkInfo::CheckSize(), NnetComputerFromEg::Compute(), NnetOnlineComputer::Compute(), DiscriminativeComputation::Compute(), kaldi::nnet3::ComputeAccuracy(), NnetComputer::ComputeLastLayerDeriv(), kaldi::cu::ComputeLstmNonlinearity(), kaldi::nnet3::ComputeObjectiveFunction(), kaldi::nnet1::ComputeStdDev(), OnlineNaturalGradient::ComputeWt1(), OnlinePreconditioner::ComputeWt1(), ConvolutionComponent::ConvolutionComponent(), kaldi::cu::Copy(), CuVectorBase< Real >::CopyColFromMat(), CuVectorBase< Real >::CopyDiagFromMat(), CuMatrixBase< Real >::CopyFromBlock(), CuTpMatrix< Real >::CopyFromMat(), CuSpMatrix< Real >::CopyFromMat(), CuBlockMatrix< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyRows(), CuVectorBase< Real >::CopyRowsFromMat(), VectorBase< Real >::CopyRowsFromMat(), CuSparseMatrix< Real >::CopyToMat(), CuMatrix< Real >::CuMatrix(), CuSubVector< Real >::CuSubVector(), CuTpMatrix< Real >::CuTpMatrix(), CuMatrixBase< Real >::DiffGroupPnorm(), CuMatrixBase< Real >::DivElements(), CuMatrixBase< Real >::EqualElementMask(), Xent::Eval(), Mse::Eval(), MultiTaskLoss::Eval(), NnetComputer::ExecuteCommand(), AffineTransform::GetGradient(), RecurrentComponent::GetGradient(), LstmProjected::GetGradient(), ConvolutionalComponent::GetGradient(), Convolutional2DComponent::GetGradient(), BlstmProjected::GetGradient(), Convolutional1dComponent::GetParameterDim(), AffineTransform::GetParams(), RecurrentComponent::GetParams(), LstmProjected::GetParams(), ConvolutionalComponent::GetParams(), Convolutional2DComponent::GetParams(), BlstmProjected::GetParams(), NnetComputer::GetPointers(), CuMatrixBase< Real >::GroupMax(), CuMatrixBase< Real >::GroupMaxDeriv(), CuMatrixBase< Real >::GroupPnorm(), ConvolutionComponent::InderivPatchesToInderiv(), MaxpoolingComponent::InderivPatchesToInderiv(), AffineComponent::Info(), AffineComponentPreconditioned::Info(), AffineComponentPreconditionedOnline::Info(), DctComponent::Info(), FixedLinearComponent::Info(), FixedAffineComponent::Info(), LstmNonlinearityComponent::Info(), Convolutional1dComponent::Info(), OnlineNaturalGradient::Init(), OnlinePreconditioner::Init(), NaturalGradientAffineComponent::Init(), AffineComponent::Init(), AffineComponentPreconditioned::Init(), AffineComponentPreconditionedOnline::Init(), FixedAffineComponent::Init(), ConvolutionComponent::Init(), Convolutional1dComponent::Init(), LinearTransform::InitData(), OnlineNaturalGradient::InitOrthonormalSpecial(), OnlinePreconditioner::InitOrthonormalSpecial(), RepeatedAffineComponent::InputDim(), AffineComponent::InputDim(), BlockAffineComponent::InputDim(), FixedLinearComponent::InputDim(), FixedAffineComponent::InputDim(), LstmNonlinearityComponent::InputDim(), Convolutional1dComponent::InputDim(), ConvolutionComponent::InputToInputPatches(), MaxpoolingComponent::InputToInputPatches(), NnetDiscriminativeUpdater::LatticeComputations(), main(), NnetComputer::MatrixStddev(), CuMatrixBase< Real >::Max(), kaldi::MeanVariance(), kaldi::nnet1::MomentStatistics(), CuMatrixBase< Real >::MulElements(), CuMatrixBase< Real >::MulRowsGroupMat(), NnetComputer::NnetComputer(), kaldi::cu::NormalizePerRow(), BlockAffineComponent::NumParameters(), RepeatedAffineComponent::NumParameters(), ConvolutionComponent::NumParameters(), LstmNonlinearityComponent::NumParameters(), LinearTransform::NumParams(), AffineTransform::NumParams(), RecurrentComponent::NumParams(), LstmProjected::NumParams(), ConvolutionalComponent::NumParams(), Convolutional2DComponent::NumParams(), BlstmProjected::NumParams(), CuMatrix< BaseFloat >::operator=(), DctComponent::OutputDim(), LstmNonlinearityComponent::OutputDim(), CuMatrixBase< Real >::ParametricRelu(), LstmNonlinearityComponent::PerturbParams(), kaldi::nnet2::PreconditionDirections(), OnlineNaturalGradient::PreconditionDirections(), OnlinePreconditioner::PreconditionDirections(), kaldi::nnet2::PreconditionDirectionsAlpha(), kaldi::nnet2::PreconditionDirectionsAlphaRescaled(), OnlineNaturalGradient::PreconditionDirectionsInternal(), OnlinePreconditioner::PreconditionDirectionsInternal(), kaldi::nnet3::PrintParameterStats(), NnetComputeProb::ProcessOutputs(), DistributeComponent::Propagate(), Component::Propagate(), ElementwiseProductComponent::Propagate(), NormalizeComponent::Propagate(), StatisticsExtractionComponent::Propagate(), SumReduceComponent::Propagate(), StatisticsPoolingComponent::Propagate(), RepeatedAffineComponent::Propagate(), SpliceComponent::Propagate(), BlockAffineComponent::Propagate(), DctComponent::Propagate(), ConvolutionComponent::Propagate(), DropoutComponent::Propagate(), AdditiveNoiseComponent::Propagate(), Convolutional1dComponent::Propagate(), CompositeComponent::Propagate(), KlHmm::PropagateFnc(), FramePoolingComponent::PropagateFnc(), SentenceAveragingComponent::PropagateFnc(), ConvolutionalComponent::PropagateFnc(), Convolutional2DComponent::PropagateFnc(), Dropout::PropagateFnc(), kaldi::nnet1::RandGauss(), CuRand< Real >::RandGaussian(), kaldi::cu::Randomize(), CuRand< Real >::RandUniform(), kaldi::nnet1::RandUniform(), Rbm::RbmUpdate(), LstmNonlinearityComponent::Read(), AffineTransform::ReadData(), LinearTransform::ReadData(), ConvolutionalComponent::ReadData(), Rbm::ReadData(), Rbm::Reconstruct(), kaldi::cu::RegularizeL1(), DctComponent::Reorder(), OnlineNaturalGradient::ReorthogonalizeXt1(), OnlinePreconditioner::ReorthogonalizeXt1(), Mse::Report(), NnetRescaler::RescaleComponent(), CuMatrixBase< BaseFloat >::Row(), OnlineNaturalGradient::SelfTest(), OnlinePreconditioner::SelfTest(), CuBlockMatrix< Real >::SetCudaData(), LinearTransform::SetLinearity(), AffineTransform::SetLinearity(), NaturalGradientRepeatedAffineComponent::SetNaturalGradientConfigs(), AffineTransform::SetParams(), RecurrentComponent::SetParams(), ConvolutionalComponent::SetParams(), Convolutional2DComponent::SetParams(), LstmProjected::SetParams(), BlstmProjected::SetParams(), kaldi::cu::Splice(), SigmoidComponent::StoreStats(), RectifiedLinearComponent::StoreStats(), NonlinearComponent::StoreStatsInternal(), PdfPrior::SubtractOnLogpost(), kaldi::TraceMatMat(), kaldi::TraceMatSmat(), kaldi::UnitTestCuMathCopy(), kaldi::UnitTestCuMathRandomize(), kaldi::UnitTestCuMathSplice(), kaldi::UnitTestCuTanh(), UnitTestMatrixRandomizer(), kaldi::nnet2::UnitTestNnetCompute(), kaldi::UnitTestSwapCu2Cu(), kaldi::UnitTestSwapCu2M(), RepeatedAffineComponent::UnVectorize(), BlockAffineComponent::UnVectorize(), ConvolutionComponent::UnVectorize(), RecurrentComponent::Update(), ConvolutionalComponent::Update(), Convolutional2DComponent::Update(), RepeatedAffineComponent::Update(), NaturalGradientRepeatedAffineComponent::Update(), NaturalGradientAffineComponent::Update(), AffineComponentPreconditioned::Update(), AffineComponentPreconditionedOnline::Update(), BlockAffineComponentPreconditioned::Update(), ConvolutionComponent::Update(), Convolutional1dComponent::Update(), BlockAffineComponent::UpdateSimple(), NonlinearComponent::UpdateStats(), MatrixRandomizer::Value(), RepeatedAffineComponent::Vectorize(), BlockAffineComponent::Vectorize(), ConvolutionComponent::Vectorize(), DctComponent::Write(), and LstmNonlinearityComponent::Write().

201 { return num_cols_; }
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
MatrixIndexT NumRows ( ) const
inline

Dimensions.

Definition at line 200 of file cu-matrix.h.

Referenced by NnetComputer::AcceptInput(), NnetComputer::AcceptOutputDeriv(), NnetLdaStatsAccumulator::AccStatsFromOutput(), CuMatrixBase< Real >::AddCols(), CuVectorBase< Real >::AddColSumMat(), MatrixRandomizer::AddData(), CuVectorBase< Real >::AddDiagMatMat(), CuMatrixBase< Real >::AddDiagVecMat(), CuRand< Real >::AddGaussNoise(), CuMatrixBase< Real >::AddMat(), CuSpMatrix< Real >::AddMat2(), CuMatrixBase< Real >::AddMatBlock(), CuMatrixBase< Real >::AddMatBlocks(), CuMatrixBase< Real >::AddMatDiagVec(), CuBlockMatrix< Real >::AddMatMat(), CuMatrixBase< Real >::AddMatMat(), CuVectorBase< Real >::AddMatVec(), CuVectorBase< Real >::AddRowSumMat(), AffineComponent::AffineComponent(), DistributeComponent::Backprop(), NnetUpdater::Backprop(), ElementwiseProductComponent::Backprop(), StatisticsExtractionComponent::Backprop(), SumReduceComponent::Backprop(), StatisticsPoolingComponent::Backprop(), MaxoutComponent::Backprop(), BackpropTruncationComponent::Backprop(), MaxpoolingComponent::Backprop(), PnormComponent::Backprop(), RepeatedAffineComponent::Backprop(), NormalizeComponent::Backprop(), SigmoidComponent::Backprop(), TanhComponent::Backprop(), PowerComponent::Backprop(), RectifiedLinearComponent::Backprop(), SoftHingeComponent::Backprop(), ScaleComponent::Backprop(), SoftmaxComponent::Backprop(), LogSoftmaxComponent::Backprop(), AffineComponent::Backprop(), ClipGradientComponent::Backprop(), SpliceComponent::Backprop(), SpliceMaxComponent::Backprop(), BlockAffineComponent::Backprop(), SumGroupComponent::Backprop(), PermuteComponent::Backprop(), DctComponent::Backprop(), FixedLinearComponent::Backprop(), FixedAffineComponent::Backprop(), ConvolutionComponent::Backprop(), DropoutComponent::Backprop(), LstmNonlinearityComponent::Backprop(), Convolutional1dComponent::Backprop(), CompositeComponent::Backprop(), Component::Backpropagate(), HiddenSoftmax::BackpropagateFnc(), Splice::BackpropagateFnc(), SimpleSentenceAveragingComponent::BackpropagateFnc(), BlockSoftmax::BackpropagateFnc(), RecurrentComponent::BackpropagateFnc(), Convolutional2DComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), ConvolutionalComponent::BackpropagateFnc(), LstmProjected::BackpropagateFnc(), BlstmProjected::BackpropagateFnc(), kaldi::cu::BackpropLstmNonlinearity(), CuRand< Real >::BinarizeProbs(), BlockAffineComponent::BlockAffineComponent(), ChunkInfo::CheckSize(), NnetComputerFromEg::Compute(), NnetOnlineComputer::Compute(), DiscriminativeComputation::Compute(), DecodableAmNnetParallel::Compute(), kaldi::nnet3::ComputeAccuracy(), NnetComputer::ComputeLastLayerDeriv(), kaldi::cu::ComputeLstmNonlinearity(), kaldi::nnet3::ComputeObjectiveFunction(), kaldi::nnet1::ComputeStdDev(), NnetUpdater::ComputeTotAccuracy(), Convolutional1dComponent::Convolutional1dComponent(), ConvolutionComponent::ConvolutionComponent(), kaldi::cu::Copy(), CuVectorBase< Real >::CopyColFromMat(), CuMatrixBase< Real >::CopyCols(), CuVectorBase< Real >::CopyDiagFromMat(), CuMatrixBase< Real >::CopyFromBlock(), CuTpMatrix< Real >::CopyFromMat(), CuSpMatrix< Real >::CopyFromMat(), CuBlockMatrix< Real >::CopyFromMat(), CuMatrixBase< Real >::CopyFromMat(), CuVectorBase< Real >::CopyRowsFromMat(), VectorBase< Real >::CopyRowsFromMat(), CuSparseMatrix< Real >::CopyToMat(), CuMatrix< Real >::CuMatrix(), CuTpMatrix< Real >::CuTpMatrix(), DecodableAmNnet::DecodableAmNnet(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), CuMatrixBase< Real >::DiffSoftmaxPerRow(), CuMatrixBase< Real >::DivElements(), CuMatrixBase< Real >::EqualElementMask(), Xent::Eval(), Mse::Eval(), MultiTaskLoss::Eval(), AffineTransform::GetGradient(), RecurrentComponent::GetGradient(), LstmProjected::GetGradient(), ConvolutionalComponent::GetGradient(), Convolutional2DComponent::GetGradient(), BlstmProjected::GetGradient(), Convolutional1dComponent::GetParameterDim(), AffineTransform::GetParams(), RecurrentComponent::GetParams(), LstmProjected::GetParams(), ConvolutionalComponent::GetParams(), Convolutional2DComponent::GetParams(), BlstmProjected::GetParams(), NnetComputer::GetPointers(), AffineComponentPreconditioned::GetScalingFactor(), CuMatrixBase< Real >::GroupMax(), CuMatrixBase< Real >::GroupPnorm(), AffineComponent::Info(), AffineComponentPreconditioned::Info(), AffineComponentPreconditionedOnline::Info(), DctComponent::Info(), FixedLinearComponent::Info(), FixedAffineComponent::Info(), ConvolutionComponent::Info(), Convolutional1dComponent::Info(), OnlineNaturalGradient::Init(), OnlinePreconditioner::Init(), NaturalGradientAffineComponent::Init(), AffineComponent::Init(), AffineComponentPreconditioned::Init(), AffineComponentPreconditionedOnline::Init(), FixedAffineComponent::Init(), ConvolutionComponent::Init(), Convolutional1dComponent::Init(), LinearTransform::InitData(), FixedAffineComponent::InitFromConfig(), FixedLinearComponent::InitFromString(), FixedAffineComponent::InitFromString(), OnlineNaturalGradient::InitOrthonormalSpecial(), OnlinePreconditioner::InitOrthonormalSpecial(), NnetDiscriminativeUpdater::LatticeComputations(), main(), NnetComputer::MatrixStddev(), CuMatrixBase< Real >::Max(), kaldi::MeanVariance(), kaldi::nnet1::MomentStatistics(), CuMatrixBase< Real >::MulElements(), NnetComputer::NnetComputer(), kaldi::cu::NormalizePerRow(), DecodableAmNnetParallel::NumFramesReady(), BlockAffineComponent::NumParameters(), RepeatedAffineComponent::NumParameters(), ConvolutionComponent::NumParameters(), LstmNonlinearityComponent::NumParameters(), LinearTransform::NumParams(), AffineTransform::NumParams(), RecurrentComponent::NumParams(), LstmProjected::NumParams(), ConvolutionalComponent::NumParams(), Convolutional2DComponent::NumParams(), BlstmProjected::NumParams(), NnetLogprobTask::operator()(), CuMatrix< BaseFloat >::operator=(), RepeatedAffineComponent::OutputDim(), AffineComponent::OutputDim(), BlockAffineComponent::OutputDim(), DctComponent::OutputDim(), FixedLinearComponent::OutputDim(), FixedAffineComponent::OutputDim(), ConvolutionComponent::OutputDim(), Convolutional1dComponent::OutputDim(), CuMatrixBase< Real >::ParametricRelu(), LstmNonlinearityComponent::PerturbParams(), kaldi::nnet2::PreconditionDirections(), OnlineNaturalGradient::PreconditionDirections(), OnlinePreconditioner::PreconditionDirections(), kaldi::nnet2::PreconditionDirectionsAlpha(), kaldi::nnet2::PreconditionDirectionsAlphaRescaled(), OnlineNaturalGradient::PreconditionDirectionsInternal(), OnlinePreconditioner::PreconditionDirectionsInternal(), kaldi::nnet3::PrintParameterStats(), DistributeComponent::Propagate(), NnetOnlineComputer::Propagate(), DropoutComponent::Propagate(), Component::Propagate(), ElementwiseProductComponent::Propagate(), StatisticsExtractionComponent::Propagate(), SumReduceComponent::Propagate(), StatisticsPoolingComponent::Propagate(), RepeatedAffineComponent::Propagate(), SpliceComponent::Propagate(), BlockAffineComponent::Propagate(), DctComponent::Propagate(), ConvolutionComponent::Propagate(), AdditiveNoiseComponent::Propagate(), Convolutional1dComponent::Propagate(), MaxpoolingComponent::Propagate(), CompositeComponent::Propagate(), KlHmm::PropagateFnc(), SimpleSentenceAveragingComponent::PropagateFnc(), RecurrentComponent::PropagateFnc(), SentenceAveragingComponent::PropagateFnc(), LengthNormComponent::PropagateFnc(), ConvolutionalComponent::PropagateFnc(), Convolutional2DComponent::PropagateFnc(), Dropout::PropagateFnc(), LstmProjected::PropagateFnc(), BlstmProjected::PropagateFnc(), kaldi::nnet1::RandGauss(), CuRand< Real >::RandGaussian(), kaldi::cu::Randomize(), CuRand< Real >::RandUniform(), kaldi::nnet1::RandUniform(), Rbm::RbmUpdate(), AffineTransform::ReadData(), LinearTransform::ReadData(), ConvolutionalComponent::ReadData(), Rbm::ReadData(), Rbm::Reconstruct(), kaldi::cu::RegularizeL1(), DctComponent::Reorder(), OnlineNaturalGradient::ReorthogonalizeXt1(), OnlinePreconditioner::ReorthogonalizeXt1(), ClipGradientComponent::RepairGradients(), NnetRescaler::RescaleComponent(), LstmProjected::ResetStreams(), OnlineNaturalGradient::SelfTest(), OnlinePreconditioner::SelfTest(), CuBlockMatrix< Real >::SetCudaData(), LinearTransform::SetLinearity(), AffineTransform::SetLinearity(), AffineTransform::SetParams(), RecurrentComponent::SetParams(), ConvolutionalComponent::SetParams(), Convolutional2DComponent::SetParams(), LstmProjected::SetParams(), BlstmProjected::SetParams(), AffineComponent::SetParams(), ConvolutionComponent::SetParams(), Convolutional1dComponent::SetParams(), kaldi::cu::Splice(), SigmoidComponent::StoreStats(), RectifiedLinearComponent::StoreStats(), NonlinearComponent::StoreStatsInternal(), CuMatrixBase< Real >::SumColumnRanges(), kaldi::TraceMatMat(), kaldi::TraceMatSmat(), kaldi::UnitTestCheck(), kaldi::UnitTestCuMathCopy(), kaldi::UnitTestCuMathRandomize(), kaldi::UnitTestCuMathSplice(), kaldi::UnitTestCuTanh(), UnitTestMatrixRandomizer(), kaldi::nnet2::UnitTestNnetCompute(), kaldi::nnet2::UnitTestPreconditionDirectionsOnline(), kaldi::nnet3::UnitTestPreconditionDirectionsOnline(), kaldi::nnet1::UnitTestSimpleSentenceAveragingComponent(), kaldi::UnitTestSwapCu2Cu(), kaldi::UnitTestSwapCu2M(), RepeatedAffineComponent::UnVectorize(), BlockAffineComponent::UnVectorize(), ConvolutionComponent::UnVectorize(), LinearTransform::Update(), AffineTransform::Update(), SentenceAveragingComponent::Update(), RecurrentComponent::Update(), ConvolutionalComponent::Update(), Convolutional2DComponent::Update(), RepeatedAffineComponent::Update(), NaturalGradientRepeatedAffineComponent::Update(), NaturalGradientAffineComponent::Update(), AffineComponentPreconditioned::Update(), AffineComponentPreconditionedOnline::Update(), BlockAffineComponentPreconditioned::Update(), ConvolutionComponent::Update(), Convolutional1dComponent::Update(), BlockAffineComponent::UpdateSimple(), NonlinearComponent::UpdateStats(), RepeatedAffineComponent::Vectorize(), BlockAffineComponent::Vectorize(), ConvolutionComponent::Vectorize(), DctComponent::Write(), and NnetLogprobTask::~NnetLogprobTask().

200 { return num_rows_; }
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
CuValue<Real> operator() ( MatrixIndexT  r,
MatrixIndexT  c 
)
inline

Definition at line 539 of file cu-matrix.h.

539  {
540  KALDI_PARANOID_ASSERT(static_cast<UnsignedMatrixIndexT>(r) <
541  static_cast<UnsignedMatrixIndexT>(num_rows_) &&
542  static_cast<UnsignedMatrixIndexT>(c) <
543  static_cast<UnsignedMatrixIndexT>(num_cols_));
544  return CuValue<Real>(data_ + r * stride_ + c);
545  }
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
MatrixIndexT stride_
Definition: cu-matrix.h:644
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define KALDI_PARANOID_ASSERT(cond)
Definition: kaldi-error.h:182
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
Real operator() ( MatrixIndexT  r,
MatrixIndexT  c 
) const
inline

Definition at line 547 of file cu-matrix.h.

547  {
548  KALDI_PARANOID_ASSERT(static_cast<UnsignedMatrixIndexT>(r) <
549  static_cast<UnsignedMatrixIndexT>(num_rows_) &&
550  static_cast<UnsignedMatrixIndexT>(c) <
551  static_cast<UnsignedMatrixIndexT>(num_cols_));
552  return CuValue<Real>(data_ + r * stride_ + c); // will be casted to Real.
553  }
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
MatrixIndexT stride_
Definition: cu-matrix.h:644
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define KALDI_PARANOID_ASSERT(cond)
Definition: kaldi-error.h:182
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void ParametricRelu ( const CuMatrixBase< Real > &  src,
const CuVectorBase< Real > &  alpha,
const CuVectorBase< Real > &  beta 
)

Compute the parametric rectified linear unit function; element by element, *this = src * (src > 0 ? alpha : beta)

Definition at line 1252 of file cu-matrix.cc.

References CU2DBLOCK, data_, CuVectorBase< Real >::data_, CuMatrixBase< Real >::data_, CuVectorBase< Real >::Dim(), Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), CuMatrixBase< Real >::Stride(), and CuVectorBase< Real >::Vec().

Referenced by ParametricRelu::PropagateFnc().

1255  {
1256  KALDI_ASSERT(src.NumRows() == this->NumRows());
1257  KALDI_ASSERT(src.NumCols() == this->NumCols());
1258  KALDI_ASSERT(alpha.Dim() == this->NumCols());
1259  KALDI_ASSERT(beta.Dim() == this->NumCols());
1260 #if HAVE_CUDA == 1
1261  if (CuDevice::Instantiate().Enabled()) {
1262  Timer tim;
1263 
1264  dim3 dimBlock(CU2DBLOCK, CU2DBLOCK);
1265  dim3 dimGrid(n_blocks(src.NumCols(), CU2DBLOCK), n_blocks(src.NumRows(), CU2DBLOCK));
1266 
1267  cuda_parametric_relu(dimGrid, dimBlock, this->data_, src.data_, this->Dim(),
1268  src.Stride(), alpha.data_, beta.data_);
1269  CU_SAFE_CALL(cudaGetLastError());
1270 
1271  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1272  } else
1273 #endif
1274  {
1275  // Do it on CPU,
1276  for (MatrixIndexT r = 0; r < NumRows(); r++) {
1277  for (MatrixIndexT c = 0; c < NumCols(); c++) {
1278  Real src_elem = src.Mat()(r,c);
1279  this->Mat()(r,c) = src_elem * (src_elem >= 0.0 ? alpha.Vec()(c) : beta.Vec()(c));
1280  }
1281  }
1282  }
1283 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
int32 MatrixIndexT
Definition: matrix-common.h:96
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define CU2DBLOCK
Definition: cu-matrixdim.h:67
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
CuSubVector<Real> Row ( MatrixIndexT  i)
inline

Definition at line 533 of file cu-matrix.h.

533  {
534  KALDI_ASSERT(static_cast<UnsignedMatrixIndexT>(i) <
535  static_cast<UnsignedMatrixIndexT>(num_rows_));
536  return CuSubVector<Real>(data_ + (i * stride_), NumCols());
537  }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
MatrixIndexT stride_
Definition: cu-matrix.h:644
friend class CuSubVector< Real >
Definition: cu-matrix.h:90
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
const Real* RowData ( MatrixIndexT  r) const
inline

Get raw row pointer (const).

Warning: may return a pointer to GPU memory. Use at your own risk.

Definition at line 597 of file cu-matrix.h.

Referenced by CuVectorBase< Real >::CopyRowsFromMat(), VectorBase< Real >::CopyRowsFromMat(), CuSubVector< Real >::CuSubVector(), RectifiedLinearComponent::RepairGradients(), kaldi::TestCuMatrixAddRows2(), kaldi::TestCuMatrixAddToRows(), kaldi::TestCuMatrixCopyRows2(), kaldi::TestCuMatrixCopyToRows(), kaldi::UnitTestCuMatrixAddRows(), and kaldi::UnitTestCuMatrixCopyRows().

597 { return data_ + r * stride_; }
MatrixIndexT stride_
Definition: cu-matrix.h:644
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
Real* RowData ( MatrixIndexT  r)
inline

Get raw row pointer.

Warning: may return a pointer to GPU memory. Use at your own risk.

Definition at line 600 of file cu-matrix.h.

600 { return data_ + r * stride_; }
MatrixIndexT stride_
Definition: cu-matrix.h:644
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
void Scale ( Real  value)

Definition at line 608 of file cu-matrix.cc.

References data_, and Timer::Elapsed().

Referenced by TanhComponent::Backprop(), PowerComponent::Backprop(), ScaleComponent::Backprop(), AveragePoolingComponent::BackpropagateFnc(), AveragePooling2DComponent::BackpropagateFnc(), MultiBasisComponent::BackpropagateFnc(), Dropout::BackpropagateFnc(), DecodableAmNnetParallel::Compute(), DecodableNnet2Online::ComputeForFrame(), CuMatrixBase< Real >::DiffLogSoftmaxPerRow(), DecodableNnet3SimpleOnline::DoNnetComputation(), DecodableNnetSimple::DoNnetComputation(), MultiTaskLoss::Eval(), RepeatedAffineComponent::Init(), NaturalGradientAffineComponent::Init(), AffineComponent::Init(), AffineComponentPreconditioned::Init(), AffineComponentPreconditionedOnline::Init(), BlockAffineComponent::Init(), ConvolutionComponent::Init(), Convolutional1dComponent::Init(), LstmNonlinearityComponent::Init(), OnlineNaturalGradient::InitDefault(), OnlinePreconditioner::InitDefault(), main(), kaldi::nnet2::PreconditionDirectionsAlphaRescaled(), NnetChainTrainer::ProcessOutputs(), NnetDiscriminativeTrainer::ProcessOutputs(), ScaleComponent::Propagate(), DropoutComponent::Propagate(), KlHmm::PropagateFnc(), Dropout::PropagateFnc(), Rbm::RbmUpdate(), LstmNonlinearityComponent::Read(), ClipGradientComponent::RepairGradients(), RepeatedAffineComponent::Scale(), NaturalGradientAffineComponent::Scale(), AffineComponent::Scale(), BlockAffineComponent::Scale(), ConvolutionComponent::Scale(), LstmNonlinearityComponent::Scale(), Convolutional1dComponent::Scale(), TanhComponent::StoreStats(), kaldi::nnet3::TestSimpleComponentPropagateProperties(), kaldi::UnitTestCuMatrixAddMatDiagVec(), kaldi::UnitTestCuMatrixAddMatMatElements(), kaldi::UnitTestCuMatrixScale(), kaldi::UnitTestLstmNonlinearity(), and Convolutional2DComponent::Update().

608  {
609 #if HAVE_CUDA == 1
610  if (CuDevice::Instantiate().Enabled()) {
611  if (num_rows_ == 0) return;
612  Timer tim;
613 
614  dim3 dimGrid, dimBlock;
615  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
616  &dimGrid, &dimBlock);
617 
618  cuda_scale(dimGrid, dimBlock, data_, value, Dim());
619  CU_SAFE_CALL(cudaGetLastError());
620 
621  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
622  } else
623 #endif
624  {
625  Mat().Scale(value);
626  }
627 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void Set ( Real  value)

Definition at line 495 of file cu-matrix.cc.

References data_, and Timer::Elapsed().

Referenced by BackpropTruncationComponent::Backprop(), SigmoidComponent::Backprop(), MaxpoolingComponent::Propagate(), MaxPoolingComponent::PropagateFnc(), MaxPooling2DComponent::PropagateFnc(), SigmoidComponent::StoreStats(), kaldi::UnitTestCuMatrixObjfDeriv(), kaldi::UnitTestCuMatrixSet(), kaldi::nnet1::UnitTestDropoutComponent(), and kaldi::nnet1::UnitTestMaxPoolingComponent().

495  {
496  #if HAVE_CUDA == 1
497  if (CuDevice::Instantiate().Enabled()) {
498  if (num_rows_ == 0) return;
499  Timer tim;
500 
501  dim3 dimGrid, dimBlock;
502  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
503  &dimGrid, &dimBlock);
504 
505  cuda_set_const(dimGrid, dimBlock, data_, value, Dim());
506  CU_SAFE_CALL(cudaGetLastError());
507 
508  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
509  } else
510  #endif
511  {
512  Mat().Set(value);
513  }
514 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void SetMatMatDivMat ( const CuMatrixBase< Real > &  A,
const CuMatrixBase< Real > &  B,
const CuMatrixBase< Real > &  C 
)

*this = a * b / c (by element; when c = 0, *this = a) *this can be an alias of a, b or c safely and get expected result.

dst = a * b / c (by element; when c = 0, dst = a) dst can be an alias of a, b or c safely and get expected result.

Definition at line 992 of file cu-matrix.cc.

References data_, CuMatrixBase< Real >::data_, Timer::Elapsed(), KALDI_ASSERT, CuMatrixBase< Real >::Mat(), CuMatrixBase< Real >::num_cols_, CuMatrixBase< Real >::num_rows_, and CuMatrixBase< Real >::Stride().

Referenced by DropoutComponent::Backprop().

993  {
994 #if HAVE_CUDA == 1
995  if (CuDevice::Instantiate().Enabled()) {
996  Timer tim;
997 
998  KALDI_ASSERT(num_rows_ == A.num_rows_ && num_cols_ == A.num_cols_);
999  KALDI_ASSERT(num_rows_ == B.num_rows_ && num_cols_ == B.num_cols_);
1000  KALDI_ASSERT(num_rows_ == C.num_rows_ && num_cols_ == C.num_cols_);
1001  if (num_rows_ == 0) return;
1002  dim3 dimGrid, dimBlock;
1003  GetBlockSizesForSimpleMatrixOperation(NumRows(), NumCols(),
1004  &dimGrid, &dimBlock);
1005  cuda_set_mat_mat_div_mat(dimGrid, dimBlock, A.data_, B.data_, C.data_,
1006  data_, Dim(), A.Stride(), B.Stride(), C.Stride());
1007  CU_SAFE_CALL(cudaGetLastError());
1008 
1009  CuDevice::Instantiate().AccuProfile(__func__, tim.Elapsed());
1010  } else
1011 #endif
1012  {
1013  Mat().SetMatMatDivMat(A.Mat(), B.Mat(), C.Mat());
1014  }
1015 }
MatrixIndexT NumCols() const
Definition: cu-matrix.h:201
MatrixIndexT num_cols_
Definition: cu-matrix.h:642
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
MatrixIndexT NumRows() const
Dimensions.
Definition: cu-matrix.h:200
Real * data_
GPU data pointer (or regular matrix data pointer,.
Definition: cu-matrix.h:634
::MatrixDim Dim() const
Definition: cu-matrix.h:206
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void SetRandn ( )

Definition at line 2741 of file cu-matrix.cc.

References CuRand< Real >::RandGaussian().

Referenced by kaldi::CuVectorUnitTestAddColSumMat(), kaldi::CuVectorUnitTestAddDiagMat2(), kaldi::CuVectorUnitTestAddDiagMatMat(), kaldi::CuVectorUnitTestAddMatVec(), kaldi::CuVectorUnitTestAddRowSumMat(), kaldi::CuVectorUnitTestCopyFromMat(), RepeatedAffineComponent::Init(), NaturalGradientAffineComponent::Init(), AffineComponent::Init(), AffineComponentPreconditioned::Init(), AffineComponentPreconditionedOnline::Init(), BlockAffineComponent::Init(), ConvolutionComponent::Init(), Convolutional1dComponent::Init(), LstmNonlinearityComponent::Init(), FixedAffineComponent::InitFromConfig(), RepeatedAffineComponent::PerturbParams(), AffineComponent::PerturbParams(), BlockAffineComponent::PerturbParams(), ConvolutionComponent::PerturbParams(), LstmNonlinearityComponent::PerturbParams(), Convolutional1dComponent::PerturbParams(), kaldi::TestCuMatrixAddMat(), kaldi::TestCuMatrixAddMatBlocks(), kaldi::TestCuMatrixAddRowRanges(), kaldi::TestCuMatrixAddRows1(), kaldi::TestCuMatrixAddRows2(), kaldi::TestCuMatrixAddToRows(), kaldi::TestCuMatrixCompObjfAndDeriv(), kaldi::TestCuMatrixCopyLowerToUpper(), kaldi::TestCuMatrixCopyRows1(), kaldi::TestCuMatrixCopyRows2(), kaldi::TestCuMatrixCopyToRows(), kaldi::TestCuMatrixCopyUpperToLower(), kaldi::TestCuMatrixDiffGroupPnorm(), kaldi::TestCuMatrixDiffLogSoftmax(), kaldi::TestCuMatrixDiffSoftmax(), kaldi::TestCuMatrixDivRowsVec(), kaldi::TestCuMatrixGroupMaxAllGroupSizes(), kaldi::TestCuMatrixHeaviside(), kaldi::TestCuMatrixLogSoftmax(), kaldi::TestCuMatrixLookup(), kaldi::TestCuMatrixMatMatBatched(), kaldi::TestCuMatrixMax(), kaldi::TestCuMatrixMin(), kaldi::TestCuMatrixMulRowsGroupMat(), kaldi::TestCuMatrixSetZeroAboveDiag(), kaldi::TestCuMatrixSigmoid(), kaldi::TestCuMatrixSoftmax(), kaldi::TestCuMatrixSum(), kaldi::TestCuMatrixTraceMatMat(), kaldi::TestCuMatrixTransposeNS(), kaldi::TestCuMatrixTransposeS(), kaldi::TestCuVectorAddColSumMat(), kaldi::TestCuVectorAddDiagMat2(), kaldi::TestCuVectorAddDiagMatMat(), kaldi::TestCuVectorAddRowSumMat(), kaldi::nnet3::TestSimpleComponentDataDerivative(), kaldi::nnet3::TestSimpleComponentModelDerivative(), kaldi::nnet3::TestSimpleComponentPropagateProperties(), kaldi::UnitTestBackpropLstmNonlinearity(), kaldi::UnitTestCuApproxEqual(), kaldi::UnitTestCuBlockMatrixAddMatBlock(), kaldi::UnitTestCuBlockMatrixAddMatMat(), kaldi::UnitTestCuCholesky(), kaldi::UnitTestCuMathComputeLstmNonlinearity(), kaldi::UnitTestCuMathCopy(), kaldi::UnitTestCuMathNormalizePerRow(), kaldi::UnitTestCuMathRandomize(), kaldi::UnitTestCuMathSplice(), kaldi::UnitTestCuMatrixAddCols(), kaldi::UnitTestCuMatrixAddDiagVecMat(), kaldi::UnitTestCuMatrixAddElements(), kaldi::UnitTestCuMatrixAddRows(), kaldi::UnitTestCuMatrixAddToRows(), kaldi::UnitTestCuMatrixCopyCols(), kaldi::UnitTestCuMatrixCopyCross(), kaldi::UnitTestCuMatrixCopyCross2(), kaldi::UnitTestCuMatrixCopyLowerToUpper(), kaldi::UnitTestCuMatrixCopyRows(), kaldi::UnitTestCuMatrixCopyToRows(), kaldi::UnitTestCuMatrixCopyUpperToLower(), kaldi::UnitTestCuMatrixIO(), kaldi::UnitTestCuMatrixLookup(), kaldi::UnitTestCuMatrixObjfDeriv(), kaldi::UnitTestCuMatrixReduceMax(), kaldi::UnitTestCuMatrixReduceMin(), kaldi::UnitTestCuMatrixReduceSum(), kaldi::UnitTestCuMatrixSetMatMatDivMat(), kaldi::UnitTestCuMatrixSetRandn(), kaldi::UnitTestCuMatrixSetZeroAboveDiag(), kaldi::UnitTestCuMatrixSymAddMat2(), kaldi::UnitTestCuMatrixSymInvertPosDef(), kaldi::UnitTestCuMatrixTraceMatMat(), kaldi::UnitTestCuMatrixTranspose(), kaldi::UnitTestCuSparseMatrixFrobeniusNorm(), kaldi::UnitTestCuSparseMatrixSum(), kaldi::UnitTestCuSparseMatrixTraceMatSmat(), kaldi::UnitTestCuSpMatrixCopyFromMat(), kaldi::UnitTestCuSubMatrix(), kaldi::UnitTestCuTpMatrixCopyFromMat(), kaldi::nnet2::UnitTestFixedAffineComponent(), kaldi::nnet2::UnitTestFixedLinearComponent(), kaldi::nnet2::UnitTestGenericComponentInternal(), kaldi::UnitTestMatrix(), kaldi::nnet2::UnitTestNnetCompute(), kaldi::nnet3::UnitTestNnetCompute(), kaldi::nnet2::UnitTestNnetComputeChunked(), kaldi::nnet3::UnitTestNnetInputDerivatives(), kaldi::nnet3::UnitTestNnetModelDerivatives(), kaldi::nnet3::UnitTestNnetOptimizeWithOptions(), and kaldi::nnet2::UnitTestPreconditionDirections().

2741  {
2742  if (num_rows_ == 0) return;
2743 #if HAVE_CUDA == 1
2744  if (CuDevice::Instantiate().Enabled()) {
2745  CuRand<Real> tmp;
2746  tmp.RandGaussian(this);
2747  } else
2748 #endif
2749  {
2750  Mat().SetRandn();
2751  }
2752 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612
friend class CuRand< Real >
Definition: cu-matrix.h:89
MatrixIndexT num_rows_
Definition: cu-matrix.h:643
void SetRandUniform ( )

Definition at line 2755 of file cu-matrix.cc.

References CuRand< Real >::RandUniform().

Referenced by kaldi::UnitTestCuMatrixEqualElementMask(), and kaldi::UnitTestCuMatrixSetRandUniform().

2755  {
2756  if (num_rows_ == 0) return;
2757 #if HAVE_CUDA == 1
2758  if (CuDevice::Instantiate().Enabled()) {
2759  CuRand<Real> tmp;
2760  tmp.RandUniform(this);
2761  } else
2762 #endif
2763  {
2764  Mat().SetRandUniform();
2765  }
2766 }
const MatrixBase< Real > & Mat() const
Definition: cu-matrix.h:612