CuSparseMatrix< Real > Class Template Reference

#include <matrix-common.h>

Collaboration diagram for CuSparseMatrix< Real >:

Public Member Functions

MatrixIndexT NumRows () const
 
MatrixIndexT NumCols () const
 
MatrixIndexT NumElements () const
 
template<typename OtherReal >
void CopyToMat (CuMatrixBase< OtherReal > *dest, MatrixTransposeType trans=kNoTrans) const
 
Real Sum () const
 
Real FrobeniusNorm () const
 
CuSparseMatrix< Real > & operator= (const SparseMatrix< Real > &smat)
 Copy from CPU-based matrix. More...
 
CuSparseMatrix< Real > & operator= (const CuSparseMatrix< Real > &smat)
 Copy from possibly-GPU-based matrix. More...
 
template<typename OtherReal >
void CopyFromSmat (const SparseMatrix< OtherReal > &smat)
 Copy from CPU-based matrix. More...
 
void CopyFromSmat (const CuSparseMatrix< Real > &smat, MatrixTransposeType trans=kNoTrans)
 Copy from GPU-based matrix, supporting transposition. More...
 
void SelectRows (const CuArray< int32 > &row_indexes, const CuSparseMatrix< Real > &smat_other)
 Select a subset of the rows of a CuSparseMatrix. More...
 
template<typename OtherReal >
void CopyToSmat (SparseMatrix< OtherReal > *smat) const
 Copy to CPU-based matrix. More...
 
void CopyElementsToVec (CuVectorBase< Real > *vec) const
 Copy elements to CuVector. More...
 
void Swap (SparseMatrix< Real > *smat)
 Swap with CPU-based matrix. More...
 
void Swap (CuSparseMatrix< Real > *smat)
 Swap with possibly-CPU-based matrix. More...
 
void SetRandn (BaseFloat zero_prob)
 Sets up to a pseudo-randomly initialized matrix, with each element zero with probability zero_prob and else normally distributed- mostly for purposes of testing. More...
 
void Write (std::ostream &os, bool binary) const
 
void Read (std::istream &is, bool binary)
 
 CuSparseMatrix ()
 Default constructor. More...
 
 CuSparseMatrix (const SparseMatrix< Real > &smat)
 Constructor from CPU-based sparse matrix. More...
 
 CuSparseMatrix (const CuSparseMatrix< Real > &smat, MatrixTransposeType trans=kNoTrans)
 Constructor from GPU-based sparse matrix (supports transposition). More...
 
 CuSparseMatrix (const CuArray< int32 > &indexes, int32 dim, MatrixTransposeType trans=kNoTrans)
 Constructor from an array of indexes. More...
 
 CuSparseMatrix (const CuArray< int32 > &indexes, const CuVectorBase< Real > &weights, int32 dim, MatrixTransposeType trans=kNoTrans)
 Constructor from an array of indexes and an array of weights; requires indexes.Dim() == weights.Dim(). More...
 
 ~CuSparseMatrix ()
 

Protected Member Functions

const SparseMatrix< Real > & Smat () const
 
SparseMatrix< Real > & Smat ()
 
void Resize (const MatrixIndexT num_rows, const MatrixIndexT num_cols, const MatrixIndexT nnz, MatrixResizeType resize_type=kSetZero)
 Users of this class won't normally have to use Resize. More...
 
const Real * CsrVal () const
 Returns pointer to the data array of length nnz_ that holds all nonzero values in zero-based CSR format. More...
 
Real * CsrVal ()
 
const int * CsrRowPtr () const
 Returns pointer to the integer array of length NumRows()+1 that holds indices of the first nonzero element in the i-th row, while the last entry contains nnz_, as zero-based CSR format is used. More...
 
int * CsrRowPtr ()
 
const int * CsrColIdx () const
 Returns pointer to the integer array of length nnz_ that contains the column indices of the corresponding elements in array CsrVal() More...
 
int * CsrColIdx ()
 

Private Member Functions

void Destroy ()
 

Private Attributes

std::vector< SparseVector< Real > > cpu_rows_
 
MatrixIndexT num_rows_
 
MatrixIndexT num_cols_
 
MatrixIndexT nnz_
 
int * csr_row_ptr_col_idx_
 
Real * csr_val_
 

Friends

class CuMatrixBase< float >
 
class CuMatrixBase< double >
 
class CuMatrixBase< Real >
 
class CuVectorBase< float >
 
class CuVectorBase< double >
 
class CuVectorBase< Real >
 
Real TraceMatSmat (const CuMatrixBase< Real > &A, const CuSparseMatrix< Real > &B, MatrixTransposeType trans)
 

Detailed Description

template<class Real>
class kaldi::CuSparseMatrix< Real >

Definition at line 78 of file matrix-common.h.

Constructor & Destructor Documentation

◆ CuSparseMatrix() [1/5]

CuSparseMatrix ( )
inline

Default constructor.

Definition at line 123 of file cu-sparse-matrix.h.

Referenced by CuSparseMatrix< Real >::CuSparseMatrix().

123  :
125  NULL) {
126  }

◆ CuSparseMatrix() [2/5]

CuSparseMatrix ( const SparseMatrix< Real > &  smat)
inlineexplicit

Constructor from CPU-based sparse matrix.

Definition at line 129 of file cu-sparse-matrix.h.

References CuSparseMatrix< Real >::CopyFromSmat().

129  :
131  NULL) {
132  this->CopyFromSmat(smat);
133  }
void CopyFromSmat(const SparseMatrix< OtherReal > &smat)
Copy from CPU-based matrix.

◆ CuSparseMatrix() [3/5]

CuSparseMatrix ( const CuSparseMatrix< Real > &  smat,
MatrixTransposeType  trans = kNoTrans 
)
inline

Constructor from GPU-based sparse matrix (supports transposition).

Definition at line 136 of file cu-sparse-matrix.h.

References CuSparseMatrix< Real >::CopyFromSmat(), CuSparseMatrix< Real >::CuSparseMatrix(), and kaldi::kNoTrans.

137  :
139  NULL) {
140  this->CopyFromSmat(smat, trans);
141  }
void CopyFromSmat(const SparseMatrix< OtherReal > &smat)
Copy from CPU-based matrix.

◆ CuSparseMatrix() [4/5]

CuSparseMatrix ( const CuArray< int32 > &  indexes,
int32  dim,
MatrixTransposeType  trans = kNoTrans 
)

Constructor from an array of indexes.

If trans == kNoTrans, construct a sparse matrix with num-rows == indexes.Dim() and num-cols = 'dim'. 'indexes' is expected to contain elements in the range [0, dim - 1]. Each row 'i' of *this after calling the constructor will contain a single element at column-index indexes[i] with value 1.0.

If trans == kTrans, the result will be the transpose of the sparse matrix described above.

Definition at line 162 of file cu-sparse-matrix.cc.

References CuArrayBase< T >::CopyFromArray(), CuArrayBase< T >::CopyToVec(), CuSparseMatrix< Real >::CsrColIdx(), CuSparseMatrix< Real >::CsrRowPtr(), CuSparseMatrix< Real >::CsrVal(), CuArrayBase< T >::Dim(), kaldi::kTrans, kaldi::kUndefined, CuSparseMatrix< Real >::NumElements(), CuSparseMatrix< Real >::NumRows(), CuSparseMatrix< Real >::Resize(), CuArrayBase< T >::Sequence(), CuVectorBase< Real >::Set(), CuSparseMatrix< Real >::Smat(), and CuSparseMatrix< Real >::Swap().

163  :
165  NULL) {
166 #if HAVE_CUDA == 1
167  if (CuDevice::Instantiate().Enabled()) {
168  Resize(indexes.Dim(), dim, indexes.Dim(), kUndefined);
169  if (NumElements() == 0) {
170  return;
171  }
172  CuSubArray<int> row_ptr(CsrRowPtr(), NumRows() + 1);
173  row_ptr.Sequence(0);
174  CuSubArray<int> col_idx(CsrColIdx(), NumElements());
175  col_idx.CopyFromArray(indexes);
176  CuSubVector<Real> val(CsrVal(), NumElements());
177  val.Set(1);
178 
179  if (trans == kTrans) {
180  CuSparseMatrix<Real> tmp(*this, kTrans);
181  this->Swap(&tmp);
182  }
183  } else
184 #endif
185  {
186  std::vector<int32> idx(indexes.Dim());
187  indexes.CopyToVec(&idx);
188  SparseMatrix<Real> tmp(idx, dim, trans);
189  Smat().Swap(&tmp);
190  }
191 }
MatrixIndexT NumRows() const
const Real * CsrVal() const
Returns pointer to the data array of length nnz_ that holds all nonzero values in zero-based CSR form...
void Resize(const MatrixIndexT num_rows, const MatrixIndexT num_cols, const MatrixIndexT nnz, MatrixResizeType resize_type=kSetZero)
Users of this class won&#39;t normally have to use Resize.
void Swap(SparseMatrix< Real > *smat)
Swap with CPU-based matrix.
void CopyToVec(std::vector< T > *dst) const
This function resizes *dst if needed.
Definition: cu-array-inl.h:177
const int * CsrColIdx() const
Returns pointer to the integer array of length nnz_ that contains the column indices of the correspon...
const SparseMatrix< Real > & Smat() const
MatrixIndexT NumElements() const
const int * CsrRowPtr() const
Returns pointer to the integer array of length NumRows()+1 that holds indices of the first nonzero el...
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:49

◆ CuSparseMatrix() [5/5]

CuSparseMatrix ( const CuArray< int32 > &  indexes,
const CuVectorBase< Real > &  weights,
int32  dim,
MatrixTransposeType  trans = kNoTrans 
)

Constructor from an array of indexes and an array of weights; requires indexes.Dim() == weights.Dim().

If trans == kNoTrans, construct a sparse matrix with num-rows == indexes.Dim() and num-cols = 'dim'. 'indexes' is expected to contain elements in the range [0, dim - 1]. Each row 'i' of *this after calling the constructor will contain a single element at column-index indexes[i] with value weights[i]. If trans == kTrans, the result will be the transpose of the sparse matrix described above.

Definition at line 194 of file cu-sparse-matrix.cc.

References CuArrayBase< T >::CopyFromArray(), CuVectorBase< Real >::CopyFromVec(), CuArrayBase< T >::CopyToVec(), CuSparseMatrix< Real >::CsrColIdx(), CuSparseMatrix< Real >::CsrRowPtr(), CuSparseMatrix< Real >::CsrVal(), CuArrayBase< T >::Dim(), kaldi::kTrans, kaldi::kUndefined, CuSparseMatrix< Real >::NumElements(), CuSparseMatrix< Real >::NumRows(), CuSparseMatrix< Real >::Resize(), CuArrayBase< T >::Sequence(), CuSparseMatrix< Real >::Smat(), CuSparseMatrix< Real >::Swap(), and CuVectorBase< Real >::Vec().

196  :
198  NULL) {
199 #if HAVE_CUDA == 1
200  if (CuDevice::Instantiate().Enabled()) {
201  Resize(indexes.Dim(), dim, indexes.Dim(), kUndefined);
202  if (NumElements() == 0) {
203  return;
204  }
205  CuSubArray<int> row_ptr(CsrRowPtr(), NumRows() + 1);
206  row_ptr.Sequence(0);
207  CuSubArray<int> col_idx(CsrColIdx(), NumElements());
208  col_idx.CopyFromArray(indexes);
209  CuSubVector<Real> val(CsrVal(), NumElements());
210  val.CopyFromVec(weights);
211 
212  if (trans == kTrans) {
213  CuSparseMatrix<Real> tmp(*this, kTrans);
214  this->Swap(&tmp);
215  }
216  } else
217 #endif
218  {
219  std::vector<int32> idx(indexes.Dim());
220  indexes.CopyToVec(&idx);
221  SparseMatrix<Real> tmp(idx, weights.Vec(), dim, trans);
222  Smat().Swap(&tmp);
223  }
224 }
MatrixIndexT NumRows() const
const Real * CsrVal() const
Returns pointer to the data array of length nnz_ that holds all nonzero values in zero-based CSR form...
void Resize(const MatrixIndexT num_rows, const MatrixIndexT num_cols, const MatrixIndexT nnz, MatrixResizeType resize_type=kSetZero)
Users of this class won&#39;t normally have to use Resize.
void Swap(SparseMatrix< Real > *smat)
Swap with CPU-based matrix.
void CopyToVec(std::vector< T > *dst) const
This function resizes *dst if needed.
Definition: cu-array-inl.h:177
const int * CsrColIdx() const
Returns pointer to the integer array of length nnz_ that contains the column indices of the correspon...
const SparseMatrix< Real > & Smat() const
MatrixIndexT NumElements() const
const int * CsrRowPtr() const
Returns pointer to the integer array of length NumRows()+1 that holds indices of the first nonzero el...
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:49

◆ ~CuSparseMatrix()

~CuSparseMatrix ( )
inline

Definition at line 170 of file cu-sparse-matrix.h.

References CuSparseMatrix< Real >::Destroy().

170  {
171  Destroy();
172  }

Member Function Documentation

◆ CopyElementsToVec()

void CopyElementsToVec ( CuVectorBase< Real > *  vec) const

Copy elements to CuVector.

It is the caller's responsibility to resize <*vec>.

Definition at line 452 of file cu-sparse-matrix.cc.

References CuVectorBase< Real >::CopyFromVec(), CuSparseMatrix< Real >::CsrVal(), CuVectorBase< Real >::Dim(), KALDI_ASSERT, CuSparseMatrix< Real >::NumElements(), CuSparseMatrix< Real >::Smat(), and CuVectorBase< Real >::Vec().

452  {
453  KALDI_ASSERT(vec != NULL);
454  KALDI_ASSERT(this->NumElements() == vec->Dim());
455 #if HAVE_CUDA == 1
456  if (CuDevice::Instantiate().Enabled()) {
457  CuSubVector<Real> val(CsrVal(), NumElements());
458  vec->CopyFromVec(val);
459  } else
460 #endif
461  {
462  Smat().CopyElementsToVec(&(vec->Vec()));
463  }
464 }
const Real * CsrVal() const
Returns pointer to the data array of length nnz_ that holds all nonzero values in zero-based CSR form...
const SparseMatrix< Real > & Smat() const
MatrixIndexT NumElements() const
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185

◆ CopyFromSmat() [1/2]

void CopyFromSmat ( const SparseMatrix< OtherReal > &  smat)

Copy from CPU-based matrix.

We will add the transpose option later when it is necessary. Resizes *this as needed.

Definition at line 326 of file cu-sparse-matrix.cc.

References CuArrayBase< T >::CopyFromVec(), CuVectorBase< Real >::CopyFromVec(), CuSparseMatrix< Real >::CsrColIdx(), CuSparseMatrix< Real >::CsrRowPtr(), CuSparseMatrix< Real >::CsrVal(), SparseMatrix< Real >::Data(), rnnlm::i, rnnlm::j, KALDI_ASSERT, kaldi::kUndefined, rnnlm::n, SparseMatrix< Real >::NumCols(), CuSparseMatrix< Real >::NumElements(), SparseMatrix< Real >::NumElements(), CuSparseMatrix< Real >::NumRows(), SparseMatrix< Real >::NumRows(), CuSparseMatrix< Real >::Resize(), and CuSparseMatrix< Real >::Smat().

Referenced by CuSparseMatrix< Real >::CuSparseMatrix(), and CuSparseMatrix< Real >::operator=().

326  {
327 #if HAVE_CUDA == 1
328  if (CuDevice::Instantiate().Enabled()) {
329  Resize(smat.NumRows(), smat.NumCols(), smat.NumElements(), kUndefined);
330  if (NumElements() == 0) {
331  return;
332  }
333  std::vector<int> row_ptr(NumRows() + 1);
334  std::vector<int> col_idx(NumElements());
335  Vector<Real> val(NumElements(), kUndefined);
336 
337  int n = 0;
338  for (int32 i = 0; i < smat.NumRows(); ++i) {
339  row_ptr[i] = n;
340  for (int32 j = 0; j < (smat.Data() + i)->NumElements(); ++j, ++n) {
341  col_idx[n] = ((smat.Data() + i)->Data() + j)->first;
342  val(n) = static_cast<Real>(((smat.Data() + i)->Data() + j)->second);
343  }
344  }
345  row_ptr[NumRows()] = n;
346  KALDI_ASSERT(n == NumElements());
347 
348  CuSubArray<int> cu_row_ptr(CsrRowPtr(), NumRows() + 1);
349  cu_row_ptr.CopyFromVec(row_ptr);
350  CuSubArray<int> cu_col_idx(CsrColIdx(), NumElements());
351  cu_col_idx.CopyFromVec(col_idx);
352  CuSubVector<Real> cu_val(CsrVal(), NumElements());
353  cu_val.CopyFromVec(val);
354  } else
355 #endif
356  {
357  this->Smat().CopyFromSmat(smat);
358  }
359 }
MatrixIndexT NumRows() const
const Real * CsrVal() const
Returns pointer to the data array of length nnz_ that holds all nonzero values in zero-based CSR form...
void Resize(const MatrixIndexT num_rows, const MatrixIndexT num_cols, const MatrixIndexT nnz, MatrixResizeType resize_type=kSetZero)
Users of this class won&#39;t normally have to use Resize.
kaldi::int32 int32
const int * CsrColIdx() const
Returns pointer to the integer array of length nnz_ that contains the column indices of the correspon...
const SparseMatrix< Real > & Smat() const
struct rnnlm::@11::@12 n
MatrixIndexT NumElements() const
const int * CsrRowPtr() const
Returns pointer to the integer array of length NumRows()+1 that holds indices of the first nonzero el...
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185

◆ CopyFromSmat() [2/2]

void CopyFromSmat ( const CuSparseMatrix< Real > &  smat,
MatrixTransposeType  trans = kNoTrans 
)

Copy from GPU-based matrix, supporting transposition.

Resizes *this as needed.

Definition at line 370 of file cu-sparse-matrix.cc.

References CuArrayBase< T >::CopyFromArray(), CuVectorBase< Real >::CopyFromVec(), CuSparseMatrix< Real >::csr_row_ptr_col_idx_, CuSparseMatrix< Real >::CsrColIdx(), CuSparseMatrix< Real >::CsrRowPtr(), CuSparseMatrix< Real >::CsrVal(), kaldi::kNoTrans, kaldi::kUndefined, CuSparseMatrix< Real >::NumCols(), CuSparseMatrix< Real >::NumElements(), CuSparseMatrix< Real >::NumRows(), CuSparseMatrix< Real >::Resize(), and CuSparseMatrix< Real >::Smat().

371  {
372 #if HAVE_CUDA == 1
373  if (CuDevice::Instantiate().Enabled()) {
374  if (trans == kNoTrans) {
375  Resize(smat.NumRows(), smat.NumCols(), smat.NumElements(), kUndefined);
376 
377  CuSubVector<Real> val_to(CsrVal(), NumElements());
378  CuSubVector<Real> val_from(smat.CsrVal(), smat.NumElements());
379  val_to.CopyFromVec(val_from);
380 
381  CuSubArray<int> idx_to(csr_row_ptr_col_idx_,
382  NumRows() + 1 + NumElements());
383  CuSubArray<int> idx_from(smat.csr_row_ptr_col_idx_,
384  smat.NumRows() + 1 + smat.NumElements());
385  idx_to.CopyFromArray(idx_from);
386 
387  } else {
388  Resize(smat.NumCols(), smat.NumRows(), smat.NumElements(), kUndefined);
389  CuTimer tim;
390 
391  CUSPARSE_SAFE_CALL(
392  cusparse_csr2csc(GetCusparseHandle(), smat.NumRows(), smat.NumCols(),
393  smat.NumElements(), smat.CsrVal(), smat.CsrRowPtr(),
394  smat.CsrColIdx(), CsrVal(), CsrColIdx(), CsrRowPtr(),
395  CUSPARSE_ACTION_NUMERIC, CUSPARSE_INDEX_BASE_ZERO));
396 
397  CuDevice::Instantiate().AccuProfile(__func__, tim);
398  }
399  } else
400 #endif
401  {
402  Smat().CopyFromSmat(smat.Smat(), trans);
403  }
404 }
MatrixIndexT NumRows() const
const Real * CsrVal() const
Returns pointer to the data array of length nnz_ that holds all nonzero values in zero-based CSR form...
void Resize(const MatrixIndexT num_rows, const MatrixIndexT num_cols, const MatrixIndexT nnz, MatrixResizeType resize_type=kSetZero)
Users of this class won&#39;t normally have to use Resize.
const int * CsrColIdx() const
Returns pointer to the integer array of length nnz_ that contains the column indices of the correspon...
const SparseMatrix< Real > & Smat() const
MatrixIndexT NumElements() const
const int * CsrRowPtr() const
Returns pointer to the integer array of length NumRows()+1 that holds indices of the first nonzero el...

◆ CopyToMat()

template void CopyToMat ( CuMatrixBase< OtherReal > *  dest,
MatrixTransposeType  trans = kNoTrans 
) const

Definition at line 622 of file cu-sparse-matrix.cc.

References CuSparseMatrix< Real >::CsrColIdx(), CuSparseMatrix< Real >::CsrRowPtr(), CuSparseMatrix< Real >::CsrVal(), CU1DBLOCK, CuMatrixBase< Real >::Data(), CuMatrixBase< Real >::Dim(), KALDI_ASSERT, kaldi::kNoTrans, CuMatrixBase< Real >::Mat(), CuSparseMatrix< Real >::NumCols(), CuMatrixBase< Real >::NumCols(), CuSparseMatrix< Real >::NumElements(), CuSparseMatrix< Real >::NumRows(), CuMatrixBase< Real >::NumRows(), CuMatrixBase< Real >::SetZero(), and CuSparseMatrix< Real >::Smat().

Referenced by kaldi::nnet3::ComputeObjectiveFunction(), CuMatrixBase< float >::CopyFromGeneralMat(), GeneralMatrix::CopyToMat(), kaldi::UnitTestCuSparseMatrixConstructFromIndexes(), kaldi::UnitTestCuSparseMatrixCopyToSmat(), kaldi::UnitTestCuSparseMatrixFrobeniusNorm(), kaldi::UnitTestCuSparseMatrixSelectRowsAndTranspose(), kaldi::UnitTestCuSparseMatrixSum(), kaldi::UnitTestCuSparseMatrixSwap(), and kaldi::UnitTestCuSparseMatrixTraceMatSmat().

623  {
624  if (trans == kNoTrans) {
625  KALDI_ASSERT(M->NumRows() == NumRows() && M->NumCols() == NumCols());
626  } else {
627  KALDI_ASSERT(M->NumRows() == NumCols() && M->NumCols() == NumRows());
628  }
629  M->SetZero();
630  if (NumElements() == 0) {
631  return;
632  }
633 
634 #if HAVE_CUDA == 1
635  if (CuDevice::Instantiate().Enabled()) {
636  CuTimer tim;
637 
638  // We use warpSize threads per row to access only the nnz elements.
639  // Every CU1DBLOCK/warpSize rows share one thread block.
640  // 1D grid to cover all rows.
641  const int warpSize = 32;
642  dim3 dimBlock(warpSize, CU1DBLOCK / warpSize);
643  dim3 dimGrid(n_blocks(NumRows(), dimBlock.y));
644 
645  if (trans == kNoTrans) {
646  cuda_copy_from_smat(dimGrid, dimBlock, M->Data(), M->Dim(), CsrRowPtr(),
647  CsrColIdx(), CsrVal());
648  } else {
649  cuda_copy_from_smat_trans(dimGrid, dimBlock, M->Data(), M->Dim(),
650  CsrRowPtr(), CsrColIdx(), CsrVal());
651  }
652  CU_SAFE_CALL(cudaGetLastError());
653  CuDevice::Instantiate().AccuProfile(__func__, tim);
654  } else
655 #endif
656  {
657  Smat().CopyToMat(&(M->Mat()), trans);
658  }
659 }
MatrixIndexT NumRows() const
const Real * CsrVal() const
Returns pointer to the data array of length nnz_ that holds all nonzero values in zero-based CSR form...
const int * CsrColIdx() const
Returns pointer to the integer array of length nnz_ that contains the column indices of the correspon...
const SparseMatrix< Real > & Smat() const
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
MatrixIndexT NumElements() const
const int * CsrRowPtr() const
Returns pointer to the integer array of length NumRows()+1 that holds indices of the first nonzero el...
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumCols() const

◆ CopyToSmat()

template void CopyToSmat ( SparseMatrix< OtherReal > *  smat) const

Copy to CPU-based matrix.

We will add the transpose option later when it is necessary.

Definition at line 408 of file cu-sparse-matrix.cc.

References SparseMatrix< Real >::CopyFromSmat(), CuArrayBase< T >::CopyToVec(), CuVectorBase< Real >::CopyToVec(), CuSparseMatrix< Real >::csr_row_ptr_col_idx_, CuSparseMatrix< Real >::CsrVal(), rnnlm::i, rnnlm::j, KALDI_ASSERT, kaldi::kUndefined, rnnlm::n, CuSparseMatrix< Real >::num_cols_, CuSparseMatrix< Real >::NumElements(), CuSparseMatrix< Real >::NumRows(), SparseMatrix< Real >::Resize(), CuSparseMatrix< Real >::Smat(), and SparseMatrix< Real >::Swap().

Referenced by CuSparseMatrix< Real >::Swap(), kaldi::UnitTestCuSparseMatrixCopyToSmat(), and CuSparseMatrix< Real >::Write().

408  {
409  KALDI_ASSERT(smat != NULL);
410 #if HAVE_CUDA == 1
411  if (CuDevice::Instantiate().Enabled()) {
412  if (NumRows() == 0) {
413  smat->Resize(0, 0);
414  return;
415  }
416  CuSubArray<int> idx(csr_row_ptr_col_idx_, NumRows() + 1 + NumElements());
417  std::vector<int> idx_cpu;
418  idx.CopyToVec(&idx_cpu);
419 
420  CuSubVector<Real> val(CsrVal(), NumElements());
421  Vector<OtherReal> val_cpu(NumElements(), kUndefined);
422  val.CopyToVec(&val_cpu);
423 
424  std::vector<std::vector<std::pair<MatrixIndexT, OtherReal> > > pairs(
425  NumRows());
426  int n = 0;
427  for (int i = 0; i < NumRows(); ++i) {
428  for (; n < idx_cpu[i + 1]; ++n) {
429  const MatrixIndexT j = idx_cpu[NumRows() + 1 + n];
430  pairs[i].push_back( { j, val_cpu(n) });
431  }
432  }
433  KALDI_ASSERT(n == NumElements());
434  SparseMatrix<OtherReal> tmp(num_cols_, pairs);
435  smat->Swap(&tmp);
436  } else
437 #endif
438  {
439  smat->CopyFromSmat(this->Smat());
440  }
441 }
MatrixIndexT NumRows() const
const Real * CsrVal() const
Returns pointer to the data array of length nnz_ that holds all nonzero values in zero-based CSR form...
int32 MatrixIndexT
Definition: matrix-common.h:98
const SparseMatrix< Real > & Smat() const
struct rnnlm::@11::@12 n
MatrixIndexT NumElements() const
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185

◆ CsrColIdx() [1/2]

const int* CsrColIdx ( ) const
inlineprotected

◆ CsrColIdx() [2/2]

int* CsrColIdx ( )
inlineprotected

◆ CsrRowPtr() [1/2]

const int* CsrRowPtr ( ) const
inlineprotected

Returns pointer to the integer array of length NumRows()+1 that holds indices of the first nonzero element in the i-th row, while the last entry contains nnz_, as zero-based CSR format is used.

Definition at line 202 of file cu-sparse-matrix.h.

References CuSparseMatrix< Real >::csr_row_ptr_col_idx_.

Referenced by CuMatrixBase< float >::AddMatSmat(), CuMatrixBase< float >::AddSmat(), CuMatrixBase< float >::AddSmatMat(), CuSparseMatrix< Real >::CopyFromSmat(), CuSparseMatrix< Real >::CopyToMat(), CuSparseMatrix< Real >::CuSparseMatrix(), CuSparseMatrix< Real >::Resize(), CuSparseMatrix< Real >::SelectRows(), and kaldi::TraceMatSmat().

202  {
203  return csr_row_ptr_col_idx_;
204  }

◆ CsrRowPtr() [2/2]

int* CsrRowPtr ( )
inlineprotected

Definition at line 205 of file cu-sparse-matrix.h.

References CuSparseMatrix< Real >::csr_row_ptr_col_idx_.

205  {
206  return csr_row_ptr_col_idx_;
207  }

◆ CsrVal() [1/2]

◆ CsrVal() [2/2]

Real* CsrVal ( )
inlineprotected

Definition at line 195 of file cu-sparse-matrix.h.

References CuSparseMatrix< Real >::csr_val_.

195  {
196  return csr_val_;
197  }

◆ Destroy()

void Destroy ( )
private

Definition at line 301 of file cu-sparse-matrix.cc.

References CuSparseMatrix< Real >::csr_row_ptr_col_idx_, CuSparseMatrix< Real >::csr_val_, CuSparseMatrix< Real >::nnz_, CuSparseMatrix< Real >::num_cols_, CuSparseMatrix< Real >::num_rows_, and CuSparseMatrix< Real >::Smat().

Referenced by CuSparseMatrix< Real >::CsrColIdx(), CuSparseMatrix< Real >::Resize(), and CuSparseMatrix< Real >::~CuSparseMatrix().

301  {
302 #if HAVE_CUDA == 1
303  if (CuDevice::Instantiate().Enabled()) {
304  CuTimer tim;
305  if (csr_row_ptr_col_idx_) {
306  CuDevice::Instantiate().Free(csr_row_ptr_col_idx_);
307  }
308  if (csr_val_) {
309  CuDevice::Instantiate().Free(csr_val_);
310  }
311  num_rows_ = 0;
312  num_cols_ = 0;
313  nnz_ = 0;
314  csr_row_ptr_col_idx_ = NULL;
315  csr_val_ = NULL;
316  CuDevice::Instantiate().AccuProfile(__func__, tim);
317  } else
318 #endif
319  {
320  Smat().Resize(0, 0);
321  }
322 }
const SparseMatrix< Real > & Smat() const

◆ FrobeniusNorm()

Real FrobeniusNorm ( ) const

Definition at line 97 of file cu-sparse-matrix.cc.

References CuVectorBase< Real >::Norm().

Referenced by kaldi::UnitTestCuSparseMatrixFrobeniusNorm().

97  {
98 #if HAVE_CUDA == 1
99  if (CuDevice::Instantiate().Enabled()) {
100  CuSubVector<Real> element_vec(CsrVal(), NumElements());
101  return element_vec.Norm(2);
102  } else
103 #endif
104  {
105  return Smat().FrobeniusNorm();
106  }
107 }
const Real * CsrVal() const
Returns pointer to the data array of length nnz_ that holds all nonzero values in zero-based CSR form...
const SparseMatrix< Real > & Smat() const
MatrixIndexT NumElements() const

◆ NumCols()

◆ NumElements()

MatrixIndexT NumElements ( ) const

Definition at line 70 of file cu-sparse-matrix.cc.

Referenced by CuMatrixBase< float >::AddMatSmat(), CuMatrixBase< float >::AddSmatMat(), CuSparseMatrix< Real >::CopyElementsToVec(), CuSparseMatrix< Real >::CopyFromSmat(), CuSparseMatrix< Real >::CopyToMat(), CuSparseMatrix< Real >::CopyToSmat(), CuSparseMatrix< Real >::CuSparseMatrix(), CuSparseMatrix< Real >::Resize(), and kaldi::TraceMatSmat().

70  {
71 #if HAVE_CUDA == 1
72  if (CuDevice::Instantiate().Enabled()) {
73  return nnz_;
74  } else
75 #endif
76  {
77  return Smat().NumElements();
78  }
79 }
const SparseMatrix< Real > & Smat() const

◆ NumRows()

◆ operator=() [1/2]

CuSparseMatrix< Real > & operator= ( const SparseMatrix< Real > &  smat)

Copy from CPU-based matrix.

Definition at line 227 of file cu-sparse-matrix.cc.

References CuSparseMatrix< Real >::CopyFromSmat().

228  {
229  this->CopyFromSmat(smat);
230  return *this;
231 }
void CopyFromSmat(const SparseMatrix< OtherReal > &smat)
Copy from CPU-based matrix.

◆ operator=() [2/2]

CuSparseMatrix< Real > & operator= ( const CuSparseMatrix< Real > &  smat)

Copy from possibly-GPU-based matrix.

Definition at line 234 of file cu-sparse-matrix.cc.

References CuSparseMatrix< Real >::CopyFromSmat(), and kaldi::kNoTrans.

235  {
236  this->CopyFromSmat(smat, kNoTrans);
237  return *this;
238 }
void CopyFromSmat(const SparseMatrix< OtherReal > &smat)
Copy from CPU-based matrix.

◆ Read()

void Read ( std::istream &  is,
bool  binary 
)

Definition at line 514 of file cu-sparse-matrix.cc.

References SparseMatrix< Real >::Read(), and CuSparseMatrix< Real >::Swap().

514  {
515  SparseMatrix<Real> tmp;
516  tmp.Read(is, binary);
517  this->Swap(&tmp);
518 }
void Swap(SparseMatrix< Real > *smat)
Swap with CPU-based matrix.

◆ Resize()

void Resize ( const MatrixIndexT  num_rows,
const MatrixIndexT  num_cols,
const MatrixIndexT  nnz,
MatrixResizeType  resize_type = kSetZero 
)
protected

Users of this class won't normally have to use Resize.

'nnz' should be determined beforehand when calling this API.

Definition at line 241 of file cu-sparse-matrix.cc.

References CuSparseMatrix< Real >::csr_row_ptr_col_idx_, CuSparseMatrix< Real >::csr_val_, CuSparseMatrix< Real >::CsrRowPtr(), CuSparseMatrix< Real >::CsrVal(), CuSparseMatrix< Real >::Destroy(), KALDI_ASSERT, kaldi::kSetZero, kaldi::kUndefined, CuSparseMatrix< Real >::nnz_, CuSparseMatrix< Real >::num_cols_, CuSparseMatrix< Real >::num_rows_, CuSparseMatrix< Real >::NumCols(), CuSparseMatrix< Real >::NumElements(), CuSparseMatrix< Real >::NumRows(), CuArrayBase< T >::Set(), CuVectorBase< Real >::Set(), and CuSparseMatrix< Real >::Smat().

Referenced by CuSparseMatrix< Real >::CopyFromSmat(), CuSparseMatrix< Real >::CuSparseMatrix(), and CuSparseMatrix< Real >::Smat().

244  {
245 #if HAVE_CUDA == 1
246  if (CuDevice::Instantiate().Enabled()) {
247  KALDI_ASSERT(resize_type == kSetZero || resize_type == kUndefined);
248 
249  if (num_rows == NumRows() && num_cols == NumCols()
250  && nnz == NumElements()) {
251  if (resize_type == kSetZero) {
252  CuSubVector<Real> val(CsrVal(), NumElements());
253  val.Set(0);
254  }
255  return;
256  }
257 
258  Destroy();
259 
260  CuTimer tim;
261 
262  if (num_rows * num_cols == 0) {
263  KALDI_ASSERT(num_rows == 0);
264  KALDI_ASSERT(num_cols == 0);
265  KALDI_ASSERT(nnz == 0);
266  num_rows_ = 0;
267  num_cols_ = 0;
268  nnz_ = 0;
269  csr_row_ptr_col_idx_ = static_cast<int*>(CuDevice::Instantiate().Malloc(
270  1 * sizeof(int)));
271  csr_val_ = NULL;
272  } else {
273  KALDI_ASSERT(num_rows > 0);
274  KALDI_ASSERT(num_cols > 0);
275  KALDI_ASSERT(nnz >= 0 && nnz <= num_rows * static_cast<int64>(num_cols));
276 
277  num_rows_ = num_rows;
278  num_cols_ = num_cols;
279  nnz_ = nnz;
280  csr_row_ptr_col_idx_ = static_cast<int*>(CuDevice::Instantiate().Malloc(
281  (num_rows + 1 + nnz) * sizeof(int)));
282  csr_val_ = static_cast<Real*>(CuDevice::Instantiate().Malloc(
283  nnz * sizeof(Real)));
284  CuSubArray<int> row_ptr(CsrRowPtr(), NumRows() + 1);
285  row_ptr.Set(nnz);
286  if (resize_type == kSetZero) {
287  CuSubVector<Real> val(CsrVal(), NumElements());
288  val.Set(0);
289  }
290  }
291 
292  CuDevice::Instantiate().AccuProfile(__func__, tim);
293  } else
294 #endif
295  {
296  Smat().Resize(num_rows, num_cols, resize_type);
297  }
298 }
MatrixIndexT NumRows() const
const Real * CsrVal() const
Returns pointer to the data array of length nnz_ that holds all nonzero values in zero-based CSR form...
const SparseMatrix< Real > & Smat() const
MatrixIndexT NumElements() const
const int * CsrRowPtr() const
Returns pointer to the integer array of length NumRows()+1 that holds indices of the first nonzero el...
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT NumCols() const

◆ SelectRows()

void SelectRows ( const CuArray< int32 > &  row_indexes,
const CuSparseMatrix< Real > &  smat_other 
)

Select a subset of the rows of a CuSparseMatrix.

Sets *this to only the rows of 'smat_other' that are listed in 'row_indexes'. 'row_indexes' must satisfy 0 <= row_indexes[i] < smat_other.NumRows().

Definition at line 110 of file cu-sparse-matrix.cc.

References CuArrayBase< T >::CopyToVec(), CuSparseMatrix< Real >::CsrColIdx(), CuSparseMatrix< Real >::CsrRowPtr(), CuSparseMatrix< Real >::CsrVal(), CU1DBLOCK, CuArrayBase< T >::Data(), CuArrayBase< T >::Dim(), rnnlm::i, kaldi::kUndefined, CuSparseMatrix< Real >::NumCols(), CuSparseMatrix< Real >::NumRows(), and CuSparseMatrix< Real >::Smat().

Referenced by kaldi::UnitTestCuSparseMatrixSelectRowsAndTranspose().

111  {
112 #if HAVE_CUDA == 1
113  if (CuDevice::Instantiate().Enabled()) {
114  CuTimer tim;
115 
116  // Calculate nnz and row_ptr before copying selected col_idx and val.
117  // We do this on CPU for now. We will move this part to GPU is mem copy
118  // becomes a bottle-neck here.
119  std::vector<int32> row_indexes_cpu(row_indexes.Dim());
120  row_indexes.CopyToVec(&row_indexes_cpu);
121  CuSubArray<int> other_row_ptr(smat_other.CsrRowPtr(),
122  smat_other.NumRows() + 1);
123  std::vector<int> other_row_ptr_cpu(smat_other.NumRows() + 1);
124  other_row_ptr.CopyToVec(&other_row_ptr_cpu);
125  int nnz = 0;
126  std::vector<int> row_ptr_cpu(row_indexes_cpu.size() + 1);
127  for (int i = 0; i < row_indexes_cpu.size(); ++i) {
128  row_ptr_cpu[i] = nnz;
129  nnz += other_row_ptr_cpu[row_indexes_cpu[i] + 1]
130  - other_row_ptr_cpu[row_indexes_cpu[i]];
131  }
132  row_ptr_cpu[row_indexes_cpu.size()] = nnz;
133 
134  Resize(row_indexes.Dim(), smat_other.NumCols(), nnz, kUndefined);
135  CuSubArray<int> row_ptr(CsrRowPtr(), NumRows() + 1);
136  row_ptr.CopyFromVec(row_ptr_cpu);
137 
138  // We use warpSize threads per row to access only the nnz elements.
139  // Every CU1DBLOCK/warpSize rows share one thread block.
140  // 1D grid to cover all selected rows.
141  const int warpSize = 32;
142  dim3 dimBlock(warpSize, CU1DBLOCK / warpSize);
143  dim3 dimGrid(n_blocks(row_indexes.Dim(), dimBlock.y));
144 
145  cuda_select_rows(dimGrid, dimBlock, CsrRowPtr(), CsrColIdx(), CsrVal(),
146  row_indexes.Data(), row_indexes.Dim(),
147  smat_other.CsrRowPtr(), smat_other.CsrColIdx(),
148  smat_other.CsrVal());
149 
150  CU_SAFE_CALL(cudaGetLastError());
151  CuDevice::Instantiate().AccuProfile(__func__, tim);
152  } else
153 #endif
154  {
155  std::vector<int32> row_indexes_cpu(row_indexes.Dim());
156  row_indexes.CopyToVec(&row_indexes_cpu);
157  Smat().SelectRows(row_indexes_cpu, smat_other.Smat());
158  }
159 }
MatrixIndexT NumRows() const
const Real * CsrVal() const
Returns pointer to the data array of length nnz_ that holds all nonzero values in zero-based CSR form...
void Resize(const MatrixIndexT num_rows, const MatrixIndexT num_cols, const MatrixIndexT nnz, MatrixResizeType resize_type=kSetZero)
Users of this class won&#39;t normally have to use Resize.
void CopyToVec(std::vector< T > *dst) const
This function resizes *dst if needed.
Definition: cu-array-inl.h:177
const T * Data() const
Get raw pointer.
Definition: cu-array.h:52
const int * CsrColIdx() const
Returns pointer to the integer array of length nnz_ that contains the column indices of the correspon...
const SparseMatrix< Real > & Smat() const
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
const int * CsrRowPtr() const
Returns pointer to the integer array of length NumRows()+1 that holds indices of the first nonzero el...
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:49

◆ SetRandn()

void SetRandn ( BaseFloat  zero_prob)

Sets up to a pseudo-randomly initialized matrix, with each element zero with probability zero_prob and else normally distributed- mostly for purposes of testing.

Definition at line 497 of file cu-sparse-matrix.cc.

References CuSparseMatrix< Real >::num_cols_, CuSparseMatrix< Real >::num_rows_, SparseMatrix< Real >::SetRandn(), and CuSparseMatrix< Real >::Swap().

497  {
498  if (num_rows_ == 0)
499  return;
500  // Use the CPU function for the moment, not efficient...
501  SparseMatrix<Real> tmp(num_rows_, num_cols_);
502  tmp.SetRandn(zero_prob);
503  Swap(&tmp);
504 }
void Swap(SparseMatrix< Real > *smat)
Swap with CPU-based matrix.

◆ Smat() [1/2]

◆ Smat() [2/2]

SparseMatrix<Real>& Smat ( )
inlineprotected

Definition at line 181 of file cu-sparse-matrix.h.

References kaldi::kSetZero, and CuSparseMatrix< Real >::Resize().

181  {
182  return *(reinterpret_cast<SparseMatrix<Real>*>(this));
183  }

◆ Sum()

Real Sum ( ) const

Definition at line 82 of file cu-sparse-matrix.cc.

References CuVectorBase< Real >::Sum().

Referenced by kaldi::nnet3::ComputeObjectiveFunction(), and kaldi::UnitTestCuSparseMatrixSum().

82  {
83  if (NumElements() == 0)
84  return 0;
85 #if HAVE_CUDA == 1
86  if (CuDevice::Instantiate().Enabled()) {
87  CuSubVector<Real> sum_vec(CsrVal(), NumElements());
88  return sum_vec.Sum();
89  } else
90 #endif
91  {
92  return Smat().Sum();
93  }
94 }
const Real * CsrVal() const
Returns pointer to the data array of length nnz_ that holds all nonzero values in zero-based CSR form...
const SparseMatrix< Real > & Smat() const
MatrixIndexT NumElements() const

◆ Swap() [1/2]

void Swap ( SparseMatrix< Real > *  smat)

Swap with CPU-based matrix.

Definition at line 467 of file cu-sparse-matrix.cc.

References CuSparseMatrix< Real >::CopyToSmat(), and CuSparseMatrix< Real >::Smat().

Referenced by CuSparseMatrix< Real >::CuSparseMatrix(), CuSparseMatrix< Real >::Read(), CuSparseMatrix< Real >::SetRandn(), and kaldi::UnitTestCuSparseMatrixSwap().

467  {
468 #if HAVE_CUDA == 1
469  if (CuDevice::Instantiate().Enabled()) {
470  CuSparseMatrix<Real> tmp(*smat);
471  Swap(&tmp);
472  tmp.CopyToSmat(smat);
473  } else
474 #endif
475  {
476  Smat().Swap(smat);
477  }
478 }
void Swap(SparseMatrix< Real > *smat)
Swap with CPU-based matrix.
const SparseMatrix< Real > & Smat() const

◆ Swap() [2/2]

void Swap ( CuSparseMatrix< Real > *  smat)

Swap with possibly-CPU-based matrix.

Definition at line 481 of file cu-sparse-matrix.cc.

References CuSparseMatrix< Real >::csr_row_ptr_col_idx_, CuSparseMatrix< Real >::csr_val_, CuSparseMatrix< Real >::nnz_, CuSparseMatrix< Real >::num_cols_, CuSparseMatrix< Real >::num_rows_, CuSparseMatrix< Real >::Smat(), and kaldi::swap().

481  {
482 #if HAVE_CUDA == 1
483  if (CuDevice::Instantiate().Enabled()) {
484  std::swap(num_rows_, smat->num_rows_);
485  std::swap(num_cols_, smat->num_cols_);
486  std::swap(nnz_, smat->nnz_);
487  std::swap(csr_row_ptr_col_idx_, smat->csr_row_ptr_col_idx_);
488  std::swap(csr_val_, smat->csr_val_);
489  } else
490 #endif
491  {
492  Smat().Swap(&(smat->Smat()));
493  }
494 }
void swap(basic_filebuf< CharT, Traits > &x, basic_filebuf< CharT, Traits > &y)
const SparseMatrix< Real > & Smat() const

◆ Write()

void Write ( std::ostream &  os,
bool  binary 
) const

Definition at line 507 of file cu-sparse-matrix.cc.

References CuSparseMatrix< Real >::CopyToSmat(), and SparseMatrix< Real >::Write().

507  {
508  SparseMatrix<Real> tmp;
509  CopyToSmat(&tmp);
510  tmp.Write(os, binary);
511 }
void CopyToSmat(SparseMatrix< OtherReal > *smat) const
Copy to CPU-based matrix.

Friends And Related Function Documentation

◆ CuMatrixBase< double >

friend class CuMatrixBase< double >
friend

Definition at line 51 of file cu-sparse-matrix.h.

◆ CuMatrixBase< float >

friend class CuMatrixBase< float >
friend

Definition at line 50 of file cu-sparse-matrix.h.

◆ CuMatrixBase< Real >

friend class CuMatrixBase< Real >
friend

Definition at line 52 of file cu-sparse-matrix.h.

◆ CuVectorBase< double >

friend class CuVectorBase< double >
friend

Definition at line 54 of file cu-sparse-matrix.h.

◆ CuVectorBase< float >

friend class CuVectorBase< float >
friend

Definition at line 53 of file cu-sparse-matrix.h.

◆ CuVectorBase< Real >

friend class CuVectorBase< Real >
friend

Definition at line 55 of file cu-sparse-matrix.h.

◆ TraceMatSmat

Real TraceMatSmat ( const CuMatrixBase< Real > &  A,
const CuSparseMatrix< Real > &  B,
MatrixTransposeType  trans 
)
friend

Definition at line 524 of file cu-sparse-matrix.cc.

Referenced by kaldi::TraceMatSmat().

526  {
527  if (A.NumCols() == 0) {
528  KALDI_ASSERT(B.NumCols() == 0);
529  return 0.0;
530  }
531  if (B.NumElements() == 0) {
532  return 0.0;
533  }
534  Real result = 0;
535 #if HAVE_CUDA == 1
536  if (CuDevice::Instantiate().Enabled()) {
537  if (trans == kTrans) {
538  KALDI_ASSERT(A.NumRows() == B.NumRows() && A.NumCols() == B.NumCols());
539  } else {
540  KALDI_ASSERT(A.NumCols() == B.NumRows() && A.NumRows() == B.NumCols());
541  }
542 
543  // The Sum() method in CuVector handles a bunch of logic, we use that to
544  // comptue the trace.
545  CuVector<Real> sum_vec(B.NumElements());
546  CuTimer tim;
547 
548  // We use warpSize threads per row to access only the nnz elements.
549  // Every CU1DBLOCK/warpSize rows share one thread block.
550  // 1D grid to cover all rows of B.
551  const int warpSize = 32;
552  dim3 dimBlock(warpSize, CU1DBLOCK / warpSize);
553  dim3 dimGrid(n_blocks(B.NumRows(), dimBlock.y));
554 
555  if (trans == kNoTrans) {
556  cuda_trace_mat_smat(dimGrid, dimBlock, A.Data(), A.Dim(), B.CsrRowPtr(),
557  B.CsrColIdx(), B.CsrVal(), sum_vec.Data());
558  } else {
559  cuda_trace_mat_smat_trans(dimGrid, dimBlock, A.Data(), A.Dim(),
560  B.CsrRowPtr(), B.CsrColIdx(), B.CsrVal(),
561  sum_vec.Data());
562  }
563  result = sum_vec.Sum();
564  CuDevice::Instantiate().AccuProfile(__func__, tim);
565  } else
566 #endif
567  {
568  result = TraceMatSmat(A.Mat(), B.Smat(), trans);
569  }
570  return result;
571 }
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
friend Real TraceMatSmat(const CuMatrixBase< Real > &A, const CuSparseMatrix< Real > &B, MatrixTransposeType trans)
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185

Member Data Documentation

◆ cpu_rows_

std::vector<SparseVector<Real> > cpu_rows_
private

Definition at line 224 of file cu-sparse-matrix.h.

◆ csr_row_ptr_col_idx_

◆ csr_val_

◆ nnz_

◆ num_cols_

◆ num_rows_


The documentation for this class was generated from the following files: