CuArrayBase< T > Class Template Reference

Class CuArrayBase, CuSubArray and CuArray are analogues of classes CuVectorBase, CuSubVector and CuVector, except that they are intended to store things other than float/double: they are intended to store integers or small structs. More...

#include <cu-array.h>

Inheritance diagram for CuArrayBase< T >:
Collaboration diagram for CuArrayBase< T >:

Public Member Functions

MatrixIndexT Dim () const
 Return the vector dimension. More...
 
const T * Data () const
 Get raw pointer. More...
 
T * Data ()
 
void SetZero ()
 Sets the memory for the object to zero, via memset. More...
 
void CopyFromArray (const CuArrayBase< T > &src)
 The caller is responsible to ensure dim is equal between *this and src. More...
 
void CopyFromVec (const std::vector< T > &src)
 The caller is responsible to ensure dim is equal between *this and src. More...
 
void CopyToVec (std::vector< T > *dst) const
 This function resizes *dst if needed. More...
 
void CopyToHost (T *dst) const
 Version of the above function that copies contents to a host array (i.e. More...
 
void Set (const T &value)
 Set to a constant value. More...
 
void Sequence (const T base)
 Fill with the sequence [base ... More...
 
void Add (const T &value)
 Add a constant value. More...
 
Min () const
 Get minimum value (for now implemented on CPU, reimplement if slow). More...
 
Max () const
 Get minimum value (for now implemented on CPU, reimplement if slow). More...
 
template<>
void Set (const int32 &value)
 
template<>
void Sequence (const int32 base)
 
template<>
void Add (const int32 &value)
 
template<>
void Sequence (const int32 base)
 
template<>
void Set (const int32 &value)
 
template<>
void Add (const int32 &value)
 

Protected Member Functions

 CuArrayBase ()
 Default constructor: make it protected so the user cannot instantiate this class. More...
 

Protected Attributes

T * data_
 GPU data pointer (if GPU not available, will point to CPU memory). More...
 
MatrixIndexT dim_
 dimension of the vector More...
 

Friends

class CuArray< T >
 
class CuSubArray< T >
 

Detailed Description

template<typename T>
class kaldi::CuArrayBase< T >

Class CuArrayBase, CuSubArray and CuArray are analogues of classes CuVectorBase, CuSubVector and CuVector, except that they are intended to store things other than float/double: they are intended to store integers or small structs.

Their CPU-based equivalents are std::vector, and we provide ways to copy to/from a std::vector of the same type.

Definition at line 44 of file cu-array.h.

Constructor & Destructor Documentation

◆ CuArrayBase()

CuArrayBase ( )
inlineprotected

Default constructor: make it protected so the user cannot instantiate this class.

Definition at line 108 of file cu-array.h.

108 : data_(NULL), dim_(0) { }
T * data_
GPU data pointer (if GPU not available, will point to CPU memory).
Definition: cu-array.h:111
MatrixIndexT dim_
dimension of the vector
Definition: cu-array.h:113

Member Function Documentation

◆ Add() [1/3]

void Add ( const int32 value)

Definition at line 88 of file cu-array.cc.

88  {
89  if (dim_ == 0) return;
90 #if HAVE_CUDA == 1
91  if (CuDevice::Instantiate().Enabled()) {
92  CuTimer tim;
93 
94  dim3 dimBlock(CU2DBLOCK);
95  dim3 dimGrid(n_blocks(Dim(), CU2DBLOCK));
96  ::MatrixDim d = { 1, Dim(), Dim() };
97 
98  cuda_int32_add(dimGrid, dimBlock, data_, value, d);
99  CU_SAFE_CALL(cudaGetLastError());
100 
101  CuDevice::Instantiate().AccuProfile(__func__, tim);
102  } else
103 #endif
104  {
105  for (int32 i = 0; i < dim_; i++) {
106  data_[i] += value;
107  }
108  }
109 }
Structure containing size of the matrix plus stride.
Definition: cu-matrixdim.h:46
kaldi::int32 int32
T * data_
GPU data pointer (if GPU not available, will point to CPU memory).
Definition: cu-array.h:111
#define CU2DBLOCK
Definition: cu-matrixdim.h:61
MatrixIndexT dim_
dimension of the vector
Definition: cu-array.h:113
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:49

◆ Add() [2/3]

void Add ( const T &  value)

Add a constant value.

This is NOT IMPLEMENTED YET except for T == int32 (the current implementation will just crash).

Definition at line 254 of file cu-array-inl.h.

Referenced by CuArrayBase< Int32Pair >::Add(), CuArrayBase< Int32Pair >::Data(), and CopyComponent::WriteData().

254  {
255  // This is not implemented yet, we'll do so if it's needed.
256  KALDI_ERR << "CuArray<T>::Add not implemented yet for this type.";
257 }
#define KALDI_ERR
Definition: kaldi-error.h:147

◆ Add() [3/3]

void Add ( const int32 value)

◆ CopyFromArray()

void CopyFromArray ( const CuArrayBase< T > &  src)

The caller is responsible to ensure dim is equal between *this and src.

Note: copying to GPU is done via memcpy, and any constructors or assignment operators are not called.

Definition at line 157 of file cu-array-inl.h.

Referenced by CuSparseMatrix< Real >::CopyFromSmat(), CuArray< Int32Pair >::CuArray(), CuSparseMatrix< Real >::CuSparseMatrix(), CuArrayBase< Int32Pair >::Data(), CuArray< Int32Pair >::operator=(), and CuArray< Int32Pair >::~CuArray().

157  {
158  KALDI_ASSERT(src.Dim() == Dim());
159  if (dim_ == 0)
160  return;
161 #if HAVE_CUDA == 1
162  if (CuDevice::Instantiate().Enabled()) {
163  CuTimer tim;
164  CU_SAFE_CALL(
165  cudaMemcpyAsync(this->data_, src.data_, dim_ * sizeof(T),
166  cudaMemcpyDeviceToDevice, cudaStreamPerThread));
167  CuDevice::Instantiate().AccuProfile(__func__, tim);
168  } else
169 #endif
170  {
171  memcpy(this->data_, src.data_, dim_ * sizeof(T));
172  }
173 }
T * data_
GPU data pointer (if GPU not available, will point to CPU memory).
Definition: cu-array.h:111
MatrixIndexT dim_
dimension of the vector
Definition: cu-array.h:113
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:49

◆ CopyFromVec()

void CopyFromVec ( const std::vector< T > &  src)

The caller is responsible to ensure dim is equal between *this and src.

Note: copying to GPU is done via memcpy, and any constructors or assignment operators are not called.

Definition at line 100 of file cu-array-inl.h.

Referenced by CuSparseMatrix< Real >::CopyFromSmat(), CuArray< Int32Pair >::CuArray(), CuArrayBase< Int32Pair >::Data(), CuArray< Int32Pair >::operator=(), and CuArray< Int32Pair >::~CuArray().

100  {
101  KALDI_ASSERT(dim_ == src.size());
102  if (src.empty())
103  return;
104 #if HAVE_CUDA == 1
105  if (CuDevice::Instantiate().Enabled()) {
106  CuTimer tim;
107  CU_SAFE_CALL(
108  cudaMemcpyAsync(data_, &src.front(), src.size() * sizeof(T),
109  cudaMemcpyHostToDevice, cudaStreamPerThread));
110  CU_SAFE_CALL(cudaStreamSynchronize(cudaStreamPerThread));
111  CuDevice::Instantiate().AccuProfile(__func__, tim);
112  } else
113 #endif
114  {
115  memcpy(data_, &src.front(), src.size() * sizeof(T));
116  }
117 }
T * data_
GPU data pointer (if GPU not available, will point to CPU memory).
Definition: cu-array.h:111
MatrixIndexT dim_
dimension of the vector
Definition: cu-array.h:113
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185

◆ CopyToHost()

void CopyToHost ( T *  dst) const

Version of the above function that copies contents to a host array (i.e.

to regular memory, not GPU memory, assuming we're using a GPU). This function requires *dst to be allocated before calling. The allocated size should be dim_ * sizeof(T)

Definition at line 198 of file cu-array-inl.h.

Referenced by CuArrayBase< Int32Pair >::Data(), CuMatrixBase< float >::Lookup(), and kaldi::UnitTestCuArray().

198  {
199  if (this->dim_ == 0) return;
200  KALDI_ASSERT(dst != NULL);
201 #if HAVE_CUDA == 1
202  if (CuDevice::Instantiate().Enabled()) {
203  CuTimer tim;
204  CU_SAFE_CALL(cudaMemcpyAsync(dst, Data(), this->dim_ * sizeof(T),
205  cudaMemcpyDeviceToHost, cudaStreamPerThread));
206  CU_SAFE_CALL(cudaStreamSynchronize(cudaStreamPerThread));
207  CuDevice::Instantiate().AccuProfile("CuArray::CopyToVecD2H", tim);
208  } else
209 #endif
210  {
211  memcpy(dst, this->data_, this->dim_ * sizeof(T));
212  }
213 }
const T * Data() const
Get raw pointer.
Definition: cu-array.h:52
T * data_
GPU data pointer (if GPU not available, will point to CPU memory).
Definition: cu-array.h:111
MatrixIndexT dim_
dimension of the vector
Definition: cu-array.h:113
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185

◆ CopyToVec()

void CopyToVec ( std::vector< T > *  dst) const

This function resizes *dst if needed.

On resize of "dst", the STL vector may call copy-constructors, initializers, and assignment operators for existing objects (which will be overwritten), but the copy from GPU to CPU is done via memcpy. So be very careful calling this function if your objects are more than plain structs.

Definition at line 177 of file cu-array-inl.h.

Referenced by Splice::BackpropagateFnc(), ConvolutionComputation::Check(), kaldi::nnet3::ComputeAccuracy(), kaldi::nnet3::CopyPairVector(), CuSparseMatrix< Real >::CopyToSmat(), kaldi::nnet1::CountCorrectFramesWeighted(), CuSparseMatrix< Real >::CuSparseMatrix(), CuArrayBase< Int32Pair >::Data(), CuSparseMatrix< Real >::SelectRows(), kaldi::TestCuFindRowMaxId(), kaldi::UnitTestCuArray(), and kaldi::UnitTestCuFindRowMaxId().

177  {
178  if (static_cast<MatrixIndexT>(dst->size()) != this->dim_) {
179  dst->resize(this->dim_);
180  }
181  if (this->dim_ == 0) return;
182 #if HAVE_CUDA == 1
183  if (CuDevice::Instantiate().Enabled()) {
184  CuTimer tim;
185  CU_SAFE_CALL(cudaMemcpyAsync(&dst->front(), Data(), this->dim_ * sizeof(T),
186  cudaMemcpyDeviceToHost, cudaStreamPerThread));
187  CU_SAFE_CALL(cudaStreamSynchronize(cudaStreamPerThread));
188  CuDevice::Instantiate().AccuProfile("CuArray::CopyToVecD2H", tim);
189  } else
190 #endif
191  {
192  memcpy(&dst->front(), this->data_, this->dim_ * sizeof(T));
193  }
194 }
const T * Data() const
Get raw pointer.
Definition: cu-array.h:52
T * data_
GPU data pointer (if GPU not available, will point to CPU memory).
Definition: cu-array.h:111
MatrixIndexT dim_
dimension of the vector
Definition: cu-array.h:113

◆ Data() [1/2]

◆ Data() [2/2]

T* Data ( )
inline

Definition at line 54 of file cu-array.h.

54 { return data_; }
T * data_
GPU data pointer (if GPU not available, will point to CPU memory).
Definition: cu-array.h:111

◆ Dim()

◆ Max()

T Max ( ) const
inline

Get minimum value (for now implemented on CPU, reimplement if slow).

Asserts the vector is non-empty, otherwise crashes.

Definition at line 282 of file cu-array-inl.h.

Referenced by CuArrayBase< Int32Pair >::Data().

282  {
283  KALDI_ASSERT(this->Dim() > 0);
284 #if HAVE_CUDA == 1
285  CuTimer tim;
286 #endif
287  std::vector<T> tmp(Dim());
288  CopyToVec(&tmp);
289  T ans = *std::max_element(tmp.begin(), tmp.end());
290 #if HAVE_CUDA == 1
291  if (CuDevice::Instantiate().Enabled()) {
292  CuDevice::Instantiate().AccuProfile(__func__, tim);
293  }
294 #endif
295  return ans;
296 }
void CopyToVec(std::vector< T > *dst) const
This function resizes *dst if needed.
Definition: cu-array-inl.h:177
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:49

◆ Min()

T Min ( ) const
inline

Get minimum value (for now implemented on CPU, reimplement if slow).

Asserts the vector is non-empty, otherwise crashes.

Definition at line 264 of file cu-array-inl.h.

Referenced by CuArrayBase< Int32Pair >::Data().

264  {
265  KALDI_ASSERT(this->Dim() > 0);
266 #if HAVE_CUDA == 1
267  CuTimer tim;
268 #endif
269  std::vector<T> tmp(Dim());
270  CopyToVec(&tmp);
271  T ans = *std::min_element(tmp.begin(), tmp.end());
272 #if HAVE_CUDA == 1
273  if (CuDevice::Instantiate().Enabled()) {
274  CuDevice::Instantiate().AccuProfile(__func__, tim);
275  }
276 #endif
277  return ans;
278 }
void CopyToVec(std::vector< T > *dst) const
This function resizes *dst if needed.
Definition: cu-array-inl.h:177
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:49

◆ Sequence() [1/3]

void Sequence ( const int32  base)

Definition at line 39 of file cu-array.cc.

39  {
40  if (dim_ == 0) return;
41 #if HAVE_CUDA == 1
42  if (CuDevice::Instantiate().Enabled()) {
43  CuTimer tim;
44 
45  dim3 dimBlock(CU1DBLOCK);
46  dim3 dimGrid(n_blocks(Dim(), CU1DBLOCK));
47 
48  cuda_sequence(dimGrid, dimBlock, Data(), Dim(), base);
49  CU_SAFE_CALL(cudaGetLastError());
50 
51  CuDevice::Instantiate().AccuProfile(__func__, tim);
52  } else
53 #endif
54  {
55  for (int32 i = 0; i < dim_; i++) {
56  data_[i] = base + i;
57  }
58  }
59 }
kaldi::int32 int32
const T * Data() const
Get raw pointer.
Definition: cu-array.h:52
T * data_
GPU data pointer (if GPU not available, will point to CPU memory).
Definition: cu-array.h:111
#define CU1DBLOCK
Definition: cu-matrixdim.h:57
MatrixIndexT dim_
dimension of the vector
Definition: cu-array.h:113
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:49

◆ Sequence() [2/3]

void Sequence ( const T  base)

Fill with the sequence [base ...

base + Dim()) This is not implemented except for T=int32

Definition at line 244 of file cu-array-inl.h.

Referenced by CuSparseMatrix< Real >::CuSparseMatrix(), CuArrayBase< Int32Pair >::Data(), and CuArrayBase< Int32Pair >::Sequence().

244  {
245  // This is not implemented yet, we'll do so if it's needed.
246  KALDI_ERR << "CuArray<T>::Sequence not implemented yet for this type.";
247 }
#define KALDI_ERR
Definition: kaldi-error.h:147

◆ Sequence() [3/3]

void Sequence ( const int32  base)

◆ Set() [1/3]

void Set ( const int32 value)

Definition at line 63 of file cu-array.cc.

63  {
64  if (dim_ == 0) return;
65 #if HAVE_CUDA == 1
66  if (CuDevice::Instantiate().Enabled()) {
67  CuTimer tim;
68 
69  dim3 dimBlock(CU2DBLOCK);
70  dim3 dimGrid(n_blocks(Dim(), CU2DBLOCK));
71  ::MatrixDim d = { 1, Dim(), Dim() };
72 
73  cuda_int32_set_const(dimGrid, dimBlock, data_, value, d);
74  CU_SAFE_CALL(cudaGetLastError());
75 
76  CuDevice::Instantiate().AccuProfile(__func__, tim);
77  } else
78 #endif
79  {
80  for (int32 i = 0; i < dim_; i++) {
81  data_[i] = value;
82  }
83  }
84 }
Structure containing size of the matrix plus stride.
Definition: cu-matrixdim.h:46
kaldi::int32 int32
T * data_
GPU data pointer (if GPU not available, will point to CPU memory).
Definition: cu-array.h:111
#define CU2DBLOCK
Definition: cu-matrixdim.h:61
MatrixIndexT dim_
dimension of the vector
Definition: cu-array.h:113
MatrixIndexT Dim() const
Return the vector dimension.
Definition: cu-array.h:49

◆ Set() [2/3]

void Set ( const T &  value)

Set to a constant value.

Note: any copying is done as if using memcpy, and assignment operators or destructors are not called. This is NOT IMPLEMENTED YET except for T == int32 (the current implementation will just crash).

Definition at line 234 of file cu-array-inl.h.

Referenced by CuArrayBase< Int32Pair >::Data(), CuSparseMatrix< Real >::Resize(), CuArrayBase< Int32Pair >::Set(), and kaldi::UnitTestCuArray().

234  {
235  // This is not implemented yet, we'll do so if it's needed.
236  KALDI_ERR << "CuArray<T>::Set not implemented yet for this type.";
237 }
#define KALDI_ERR
Definition: kaldi-error.h:147

◆ Set() [3/3]

void Set ( const int32 value)

◆ SetZero()

void SetZero ( )

Sets the memory for the object to zero, via memset.

You should verify that this makes sense for type T.

Definition at line 217 of file cu-array-inl.h.

Referenced by CuArrayBase< Int32Pair >::Data().

217  {
218  if (this->dim_ == 0) return;
219 #if HAVE_CUDA == 1
220  if (CuDevice::Instantiate().Enabled()) {
221  CuTimer tim;
222  CU_SAFE_CALL(cudaMemsetAsync(this->data_, 0, this->dim_ * sizeof(T),
223  cudaStreamPerThread));
224  CuDevice::Instantiate().AccuProfile("CuArray::SetZero", tim);
225  } else
226 #endif
227  {
228  memset(static_cast<void*>(this->data_), 0, this->dim_ * sizeof(T));
229  }
230 }
T * data_
GPU data pointer (if GPU not available, will point to CPU memory).
Definition: cu-array.h:111
MatrixIndexT dim_
dimension of the vector
Definition: cu-array.h:113

Friends And Related Function Documentation

◆ CuArray< T >

friend class CuArray< T >
friend

Definition at line 45 of file cu-array.h.

◆ CuSubArray< T >

friend class CuSubArray< T >
friend

Definition at line 46 of file cu-array.h.

Member Data Documentation

◆ data_

T* data_
protected

GPU data pointer (if GPU not available, will point to CPU memory).

Definition at line 111 of file cu-array.h.

Referenced by CuArrayBase< Int32Pair >::CopyFromArray(), CuArray< Int32Pair >::CopyFromArray(), CuSubArray< T >::CuSubArray(), CuArrayBase< Int32Pair >::Data(), and CuArray< Int32Pair >::Swap().

◆ dim_

MatrixIndexT dim_
protected

dimension of the vector

Definition at line 113 of file cu-array.h.

Referenced by CuSubArray< T >::CuSubArray(), CuArrayBase< Int32Pair >::Dim(), and CuArray< Int32Pair >::Swap().


The documentation for this class was generated from the following files: