NnetBatchComputer Class Reference

This class does neural net inference in a way that is optimized for GPU use: it combines chunks of multiple utterances into minibatches for more efficient computation. More...

#include <nnet-batch-compute.h>

Collaboration diagram for NnetBatchComputer:

Classes

struct  ComputationGroupInfo
 
struct  ComputationGroupKey
 
struct  ComputationGroupKeyHasher
 
struct  MinibatchSizeInfo
 

Public Member Functions

 NnetBatchComputer (const NnetBatchComputerOptions &opts, const Nnet &nnet, const VectorBase< BaseFloat > &priors)
 Constructor. More...
 
void AcceptTask (NnetInferenceTask *task, int32 max_minibatches_full=-1)
 Accepts a task, meaning the task will be queued. More...
 
int32 NumFullPendingMinibatches () const
 Returns the number of full minibatches waiting to be computed. More...
 
bool Compute (bool allow_partial_minibatch)
 Does some kind of computation, choosing the highest-priority thing to compute. More...
 
void SplitUtteranceIntoTasks (bool output_to_cpu, const Matrix< BaseFloat > &input, const Vector< BaseFloat > *ivector, const Matrix< BaseFloat > *online_ivectors, int32 online_ivector_period, std::vector< NnetInferenceTask > *tasks)
 Split a single utterance into a list of separate tasks which can then be given to this class by AcceptTask(). More...
 
void SplitUtteranceIntoTasks (bool output_to_cpu, const CuMatrix< BaseFloat > &input, const CuVector< BaseFloat > *ivector, const CuMatrix< BaseFloat > *online_ivectors, int32 online_ivector_period, std::vector< NnetInferenceTask > *tasks)
 
const NnetBatchComputerOptionsGetOptions ()
 
 ~NnetBatchComputer ()
 

Private Types

typedef unordered_map< ComputationGroupKey, ComputationGroupInfo, ComputationGroupKeyHasherMapType
 

Private Member Functions

 KALDI_DISALLOW_COPY_AND_ASSIGN (NnetBatchComputer)
 
double GetPriority (bool allow_partial_minibatch, const ComputationGroupInfo &info) const
 
int32 GetMinibatchSize (const ComputationGroupInfo &info) const
 
std::shared_ptr< const NnetComputationGetComputation (const ComputationGroupInfo &info, int32 minibatch_size)
 
int32 GetActualMinibatchSize (const ComputationGroupInfo &info) const
 
void GetHighestPriorityTasks (int32 num_tasks, ComputationGroupInfo *info, std::vector< NnetInferenceTask *> *tasks)
 
MinibatchSizeInfoGetHighestPriorityComputation (bool allow_partial_minibatch, int32 *minibatch_size, std::vector< NnetInferenceTask *> *tasks)
 This function finds and returns the computation corresponding to the highest-priority group of tasks. More...
 
void FormatInputs (int32 minibatch_size, const std::vector< NnetInferenceTask *> &tasks, CuMatrix< BaseFloat > *input, CuMatrix< BaseFloat > *ivector)
 formats the inputs to the computation and transfers them to GPU. More...
 
void FormatOutputs (const CuMatrix< BaseFloat > &output, const std::vector< NnetInferenceTask *> &tasks)
 
void CheckAndFixConfigs ()
 
void PrintMinibatchStats ()
 

Static Private Member Functions

static void GetComputationRequest (const NnetInferenceTask &task, int32 minibatch_size, ComputationRequest *request)
 

Private Attributes

NnetBatchComputerOptions opts_
 
const Nnetnnet_
 
CachingOptimizingCompiler compiler_
 
CuVector< BaseFloatlog_priors_
 
std::mutex mutex_
 
MapType tasks_
 
int32 num_full_minibatches_
 
std::unordered_map< int32, std::condition_variable * > no_more_than_n_minibatches_full_
 
int32 nnet_left_context_
 
int32 nnet_right_context_
 
int32 input_dim_
 
int32 ivector_dim_
 
int32 output_dim_
 

Detailed Description

This class does neural net inference in a way that is optimized for GPU use: it combines chunks of multiple utterances into minibatches for more efficient computation.

It does the computation in one background thread that accesses the GPU. It is thread safe, i.e. you can call it from multiple threads without having to worry about data races and the like.

Definition at line 207 of file nnet-batch-compute.h.

Member Typedef Documentation

◆ MapType

Definition at line 344 of file nnet-batch-compute.h.

Constructor & Destructor Documentation

◆ NnetBatchComputer()

NnetBatchComputer ( const NnetBatchComputerOptions opts,
const Nnet nnet,
const VectorBase< BaseFloat > &  priors 
)

Constructor.

It stores references to all the arguments, so don't delete them till this object goes out of scop.

Parameters
[in]optsOptions struct
[in]nnetThe neural net which we'll be doing the computation with
[in]priorsEither the empty vector, or a vector of prior probabilities which we'll take the log of and subtract from the neural net outputs (e.g. used in non-chain systems).

Definition at line 31 of file nnet-batch-compute.cc.

References NnetSimpleComputationOptions::CheckAndFixConfigs(), kaldi::nnet3::ComputeSimpleNnetContext(), NnetBatchComputerOptions::edge_minibatch_size, NnetBatchComputer::input_dim_, Nnet::InputDim(), NnetBatchComputer::ivector_dim_, KALDI_ASSERT, NnetBatchComputer::log_priors_, NnetBatchComputerOptions::minibatch_size, Nnet::Modulus(), NnetBatchComputer::nnet_, NnetBatchComputer::nnet_left_context_, NnetBatchComputer::nnet_right_context_, NnetBatchComputer::opts_, NnetBatchComputer::output_dim_, Nnet::OutputDim(), and NnetBatchComputerOptions::partial_minibatch_factor.

34  :
35  opts_(opts),
36  nnet_(nnet),
37  compiler_(nnet_, opts.optimize_config),
38  log_priors_(priors),
40  log_priors_.ApplyLog();
45 
47  input_dim_ = nnet.InputDim("input");
48  ivector_dim_ = std::max<int32>(0, nnet.InputDim("ivector"));
49  output_dim_ = nnet.OutputDim("output");
50  KALDI_ASSERT(input_dim_ > 0 && output_dim_ > 0);
51 }
int32 Modulus() const
[Relevant for clockwork RNNs and similar].
Definition: nnet-nnet.cc:658
void ComputeSimpleNnetContext(const Nnet &nnet, int32 *left_context, int32 *right_context)
ComputeSimpleNnetContext computes the left-context and right-context of a nnet.
Definition: nnet-utils.cc:146
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
CachingOptimizingCompiler compiler_
NnetBatchComputerOptions opts_

◆ ~NnetBatchComputer()

Definition at line 112 of file nnet-batch-compute.cc.

References KALDI_ASSERT, KALDI_ERR, NnetBatchComputer::mutex_, NnetBatchComputer::no_more_than_n_minibatches_full_, NnetBatchComputer::num_full_minibatches_, NnetBatchComputer::PrintMinibatchStats(), and NnetBatchComputer::tasks_.

112  {
114  // the destructor shouldn't be called while the mutex is locked; if it is, it
115  // likely means the program has already crashed, or it's a programming error.
116  if (!mutex_.try_lock())
117  KALDI_ERR << "Destructor called while object locked.";
118  int32 num_pending_tasks = 0;
119  for (auto iter = tasks_.begin(); iter != tasks_.end(); ++iter)
120  num_pending_tasks += iter->second.tasks.size();
121  if (num_pending_tasks > 0)
122  KALDI_ERR << "Tasks are pending but object is being destroyed";
123  for (auto iter = no_more_than_n_minibatches_full_.begin();
124  iter != no_more_than_n_minibatches_full_.end(); ++iter) {
125  std::condition_variable *cond = iter->second;
126  // the next call will notify any threads that were waiting on this condition
127  // variable-- there shouldn't be any, though, as it would be a programming
128  // error, but better to wake them up so we can see any messages they print.
129  cond->notify_all();
130  delete cond;
131  }
132  KALDI_ASSERT(num_full_minibatches_ == 0); // failure would be a coding error.
133 }
kaldi::int32 int32
#define KALDI_ERR
Definition: kaldi-error.h:147
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
std::unordered_map< int32, std::condition_variable * > no_more_than_n_minibatches_full_

Member Function Documentation

◆ AcceptTask()

void AcceptTask ( NnetInferenceTask task,
int32  max_minibatches_full = -1 
)

Accepts a task, meaning the task will be queued.

(Note: the pointer is still owned by the caller. If the max_minibatches_full >= 0, then the calling thread will block until no more than that many full minibatches are waiting to be computed. This is a mechanism to prevent too many requests from piling up in memory.

Definition at line 568 of file nnet-batch-compute.cc.

References NnetBatchComputer::GetMinibatchSize(), NnetBatchComputer::mutex_, NnetBatchComputer::no_more_than_n_minibatches_full_, NnetBatchComputer::num_full_minibatches_, NnetBatchComputer::ComputationGroupInfo::tasks, and NnetBatchComputer::tasks_.

Referenced by NnetBatchInference::AcceptInput(), and NnetBatchDecoder::Decode().

569  {
570  std::unique_lock<std::mutex> lock(mutex_);
571 
572  if (max_minibatches_full > 0 && num_full_minibatches_ > max_minibatches_full) {
573  std::unordered_map<int32, std::condition_variable*>::iterator
574  iter = no_more_than_n_minibatches_full_.find(max_minibatches_full);
575  std::condition_variable *cond;
576  if (iter != no_more_than_n_minibatches_full_.end()) {
577  cond = iter->second;
578  } else {
579  cond = new std::condition_variable();
580  no_more_than_n_minibatches_full_[max_minibatches_full] = cond;
581  }
582  while (num_full_minibatches_ > max_minibatches_full)
583  cond->wait(lock);
584  }
585  ComputationGroupKey key(*task);
586  ComputationGroupInfo &info = tasks_[key];
587  info.tasks.push_back(task);
588  int32 minibatch_size = GetMinibatchSize(info);
589  if (static_cast<int32>(info.tasks.size()) % minibatch_size == 0)
591 }
kaldi::int32 int32
std::unordered_map< int32, std::condition_variable * > no_more_than_n_minibatches_full_
int32 GetMinibatchSize(const ComputationGroupInfo &info) const

◆ CheckAndFixConfigs()

void CheckAndFixConfigs ( )
private

◆ Compute()

bool Compute ( bool  allow_partial_minibatch)

Does some kind of computation, choosing the highest-priority thing to compute.

It returns true if it did some kind of computation, and false otherwise. This function locks the class, but not for the entire time it's being called: only at the beginning and at the end.

Parameters
[in]allow_partial_minibatchIf false, then this will only do the computation if a full minibatch is ready; if true, it is allowed to do computation on partial (not-full) minibatches.

Definition at line 593 of file nnet-batch-compute.cc.

References NnetComputer::AcceptInput(), NnetSimpleComputationOptions::acoustic_scale, CuMatrixBase< Real >::AddVecToRows(), NnetBatchComputer::MinibatchSizeInfo::computation, NnetSimpleComputationOptions::compute_config, Timer::Elapsed(), NnetBatchComputer::FormatInputs(), NnetBatchComputer::FormatOutputs(), NnetBatchComputer::GetHighestPriorityComputation(), NnetComputer::GetOutputDestructive(), rnnlm::i, NnetBatchComputer::log_priors_, NnetBatchComputer::nnet_, NnetBatchComputer::MinibatchSizeInfo::num_done, CuMatrixBase< Real >::NumRows(), NnetBatchComputer::opts_, NnetComputer::Run(), CuMatrixBase< Real >::Scale(), NnetBatchComputer::MinibatchSizeInfo::seconds_taken, kaldi::SynchronizeGpu(), and NnetBatchComputer::MinibatchSizeInfo::tot_num_tasks.

Referenced by NnetBatchInference::Compute(), and NnetBatchDecoder::Compute().

593  {
594  int32 minibatch_size;
595  std::vector<NnetInferenceTask*> tasks;
596  MinibatchSizeInfo *minfo =
597  GetHighestPriorityComputation(allow_partial_minibatch,
598  &minibatch_size,
599  &tasks);
600  if (minfo == NULL)
601  return false;
602 
603  Timer tim;
604  Nnet *nnet_to_update = NULL; // we're not doing any update
605  NnetComputer computer(opts_.compute_config, *(minfo->computation),
606  nnet_, nnet_to_update);
607 
608 
609  CuMatrix<BaseFloat> input;
610  CuMatrix<BaseFloat> ivector;
611  FormatInputs(minibatch_size, tasks, &input, &ivector);
612  computer.AcceptInput("input", &input);
613  if (ivector.NumRows() != 0)
614  computer.AcceptInput("ivector", &ivector);
615  computer.Run();
616  CuMatrix<BaseFloat> output;
617  computer.GetOutputDestructive("output", &output);
618  if (log_priors_.Dim() != 0) {
619  output.AddVecToRows(-1.0, log_priors_);
620  }
621  output.Scale(opts_.acoustic_scale);
622  FormatOutputs(output, tasks);
623 
624  // Update the stats, for diagnostics.
625  minfo->num_done++;
626  minfo->tot_num_tasks += static_cast<int64>(tasks.size());
627  minfo->seconds_taken += tim.Elapsed();
628 
629  SynchronizeGpu();
630 
631  for (size_t i = 0; i < tasks.size(); i++)
632  tasks[i]->semaphore.Signal();
633 
634  return true;
635 }
MinibatchSizeInfo * GetHighestPriorityComputation(bool allow_partial_minibatch, int32 *minibatch_size, std::vector< NnetInferenceTask *> *tasks)
This function finds and returns the computation corresponding to the highest-priority group of tasks...
kaldi::int32 int32
void FormatInputs(int32 minibatch_size, const std::vector< NnetInferenceTask *> &tasks, CuMatrix< BaseFloat > *input, CuMatrix< BaseFloat > *ivector)
formats the inputs to the computation and transfers them to GPU.
void FormatOutputs(const CuMatrix< BaseFloat > &output, const std::vector< NnetInferenceTask *> &tasks)
void SynchronizeGpu()
The function SynchronizeGpu(), which for convenience is defined whether or not we have compiled for C...
Definition: cu-device.cc:638
NnetBatchComputerOptions opts_

◆ FormatInputs()

void FormatInputs ( int32  minibatch_size,
const std::vector< NnetInferenceTask *> &  tasks,
CuMatrix< BaseFloat > *  input,
CuMatrix< BaseFloat > *  ivector 
)
private

formats the inputs to the computation and transfers them to GPU.

Parameters
[in]minibatch_sizeThe number of parallel sequences we're doing this computation for. This will be more than tasks.size() in some cases.
[in]tasksThe tasks we're doing the computation for. The input comes from here.
[out]inputThe main feature input to the computation is put into here.
[out]ivectorIf we're using i-vectors, the i-vectors are put here.

Definition at line 346 of file nnet-batch-compute.cc.

References CuMatrixBase< Real >::CopyFromMat(), CuVectorBase< Real >::Data(), CuMatrixBase< Real >::Data(), kaldi::GetVerboseLevel(), KALDI_ASSERT, kaldi::kUndefined, rnnlm::n, CuMatrix< Real >::Resize(), CuMatrixBase< Real >::Row(), CuMatrixBase< Real >::RowRange(), and CuMatrixBase< Real >::Stride().

Referenced by NnetBatchComputer::Compute().

350  {
351  int32 num_input_frames = tasks[0]->input.NumRows(),
352  input_dim = tasks[0]->input.NumCols(),
353  ivector_dim = tasks[0]->ivector.Dim(),
354  num_tasks = tasks.size();
355  KALDI_ASSERT(num_tasks > 0 && num_tasks <= minibatch_size);
356 
357  // destination matrix
358  input->Resize(minibatch_size * num_input_frames, input_dim,
359  kUndefined);
360 
361 #if HAVE_CUDA == 1
362  if (CuDevice::Instantiate().Enabled()) {
363 
364  std::vector<const BaseFloat*> inputs(num_tasks);
365  std::vector<BaseFloat*> outputs(num_tasks);
366  std::vector<int32_t> ldi(num_tasks), ldo(num_tasks);
367  std::vector<int32_t> num_rows(num_tasks), num_cols(num_tasks);
368 
369  // compute matrix descriptions for each copy
370  for (int32 n = 0; n < num_tasks; n++) {
371  const CuMatrix<BaseFloat> &input_mat = tasks[n]->input;
372  CuSubMatrix<BaseFloat> output_mat = input->RowRange(
373  n * num_input_frames, num_input_frames);
374 
375  // create matrix batch description arrays
376  num_rows[n] = num_input_frames;
377  num_cols[n] = input_dim;
378  outputs[n] = output_mat.Data();
379  inputs[n] = input_mat.Data();
380  ldo[n] = output_mat.Stride();
381  ldi[n] = input_mat.Stride();
382  }
383 
384  // execute batched copy
385  cuda_batched_copy_mats(num_tasks, &num_rows[0], &num_cols[0], &inputs[0],
386  &ldi[0], &outputs[0], &ldo[0]);
387 
388  } else
389 #endif
390  {
391  for (int32 n = 0; n < num_tasks; n++) {
392  CuSubMatrix<BaseFloat> input_part(*input,
393  n * num_input_frames, num_input_frames,
394  0, input_dim);
395  input_part.CopyFromMat(tasks[n]->input);
396  }
397  }
398 
399  if (GetVerboseLevel() >=2 ) {
400  if (num_tasks < minibatch_size) {
401  // The following will make things easier to debug if something fails, but
402  // shouldn't be strictly necessary.
403  // the -1 means 'take all remaining rows'.
404  input->RowRange(num_tasks * num_input_frames,
405  (minibatch_size - num_tasks) * num_input_frames).SetZero();
406  }
407  }
408 
409  if (ivector_dim != 0) {
410  ivector->Resize(minibatch_size, ivector_dim, kUndefined);
411 
412 #if HAVE_CUDA == 1
413  if (CuDevice::Instantiate().Enabled()) {
414 
415  // using the batched matrix copy routine for this. This isn't
416  // extremely efficient but the kernel takes a minimal amount of
417  // time so making a batched vector copy is not worth the effort.
418  std::vector<const BaseFloat*> inputs(num_tasks);
419  std::vector<BaseFloat*> outputs(num_tasks);
420  std::vector<int32_t> ldi(num_tasks), ldo(num_tasks);
421  std::vector<int32_t> num_rows(num_tasks), num_cols(num_tasks);
422 
423  // compute source pointers for each input
424  for (int32 n = 0; n < num_tasks; n++) {
425  const CuVector<BaseFloat> &input_vec = tasks[n]->ivector;
426  CuSubVector<BaseFloat> output_vec = ivector->Row(n);
427  // create matrix batch description arrays
428  num_rows[n] = 1;
429  num_cols[n] = ivector_dim;
430  outputs[n] = output_vec.Data();
431  inputs[n] = input_vec.Data();
432  ldo[n] = 1;
433  ldi[n] = 1;
434  }
435 
436  // execute batched copy
437  cuda_batched_copy_mats(num_tasks, &num_rows[0], &num_cols[0], &inputs[0], &ldi[0],
438  &outputs[0], &ldo[0]);
439 
440  } else
441 #endif
442  {
443  for (int32 n = 0; n < num_tasks; n++) {
444  ivector->Row(n).CopyFromVec(tasks[n]->ivector);
445  }
446  }
447 
448  if (GetVerboseLevel() >= 2) {
449  if (num_tasks < minibatch_size) {
450  // The following will make things easier to debug if something fails, but
451  // shouldn't be strictly necessary.
452  // the -1 means 'take all remaining rows'.
453  ivector->RowRange(num_tasks, minibatch_size - num_tasks).SetZero();
454  }
455  }
456  }
457 }
int32 GetVerboseLevel()
Get verbosity level, usually set via command line &#39;–verbose=&#39; switch.
Definition: kaldi-error.h:60
kaldi::int32 int32
struct rnnlm::@11::@12 n
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185

◆ FormatOutputs()

void FormatOutputs ( const CuMatrix< BaseFloat > &  output,
const std::vector< NnetInferenceTask *> &  tasks 
)
private

Definition at line 459 of file nnet-batch-compute.cc.

References CuMatrixBase< Real >::Data(), KALDI_ASSERT, kaldi::kUndefined, rnnlm::n, NnetInferenceTask::num_initial_unused_output_frames, NnetInferenceTask::num_used_output_frames, CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), NnetInferenceTask::output, NnetInferenceTask::output_cpu, NnetInferenceTask::output_to_cpu, Matrix< Real >::Resize(), MatrixBase< Real >::RowRange(), CuMatrixBase< Real >::RowRange(), CuMatrixBase< Real >::Stride(), and kaldi::SynchronizeGpu().

Referenced by NnetBatchComputer::Compute().

461  {
462  KALDI_ASSERT(!tasks.empty());
463  int32 num_output_frames = tasks[0]->num_output_frames,
464  output_dim = output.NumCols(),
465  num_tasks = tasks.size();
466  bool did_output_to_gpu = false;
467 
468  // We don't bother zeroing frames of the output that are unused, but you could
469  // un-comment the commented lines of code below to do so and add equivalent
470  // calls to the cuda version.
471 
472 #if HAVE_CUDA == 1
473  if (CuDevice::Instantiate().Enabled()) {
474 
475  std::vector<const BaseFloat*> inputs(num_tasks);
476  std::vector<BaseFloat*> outputs(num_tasks);
477  std::vector<int32_t> ldi(num_tasks), ldo(num_tasks);
478  std::vector<int32_t> num_rows(num_tasks), num_cols(num_tasks);
479 
480  int b=0; // batch counter
481  for (int32 n = 0; n < num_tasks; n++) {
482  NnetInferenceTask *task = tasks[n];
483 
484  int32 left_unused = task->num_initial_unused_output_frames,
485  used = task->num_used_output_frames;
486  // int32 right_unused = num_output_frames - used - left_unused;
487 
488  // TODO do we really expect different tasks to output CPU or GPU?
489  // This adds a bit of code complexity. Perhaps output_to_cpu should
490  // be a property of the batch computer and not the tasks
491  if (task->output_to_cpu) {
492  task->output_cpu.Resize(num_output_frames, output_dim,
493  kUndefined);
494  // if (left_unused > 0)
495  // task->output_cpu.RowRange(0, left_unused).SetZero();
496  task->output_cpu.RowRange(left_unused, used).CopyFromMat(
497  output.RowRange(n * num_output_frames + left_unused, used));
498  // if (right_unused > 0)
499  // task->output_cpu.RowRange(
500  // 0, left_unused + used, right_unused).SetZero();
501 
502  } else {
503  did_output_to_gpu = true;
504  task->output.Resize(num_output_frames, output_dim,
505  kUndefined);
506 
507  CuSubMatrix<BaseFloat> output_mat = task->output.RowRange(
508  left_unused, used);
509  const CuSubMatrix<BaseFloat> input_mat = output.RowRange(
510  n * num_output_frames + left_unused, used);
511 
512  // create matrix batch description arrays
513  num_rows[b] = output_mat.NumRows();
514  num_cols[b] = output_mat.NumCols();
515  outputs[b] = output_mat.Data();
516  inputs[b] = input_mat.Data();
517  ldo[b] = output_mat.Stride();
518  ldi[b] = input_mat.Stride();
519  b++; // increase batch count
520  }
521  }
522 
523  // execute batched copy
524  cuda_batched_copy_mats(b, &num_rows[0], &num_cols[0], &inputs[0], &ldi[0],
525  &outputs[0], &ldo[0]);
526 
527  } else
528 #endif
529  {
530  //TODO i don't think all of these paths are actually possible. We should simplify this.
531  //Is it possible to output_to_gpu with HAVE_CUDA == 0 or when the device is disabled?
532  for (int32 n = 0; n < num_tasks; n++) {
533  NnetInferenceTask *task = tasks[n];
534 
535  int32 left_unused = task->num_initial_unused_output_frames,
536  used = task->num_used_output_frames;
537  // int32 right_unused = num_output_frames - used - left_unused;
538 
539  if (task->output_to_cpu) {
540  task->output_cpu.Resize(num_output_frames, output_dim,
541  kUndefined);
542  // if (left_unused > 0)
543  // task->output_cpu.RowRange(0, left_unused).SetZero();
544  task->output_cpu.RowRange(left_unused, used).CopyFromMat(
545  output.RowRange(n * num_output_frames + left_unused, used));
546  // if (right_unused > 0)
547  // task->output_cpu.RowRange(0, left_unused + used, right_unused).SetZero();
548  } else {
549  did_output_to_gpu = true;
550  task->output.Resize(num_output_frames, output_dim,
551  kUndefined);
552  // if (left_unused > 0)
553  // task->output.RowRange(0, left_unused).SetZero();
554  task->output.RowRange(left_unused, used).CopyFromMat(
555  output.RowRange(n * num_output_frames + left_unused, used));
556  // if (right_unused > 0)
557  // task->output.RowRange(0, left_unused + used, right_unused).SetZero();
558  }
559  }
560  }
561  // The output of this function will likely be consumed by another thread.
562  // The following call will make sure the relevant kernels complete before
563  // any kernels from the other thread use the output.
564  if (did_output_to_gpu)
565  SynchronizeGpu();
566 }
kaldi::int32 int32
struct rnnlm::@11::@12 n
void SynchronizeGpu()
The function SynchronizeGpu(), which for convenience is defined whether or not we have compiled for C...
Definition: cu-device.cc:638
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185

◆ GetActualMinibatchSize()

int32 GetActualMinibatchSize ( const ComputationGroupInfo info) const
private

Definition at line 242 of file nnet-batch-compute.cc.

References NnetBatchComputer::GetMinibatchSize(), KALDI_ASSERT, NnetBatchComputer::opts_, NnetBatchComputerOptions::partial_minibatch_factor, and NnetBatchComputer::ComputationGroupInfo::tasks.

Referenced by NnetBatchComputer::GetHighestPriorityComputation().

243  {
244  KALDI_ASSERT(!info.tasks.empty());
245  int32 num_tasks = info.tasks.size(),
246  this_minibatch_size = GetMinibatchSize(info);
247  KALDI_ASSERT(num_tasks > 0);
248  while (num_tasks <
249  int32(opts_.partial_minibatch_factor * this_minibatch_size))
250  this_minibatch_size *= opts_.partial_minibatch_factor;
251  return int32(this_minibatch_size);
252 }
kaldi::int32 int32
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
NnetBatchComputerOptions opts_
int32 GetMinibatchSize(const ComputationGroupInfo &info) const

◆ GetComputation()

std::shared_ptr< const NnetComputation > GetComputation ( const ComputationGroupInfo info,
int32  minibatch_size 
)
private

Definition at line 255 of file nnet-batch-compute.cc.

References CachingOptimizingCompiler::Compile(), NnetBatchComputer::compiler_, NnetBatchComputer::GetComputationRequest(), KALDI_ASSERT, and NnetBatchComputer::ComputationGroupInfo::tasks.

Referenced by NnetBatchComputer::GetHighestPriorityComputation().

257  {
258  KALDI_ASSERT(!info.tasks.empty());
259  // note: all the tasks will have the same structure, in the respects that
260  // would affect the computation.
261  NnetInferenceTask *example_task = info.tasks[0];
262  ComputationRequest request;
263  GetComputationRequest(*example_task, minibatch_size, &request);
264  return compiler_.Compile(request);
265 }
std::shared_ptr< const NnetComputation > Compile(const ComputationRequest &request)
Does the compilation and returns a const pointer to the result, which is owned by this class...
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
CachingOptimizingCompiler compiler_
static void GetComputationRequest(const NnetInferenceTask &task, int32 minibatch_size, ComputationRequest *request)

◆ GetComputationRequest()

void GetComputationRequest ( const NnetInferenceTask task,
int32  minibatch_size,
ComputationRequest request 
)
staticprivate

Definition at line 312 of file nnet-batch-compute.cc.

References NnetInferenceTask::first_input_t, NnetInferenceTask::input, ComputationRequest::inputs, NnetInferenceTask::ivector, rnnlm::n, ComputationRequest::need_model_derivative, NnetInferenceTask::num_output_frames, NnetInferenceTask::output_t_stride, ComputationRequest::outputs, and ComputationRequest::store_component_stats.

Referenced by NnetBatchComputer::GetComputation().

315  {
316  request->need_model_derivative = false;
317  request->store_component_stats = false;
318  request->inputs.reserve(2);
319 
320  int32 num_input_frames = task.input.NumRows(),
321  first_input_t = task.first_input_t,
322  num_output_frames = task.num_output_frames,
323  output_t_stride = task.output_t_stride;
324  bool has_ivector = (task.ivector.Dim() != 0);
325 
326  std::vector<Index> input_indexes, ivector_indexes, output_indexes;
327  input_indexes.reserve(minibatch_size * num_input_frames);
328  output_indexes.reserve(minibatch_size * num_output_frames);
329  if (has_ivector)
330  ivector_indexes.reserve(minibatch_size);
331 
332  for (int32 n = 0; n < minibatch_size; n++) {
333  for (int32 t = first_input_t; t < first_input_t + num_input_frames; t++)
334  input_indexes.push_back(Index(n, t, 0));
335  if (has_ivector)
336  ivector_indexes.push_back(Index(n, 0, 0));
337  for (int32 t = 0; t < num_output_frames; t++)
338  output_indexes.push_back(Index(n, t * output_t_stride, 0));
339  }
340  request->inputs.push_back(IoSpecification("input", input_indexes));
341  if (has_ivector)
342  request->inputs.push_back(IoSpecification("ivector", ivector_indexes));
343  request->outputs.push_back(IoSpecification("output", output_indexes));
344 }
kaldi::int32 int32
struct rnnlm::@11::@12 n

◆ GetHighestPriorityComputation()

NnetBatchComputer::MinibatchSizeInfo * GetHighestPriorityComputation ( bool  allow_partial_minibatch,
int32 minibatch_size,
std::vector< NnetInferenceTask *> *  tasks 
)
private

This function finds and returns the computation corresponding to the highest-priority group of tasks.

Parameters
[in]allow_partial_minibatchIf this is true, then this function may return a computation corresponding to a partial minibatch– i.e. the minibatch size in the computation may be less than the minibatch size in the options class, and/or the number of tasks may not be as many as the minibatch size in the computation.
[out]minibatch_sizeIf this function returns non-NULL, then this will be set to the minibatch size that the returned computation expects. This may be less than tasks->size(), in cases where the minibatch was not 'full'.
[out]tasksThe tasks which we'll be doing the computation for in this minibatch are put here (and removed from tasks_, in cases where this function returns non-NULL.
Returns
This function returns a pointer to the appropriate 'MinibatchSizeInfo' object corresponding to the computation that we'll be doing for this minibatch, or NULL if there is nothing to compute.

Definition at line 136 of file nnet-batch-compute.cc.

References NnetBatchComputer::MinibatchSizeInfo::computation, NnetBatchComputer::GetActualMinibatchSize(), NnetBatchComputer::GetComputation(), NnetBatchComputer::GetHighestPriorityTasks(), NnetBatchComputer::GetPriority(), NnetBatchComputer::ComputationGroupInfo::minibatch_info, NnetBatchComputer::mutex_, and NnetBatchComputer::tasks_.

Referenced by NnetBatchComputer::Compute().

139  {
140  tasks->clear();
141  std::unique_lock<std::mutex> lock(mutex_);
142  MapType::iterator iter = tasks_.begin(), end = tasks_.end(),
143  best_iter = tasks_.end();
144  double highest_priority = -std::numeric_limits<double>::infinity();
145 
146  for (; iter != end; ++iter) {
147  ComputationGroupInfo &info = iter->second;
148  double this_priority = GetPriority(allow_partial_minibatch, info);
149  if (this_priority > highest_priority) {
150  highest_priority = this_priority;
151  best_iter = iter;
152  }
153  }
154  if (best_iter == tasks_.end()) {
155  // either allow_partial_minibatch == false and there were no full
156  // minibatches, or there were no pending tasks at all.
157  return NULL;
158  }
159  ComputationGroupInfo &info = best_iter->second;
160  int32 actual_minibatch_size = GetActualMinibatchSize(info);
161  *minibatch_size_out = actual_minibatch_size;
162  MinibatchSizeInfo *minfo = &(info.minibatch_info[actual_minibatch_size]);
163  if (minfo->computation == NULL)
164  minfo->computation = GetComputation(info, actual_minibatch_size);
165  GetHighestPriorityTasks(actual_minibatch_size, &info, tasks);
166  return minfo;
167 }
kaldi::int32 int32
void GetHighestPriorityTasks(int32 num_tasks, ComputationGroupInfo *info, std::vector< NnetInferenceTask *> *tasks)
int32 GetActualMinibatchSize(const ComputationGroupInfo &info) const
std::shared_ptr< const NnetComputation > GetComputation(const ComputationGroupInfo &info, int32 minibatch_size)
double GetPriority(bool allow_partial_minibatch, const ComputationGroupInfo &info) const

◆ GetHighestPriorityTasks()

void GetHighestPriorityTasks ( int32  num_tasks,
ComputationGroupInfo info,
std::vector< NnetInferenceTask *> *  tasks 
)
private

Definition at line 170 of file nnet-batch-compute.cc.

References NnetBatchComputer::GetMinibatchSize(), rnnlm::i, KALDI_ASSERT, NnetBatchComputer::no_more_than_n_minibatches_full_, NnetBatchComputer::num_full_minibatches_, and NnetBatchComputer::ComputationGroupInfo::tasks.

Referenced by NnetBatchComputer::GetHighestPriorityComputation().

173  {
174  int32 num_tasks_present = info->tasks.size(),
175  minibatch_size = GetMinibatchSize(*info);
176  KALDI_ASSERT(tasks->empty());
177  if (num_tasks_needed >= num_tasks_present) {
178  tasks->swap(info->tasks);
179  } else {
180  int32 num_tasks_not_needed = num_tasks_present - num_tasks_needed;
181  // We don't sort the tasks with a comparator that dereferences the pointers,
182  // because the priorities can change asynchronously, and we're concerned that
183  // something weird might happen in the sorting if the things it's comparing
184  // are changing.
185  std::vector<std::pair<double, NnetInferenceTask*> > pairs(num_tasks_present);
186  for (int32 i = 0; i < num_tasks_present; i++) {
187  pairs[i].first = info->tasks[i]->priority;
188  pairs[i].second = info->tasks[i];
189  }
190  std::nth_element(pairs.begin(), pairs.begin() + num_tasks_not_needed,
191  pairs.end());
192 
193  // The lowest-priority 'num_tasks_not_needed' stay in the 'info' struct.
194  info->tasks.clear();
195  for (int32 i = 0; i < num_tasks_not_needed; i++)
196  info->tasks.push_back(pairs[i].second);
197  // The highest-priority 'num_tasks_needed' tasks go to the output 'tasks'
198  // array.
199  for (int32 i = num_tasks_not_needed; i < num_tasks_present; i++)
200  tasks->push_back(pairs[i].second);
201  // The following assertion checks that the is_edge and is_irregular values
202  // are the same for the entire minibatch, which they should always be.
203  KALDI_ASSERT(GetMinibatchSize(*info) == minibatch_size);
204  }
205 
206  {
207  // This block updates num_full_minibatches_ and notifies threads waiting on
208  // any related condition variable.
209  int32 new_num_tasks_present = info->tasks.size(),
210  full_minibatch_reduction =
211  (num_tasks_present / minibatch_size) -
212  (new_num_tasks_present / minibatch_size);
213  for (int32 i = 0; i < full_minibatch_reduction; i++) {
216  std::unordered_map<int32, std::condition_variable*>::const_iterator
218  if (iter != no_more_than_n_minibatches_full_.end()) {
219  std::condition_variable *cond = iter->second;
220  cond->notify_all();
221  }
222  }
223  }
224 }
kaldi::int32 int32
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
std::unordered_map< int32, std::condition_variable * > no_more_than_n_minibatches_full_
int32 GetMinibatchSize(const ComputationGroupInfo &info) const

◆ GetMinibatchSize()

int32 GetMinibatchSize ( const ComputationGroupInfo info) const
inlineprivate

Definition at line 227 of file nnet-batch-compute.cc.

References NnetBatchComputerOptions::edge_minibatch_size, NnetInferenceTask::is_edge, NnetInferenceTask::is_irregular, NnetBatchComputerOptions::minibatch_size, NnetBatchComputer::opts_, and NnetBatchComputer::ComputationGroupInfo::tasks.

Referenced by NnetBatchComputer::AcceptTask(), NnetBatchComputer::GetActualMinibatchSize(), NnetBatchComputer::GetHighestPriorityTasks(), and NnetBatchComputer::GetPriority().

228  {
229  if (info.tasks.empty()) {
230  return opts_.minibatch_size; // actually it shouldn't matter what we return
231  // in this case.
232  }
233  const NnetInferenceTask &task = *(info.tasks[0]);
234  if (task.is_irregular)
235  return 1;
236  else if (task.is_edge)
237  return opts_.edge_minibatch_size;
238  else
239  return opts_.minibatch_size;
240 }
NnetBatchComputerOptions opts_

◆ GetOptions()

const NnetBatchComputerOptions& GetOptions ( )
inline

◆ GetPriority()

double GetPriority ( bool  allow_partial_minibatch,
const ComputationGroupInfo info 
) const
inlineprivate

Definition at line 268 of file nnet-batch-compute.cc.

References NnetBatchComputer::GetMinibatchSize(), rnnlm::i, and NnetBatchComputer::ComputationGroupInfo::tasks.

Referenced by NnetBatchComputer::GetHighestPriorityComputation().

269  {
270  if (info.tasks.empty())
271  return -std::numeric_limits<double>::infinity();
272  int32 this_minibatch_size = GetMinibatchSize(info);
273  int32 num_tasks = info.tasks.size();
274 
275  if (!allow_partial_minibatch && num_tasks < this_minibatch_size)
276  return -std::numeric_limits<double>::infinity();
277 
278  // penalty_for_not_full will be negative if the minibatch is not full, up to a
279  // maximum of 10. the 10 is a heuristic; it could be changed.
280  // Note: the penalty is effectively infinity if allow_partial_minibatch == false;
281  // see the 'return' above.
282  double proportion_full = std::min<int32>(num_tasks, this_minibatch_size) /
283  double(this_minibatch_size),
284  penalty_for_not_full = 10.0 * (proportion_full - 1.0),
285  task_priority_sum = 0.0;
286 
287 
288  if (num_tasks > this_minibatch_size) {
289  // Get the average of the priorities of the highest-priority tasks (no more
290  // than 'minibatch_size' of them.
291  std::vector<double> priorities;
292  priorities.resize(num_tasks);
293  for (int32 i = 0; i < num_tasks; i++)
294  priorities[i] = info.tasks[i]->priority;
295  // sort from greatest to least.
296  std::nth_element(priorities.begin(),
297  priorities.begin() + this_minibatch_size,
298  priorities.end(),
299  std::greater<double>());
300  for (int32 i = 0; i < this_minibatch_size; i++)
301  task_priority_sum += priorities[i];
302  return penalty_for_not_full + task_priority_sum / this_minibatch_size;
303  } else {
304  for (int32 i = 0; i < num_tasks; i++)
305  task_priority_sum += info.tasks[i]->priority;
306  return penalty_for_not_full + task_priority_sum / num_tasks;
307  }
308 }
kaldi::int32 int32
int32 GetMinibatchSize(const ComputationGroupInfo &info) const

◆ KALDI_DISALLOW_COPY_AND_ASSIGN()

KALDI_DISALLOW_COPY_AND_ASSIGN ( NnetBatchComputer  )
private

◆ NumFullPendingMinibatches()

int32 NumFullPendingMinibatches ( ) const
inline

Returns the number of full minibatches waiting to be computed.

Definition at line 233 of file nnet-batch-compute.h.

References NnetInferenceTask::input, NnetInferenceTask::ivector, and NnetInferenceTask::output_to_cpu.

◆ PrintMinibatchStats()

void PrintMinibatchStats ( )
private

Definition at line 53 of file nnet-batch-compute.cc.

References rnnlm::i, KALDI_LOG, NnetBatchComputer::MinibatchSizeInfo::num_done, NnetBatchComputer::ComputationGroupKey::num_input_frames, NnetBatchComputer::ComputationGroupKey::num_output_frames, operator<(), NnetBatchComputer::MinibatchSizeInfo::seconds_taken, NnetBatchComputer::tasks_, and NnetBatchComputer::MinibatchSizeInfo::tot_num_tasks.

Referenced by NnetBatchComputer::~NnetBatchComputer().

53  {
54  int32 max_stats_to_print = 10;
55  int64 tot_tasks = 0, tot_minibatches = 0;
56  double tot_time = 0.0;
57  std::ostringstream os;
58  struct MinibatchStats {
59  int32 num_frames_out;
60  int32 num_frames_in;
61  int32 minibatch_size;
62  int32 num_done;
63  int32 percent_full;
64  BaseFloat seconds_taken;
65 
66  bool operator < (const MinibatchStats &other) const {
67  return seconds_taken > other.seconds_taken; // sort from most to least time.
68  }
69  };
70  std::vector<MinibatchStats> all_stats;
71  os << "Minibatch stats: seconds-taken,frames-in:frames-out*minibatch-size=num-done(percent-full%) ";
72 
73  for (MapType::const_iterator iter = tasks_.begin();
74  iter != tasks_.end(); ++iter) {
75  for (std::map<int32, MinibatchSizeInfo>::const_iterator
76  miter = iter->second.minibatch_info.begin();
77  miter != iter->second.minibatch_info.end(); ++miter) {
78  const ComputationGroupKey &key = iter->first;
79  const MinibatchSizeInfo &minfo = miter->second;
80  MinibatchStats stats;
81  stats.num_frames_in = key.num_input_frames;
82  stats.num_frames_out = key.num_output_frames;
83  stats.minibatch_size = miter->first;
84  stats.num_done = minfo.num_done;
85  stats.seconds_taken = minfo.seconds_taken;
86 
87  tot_tasks += minfo.tot_num_tasks;
88  tot_minibatches += minfo.num_done;
89  tot_time += minfo.seconds_taken;
90  stats.percent_full = int32(minfo.tot_num_tasks * 100.0 /
91  (stats.minibatch_size * stats.num_done));
92  all_stats.push_back(stats);
93  }
94  }
95 
96  std::sort(all_stats.begin(), all_stats.end());
97  os << std::fixed << std::setprecision(2);
98  int32 num_stats = all_stats.size();
99  for (int32 i = 0; i < std::min<int32>(num_stats, max_stats_to_print); i++) {
100  MinibatchStats &stats = all_stats[i];
101  os << stats.seconds_taken << ',' << stats.num_frames_in << ':'
102  << stats.num_frames_out << '*' << stats.minibatch_size
103  << '=' << stats.num_done << '(' << stats.percent_full << "%) ";
104  }
105  if (num_stats > max_stats_to_print)
106  os << "...";
107  KALDI_LOG << os.str();
108  KALDI_LOG << "Did " << tot_tasks << " tasks in " << tot_minibatches
109  << " minibatches, taking " << tot_time << " seconds.";
110 }
kaldi::int32 int32
float BaseFloat
Definition: kaldi-types.h:29
bool operator<(const Int32Pair &a, const Int32Pair &b)
Definition: cu-matrixdim.h:83
#define KALDI_LOG
Definition: kaldi-error.h:153

◆ SplitUtteranceIntoTasks() [1/2]

void SplitUtteranceIntoTasks ( bool  output_to_cpu,
const Matrix< BaseFloat > &  input,
const Vector< BaseFloat > *  ivector,
const Matrix< BaseFloat > *  online_ivectors,
int32  online_ivector_period,
std::vector< NnetInferenceTask > *  tasks 
)

Split a single utterance into a list of separate tasks which can then be given to this class by AcceptTask().

Parameters
[in]output_to_cpuWill become the 'output_to_cpu' member of the output tasks; this controls whether the computation code should transfer the outputs to CPU (which is to save GPU memory).
[in]ivectorIf non-NULL, and i-vector for the whole utterance is expected to be supplied here (and online_ivectors should be NULL). This is relevant if you estimate i-vectors per speaker instead of online.
[in]online_ivectorsMatrix of ivectors, one every 'online_ivector_period' frames.
[in]online_ivector_periodAffects the interpretation of 'online_ivectors'.
[out]tasksThe tasks created will be output to here. The priorities will be set to zero; setting them to a meaningful value is up to the caller.

Definition at line 843 of file nnet-batch-compute.cc.

References CuMatrixBase< Real >::CopyFromMat(), CuVectorBase< Real >::CopyFromVec(), VectorBase< Real >::Dim(), kaldi::kUndefined, MatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), CuVector< Real >::Resize(), and CuMatrix< Real >::Resize().

Referenced by NnetBatchInference::AcceptInput(), and NnetBatchDecoder::Decode().

849  {
850 
851  // Inputs are expected to be in device memory.
852  // create temporary device arrays and copy
853  // inputs into them
854  CuMatrix<BaseFloat> cu_input(input);
855  CuVector<BaseFloat> cu_ivector, *ivector = NULL;
856  CuMatrix<BaseFloat> cu_online_ivectors, *online_ivectors = NULL;
857 
858  if (h_ivector!=NULL) {
859  cu_ivector.Resize(h_ivector->Dim(), kUndefined);
860  cu_ivector.CopyFromVec(*h_ivector);
861  ivector = &cu_ivector;
862  }
863  if (h_online_ivectors!=NULL) {
864  cu_online_ivectors.Resize(h_online_ivectors->NumRows(), h_online_ivectors->NumCols(), kUndefined);
865  cu_online_ivectors.CopyFromMat(*h_online_ivectors);
866  online_ivectors = &cu_online_ivectors;
867  }
868 
869  SplitUtteranceIntoTasks(output_to_cpu, cu_input, ivector,
870  online_ivectors, online_ivector_period, tasks);
871 }
void SplitUtteranceIntoTasks(bool output_to_cpu, const Matrix< BaseFloat > &input, const Vector< BaseFloat > *ivector, const Matrix< BaseFloat > *online_ivectors, int32 online_ivector_period, std::vector< NnetInferenceTask > *tasks)
Split a single utterance into a list of separate tasks which can then be given to this class by Accep...
void Resize(const MatrixIndexT r, const MatrixIndexT c, MatrixResizeType resize_type=kSetZero, MatrixStrideType stride_type=kDefaultStride)
Sets matrix to a specified size (zero is OK as long as both r and c are zero).

◆ SplitUtteranceIntoTasks() [2/2]

void SplitUtteranceIntoTasks ( bool  output_to_cpu,
const CuMatrix< BaseFloat > &  input,
const CuVector< BaseFloat > *  ivector,
const CuMatrix< BaseFloat > *  online_ivectors,
int32  online_ivector_period,
std::vector< NnetInferenceTask > *  tasks 
)

Definition at line 873 of file nnet-batch-compute.cc.

References kaldi::nnet3::utterance_splitting::AddOnlineIvectorsToTasks(), CuVectorBase< Real >::Data(), CuVectorBase< Real >::Dim(), NnetSimpleComputationOptions::frame_subsampling_factor, NnetSimpleComputationOptions::frames_per_chunk, kaldi::nnet3::utterance_splitting::GetOutputFrameInfoForTasks(), rnnlm::i, NnetBatchComputer::input_dim_, NnetBatchComputer::ivector_dim_, KALDI_ASSERT, KALDI_ERR, kaldi::kUndefined, NnetBatchComputer::nnet_left_context_, NnetBatchComputer::nnet_right_context_, CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), NnetBatchComputer::opts_, CuVector< Real >::Resize(), and kaldi::nnet3::utterance_splitting::SplitInputToTasks().

879  {
880  using namespace utterance_splitting;
881 
882 
883  { // This block does some checking.
884  if (input.NumCols() != input_dim_) {
885  KALDI_ERR << "Input features did not have expected dimension: expected "
886  << input_dim_ << ", got " << input.NumCols();
887  }
888  int32 ivector_dim = (ivector != NULL ? ivector->Dim() :
889  (online_ivectors != NULL ?
890  online_ivectors->NumCols() : 0));
891  if (ivector_dim_ != 0 && ivector_dim == 0)
892  KALDI_ERR << "Model expects i-vectors but none were supplied";
893  else if (ivector_dim_ == 0 && ivector_dim != 0)
894  KALDI_ERR << "You supplied i-vectors but model does not expect them.";
895  else if (ivector_dim != ivector_dim_)
896  KALDI_ERR << "I-vector dimensions mismatch: model expects "
897  << ivector_dim_ << ", you supplied " << ivector_dim;
898  }
899 
900 
901  int32 num_input_frames = input.NumRows(),
903  num_subsampled_frames = (num_input_frames + f - 1) / f,
904  num_subsampled_frames_per_chunk = opts_.frames_per_chunk / f;
905 
906  GetOutputFrameInfoForTasks(opts_, num_subsampled_frames,
907  num_subsampled_frames_per_chunk,
908  tasks);
909 
911  input, tasks);
912 
913 
914  if (ivector != NULL) {
915  KALDI_ASSERT(online_ivectors == NULL);
916 
917 #if HAVE_CUDA == 1
918  if (CuDevice::Instantiate().Enabled()) {
919  int32_t num_tasks = tasks->size();
920 
921  std::vector<const BaseFloat*> inputs(num_tasks);
922  std::vector<BaseFloat*> outputs(num_tasks);
923  std::vector<int32_t> ldi(num_tasks), ldo(num_tasks);
924  std::vector<int32_t> num_rows(num_tasks), num_cols(num_tasks);
925 
926  int b=0; // batch counter
927 
928  for (size_t i = 0; i < tasks->size(); i++) {
929  CuVector<BaseFloat> &output_vec = (*tasks)[i].ivector;
930  const CuVector<BaseFloat> &input_vec = *ivector;
931 
932  output_vec.Resize(input_vec.Dim(), kUndefined);
933 
934  // create matrix batch description arrays
935  num_rows[b] = 1;
936  num_cols[b] = output_vec.Dim();
937  outputs[b] = output_vec.Data();
938  inputs[b] = input_vec.Data();
939  ldo[b] = 0;
940  ldi[b] = 0;
941  b++; // increase batch count
942  }
943 
944  // execute batched copy
945  cuda_batched_copy_mats(b, &num_rows[0], &num_cols[0], &inputs[0], &ldi[0],
946  &outputs[0], &ldo[0]);
947  } else
948 #endif
949  {
950  for (size_t i = 0; i < tasks->size(); i++)
951  (*tasks)[i].ivector = *ivector;
952  }
953 
954  } else if (online_ivectors != NULL) {
955  AddOnlineIvectorsToTasks(opts_, *online_ivectors,
956  online_ivector_period, tasks);
957  }
958 
959  for (size_t i = 0; i < tasks->size(); i++) {
960  (*tasks)[i].output_to_cpu = output_to_cpu;
961  // The priority will be set by the user; this just avoids undefined
962  // behavior.
963  (*tasks)[i].priority = 0.0;
964  }
965 }
static void SplitInputToTasks(const NnetBatchComputerOptions &opts, int32 nnet_left_context, int32 nnet_right_context, const CuMatrix< BaseFloat > &input, std::vector< NnetInferenceTask > *tasks)
This function sets up the &#39;input&#39; and &#39;first_input_t&#39; and &#39;is_edge&#39; members of the &#39;tasks&#39; array; it ...
void AddOnlineIvectorsToTasks(const NnetBatchComputerOptions &opts, const CuMatrix< BaseFloat > &online_ivectors, int32 online_ivector_period, std::vector< NnetInferenceTask > *tasks)
void GetOutputFrameInfoForTasks(const NnetBatchComputerOptions &opts, int32 num_subsampled_frames, int32 num_subsampled_frames_per_chunk, std::vector< NnetInferenceTask > *tasks)
This function figures out how many chunks are needed for this utterance, sets &#39;tasks&#39; to a vector wit...
kaldi::int32 int32
#define KALDI_ERR
Definition: kaldi-error.h:147
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
NnetBatchComputerOptions opts_

Member Data Documentation

◆ compiler_

CachingOptimizingCompiler compiler_
private

Definition at line 462 of file nnet-batch-compute.h.

Referenced by NnetBatchComputer::GetComputation().

◆ input_dim_

◆ ivector_dim_

int32 ivector_dim_
private

◆ log_priors_

CuVector<BaseFloat> log_priors_
private

◆ mutex_

◆ nnet_

const Nnet& nnet_
private

◆ nnet_left_context_

int32 nnet_left_context_
private

◆ nnet_right_context_

int32 nnet_right_context_
private

◆ no_more_than_n_minibatches_full_

std::unordered_map<int32, std::condition_variable*> no_more_than_n_minibatches_full_
private

◆ num_full_minibatches_

◆ opts_

◆ output_dim_

int32 output_dim_
private

Definition at line 492 of file nnet-batch-compute.h.

Referenced by NnetBatchComputer::NnetBatchComputer().

◆ tasks_


The documentation for this class was generated from the following files: