kaldi::nnet3::utterance_splitting Namespace Reference

This namespace contains things needed for the implementation of the function NnetBatchComputer::SplitUtteranceIntoTasks(). More...

Functions

void GetOutputFrameInfoForTasks (const NnetBatchComputerOptions &opts, int32 num_subsampled_frames, int32 num_subsampled_frames_per_chunk, std::vector< NnetInferenceTask > *tasks)
 This function figures out how many chunks are needed for this utterance, sets 'tasks' to a vector with that many elements, and sets up the following elements in 'tasks': output_t_stride num_output_frames num_initial_unused_output_frames num_used_output_frames. More...
 
void AddOnlineIvectorsToTasks (const NnetBatchComputerOptions &opts, const CuMatrix< BaseFloat > &online_ivectors, int32 online_ivector_period, std::vector< NnetInferenceTask > *tasks)
 
static void SplitInputToTasks (const NnetBatchComputerOptions &opts, int32 nnet_left_context, int32 nnet_right_context, const CuMatrix< BaseFloat > &input, std::vector< NnetInferenceTask > *tasks)
 This function sets up the 'input' and 'first_input_t' and 'is_edge' members of the 'tasks' array; it is responsible for working out, for each task, which input frames it needs (including left-context and right-context). More...
 

Detailed Description

This namespace contains things needed for the implementation of the function NnetBatchComputer::SplitUtteranceIntoTasks().

Function Documentation

◆ AddOnlineIvectorsToTasks()

void kaldi::nnet3::utterance_splitting::AddOnlineIvectorsToTasks ( const NnetBatchComputerOptions opts,
const CuMatrix< BaseFloat > &  online_ivectors,
int32  online_ivector_period,
std::vector< NnetInferenceTask > *  tasks 
)

Definition at line 729 of file nnet-batch-compute.cc.

References NnetInferenceTask::first_used_output_frame_index, NnetSimpleComputationOptions::frame_subsampling_factor, rnnlm::i, NnetInferenceTask::ivector, KALDI_ERR, NnetInferenceTask::num_initial_unused_output_frames, NnetInferenceTask::num_output_frames, CuMatrixBase< Real >::NumRows(), and CuMatrixBase< Real >::Row().

Referenced by NnetBatchComputer::SplitUtteranceIntoTasks().

733  {
734  int32 f = opts.frame_subsampling_factor,
735  num_tasks = tasks->size();
736  for (int32 i = 0; i < num_tasks; i++) {
737  NnetInferenceTask &task = (*tasks)[i];
738  // begin_output_t and end_output_t are the subsampled frame indexes at
739  // the output; you'd have to multiply them by f to get real frame indexes.
740  int32 begin_output_t = task.first_used_output_frame_index -
741  task.num_initial_unused_output_frames,
742  mid_output_t = begin_output_t + (task.num_output_frames / 2),
743  mid_input_t = mid_output_t * f,
744  ivector_frame = mid_input_t / online_ivector_period,
745  num_ivector_frames = online_ivectors.NumRows(),
746  margin_in_frames = 20,
747  margin_in_ivector_frames =
748  (margin_in_frames + online_ivector_period - 1) / online_ivector_period;
749  // the 'margin' is our tolerance for when the number of rows of
750  // 'online_ivectors' is less than what we expected; we allow 20 frames of
751  // tolerance in the numbering of the original (input) features.
752  if (ivector_frame >= num_ivector_frames) {
753  if (num_ivector_frames > 0 && ivector_frame > num_ivector_frames -
754  margin_in_ivector_frames) {
755  ivector_frame = num_ivector_frames - 1; // Just take the last available one.
756  } else {
757  KALDI_ERR << "Could not get iVector for frame " << ivector_frame
758  << ", online-ivectors matrix has "
759  << online_ivectors.NumRows()
760  << " rows. Mismatched --online-ivector-period?";
761  }
762  }
763  task.ivector = online_ivectors.Row(ivector_frame);
764  }
765 }
kaldi::int32 int32
#define KALDI_ERR
Definition: kaldi-error.h:147

◆ GetOutputFrameInfoForTasks()

void kaldi::nnet3::utterance_splitting::GetOutputFrameInfoForTasks ( const NnetBatchComputerOptions opts,
int32  num_subsampled_frames,
int32  num_subsampled_frames_per_chunk,
std::vector< NnetInferenceTask > *  tasks 
)

This function figures out how many chunks are needed for this utterance, sets 'tasks' to a vector with that many elements, and sets up the following elements in 'tasks': output_t_stride num_output_frames num_initial_unused_output_frames num_used_output_frames.

Parameters
[in]optsOptions class
[in]num_subsampled_framesThe number of output frames in this utterance. Must be > 0.
[in]num_subsampled_frames_per_chunkThe number of output frames per chunk
[out]The'tasks' array is output to here; it will have one task per chunk, with only the members 'output_t_stride', 'num_output_frames', 'num_initial_unused_output_frames', 'num_used_output_frames' and 'is_irregular' set up.

Definition at line 661 of file nnet-batch-compute.cc.

References NnetBatchComputerOptions::ensure_exact_final_context, NnetInferenceTask::first_used_output_frame_index, NnetSimpleComputationOptions::frame_subsampling_factor, rnnlm::i, NnetInferenceTask::is_irregular, KALDI_ASSERT, NnetInferenceTask::num_initial_unused_output_frames, NnetInferenceTask::num_output_frames, and NnetInferenceTask::num_used_output_frames.

Referenced by NnetBatchComputer::SplitUtteranceIntoTasks().

665  {
666  KALDI_ASSERT(num_subsampled_frames > 0);
667  int32 fpc = num_subsampled_frames_per_chunk;
668  int32 num_tasks = (num_subsampled_frames + fpc - 1) / fpc;
669  tasks->resize(num_tasks);
670  for (int32 i = 0; i < num_tasks; i++) {
671  (*tasks)[i].output_t_stride = opts.frame_subsampling_factor;
672  }
673  if (num_subsampled_frames <= fpc) { // there is one chunk.
674  KALDI_ASSERT(num_tasks == 1); // TODO: remove this.
675  NnetInferenceTask &task = (*tasks)[0];
676  task.first_used_output_frame_index = 0;
677  if (opts.ensure_exact_final_context) {
678  task.num_output_frames = num_subsampled_frames;
679  task.num_initial_unused_output_frames = 0;
680  task.num_used_output_frames = num_subsampled_frames;
681  task.is_irregular = true;
682  } else {
683  task.num_output_frames = fpc;
684  task.num_initial_unused_output_frames = 0;
685  task.num_used_output_frames = num_subsampled_frames;
686  task.is_irregular = false;
687  }
688  } else {
689  for (int32 i = 0; i + 1 < num_tasks; i++) {
690  NnetInferenceTask &task = (*tasks)[i];
691  task.num_output_frames = fpc;
692  task.num_initial_unused_output_frames = 0;
693  task.num_used_output_frames = fpc;
694  task.first_used_output_frame_index = i * fpc;
695  task.is_irregular = false;
696  }
697  // The last chunk will end on the last frame of the file, but we won't use
698  // the part of its output that overlaps with the preceding chunk.
699  NnetInferenceTask &task = (*tasks)[num_tasks - 1];
700  task.num_output_frames = fpc;
701  task.num_initial_unused_output_frames = ((num_tasks - 1) * fpc) -
702  (num_subsampled_frames - fpc);
703  task.num_used_output_frames =
704  num_subsampled_frames - ((num_tasks - 1) * fpc);
705  task.first_used_output_frame_index = (num_tasks - 1) * fpc;
706  task.is_irregular = false;
707  }
708 
709  if (true) {
710  // Do some checking. TODO: remove this.
711  KALDI_ASSERT((*tasks)[0].first_used_output_frame_index == 0);
712  for (int32 i = 1; i < num_tasks; i++) {
713  KALDI_ASSERT((*tasks)[i].first_used_output_frame_index ==
714  (*tasks)[i-1].first_used_output_frame_index +
715  (*tasks)[i-1].num_used_output_frames);
716  }
717  KALDI_ASSERT((*tasks)[num_tasks-1].first_used_output_frame_index +
718  (*tasks)[num_tasks-1].num_used_output_frames ==
719  num_subsampled_frames);
720  for (int32 i = 0; i < num_tasks; i++) {
721  const NnetInferenceTask &task = (*tasks)[i];
722  KALDI_ASSERT(task.num_used_output_frames +
723  task.num_initial_unused_output_frames <=
724  task.num_output_frames);
725  }
726  }
727 }
kaldi::int32 int32
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185

◆ SplitInputToTasks()

static void kaldi::nnet3::utterance_splitting::SplitInputToTasks ( const NnetBatchComputerOptions opts,
int32  nnet_left_context,
int32  nnet_right_context,
const CuMatrix< BaseFloat > &  input,
std::vector< NnetInferenceTask > *  tasks 
)
static

This function sets up the 'input' and 'first_input_t' and 'is_edge' members of the 'tasks' array; it is responsible for working out, for each task, which input frames it needs (including left-context and right-context).

The 'nnet_left_context' and 'nnet_right_context' are the inherent left and right context of the network (num-frames required on left and right to compute an output frame), and may be computed by doing: ComputeSimpleNnetContext(nnet, &nnet_left_context_, &nnet_right_context_)

Definition at line 779 of file nnet-batch-compute.cc.

References NnetSimpleComputationOptions::extra_left_context, NnetSimpleComputationOptions::extra_left_context_initial, NnetSimpleComputationOptions::extra_right_context, NnetSimpleComputationOptions::extra_right_context_final, NnetInferenceTask::first_input_t, NnetInferenceTask::first_used_output_frame_index, NnetSimpleComputationOptions::frame_subsampling_factor, rnnlm::i, NnetInferenceTask::input, NnetInferenceTask::is_edge, kaldi::kUndefined, NnetInferenceTask::num_initial_unused_output_frames, NnetInferenceTask::num_output_frames, CuMatrixBase< Real >::NumCols(), and CuMatrixBase< Real >::NumRows().

Referenced by NnetBatchComputer::SplitUtteranceIntoTasks().

783  {
784  int32 num_input_frames = input.NumRows(),
785  f = opts.frame_subsampling_factor,
786  num_subsampled_frames = (num_input_frames + f - 1) / f,
787  extra_left_context_initial = (opts.extra_left_context_initial < 0 ?
788  opts.extra_left_context :
789  opts.extra_left_context_initial),
790  extra_right_context_final = (opts.extra_right_context_final < 0 ?
791  opts.extra_right_context :
792  opts.extra_right_context_final),
793  num_tasks = tasks->size();
794 
795  for (int32 i = 0; i < num_tasks; i++) {
796  NnetInferenceTask &task = (*tasks)[i];
797  // begin_output_t and end_output_t are the subsampled frame indexes at
798  // the output; you'd have to multiply them by f to get real frame indexes.
799  int32 begin_output_t = task.first_used_output_frame_index -
800  task.num_initial_unused_output_frames,
801  end_output_t = begin_output_t + task.num_output_frames;
802  // begin_input_t and end_input_t are the real 't' values corresponding to
803  // begin_output_t and end_output_t; they are the beginning and end
804  // (i.e. first and last-plus-one) frame indexes without any left or right
805  // context.
806  int32 begin_input_t = begin_output_t * f,
807  end_input_t = end_output_t * f;
808  // Detect whether the left and right edges touch (or pass over) the left
809  // and right boundaries. Note: we don't expect begin_output_t to ever be
810  // negative.
811  bool left_edge = (begin_output_t <= 0),
812  right_edge = (end_output_t >= num_subsampled_frames);
813  int32 tot_left_context = nnet_left_context +
814  (left_edge ? extra_left_context_initial : opts.extra_left_context),
815  tot_right_context = nnet_right_context +
816  (right_edge ? extra_right_context_final : opts.extra_right_context);
817 
818  // 'is_edge' is only true if it's an edge minibatch *and* its being an
819  // edge actually made a difference to the structure of the example.
820  task.is_edge =
821  (tot_left_context != nnet_left_context + opts.extra_left_context ||
822  tot_right_context != nnet_right_context + opts.extra_right_context);
823 
824  int32 begin_input_t_padded = begin_input_t - tot_left_context,
825  end_input_t_padded = end_input_t + tot_right_context;
826 
827  // 'task.first_input_t' is a representation of 'begin_input_t_padded' in a
828  // shifted/normalized numbering where the output time indexes start from
829  // zero.
830  task.first_input_t = begin_input_t_padded - (begin_output_t * f);
831 
832  task.input.Resize(end_input_t_padded - begin_input_t_padded,
833  input.NumCols(), kUndefined);
834 
835  // Copy from intput into task input with clamping
836  task.input.CopyRangeFromMatClamped(input, begin_input_t_padded,
837  end_input_t_padded, 0, num_input_frames-1);
838  }
839 }
kaldi::int32 int32