Collaboration diagram for BatchedXvectorComputer:

[legend]

Classes
struct	XvectorTask

Public Member Functions
	BatchedXvectorComputer (const BatchedXvectorComputerOptions &opts, const Nnet &nnet, int32 total_context)

void	AcceptUtterance (const std::string &utt, const Matrix< BaseFloat > &input)
	Accepts an utterance to process into an xvector, and, if one or more batches become full, processes the batch. More...

bool	XvectorReady () const
	Returns true if at least one xvector is pending output (i.e. More...

void	OutputXvector (std::string utt, Vector< BaseFloat > xvector)
	This function, which must only be called if XvectorReady() has just returned true, outputs an xvector for an utterance. More...

void	Flush ()
	Calling this will force any partial minibatch to be computed, so that any utterances that have previously been passed to AcceptUtterance() will, when this function returns, have their xvectors ready to be retrieved by OutputXvector(). More...

Private Member Functions
void	SplitUtteranceIntoChunks (int32 num_frames, std::vector< int32 > *start_frames)
	This decides how to split the utterance into chunks. More...

XvectorTask *	CreateTask (const std::string &utt, int32 num_chunks)
	This adds a newly created XvectorTask at the tail of the singly linked list whose (head,tail) are results_head_, results_tail_. More...

void	ComputeOneBatch ()
	Does the nnet computation for one batch and distributes the computed x-vectors (of chunks) appropriately to their XvectorTask objects. More...

void	AddChunkToBatch (XvectorTask *task, const Matrix< BaseFloat > &input, int32 chunk_start)
	Adds a new chunk to a batch we are preparing. More...

Private Attributes
const BatchedXvectorComputerOptions &	opts_

int32	total_context_

const Nnet &	nnet_

int32	feature_dim_

int32	xvector_dim_

Matrix< BaseFloat >	input_feats_
	Staging area for the input features prior to copying them to GPU. More...

std::shared_ptr< const NnetComputation >	computation_
	The compiled computation (will be the same for every batch). More...

int32	position_in_batch_
	position_in_batch_ is the number of chunks that we have filled in in the input_feats_ matrix and tasks_this_batch_. More...

std::vector< XvectorTask * >	tasks_this_batch_
	tasks_this_batch_ is of dimension opts_.batch_size. More...

XvectorTask *	results_head_

XvectorTask *	results_tail_

Detailed Description

Definition at line 99 of file nnet3-xvector-compute-batched.cc.

Constructor & Destructor Documentation

◆ BatchedXvectorComputer()

BatchedXvectorComputer	(	const BatchedXvectorComputerOptions &	opts,
		const Nnet &	nnet,
		int32	total_context
	)

Parameters

[in]	opts	Options class; warning, it keeps a reference to it.
[in]	nnet	The neural net we'll be computing with; assumed to have already been prepared for test.
[in]	total_context	The sum of the left and right context of the network, computed after calling SetRequireDirectInput(true, &nnet); so the l/r context isn't zero.

Definition at line 293 of file nnet3-xvector-compute-batched.cc.

                         :
     opts_(opts),
     total_context_(total_context),
     nnet_(nnet),
     position_in_batch_(0),
     results_head_(NULL),
     results_tail_(NULL) {
 
   tasks_this_batch_.resize(opts_.batch_size);
 
   feature_dim_ = nnet.InputDim("input");
   xvector_dim_ = nnet.OutputDim("output");
   // Zero input_feats_ in case there is only one batch, to avoid
   // NaN's being generated due to undefined data.
   input_feats_.Resize(opts_.chunk_size * opts_.batch_size,
                       feature_dim_);
 
   CachingOptimizingCompiler compiler(nnet, opts.optimize_config,
                                      opts.compiler_config);
 
   {  // This block creates computation_.
     ComputationRequest request;
     request.need_model_derivative = false;
     request.store_component_stats = false;
     request.inputs.resize(1);
     IoSpecification &input(request.inputs[0]);
     input.name = "input";
     input.has_deriv = false;
     input.indexes.resize(opts_.batch_size * opts_.chunk_size);
     // Note: the sequences are interleaved in the input; this will save an extra
     // copy since it corresponds to how nnet3 stores things by default.  (Makes
     // TDNNs easier to implement.)
     for (int32 n = 0; n < opts_.batch_size; n++) {
       for (int32 t = 0; t < opts_.chunk_size; t++) {
         Index index;
         index.n = n;
         index.t = t;
         // index.x is 0 by default.
         input.indexes[n + opts_.batch_size * t] = index;
       }
     }
     IoSpecification output;
     output.name = "output";
     output.has_deriv = false;
     output.indexes.resize(opts_.batch_size);
     for (int32 n = 0; n < opts_.batch_size; n++){
         Index index;
         index.n = n;
         index.t = 0;
         output.indexes[n] = index;
     }
     request.outputs.push_back(output);
     computation_ = compiler.Compile(request);
   }
 }

Member Function Documentation

◆ AcceptUtterance()

void AcceptUtterance	(	const std::string &	utt,
		const Matrix< BaseFloat > &	input
	)

Accepts an utterance to process into an xvector, and, if one or more batches become full, processes the batch.

Definition at line 428 of file nnet3-xvector-compute-batched.cc.

References BatchedXvectorComputer::AddChunkToBatch(), BatchedXvectorComputerOptions::batch_size, BatchedXvectorComputer::ComputeOneBatch(), BatchedXvectorComputer::CreateTask(), rnnlm::i, MatrixBase< Real >::NumRows(), BatchedXvectorComputer::opts_, BatchedXvectorComputer::position_in_batch_, and BatchedXvectorComputer::SplitUtteranceIntoChunks().

Referenced by main().

                                     {
   std::vector<int32> chunk_starts;
   int32 num_frames = input.NumRows();
   SplitUtteranceIntoChunks(num_frames, &chunk_starts);
   int32 num_chunks = chunk_starts.size();
   XvectorTask *task = CreateTask(utt, num_chunks);
 
   for (int32 i = 0; i < num_chunks; i++) {
     AddChunkToBatch(task, input, chunk_starts[i]);
     if (position_in_batch_ == opts_.batch_size) {
       ComputeOneBatch();
     }
   }
 }

◆ AddChunkToBatch()

void AddChunkToBatch	(	XvectorTask *	task,
		const Matrix< BaseFloat > &	input,
		int32	chunk_start
	)

private

Adds a new chunk to a batch we are preparing.

This will go at position `position_in_batch_` which will be incremented.

Parameters

[in]	task	The task this is part of (records the utterance); tasks_this_batch_[position_in_batch_] will be set to this.
[in]	input	The input matrix of features of which this chunk is a part
[in]	chunk_start	The frame at which this chunk starts. Must be >= 0; and if opts_.pad_input is false, chunk_start + opts_.chunk_size must be <= input.NumRows().

Definition at line 352 of file nnet3-xvector-compute-batched.cc.

References BatchedXvectorComputerOptions::batch_size, BatchedXvectorComputerOptions::chunk_size, VectorBase< Real >::CopyFromVec(), BatchedXvectorComputer::feature_dim_, BatchedXvectorComputer::input_feats_, KALDI_ASSERT, KALDI_ERR, rnnlm::n, MatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), BatchedXvectorComputer::opts_, BatchedXvectorComputerOptions::pad_input, BatchedXvectorComputer::position_in_batch_, and BatchedXvectorComputer::tasks_this_batch_.

Referenced by BatchedXvectorComputer::AcceptUtterance().

                        {
   int32 n = position_in_batch_++;
   KALDI_ASSERT(n >= 0 && n < opts_.batch_size);
   tasks_this_batch_[n] = task;
   int32 T = opts_.chunk_size,
       num_input_frames = input.NumRows();
   KALDI_ASSERT(input_feats_.NumRows() == T * opts_.batch_size);
   if (input.NumCols() != feature_dim_) {
     KALDI_ERR << "Feature dimension mismatch: neural net expected "
               << feature_dim_ << ", got " << input.NumCols();
   }
   for (int32 t = 0; t < T; t++) {
     SubVector<BaseFloat> dest(input_feats_, t * opts_.batch_size + n);
     int32 src_t = t + chunk_start;
     if (src_t >= num_input_frames) {
       KALDI_ASSERT(opts_.pad_input);
       src_t = num_input_frames - 1;  // Pad with repeats of the last frame.
     }
     SubVector<BaseFloat> src(input, src_t);
     dest.CopyFromVec(src);
   }
 }

◆ ComputeOneBatch()

void ComputeOneBatch ( )

private

Does the nnet computation for one batch and distributes the computed x-vectors (of chunks) appropriately to their XvectorTask objects.

Definition at line 404 of file nnet3-xvector-compute-batched.cc.

Referenced by BatchedXvectorComputer::AcceptUtterance(), and BatchedXvectorComputer::Flush().

                                              {
 
   CuMatrix<BaseFloat> cu_input_feats(input_feats_);
   Nnet *nnet_to_update = NULL;  // we're not doing any update.
   NnetComputer computer(opts_.compute_config, *computation_,
                         nnet_, nnet_to_update);
   computer.AcceptInput("input", &cu_input_feats);
   computer.Run();
   CuMatrix<BaseFloat> cu_output;
   computer.GetOutputDestructive("output", &cu_output);
   KALDI_ASSERT(cu_output.NumRows() == opts_.batch_size);
   Matrix<BaseFloat> output(cu_output);
   for (int32 n = 0; n < opts_.batch_size; n++) {
     XvectorTask *task = tasks_this_batch_[n];
     if (task == NULL)
       continue;  // Would only happen for the last batch.
     task->num_chunks_finished++;
     task->xvector.AddVec(1.0 / task->num_chunks, output.Row(n));
   }
   position_in_batch_ = 0;
   std::fill(tasks_this_batch_.begin(), tasks_this_batch_.end(),
             (XvectorTask*)NULL);
 }

◆ CreateTask()

BatchedXvectorComputer::XvectorTask * CreateTask	(	const std::string &	utt,
		int32	num_chunks
	)

private

This adds a newly created XvectorTask at the tail of the singly linked list whose (head,tail) are results_head_, results_tail_.

Definition at line 275 of file nnet3-xvector-compute-batched.cc.

References BatchedXvectorComputer::XvectorTask::num_chunks, BatchedXvectorComputer::XvectorTask::num_chunks_finished, BatchedXvectorComputer::XvectorTask::tail, BatchedXvectorComputer::XvectorTask::utt_id, and BatchedXvectorComputer::XvectorTask::xvector.

Referenced by BatchedXvectorComputer::AcceptUtterance().

                                             {
   XvectorTask *task = new XvectorTask;
   task->utt_id = utt;
   task->num_chunks = num_chunks;
   task->num_chunks_finished = 0;
   task->xvector.Resize(xvector_dim_);
   task->tail = NULL;
   if (results_tail_) {
     results_tail_->tail = task;
     results_tail_ = task;
   } else {  // List was previously empty.
     results_head_ = task;
     results_tail_ = task;
   }
   return task;
 }

◆ Flush()

void Flush ( )

Calling this will force any partial minibatch to be computed, so that any utterances that have previously been passed to AcceptUtterance() will, when this function returns, have their xvectors ready to be retrieved by OutputXvector().

Definition at line 397 of file nnet3-xvector-compute-batched.cc.

References BatchedXvectorComputer::ComputeOneBatch(), and BatchedXvectorComputer::position_in_batch_.

Referenced by main().

                                    {
   if (position_in_batch_ == 0)
     return;
   ComputeOneBatch();
 }

◆ OutputXvector()

void OutputXvector	(	std::string *	utt,
		Vector< BaseFloat > *	xvector
	)

This function, which must only be called if XvectorReady() has just returned true, outputs an xvector for an utterance.

Parameters

[out]	utt	The utterance-id is written to here. Note: these will be output in the same order as the user called AcceptUtterance(), except that if opts_.pad_input is false and and utterance is shorter than the chunk size, some utterances may be skipped.
[out]	xvector	The xvector will be written to here.

Definition at line 385 of file nnet3-xvector-compute-batched.cc.

References KALDI_ASSERT, BatchedXvectorComputer::results_head_, BatchedXvectorComputer::results_tail_, Vector< Real >::Swap(), BatchedXvectorComputer::XvectorTask::tail, BatchedXvectorComputer::XvectorTask::utt_id, BatchedXvectorComputer::XvectorTask::xvector, and BatchedXvectorComputer::XvectorReady().

Referenced by main().

                                                                        {
   KALDI_ASSERT(XvectorReady());
   *utt = results_head_->utt_id;
   xvector->Swap(&(results_head_->xvector));
   XvectorTask *new_tail = results_head_->tail;
   delete results_head_;
   results_head_ = new_tail;
   if (new_tail == NULL)
     results_tail_ = NULL;
 }

◆ SplitUtteranceIntoChunks()

void SplitUtteranceIntoChunks	(	int32	num_frames,
		std::vector< int32 > *	start_frames
	)

private

This decides how to split the utterance into chunks.

It does so in a way that minimizes the variance of the x-vector under some simplifying assumptions. It's about minimizing the variance of the x-vector. We treat the x-vector as computed as a sum over frames (although some frames may be repeated or omitted due to gaps between chunks or overlaps between chunks); and we try to minimize the variance of the x-vector estimate; this is minimized when all the frames have the same weight, which is only possible if it can be exactly divided into chunks; anyway, this function computes the best division into chunks.

It's a question of whether to allow overlaps or gaps. Suppose we are averaging independent quantities with variance 1. The variance of a simple sum of M of those quantities is 1/M. Suppose we have M of those quantities, plus N which are repeated twice in the sum. The variance of the estimate formed that way is:

(M + 4N) / (M + 2N)^2

If we can't divide it exactly into chunks we'll compare the variances from the cases where there is a gap vs. an overlap, and choose the one with the smallest variance. (Note: due to context effects we actually lose total_context_ frames from the input signal, and the chunks would have to overlap by total_context_ even if the part at the statistics-computation layer were ideally cut up.

Parameters

[in]	num_frames	The number of frames in the utterance
[out]	start_frames	This function will output to here a vector containing all the start-frames of chunks in this utterance. All chunks will have duration opts_.chunk_size; if a chunk goes past the end of the input we'll repeat the last frame. (This will only happen if opts_.pad_input is false and num_frames is less than opts_.chunk_length.)

Definition at line 445 of file nnet3-xvector-compute-batched.cc.

References BatchedXvectorComputerOptions::chunk_size, kaldi::nnet3::DivideIntoPieces(), rnnlm::i, KALDI_ASSERT, BatchedXvectorComputer::opts_, BatchedXvectorComputerOptions::pad_input, and BatchedXvectorComputer::total_context_.

Referenced by BatchedXvectorComputer::AcceptUtterance().

                                                       {
   start_frames->clear();
   if (num_frames <= opts_.chunk_size) {
     if (num_frames == opts_.chunk_size || opts_.pad_input)
       start_frames->push_back(0);
     // if we leave start_frames empty, then we just won't compute anything for
     // this file.
   } else {
     // these modified quantities are to account for the context effects...  when
     // the chunks overlap by exactly total_context_, the frames that get
     // averaged by the respective chunks in their averaging layers would touch
     // but not overlap.  So the optimal separation between chunks would equal
     // opts_.chunk_size - total_context_.
     int32 modified_num_frames = num_frames - total_context_,
         modified_chunk_size = opts_.chunk_size - total_context_;
     KALDI_ASSERT(modified_num_frames > modified_chunk_size);
     int32 num_chunks1 = modified_num_frames / modified_chunk_size,
         num_chunks2 = num_chunks1 + 1;
     int32 num_frames1 = num_chunks1 * modified_chunk_size,
         num_frames2 = num_chunks2 * modified_chunk_size;
     KALDI_ASSERT(num_frames2 > modified_chunk_size);
     // The M and N below correspond to the M and N in the comment:
     // M is the number of frames repeated once in the averaging, N
     // the number of frames repeated twice.  (Basically a solution
     // of the equations: (M + 2N == num_frames2, M+N == modified_num_frames).
     // Note: by a "frame" above, I mean a specific "t" value in
     // the utterance.
     int32 N = num_frames2 - modified_num_frames,
         M = modified_num_frames - N;
     KALDI_ASSERT(M + 2*N == num_frames2 && M + N == modified_num_frames);
 
     // The variances below are proportional to the variance of our
     // estimate of the xvector under certain simplifying assumptions..
     // they help us choose whether to have gaps between the chunks
     // or overlaps between them.
     BaseFloat variance1 = 1.0 / num_frames1,  // the 1/M mentioned above.
         variance2 = (M + 4.0*N) / ((M + 2.0*N)*(M + 2.0*N));
     if (variance1 <= variance2) {
       // We'll choose the smaller number of chunks.  There may be gaps.
       // Counting the positions at the ends, there are num_chunks+1 positions
       // where there might be gaps.
       // Note: "total_gap" is >= 0, it's the positive of the sum of the
       // sizes of those gaps.
       int32 num_chunks = num_chunks1,
           num_gaps = num_chunks + 1,
           total_gap = modified_num_frames - num_chunks * modified_chunk_size;
       KALDI_ASSERT(0 <= total_gap && total_gap < modified_chunk_size);
       std::vector<int32> gap_sizes;  // elements will be >= 0.
       DivideIntoPieces(total_gap, num_gaps, &gap_sizes);
       int32 pos = gap_sizes[0];
       for (int32 i = 0; i < num_chunks; i++) {
         start_frames->push_back(pos);
         pos += modified_chunk_size + gap_sizes[i + 1];
       }
       KALDI_ASSERT(pos == modified_num_frames);
     } else {
       int32 num_chunks = num_chunks2,
           num_overlaps = num_chunks - 1,
           total_overlap = modified_num_frames - num_chunks * modified_chunk_size;
       KALDI_ASSERT( -modified_chunk_size < total_overlap && total_overlap <= 0 );
       std::vector<int32> overlap_sizes;  // elements will be <= 0.
       DivideIntoPieces(total_overlap, num_overlaps, &overlap_sizes);
       int32 pos = 0;
       for (int32 i = 0; i < num_chunks; i++) {
         start_frames->push_back(pos);
         pos += modified_chunk_size;
         if (i < num_overlaps)
           pos += overlap_sizes[i];
       }
       KALDI_ASSERT(pos == modified_num_frames);
     }
   }
 }

◆ XvectorReady()

bool XvectorReady ( ) const

Returns true if at least one xvector is pending output (i.e.

that the user may call OutputXvector()).

Definition at line 378 of file nnet3-xvector-compute-batched.cc.

References KALDI_ASSERT, BatchedXvectorComputer::XvectorTask::num_chunks, BatchedXvectorComputer::XvectorTask::num_chunks_finished, and BatchedXvectorComputer::results_head_.

Referenced by main(), and BatchedXvectorComputer::OutputXvector().

                                                 {
   if (results_head_ == NULL)
     return false;
   KALDI_ASSERT(results_head_->num_chunks_finished <= results_head_->num_chunks);
   return results_head_->num_chunks_finished == results_head_->num_chunks;
 }

Member Data Documentation

◆ computation_

std::shared_ptr<const NnetComputation> computation_

private

The compiled computation (will be the same for every batch).

Definition at line 248 of file nnet3-xvector-compute-batched.cc.

Referenced by BatchedXvectorComputer::BatchedXvectorComputer(), and BatchedXvectorComputer::ComputeOneBatch().

◆ feature_dim_

int32 feature_dim_

private

Definition at line 234 of file nnet3-xvector-compute-batched.cc.

Referenced by BatchedXvectorComputer::AddChunkToBatch(), and BatchedXvectorComputer::BatchedXvectorComputer().

◆ input_feats_

Matrix<BaseFloat> input_feats_

private

Staging area for the input features prior to copying them to GPU.

Dimension is opts_.chunk_size * opts_.batch_size by feature_dim_. The sequences are interleaved (will be faster since this corresponds to how nnet3 keeps things in memory), i.e. row 0 of input_feats_ is time t=0 for chunk n=0; and row 1 of input_feats_ is time t=0 for chunk n=1.

Definition at line 244 of file nnet3-xvector-compute-batched.cc.

Referenced by BatchedXvectorComputer::AddChunkToBatch(), BatchedXvectorComputer::BatchedXvectorComputer(), and BatchedXvectorComputer::ComputeOneBatch().

◆ nnet_

const Nnet& nnet_

private

Definition at line 232 of file nnet3-xvector-compute-batched.cc.

Referenced by BatchedXvectorComputer::ComputeOneBatch().

◆ opts_

const BatchedXvectorComputerOptions& opts_

private

Definition at line 230 of file nnet3-xvector-compute-batched.cc.

Referenced by BatchedXvectorComputer::AcceptUtterance(), BatchedXvectorComputer::AddChunkToBatch(), BatchedXvectorComputer::BatchedXvectorComputer(), BatchedXvectorComputer::ComputeOneBatch(), and BatchedXvectorComputer::SplitUtteranceIntoChunks().

◆ position_in_batch_

int32 position_in_batch_

private

position_in_batch_ is the number of chunks that we have filled in in the input_feats_ matrix and tasks_this_batch_.

When it reaches opts_.batch_size we will do the actual computation.

Definition at line 255 of file nnet3-xvector-compute-batched.cc.

Referenced by BatchedXvectorComputer::AcceptUtterance(), BatchedXvectorComputer::AddChunkToBatch(), BatchedXvectorComputer::ComputeOneBatch(), and BatchedXvectorComputer::Flush().

◆ results_head_

XvectorTask* results_head_

private

Definition at line 268 of file nnet3-xvector-compute-batched.cc.

Referenced by BatchedXvectorComputer::OutputXvector(), and BatchedXvectorComputer::XvectorReady().

◆ results_tail_

XvectorTask* results_tail_

private

Definition at line 271 of file nnet3-xvector-compute-batched.cc.

Referenced by BatchedXvectorComputer::OutputXvector().

◆ tasks_this_batch_

std::vector<XvectorTask*> tasks_this_batch_

private

tasks_this_batch_ is of dimension opts_.batch_size.

It is a vector of pointers to elements of the singly linked list whose head is at results_head_, or NULL for elements with indexes >= position_in_batch_.

Definition at line 262 of file nnet3-xvector-compute-batched.cc.

Referenced by BatchedXvectorComputer::AddChunkToBatch(), BatchedXvectorComputer::BatchedXvectorComputer(), and BatchedXvectorComputer::ComputeOneBatch().