RestrictedAttentionComponent implements an attention model with restricted temporal context. More...

#include <nnet-attention-component.h>

Inheritance diagram for RestrictedAttentionComponent:

Collaboration diagram for RestrictedAttentionComponent:

[legend]

Classes
struct	Memo

class	PrecomputedIndexes

Public Member Functions
	RestrictedAttentionComponent ()

	RestrictedAttentionComponent (const RestrictedAttentionComponent &other)

virtual int32	InputDim () const
	Returns input-dimension of this component. More...

virtual int32	OutputDim () const
	Returns output-dimension of this component. More...

virtual std::string	Info () const
	Returns some text-form information about this component, for diagnostics. More...

virtual void	InitFromConfig (ConfigLine *cfl)
	Initialize, from a ConfigLine object. More...

virtual std::string	Type () const
	Returns a string such as "SigmoidComponent", describing the type of the object. More...

virtual int32	Properties () const
	Return bitmask of the component's properties. More...

virtual void *	Propagate (const ComponentPrecomputedIndexes indexes, const CuMatrixBase< BaseFloat > &in, CuMatrixBase< BaseFloat > out) const
	Propagate function. More...

virtual void	StoreStats (const CuMatrixBase< BaseFloat > &in_value, const CuMatrixBase< BaseFloat > &out_value, void *memo)
	This function may store stats on average activation values, and for some component types, the average value of the derivative of the nonlinearity. More...

virtual void	Scale (BaseFloat scale)
	This virtual function when called on – an UpdatableComponent scales the parameters by "scale" when called by an UpdatableComponent. More...

virtual void	Add (BaseFloat alpha, const Component &other)
	This virtual function when called by – an UpdatableComponent adds the parameters of another updatable component, times some constant, to the current parameters. More...

virtual void	ZeroStats ()
	Components that provide an implementation of StoreStats should also provide an implementation of ZeroStats(), to set those stats to zero. More...

virtual void	Backprop (const std::string &debug_info, const ComponentPrecomputedIndexes indexes, const CuMatrixBase< BaseFloat > &in_value, const CuMatrixBase< BaseFloat > &out_value, const CuMatrixBase< BaseFloat > &out_deriv, void memo, Component to_update, CuMatrixBase< BaseFloat > in_deriv) const
	Backprop function; depending on which of the arguments 'to_update' and 'in_deriv' are non-NULL, this can compute input-data derivatives and/or perform model update. More...

virtual void	Read (std::istream &is, bool binary)
	Read function (used after we know the type of the Component); accepts input that is missing the token that describes the component type, in case it has already been consumed. More...

virtual void	Write (std::ostream &os, bool binary) const
	Write component to stream. More...

virtual Component *	Copy () const
	Copies component (deep copy). More...

virtual void	DeleteMemo (void *memo) const
	This virtual function only needs to be overwritten by Components that return a non-NULL memo from their Propagate() function. More...

virtual void	ReorderIndexes (std::vector< Index > input_indexes, std::vector< Index > output_indexes) const
	This function only does something interesting for non-simple Components. More...

virtual void	GetInputIndexes (const MiscComputationInfo &misc_info, const Index &output_index, std::vector< Index > *desired_indexes) const
	This function only does something interesting for non-simple Components. More...

virtual bool	IsComputable (const MiscComputationInfo &misc_info, const Index &output_index, const IndexSet &input_index_set, std::vector< Index > *used_inputs) const
	This function only does something interesting for non-simple Components, and it exists to make it possible to manage optionally-required inputs. More...

virtual ComponentPrecomputedIndexes *	PrecomputeIndexes (const MiscComputationInfo &misc_info, const std::vector< Index > &input_indexes, const std::vector< Index > &output_indexes, bool need_backprop) const
	This function must return NULL for simple Components. More...

Public Member Functions inherited from Component
virtual void	ConsolidateMemory ()
	This virtual function relates to memory management, and avoiding fragmentation. More...

	Component ()

virtual	~Component ()

Private Member Functions
void	PropagateOneHead (const time_height_convolution::ConvolutionComputationIo &io, const CuMatrixBase< BaseFloat > &in, CuMatrixBase< BaseFloat > c, CuMatrixBase< BaseFloat > out) const

void	BackpropOneHead (const time_height_convolution::ConvolutionComputationIo &io, const CuMatrixBase< BaseFloat > &in_value, const CuMatrixBase< BaseFloat > &c, const CuMatrixBase< BaseFloat > &out_deriv, CuMatrixBase< BaseFloat > *in_deriv) const

void	GetComputationStructure (const std::vector< Index > &input_indexes, const std::vector< Index > &output_indexes, time_height_convolution::ConvolutionComputationIo *io) const

void	GetIndexes (const std::vector< Index > &input_indexes, const std::vector< Index > &output_indexes, time_height_convolution::ConvolutionComputationIo &io, std::vector< Index > new_input_indexes, std::vector< Index > new_output_indexes) const

void	Check () const

Static Private Member Functions
static void	CreateIndexesVector (const std::vector< std::pair< int32, int32 > > &n_x_pairs, int32 t_start, int32 t_step, int32 num_t_values, const std::unordered_set< Index, IndexHasher > &index_set, std::vector< Index > *output_indexes)
	Utility function used in GetIndexes(). More...

Private Attributes
int32	num_heads_

int32	key_dim_

int32	value_dim_

int32	num_left_inputs_

int32	num_right_inputs_

int32	time_stride_

int32	context_dim_

int32	num_left_inputs_required_

int32	num_right_inputs_required_

bool	output_context_

BaseFloat	key_scale_

double	stats_count_

Vector< double >	entropy_stats_

CuMatrix< double >	posterior_stats_

Additional Inherited Members
Static Public Member Functions inherited from Component
static Component *	ReadNew (std::istream &is, bool binary)
	Read component from stream (works out its type). Dies on error. More...

static Component *	NewComponentOfType (const std::string &type)
	Returns a new Component of the given type e.g. More...

Detailed Description

RestrictedAttentionComponent implements an attention model with restricted temporal context.

What is implemented here is a case of self-attention, meaning that the set of indexes on the input is the same set as the indexes on the output (like an N->N mapping, ignoring edge effects, as opposed to an N->M mapping that you might see in a translation model). "Restricted" means that the source indexes are constrained to be close to the destination indexes, i.e. when outputting something for time 't' we attend to a narrow range of source time indexes close to 't'.

This component is just a fixed nonlinearity (albeit of a type that "knows about" time, i.e. the output at time 't' depends on inputs at a range of time values). This component is not updatable; all the parameters are expected to live in the previous component which is most likely going to be of type NaturalGradientAffineComponent. For a more in-depth explanation, please see comments in the source of the file attention.h. Also, look at the comments for InputDim() and OutputDim() which help to clarify the input and output formats.

The following are the parameters accepted on the config line, with examples of their values.

num-heads E.g. num-heads=10. Defaults to 1. Having multiple heads just means the same nonlinearity is repeated many times. InputDim() and OutputDim() are multiples of num-heads. key-dim E.g. key-dim=60. Must be specified. Dimension of input keys. value-dim E.g. value-dim=60. Must be specified. Dimension of input values (these are the things over which the component forms a weighted sum, although if output-context=true we append to the output the weights of the weighted sum, as they might also carry useful information. time-stride Stride for 't' value, e.g. 1 or 3. For example, if time-stride=3, to compute the output for t=10 we'd use the input for time values like ... t=7, t=10, t=13, ... (the ends of this range depend on num-left-inputs and num-right-inputs). num-left-inputs Number of frames to the left of the current frame, that we use as input, e.g. 5. (The 't' values used will be separated by 'time-stride'). num-left-inputs must be >= 0. Must be specified. num-right-inputs Number of frames to the right of the current frame, that we use as input, e.g. 2. Must be >= 0 and must be specified. You are not allowed to set both num-left-inputs and num-right-inputs to zero. num-left-inputs-required The number of frames to the left, that are required in order to produce an output. Defaults to the same as num-left-inputs, but you can set it to a smaller value if you want. We'll use zero-padding for non-required inputs that are not present in the input. Be careful with this because it interacts with decoding settings; for non-online decoding and for dumping of egs it would be advisable to increase the extra-left-context parameter by the sum of the difference between num-left-inputs-required and num-left-inputs, although you could leave extra-left-context-initial at zero. num-right-inputs-required See num-left-inputs-required for explanation; it's the mirror image. Defaults to num-right-inputs. However, be even more careful with the right-hand version; if you set this, online (looped) decoding will not work correctly. It might be wiser just to reduce num-right-inputs if you care about real-time decoding. key-scale Scale on the keys (but not the added context). Defaults to 1.0 / sqrt(key-dim), like the 1/sqrt(d_k) value in the "Attention is all you need" paper. This helps prevent saturation of the softmax. output-context (Default: true). If true, output the softmax that encodes which positions we chose, in addition to the input values.

Definition at line 106 of file nnet-attention-component.h.

Constructor & Destructor Documentation

◆ RestrictedAttentionComponent() [1/2]

RestrictedAttentionComponent ( )

inline

Definition at line 110 of file nnet-attention-component.h.

Referenced by RestrictedAttentionComponent::Copy().

110 { }

◆ RestrictedAttentionComponent() [2/2]

RestrictedAttentionComponent ( const RestrictedAttentionComponent & other )

Definition at line 61 of file nnet-attention-component.cc.

                                               :
     num_heads_(other.num_heads_),
     key_dim_(other.key_dim_),
     value_dim_(other.value_dim_),
     num_left_inputs_(other.num_left_inputs_),
     num_right_inputs_(other.num_right_inputs_),
     time_stride_(other.time_stride_),
     context_dim_(other.context_dim_),
     num_left_inputs_required_(other.num_left_inputs_required_),
     num_right_inputs_required_(other.num_right_inputs_required_),
     output_context_(other.output_context_),
     key_scale_(other.key_scale_),
     stats_count_(other.stats_count_),
     entropy_stats_(other.entropy_stats_),
     posterior_stats_(other.posterior_stats_) { }

Member Function Documentation

◆ Add()

void Add	(	BaseFloat	alpha,
		const Component &	other
	)

virtual

This virtual function when called by – an UpdatableComponent adds the parameters of another updatable component, times some constant, to the current parameters.

– a NonlinearComponent (or another component that stores stats, like BatchNormComponent)– it relates to adding stats. Otherwise it will normally do nothing.

Reimplemented from Component.

Definition at line 261 of file nnet-attention-component.cc.

References CuMatrixBase< Real >::AddMat(), VectorBase< Real >::AddVec(), VectorBase< Real >::Dim(), RestrictedAttentionComponent::entropy_stats_, KALDI_ASSERT, CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), RestrictedAttentionComponent::posterior_stats_, Vector< Real >::Resize(), CuMatrix< Real >::Resize(), and RestrictedAttentionComponent::stats_count_.

Referenced by RestrictedAttentionComponent::Properties().

                                                                                  {
   const RestrictedAttentionComponent *other =
       dynamic_cast<const RestrictedAttentionComponent*>(&other_in);
   KALDI_ASSERT(other != NULL);
   if (entropy_stats_.Dim() == 0 && other->entropy_stats_.Dim() != 0)
     entropy_stats_.Resize(other->entropy_stats_.Dim());
   if (posterior_stats_.NumRows() == 0 && other->posterior_stats_.NumRows() != 0)
     posterior_stats_.Resize(other->posterior_stats_.NumRows(), other->posterior_stats_.NumCols());
   if (other->entropy_stats_.Dim() != 0)
     entropy_stats_.AddVec(alpha, other->entropy_stats_);
   if (other->posterior_stats_.NumRows() != 0)
     posterior_stats_.AddMat(alpha, other->posterior_stats_);
   stats_count_ += alpha * other->stats_count_;
 }

◆ Backprop()

void Backprop	(	const std::string &	debug_info,
		const ComponentPrecomputedIndexes *	indexes,
		const CuMatrixBase< BaseFloat > &	in_value,
		const CuMatrixBase< BaseFloat > &	out_value,
		const CuMatrixBase< BaseFloat > &	out_deriv,
		void *	memo,
		Component *	to_update,
		CuMatrixBase< BaseFloat > *	in_deriv
	)		const

virtual

Backprop function; depending on which of the arguments 'to_update' and 'in_deriv' are non-NULL, this can compute input-data derivatives and/or perform model update.

Parameters

[in]	debug_info	The component name, to be printed out in any warning messages.
[in]	indexes	A pointer to some information output by this class's PrecomputeIndexes function (will be NULL for simple components, i.e. those that don't do things like splicing).
[in]	in_value	The matrix that was given as input to the Propagate function. Will be ignored (and may be empty) if Properties()&kBackpropNeedsInput == 0.
[in]	out_value	The matrix that was output from the Propagate function. Will be ignored (and may be empty) if Properties()&kBackpropNeedsOutput == 0
[in]	out_deriv	The derivative at the output of this component.
[in]	memo	This will normally be NULL, but for component types that set the flag kUsesMemo, this will be the return value of the Propagate() function that corresponds to this Backprop() function. Ownership of any pointers is not transferred to the Backprop function; DeleteMemo() will be called to delete it.
[out]	to_update	If model update is desired, the Component to be updated, else NULL. Does not have to be identical to this. If supplied, you can assume that to_update->Properties() & kUpdatableComponent is nonzero.
[out]	in_deriv	The derivative at the input of this component, if needed (else NULL). If Properties()&kBackpropInPlace, may be the same matrix as out_deriv. If Properties()&kBackpropAdds, this is added to by the Backprop routine, else it is set. The component code chooses which mode to work in, based on convenience.

Implements Component.

Definition at line 292 of file nnet-attention-component.cc.

Referenced by RestrictedAttentionComponent::Properties().

                                              {
   NVTX_RANGE("RestrictedAttentionComponent::Backprop");
   const PrecomputedIndexes *indexes =
       dynamic_cast<const PrecomputedIndexes*>(indexes_in);
   KALDI_ASSERT(indexes != NULL);
   Memo *memo = static_cast<Memo*>(memo_in);
   KALDI_ASSERT(memo != NULL);
   const time_height_convolution::ConvolutionComputationIo &io = indexes->io;
   KALDI_ASSERT(indexes != NULL &&
                in_value.NumRows() == io.num_t_in * io.num_images &&
                out_deriv.NumRows() == io.num_t_out * io.num_images &&
                in_deriv != NULL && SameDim(in_value, *in_deriv));
 
   const CuMatrix<BaseFloat> &c = memo->c;
 
   int32 query_dim = key_dim_ + context_dim_,
       input_dim_per_head = key_dim_ + value_dim_ + query_dim,
       output_dim_per_head = value_dim_ + (output_context_ ? context_dim_ : 0);
 
   for (int32 h = 0; h < num_heads_; h++) {
     CuSubMatrix<BaseFloat>
         in_value_part(in_value, 0, in_value.NumRows(),
                       h * input_dim_per_head, input_dim_per_head),
         c_part(c, 0, out_deriv.NumRows(),
                h * context_dim_, context_dim_),
         out_deriv_part(out_deriv, 0, out_deriv.NumRows(),
                        h * output_dim_per_head, output_dim_per_head),
         in_deriv_part(*in_deriv, 0, in_value.NumRows(),
                       h * input_dim_per_head, input_dim_per_head);
     BackpropOneHead(io, in_value_part, c_part, out_deriv_part,
                     &in_deriv_part);
   }
 }

◆ BackpropOneHead()

void BackpropOneHead	(	const time_height_convolution::ConvolutionComputationIo &	io,
		const CuMatrixBase< BaseFloat > &	in_value,
		const CuMatrixBase< BaseFloat > &	c,
		const CuMatrixBase< BaseFloat > &	out_deriv,
		CuMatrixBase< BaseFloat > *	in_deriv
	)		const

private

Definition at line 335 of file nnet-attention-component.cc.

References kaldi::nnet3::attention::AttentionBackward(), RestrictedAttentionComponent::context_dim_, KALDI_ASSERT, RestrictedAttentionComponent::key_dim_, RestrictedAttentionComponent::key_scale_, ConvolutionComputationIo::num_images, ConvolutionComputationIo::num_t_in, ConvolutionComputationIo::num_t_out, CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), RestrictedAttentionComponent::output_context_, kaldi::SameDim(), ConvolutionComputationIo::start_t_in, ConvolutionComputationIo::start_t_out, ConvolutionComputationIo::t_step_in, ConvolutionComputationIo::t_step_out, and RestrictedAttentionComponent::value_dim_.

Referenced by RestrictedAttentionComponent::Backprop().

                                              {
   // the easiest way to understand this is to compare it with PropagateOneHead().
   int32 query_dim = key_dim_ + context_dim_,
       full_value_dim = value_dim_ + (output_context_ ? context_dim_ : 0);
   KALDI_ASSERT(in_value.NumRows() == io.num_images * io.num_t_in &&
                out_deriv.NumRows() == io.num_images * io.num_t_out &&
                out_deriv.NumCols() == full_value_dim &&
                in_value.NumCols() == (key_dim_ + value_dim_ + query_dim) &&
                io.t_step_in == io.t_step_out &&
                (io.start_t_out - io.start_t_in) % io.t_step_in == 0 &&
                SameDim(in_value, *in_deriv) &&
                c.NumRows() == out_deriv.NumRows() &&
                c.NumCols() == context_dim_);
 
   // 'steps_left_context' is the number of time-steps the input has on the left
   // that don't appear in the output.
   int32 steps_left_context = (io.start_t_out - io.start_t_in) / io.t_step_in,
       rows_left_context = steps_left_context * io.num_images;
   KALDI_ASSERT(rows_left_context >= 0);
 
 
   CuSubMatrix<BaseFloat> queries(in_value, rows_left_context, out_deriv.NumRows(),
                                  key_dim_ + value_dim_, query_dim),
       queries_deriv(*in_deriv, rows_left_context, out_deriv.NumRows(),
                     key_dim_ + value_dim_, query_dim),
       keys(in_value, 0, in_value.NumRows(), 0, key_dim_),
       keys_deriv(*in_deriv,  0, in_value.NumRows(), 0, key_dim_),
       values(in_value, 0, in_value.NumRows(), key_dim_, value_dim_),
       values_deriv(*in_deriv, 0, in_value.NumRows(), key_dim_, value_dim_);
 
   attention::AttentionBackward(key_scale_, keys, queries, values, c, out_deriv,
                                &keys_deriv, &queries_deriv, &values_deriv);
 }

◆ Check()

void Check ( ) const

private

Definition at line 277 of file nnet-attention-component.cc.

Referenced by RestrictedAttentionComponent::InitFromConfig().

                                                {
   KALDI_ASSERT(num_heads_ > 0 && key_dim_ > 0 && value_dim_ > 0 &&
                num_left_inputs_ >= 0 && num_right_inputs_ >= 0 &&
                (num_left_inputs_ + num_right_inputs_) > 0 &&
                time_stride_ > 0 &&
                context_dim_ == (num_left_inputs_ + 1 + num_right_inputs_) &&
                num_left_inputs_required_ >= 0 &&
                num_left_inputs_required_ <= num_left_inputs_ &&
                num_right_inputs_required_ >= 0 &&
                num_right_inputs_required_ <= num_right_inputs_ &&
                key_scale_ > 0.0 && key_scale_ <= 1.0 &&
                stats_count_ >= 0.0);
 }

◆ Copy()

virtual Component* Copy ( ) const

inlinevirtual

Copies component (deep copy).

Implements Component.

Definition at line 155 of file nnet-attention-component.h.

References RestrictedAttentionComponent::RestrictedAttentionComponent().

                                   {
     return new RestrictedAttentionComponent(*this);
   }

◆ CreateIndexesVector()

void CreateIndexesVector	(	const std::vector< std::pair< int32, int32 > > &	n_x_pairs,
		int32	t_start,
		int32	t_step,
		int32	num_t_values,
		const std::unordered_set< Index, IndexHasher > &	index_set,
		std::vector< Index > *	output_indexes
	)

staticprivate

Utility function used in GetIndexes().

Creates a grid of Indexes, where 't' has the larger stride, and within each block of Indexes for a given 't', we have the given list of (n, x) pairs. For Indexes that we create where the 't' value was not present in 'index_set', we set the 't' value to kNoTime (indicating that it's only for padding, not a real input or an output that's ever used).

Definition at line 575 of file nnet-attention-component.cc.

References KALDI_ASSERT, and kaldi::nnet3::kNoTime.

Referenced by RestrictedAttentionComponent::GetIndexes().

                                       {
   output_indexes->resize(static_cast<size_t>(num_t_values) * n_x_pairs.size());
   std::vector<Index>::iterator out_iter = output_indexes->begin();
   for (int32 t = t_start; t < t_start + (t_step * num_t_values); t += t_step) {
     std::vector<std::pair<int32, int32> >::const_iterator
         iter = n_x_pairs.begin(), end = n_x_pairs.end();
     for (; iter != end; ++iter) {
       out_iter->n = iter->first;
       out_iter->t = t;
       out_iter->x = iter->second;
       if (index_set.count(*out_iter) == 0)
         out_iter->t = kNoTime;
       ++out_iter;
     }
   }
   KALDI_ASSERT(out_iter == output_indexes->end());
 }

◆ DeleteMemo()

virtual void DeleteMemo ( void * memo ) const

inlinevirtual

This virtual function only needs to be overwritten by Components that return a non-NULL memo from their Propagate() function.

It's called by NnetComputer in cases where Propagate returns a memo but there will be no backprop to consume it.

Reimplemented from Component.

Definition at line 158 of file nnet-attention-component.h.

References RestrictedAttentionComponent::GetInputIndexes(), RestrictedAttentionComponent::IsComputable(), RestrictedAttentionComponent::PrecomputeIndexes(), and RestrictedAttentionComponent::ReorderIndexes().

158 { delete static_cast<Memo*>(memo); }

◆ GetComputationStructure()

void GetComputationStructure	(	const std::vector< Index > &	input_indexes,
		const std::vector< Index > &	output_indexes,
		time_height_convolution::ConvolutionComputationIo *	io
	)		const

private

Definition at line 389 of file nnet-attention-component.cc.

References kaldi::Gcd(), kaldi::nnet3::time_height_convolution::GetComputationIo(), KALDI_ASSERT, RestrictedAttentionComponent::num_left_inputs_, RestrictedAttentionComponent::num_left_inputs_required_, RestrictedAttentionComponent::num_right_inputs_, RestrictedAttentionComponent::num_right_inputs_required_, ConvolutionComputationIo::num_t_in, ConvolutionComputationIo::num_t_out, ConvolutionComputationIo::start_t_in, ConvolutionComputationIo::start_t_out, ConvolutionComputationIo::t_step_in, ConvolutionComputationIo::t_step_out, and RestrictedAttentionComponent::time_stride_.

Referenced by RestrictedAttentionComponent::PrecomputeIndexes(), and RestrictedAttentionComponent::ReorderIndexes().

                                                                  {
   GetComputationIo(input_indexes, output_indexes, io);
   // if there was only one output and/or input index (unlikely),
   // just let the grid periodicity be t_stride_.
   if (io->t_step_out == 0) io->t_step_out = time_stride_;
   if (io->t_step_in == 0) io->t_step_in = time_stride_;
 
   // We need the grid size on the input and output to be the same, and to divide
   // t_stride_.  If someone is requesting the output more frequently than
   // t_stride_, then after this change we may end up computing more outputs than
   // we need, but this is not a configuration that I think is very likely.  We
   // let the grid step be the gcd of the input and output steps, and of
   // t_stride_.
   // The next few statements may have the effect of making the grid finer at the
   // input and output, while having the same start and end point.
   int32 t_step = Gcd(Gcd(io->t_step_out, io->t_step_in), time_stride_);
   int32 multiple_out = io->t_step_out / t_step,
       multiple_in = io->t_step_in / t_step;
   io->t_step_in = t_step;
   io->t_step_out = t_step;
   io->num_t_out = 1 + multiple_out * (io->num_t_out - 1);
   io->num_t_in = 1 + multiple_in * (io->num_t_in - 1);
 
   // Now ensure that the extent of the input has at least
   // the requested left-context and right context; if
   // this increases the amount of input, we'll do zero-padding.
   int32 first_requested_input =
           io->start_t_out - (time_stride_ * num_left_inputs_),
       first_required_input =
          io->start_t_out - (time_stride_ * num_left_inputs_required_),
       last_t_out = io->start_t_out + (io->num_t_out - 1) * t_step,
       last_t_in = io->start_t_in + (io->num_t_in - 1) * t_step,
       last_requested_input = last_t_out + (time_stride_ * num_right_inputs_),
       last_required_input =
            last_t_out + (time_stride_ * num_right_inputs_required_);
 
   // check that we don't have *more* than the requested context,
   // but that we have at least the required context.
   KALDI_ASSERT(io->start_t_in >= first_requested_input &&
                last_t_in <= last_requested_input &&
                io->start_t_in <= first_required_input &&
                last_t_in >= last_required_input);
 
   // For the inputs that were requested, but not required,
   // we pad with zeros.  We pad the 'io' object, adding these
   // extra inputs structurally; in runtime they'll be set to zero.
   io->start_t_in = first_requested_input;
   io->num_t_in = 1 + (last_requested_input - first_requested_input) / t_step;
 }

◆ GetIndexes()

void GetIndexes	(	const std::vector< Index > &	input_indexes,
		const std::vector< Index > &	output_indexes,
		time_height_convolution::ConvolutionComputationIo &	io,
		std::vector< Index > *	new_input_indexes,
		std::vector< Index > *	new_output_indexes
	)		const

private

Definition at line 597 of file nnet-attention-component.cc.

References RestrictedAttentionComponent::CreateIndexesVector(), kaldi::nnet3::GetNxList(), KALDI_ASSERT, ConvolutionComputationIo::num_images, ConvolutionComputationIo::num_t_in, ConvolutionComputationIo::num_t_out, ConvolutionComputationIo::start_t_in, ConvolutionComputationIo::start_t_out, ConvolutionComputationIo::t_step_in, and ConvolutionComputationIo::t_step_out.

Referenced by RestrictedAttentionComponent::PrecomputeIndexes(), and RestrictedAttentionComponent::ReorderIndexes().

                                                   {
 
   std::unordered_set<Index, IndexHasher> input_set, output_set;
   for (std::vector<Index>::const_iterator iter = input_indexes.begin();
        iter != input_indexes.end(); ++iter)
     input_set.insert(*iter);
   for (std::vector<Index>::const_iterator iter = output_indexes.begin();
        iter != output_indexes.end(); ++iter)
     output_set.insert(*iter);
 
   std::vector<std::pair<int32, int32> > n_x_pairs;
   GetNxList(input_indexes, &n_x_pairs);  // the n,x pairs at the output will be
                                          // identical.
   KALDI_ASSERT(n_x_pairs.size() == io.num_images);
   CreateIndexesVector(n_x_pairs, io.start_t_in, io.t_step_in, io.num_t_in,
                       input_set, new_input_indexes);
   CreateIndexesVector(n_x_pairs, io.start_t_out, io.t_step_out, io.num_t_out,
                       output_set, new_output_indexes);
 }

◆ GetInputIndexes()

void GetInputIndexes	(	const MiscComputationInfo &	misc_info,
		const Index &	output_index,
		std::vector< Index > *	desired_indexes
	)		const

virtual

This function only does something interesting for non-simple Components.

For a given index at the output of the component, tells us what indexes are required at its input (note: "required" encompasses also optionally-required things; it will enumerate all things that we'd like to have). See also IsComputable().

Parameters

[in]	misc_info	This argument is supplied to handle things that the framework can't very easily supply: information like which time indexes are needed for AggregateComponent, which time-indexes are available at the input of a recurrent network, and so on. We will add members to misc_info as needed.
[in]	output_index	The Index at the output of the component, for which we are requesting the list of indexes at the component's input.
[out]	desired_indexes	A list of indexes that are desired at the input. are to be written to here. By "desired" we mean required or optionally-required.

The default implementation of this function is suitable for any SimpleComponent; it just copies the output_index to a single identical element in input_indexes.

Reimplemented from Component.

Definition at line 507 of file nnet-attention-component.cc.

References RestrictedAttentionComponent::context_dim_, rnnlm::i, KALDI_ASSERT, kaldi::nnet3::kNoTime, Index::n, rnnlm::n, RestrictedAttentionComponent::num_left_inputs_, RestrictedAttentionComponent::num_right_inputs_, Index::t, RestrictedAttentionComponent::time_stride_, and Index::x.

Referenced by RestrictedAttentionComponent::DeleteMemo().

                                              {
   KALDI_ASSERT(output_index.t != kNoTime);
   int32 first_time = output_index.t - (time_stride_ * num_left_inputs_),
       last_time = output_index.t + (time_stride_ * num_right_inputs_);
   desired_indexes->clear();
   desired_indexes->resize(context_dim_);
   int32 n = output_index.n, x = output_index.x,
       i = 0;
   for (int32 t = first_time; t <= last_time; t += time_stride_, i++) {
     (*desired_indexes)[i].n = n;
     (*desired_indexes)[i].t = t;
     (*desired_indexes)[i].x = x;
   }
   KALDI_ASSERT(i == context_dim_);
 }

◆ Info()

std::string Info ( ) const

virtual

Returns some text-form information about this component, for diagnostics.

Starts with the type of the component. E.g. "SigmoidComponent dim=900", although most components will have much more info.

Reimplemented from Component.

Definition at line 32 of file nnet-attention-component.cc.

Referenced by RestrictedAttentionComponent::OutputDim().

                                                    {
   std::stringstream stream;
   stream << Type() << ", input-dim=" << InputDim()
          << ", output-dim=" << OutputDim()
          << ", num-heads=" << num_heads_
          << ", time-stride=" << time_stride_
          << ", key-dim=" << key_dim_
          << ", value-dim=" << value_dim_
          << ", num-left-inputs=" << num_left_inputs_
          << ", num-right-inputs=" << num_right_inputs_
          << ", context-dim=" << context_dim_
          << ", num-left-inputs-required=" << num_left_inputs_required_
          << ", num-right-inputs-required=" << num_right_inputs_required_
          << ", output-context=" << (output_context_ ? "true" : "false")
          << ", key-scale=" << key_scale_;
   if (stats_count_ != 0.0) {
     stream << ", entropy=";
     for (int32 i = 0; i < entropy_stats_.Dim(); i++)
       stream << (entropy_stats_(i) / stats_count_) << ',';
     for (int32 i = 0; i < num_heads_ && i < 5; i++) {
       stream << " posterior-stats[" << i <<"]=";
       for (int32 j = 0; j < posterior_stats_.NumCols(); j++)
         stream << (posterior_stats_(i,j) / stats_count_) << ',';
     }
     stream << " stats-count=" << stats_count_;
   }
   return stream.str();
 }

◆ InitFromConfig()

void InitFromConfig ( ConfigLine * cfl )

virtual

Initialize, from a ConfigLine object.

Parameters

[in] cfl A ConfigLine containing any parameters that are needed for initialization. For example: "dim=100 param-stddev=0.1"

Implements Component.

Definition at line 80 of file nnet-attention-component.cc.

Referenced by RestrictedAttentionComponent::OutputDim().

                                                                  {
   num_heads_ = 1;
   key_dim_ = -1;
   value_dim_ = -1;
   num_left_inputs_ = -1;
   num_right_inputs_ = -1;
   time_stride_ = 1;
   num_left_inputs_required_ = -1;
   num_right_inputs_required_ = -1;
   output_context_ = true;
   key_scale_ = -1.0;
 
 
   // mandatory arguments.
   bool ok = cfl->GetValue("key-dim", &key_dim_) &&
       cfl->GetValue("value-dim", &value_dim_) &&
       cfl->GetValue("num-left-inputs", &num_left_inputs_) &&
       cfl->GetValue("num-right-inputs", &num_right_inputs_);
 
   if (!ok)
     KALDI_ERR << "All of the values key-dim, value-dim, "
         "num-left-inputs and num-right-inputs must be defined.";
   // optional arguments.
   cfl->GetValue("num-heads", &num_heads_);
   cfl->GetValue("time-stride", &time_stride_);
   cfl->GetValue("num-left-inputs-required", &num_left_inputs_required_);
   cfl->GetValue("num-right-inputs-required", &num_right_inputs_required_);
   cfl->GetValue("output-context", &output_context_);
   cfl->GetValue("key-scale", &key_scale_);
 
   if (key_scale_ < 0.0) key_scale_ = 1.0 / sqrt(key_dim_);
   if (num_left_inputs_required_ < 0)
     num_left_inputs_required_ = num_left_inputs_;
   if (num_right_inputs_required_ < 0)
     num_right_inputs_required_ = num_right_inputs_;
 
   if (num_heads_ <= 0 || key_dim_ <= 0 || value_dim_ <= 0 ||
       num_left_inputs_ < 0 || num_right_inputs_ < 0 ||
       (num_left_inputs_ + num_right_inputs_) <= 0 ||
       num_left_inputs_required_ > num_left_inputs_ ||
       num_right_inputs_required_ > num_right_inputs_ ||
       time_stride_ <= 0)
     KALDI_ERR << "Config line contains invalid values: "
               << cfl->WholeLine();
   stats_count_ = 0.0;
   context_dim_ = num_left_inputs_ + 1 + num_right_inputs_;
   Check();
 }

◆ InputDim()

virtual int32 InputDim ( ) const

inlinevirtual

Returns input-dimension of this component.

Implements Component.

Definition at line 115 of file nnet-attention-component.h.

References RestrictedAttentionComponent::context_dim_, RestrictedAttentionComponent::key_dim_, RestrictedAttentionComponent::num_heads_, and RestrictedAttentionComponent::value_dim_.

Referenced by RestrictedAttentionComponent::Info().

                                  {
     // the input is interpreted as being appended blocks one for each head; each
     // such block is interpreted as (key, value, query).
     int32 query_dim = key_dim_ + context_dim_;
     return num_heads_ * (key_dim_ + value_dim_ + query_dim);
   }

◆ IsComputable()

bool IsComputable	(	const MiscComputationInfo &	misc_info,
		const Index &	output_index,
		const IndexSet &	input_index_set,
		std::vector< Index > *	used_inputs
	)		const

virtual

This function only does something interesting for non-simple Components, and it exists to make it possible to manage optionally-required inputs.

It tells the user whether a given output index is computable from a given set of input indexes, and if so, says which input indexes will be used in the computation.

Implementations of this function are required to have the property that adding an element to "input_index_set" can only ever change IsComputable from false to true, never vice versa.

Parameters

[in]	misc_info	Some information specific to the computation, such as minimum and maximum times for certain components to do adaptation on; it's a place to put things that don't easily fit in the framework.
[in]	output_index	The index that is to be computed at the output of this Component.
[in]	input_index_set	The set of indexes that is available at the input of this Component.
[out]	used_inputs	If this is non-NULL and the output is computable this will be set to the list of input indexes that will actually be used in the computation.

Returns: Returns true iff this output is computable from the provided inputs.

The default implementation of this function is suitable for any SimpleComponent: it just returns true if output_index is in input_index_set, and if so sets used_inputs to vector containing that one Index.

Reimplemented from Component.

Definition at line 527 of file nnet-attention-component.cc.

References RestrictedAttentionComponent::context_dim_, KALDI_ASSERT, kaldi::nnet3::kNoTime, RestrictedAttentionComponent::num_left_inputs_, RestrictedAttentionComponent::num_left_inputs_required_, RestrictedAttentionComponent::num_right_inputs_, RestrictedAttentionComponent::num_right_inputs_required_, Index::t, and RestrictedAttentionComponent::time_stride_.

Referenced by RestrictedAttentionComponent::DeleteMemo().

                                          {
   KALDI_ASSERT(output_index.t != kNoTime);
   Index index(output_index);
 
   if (used_inputs != NULL) {
     int32 first_time = output_index.t - (time_stride_ * num_left_inputs_),
         last_time = output_index.t + (time_stride_ * num_right_inputs_);
     used_inputs->clear();
     used_inputs->reserve(context_dim_);
 
     for (int32 t = first_time; t <= last_time; t += time_stride_) {
       index.t = t;
       if (input_index_set(index)) {
         // This input index is available.
         used_inputs->push_back(index);
       } else {
         // This input index is not available.
         int32 offset = (t - output_index.t) / time_stride_;
         if (offset >= -num_left_inputs_required_ &&
             offset <= num_right_inputs_required_) {
           used_inputs->clear();
           return false;
         }
       }
     }
     // All required time-offsets of the output were computable. -> return true.
     return true;
   } else {
     int32 t = output_index.t,
         first_time_required = t - (time_stride_ * num_left_inputs_required_),
         last_time_required = t + (time_stride_ * num_right_inputs_required_);
     for (int32 t = first_time_required;
          t <= last_time_required;
          t += time_stride_) {
       index.t = t;
       if (!input_index_set(index))
         return false;
     }
     return true;
   }
 }

◆ OutputDim()

virtual int32 OutputDim ( ) const

inlinevirtual

Returns output-dimension of this component.

Implements Component.

Definition at line 121 of file nnet-attention-component.h.

References RestrictedAttentionComponent::context_dim_, RestrictedAttentionComponent::Info(), RestrictedAttentionComponent::InitFromConfig(), RestrictedAttentionComponent::num_heads_, RestrictedAttentionComponent::output_context_, and RestrictedAttentionComponent::value_dim_.

Referenced by RestrictedAttentionComponent::Info().

                                   {
     // the output consists of appended blocks, one for each head; each such
     // block is is the attention weighted average of the input values, to which
     // we append softmax encoding of the positions we chose, if output_context_
     // == true.
     return num_heads_ * (value_dim_ + (output_context_ ? context_dim_ : 0));
   }

◆ PrecomputeIndexes()

ComponentPrecomputedIndexes * PrecomputeIndexes	(	const MiscComputationInfo &	misc_info,
		const std::vector< Index > &	input_indexes,
		const std::vector< Index > &	output_indexes,
		bool	need_backprop
	)		const

virtual

This function must return NULL for simple Components.

Returns a pointer to a class that may contain some precomputed component-specific and computation-specific indexes to be in used in the Propagate and Backprop functions.

Parameters

[in]	misc_info	This argument is supplied to handle things that the framework can't very easily supply: information like which time indexes are needed for AggregateComponent, which time-indexes are available at the input of a recurrent network, and so on. misc_info may not even ever be used here. We will add members to misc_info as needed.
[in]	input_indexes	A vector of indexes that explains what time-indexes (and other indexes) each row of the in/in_value/in_deriv matrices given to Propagate and Backprop will mean.
[in]	output_indexes	A vector of indexes that explains what time-indexes (and other indexes) each row of the out/out_value/out_deriv matrices given to Propagate and Backprop will mean.
[in]	need_backprop	True if we might need to do backprop with this component, so that if any different indexes are needed for backprop then those should be computed too.

Returns: Returns a child-class of class ComponentPrecomputedIndexes, or NULL if this component for does not need to precompute any indexes (e.g. if it is a simple component and does not care about indexes).

Reimplemented from Component.

Definition at line 622 of file nnet-attention-component.cc.

References RestrictedAttentionComponent::GetComputationStructure(), RestrictedAttentionComponent::GetIndexes(), kaldi::GetVerboseLevel(), RestrictedAttentionComponent::PrecomputedIndexes::io, and KALDI_ASSERT.

Referenced by RestrictedAttentionComponent::DeleteMemo().

           {
   PrecomputedIndexes *ans = new PrecomputedIndexes();
   GetComputationStructure(input_indexes, output_indexes, &(ans->io));
   if (GetVerboseLevel() >= 2) {
     // what goes next is just a check.
     std::vector<Index> new_input_indexes, new_output_indexes;
     GetIndexes(input_indexes, output_indexes, ans->io,
                &new_input_indexes, &new_output_indexes);
     // input_indexes and output_indexes should be the ones that were
     // output by ReorderIndexes(), so they should already
     // have gone through the GetComputationStructure()->GetIndexes()
     // procedure.  Applying the same procedure twice is supposed to
     // give an unchanged results.
     KALDI_ASSERT(input_indexes == new_input_indexes &&
                  output_indexes == new_output_indexes);
   }
   return ans;
 }

◆ Propagate()

void * Propagate	(	const ComponentPrecomputedIndexes *	indexes,
		const CuMatrixBase< BaseFloat > &	in,
		CuMatrixBase< BaseFloat > *	out
	)		const

virtual

Propagate function.

Parameters

[in]	indexes	A pointer to some information output by this class's PrecomputeIndexes function (will be NULL for simple components, i.e. those that don't do things like splicing).
[in]	in	The input to this component. Num-columns == InputDim().
[out]	out	The output of this component. Num-columns == OutputDim(). Note: output of this component will be added to the initial value of "out" if Properties()&kPropagateAdds != 0; otherwise the output will be set and the initial value ignored. Each Component chooses whether it is more convenient implementation-wise to add or set, and the calling code has to deal with it.

Returns: Normally returns NULL, but may return a non-NULL value for components which have the flag kUsesMemo set. This value will be passed into the corresponding Backprop routine.

Implements Component.

Definition at line 132 of file nnet-attention-component.cc.

References RestrictedAttentionComponent::context_dim_, RestrictedAttentionComponent::PrecomputedIndexes::io, KALDI_ASSERT, RestrictedAttentionComponent::key_dim_, RestrictedAttentionComponent::num_heads_, ConvolutionComputationIo::num_images, ConvolutionComputationIo::num_t_in, ConvolutionComputationIo::num_t_out, CuMatrixBase< Real >::NumRows(), RestrictedAttentionComponent::output_context_, RestrictedAttentionComponent::PropagateOneHead(), and RestrictedAttentionComponent::value_dim_.

Referenced by RestrictedAttentionComponent::Properties().

                                                                             {
   const PrecomputedIndexes *indexes = dynamic_cast<const PrecomputedIndexes*>(
       indexes_in);
   KALDI_ASSERT(indexes != NULL &&
                in.NumRows() == indexes->io.num_t_in * indexes->io.num_images &&
                out->NumRows() == indexes->io.num_t_out * indexes->io.num_images);
 
 
   Memo *memo = new Memo();
   memo->c.Resize(out->NumRows(), context_dim_ * num_heads_);
 
   int32 query_dim = key_dim_ + context_dim_;
   int32 input_dim_per_head = key_dim_ + value_dim_ + query_dim,
       output_dim_per_head = value_dim_ + (output_context_ ? context_dim_ : 0);
   for (int32 h = 0; h < num_heads_; h++) {
     CuSubMatrix<BaseFloat> in_part(in, 0, in.NumRows(),
                                    h * input_dim_per_head, input_dim_per_head),
         c_part(memo->c, 0, out->NumRows(),
                h * context_dim_, context_dim_),
         out_part(*out, 0, out->NumRows(),
                  h * output_dim_per_head, output_dim_per_head);
     PropagateOneHead(indexes->io, in_part, &c_part, &out_part);
   }
   return static_cast<void*>(memo);
 }

◆ PropagateOneHead()

void PropagateOneHead	(	const time_height_convolution::ConvolutionComputationIo &	io,
		const CuMatrixBase< BaseFloat > &	in,
		CuMatrixBase< BaseFloat > *	c,
		CuMatrixBase< BaseFloat > *	out
	)		const

private

Definition at line 160 of file nnet-attention-component.cc.

References kaldi::nnet3::attention::AttentionForward(), RestrictedAttentionComponent::context_dim_, KALDI_ASSERT, RestrictedAttentionComponent::key_dim_, RestrictedAttentionComponent::key_scale_, ConvolutionComputationIo::num_images, ConvolutionComputationIo::num_t_in, ConvolutionComputationIo::num_t_out, CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), RestrictedAttentionComponent::output_context_, ConvolutionComputationIo::start_t_in, ConvolutionComputationIo::start_t_out, ConvolutionComputationIo::t_step_in, ConvolutionComputationIo::t_step_out, and RestrictedAttentionComponent::value_dim_.

Referenced by RestrictedAttentionComponent::Propagate().

                                         {
   int32 query_dim = key_dim_ + context_dim_,
       full_value_dim = value_dim_ + (output_context_ ? context_dim_ : 0);
   KALDI_ASSERT(in.NumRows() == io.num_images * io.num_t_in &&
                out->NumRows() == io.num_images * io.num_t_out &&
                out->NumCols() == full_value_dim &&
                in.NumCols() == (key_dim_ + value_dim_ + query_dim) &&
                io.t_step_in == io.t_step_out &&
                (io.start_t_out - io.start_t_in) % io.t_step_in == 0);
 
   // 'steps_left_context' is the number of time-steps the input has on the left
   // that don't appear in the output.
   int32 steps_left_context = (io.start_t_out - io.start_t_in) / io.t_step_in,
       rows_left_context = steps_left_context * io.num_images;
   KALDI_ASSERT(rows_left_context >= 0);
 
   // 'queries' contains the queries.  We don't use all rows of the input
   // queries; only the rows that correspond to the time-indexes at the
   // output, i.e. we exclude the left-context and right-context.
   // 'out'; the remaining rows of 'in' that we didn't select correspond to left
   // and right temporal context.
   CuSubMatrix<BaseFloat> queries(in, rows_left_context, out->NumRows(),
                                  key_dim_ + value_dim_, query_dim);
   // 'keys' contains the keys; note, these are not extended with
   // context information; that happens further in.
   CuSubMatrix<BaseFloat> keys(in, 0, in.NumRows(), 0, key_dim_);
 
   // 'values' contains the values which we will interpolate.
   // these don't contain the context information; that will be added
   // later if output_context_ == true.
   CuSubMatrix<BaseFloat> values(in, 0, in.NumRows(), key_dim_, value_dim_);
 
   attention::AttentionForward(key_scale_, keys, queries, values, c, out);
 }

◆ Properties()

virtual int32 Properties ( ) const

inlinevirtual

Return bitmask of the component's properties.

These properties depend only on the component's type. See enum ComponentProperties.

Implements Component.

Definition at line 131 of file nnet-attention-component.h.

                                    {
     return kReordersIndexes|kBackpropNeedsInput|kPropagateAdds|kBackpropAdds|
         kStoresStats|kUsesMemo;
   }

◆ Read()

void Read	(	std::istream &	is,
		bool	binary
	)

virtual

Read function (used after we know the type of the Component); accepts input that is missing the token that describes the component type, in case it has already been consumed.

Implements Component.

Definition at line 473 of file nnet-attention-component.cc.

Referenced by RestrictedAttentionComponent::Properties().

                                                                    {
   ExpectOneOrTwoTokens(is, binary, "<RestrictedAttentionComponent>",
                        "<NumHeads>");
   ReadBasicType(is, binary, &num_heads_);
   ExpectToken(is, binary, "<KeyDim>");
   ReadBasicType(is, binary, &key_dim_);
   ExpectToken(is, binary, "<ValueDim>");
   ReadBasicType(is, binary, &value_dim_);
   ExpectToken(is, binary, "<NumLeftInputs>");
   ReadBasicType(is, binary, &num_left_inputs_);
   ExpectToken(is, binary, "<NumRightInputs>");
   ReadBasicType(is, binary, &num_right_inputs_);
   ExpectToken(is, binary, "<TimeStride>");
   ReadBasicType(is, binary, &time_stride_);
   ExpectToken(is, binary, "<NumLeftInputsRequired>");
   ReadBasicType(is, binary, &num_left_inputs_required_);
   ExpectToken(is, binary, "<NumRightInputsRequired>");
   ReadBasicType(is, binary, &num_right_inputs_required_);
   ExpectToken(is, binary, "<OutputContext>");
   ReadBasicType(is, binary, &output_context_);
   ExpectToken(is, binary, "<KeyScale>");
   ReadBasicType(is, binary, &key_scale_);
   ExpectToken(is, binary, "<StatsCount>");
   ReadBasicType(is, binary, &stats_count_);
   ExpectToken(is, binary, "<EntropyStats>");
   entropy_stats_.Read(is, binary);
   ExpectToken(is, binary, "<PosteriorStats>");
   posterior_stats_.Read(is, binary);
   ExpectToken(is, binary, "</RestrictedAttentionComponent>");
 
   context_dim_ = num_left_inputs_ + 1 + num_right_inputs_;
 }

◆ ReorderIndexes()

void ReorderIndexes	(	std::vector< Index > *	input_indexes,
		std::vector< Index > *	output_indexes
	)		const

virtual

This function only does something interesting for non-simple Components.

It provides an opportunity for a Component to reorder the or pad the indexes at its input and output. This might be useful, for instance, if a component requires a particular ordering of the indexes that doesn't correspond to their natural ordering. Components that might modify the indexes are required to return the kReordersIndexes flag in their Properties(). The ReorderIndexes() function is now allowed to insert blanks into the indexes. The 'blanks' must be of the form (n,kNoTime,x), where the marker kNoTime (a very negative number) is there where the 't' indexes normally live. The reason we don't just have, say, (-1,-1,-1), relates to the need to preserve a regular pattern over the 'n' indexes so that 'shortcut compilation' (c.f. ExpandComputation()) can work correctly

Parameters

[in,out]	Indexes	at the input of the Component.
[in,out]	Indexes	at the output of the Component

Reimplemented from Component.

Definition at line 376 of file nnet-attention-component.cc.

References RestrictedAttentionComponent::GetComputationStructure(), and RestrictedAttentionComponent::GetIndexes().

Referenced by RestrictedAttentionComponent::DeleteMemo().

                                             {
   using namespace time_height_convolution;
   ConvolutionComputationIo io;
   GetComputationStructure(*input_indexes, *output_indexes, &io);
   std::vector<Index> new_input_indexes, new_output_indexes;
   GetIndexes(*input_indexes, *output_indexes, io,
              &new_input_indexes, &new_output_indexes);
   input_indexes->swap(new_input_indexes);
   output_indexes->swap(new_output_indexes);
 }

◆ Scale()

void Scale ( BaseFloat scale )

virtual

This virtual function when called on – an UpdatableComponent scales the parameters by "scale" when called by an UpdatableComponent.

– a Nonlinear component (or another component that stores stats, like BatchNormComponent)– it relates to scaling activation stats, not parameters. Otherwise it will normally do nothing.

Reimplemented from Component.

Definition at line 255 of file nnet-attention-component.cc.

References RestrictedAttentionComponent::entropy_stats_, RestrictedAttentionComponent::posterior_stats_, VectorBase< Real >::Scale(), CuMatrixBase< Real >::Scale(), and RestrictedAttentionComponent::stats_count_.

Referenced by RestrictedAttentionComponent::Properties().

                                                         {
   entropy_stats_.Scale(scale);
   posterior_stats_.Scale(scale);
   stats_count_ *= scale;
 }

◆ StoreStats()

void StoreStats	(	const CuMatrixBase< BaseFloat > &	in_value,
		const CuMatrixBase< BaseFloat > &	out_value,
		void *	memo
	)

virtual

This function may store stats on average activation values, and for some component types, the average value of the derivative of the nonlinearity.

It only does something for those components that have nonzero Properties()&kStoresStats.

Parameters

[in]	in_value	The input to the Propagate() function. Note: if the component sets the flag kPropagateInPlace, this should not be used; the empty matrix will be provided here if in-place propagation was used.
[in]	out_value	The output of the Propagate() function.
[in]	memo	The 'memo' returned by the Propagate() function; this will usually be NULL.

Reimplemented from Component.

Definition at line 200 of file nnet-attention-component.cc.

References CuVectorBase< Real >::AddDiagMatMat(), CuMatrixBase< Real >::AddMat(), CuVectorBase< Real >::AddRowSumMat(), VectorBase< Real >::AddVec(), CuMatrixBase< Real >::ApplyFloor(), CuMatrixBase< Real >::ApplyLog(), RestrictedAttentionComponent::Memo::c, RestrictedAttentionComponent::context_dim_, CuVectorBase< Real >::Data(), VectorBase< Real >::Dim(), RestrictedAttentionComponent::entropy_stats_, KALDI_ASSERT, kaldi::kNoTrans, kaldi::kTrans, RestrictedAttentionComponent::num_heads_, CuMatrixBase< Real >::NumCols(), CuMatrixBase< Real >::NumRows(), RestrictedAttentionComponent::posterior_stats_, kaldi::RandInt(), Vector< Real >::Resize(), CuMatrix< Real >::Resize(), and RestrictedAttentionComponent::stats_count_.

Referenced by RestrictedAttentionComponent::Properties().

                    {
   const Memo *memo = static_cast<const Memo*>(memo_in);
   KALDI_ASSERT(memo != NULL);
   if (entropy_stats_.Dim() != num_heads_) {
     entropy_stats_.Resize(num_heads_);
     posterior_stats_.Resize(num_heads_, context_dim_);
     stats_count_ = 0.0;
   }
   const CuMatrix<BaseFloat> &c = memo->c;
   if (RandInt(0, 2) == 0)
     return;  // only actually store the stats for one in three minibatches, to
              // save time.
 
   { // first get the posterior stats.
     CuVector<BaseFloat> c_sum(num_heads_ * context_dim_);
     c_sum.AddRowSumMat(1.0, c, 0.0);
     // view the vector as a matrix.
     CuSubMatrix<BaseFloat> c_sum_as_mat(c_sum.Data(), num_heads_,
                                         context_dim_, context_dim_);
     CuMatrix<double> c_sum_as_mat_dbl(c_sum_as_mat);
     posterior_stats_.AddMat(1.0, c_sum_as_mat_dbl);
     KALDI_ASSERT(c.NumCols() == num_heads_ * context_dim_);
   }
   { // now get the entropy stats.
     CuMatrix<BaseFloat> log_c(c);
     log_c.ApplyFloor(1.0e-20);
     log_c.ApplyLog();
     CuVector<BaseFloat> dot_prod(num_heads_ * context_dim_);
     dot_prod.AddDiagMatMat(-1.0, c, kTrans, log_c, kNoTrans, 0.0);
     // dot_prod is the sum over the matrix's rows (which correspond
     // to heads, and context positions), of - c * log(c), which is
     // part of the entropy.  To get the actual contribution to the
     // entropy, we have to sum 'dot_prod' over blocks of
     // size 'context_dim_'; that gives us the entropy contribution
     // per head.  We'd have to divide by c.NumRows() to get the
     // actual entropy, but that's reflected in stats_count_.
     CuSubMatrix<BaseFloat> entropy_mat(dot_prod.Data(), num_heads_,
                                        context_dim_, context_dim_);
     CuVector<BaseFloat> entropy_vec(num_heads_);
     entropy_vec.AddColSumMat(1.0, entropy_mat);
     Vector<double> entropy_vec_dbl(entropy_vec);
     entropy_stats_.AddVec(1.0, entropy_vec_dbl);
   }
   stats_count_ += c.NumRows();
 }

◆ Type()

virtual std::string Type ( ) const

inlinevirtual

Returns a string such as "SigmoidComponent", describing the type of the object.

Implements Component.

Definition at line 130 of file nnet-attention-component.h.

Referenced by RestrictedAttentionComponent::Info().

130 { return "RestrictedAttentionComponent"; }

◆ Write()

void Write	(	std::ostream &	os,
		bool	binary
	)		const

virtual

Write component to stream.

Implements Component.

Definition at line 442 of file nnet-attention-component.cc.

Referenced by RestrictedAttentionComponent::Properties().

                                                                           {
   WriteToken(os, binary, "<RestrictedAttentionComponent>");
   WriteToken(os, binary, "<NumHeads>");
   WriteBasicType(os, binary, num_heads_);
   WriteToken(os, binary, "<KeyDim>");
   WriteBasicType(os, binary, key_dim_);
   WriteToken(os, binary, "<ValueDim>");
   WriteBasicType(os, binary, value_dim_);
   WriteToken(os, binary, "<NumLeftInputs>");
   WriteBasicType(os, binary, num_left_inputs_);
   WriteToken(os, binary, "<NumRightInputs>");
   WriteBasicType(os, binary, num_right_inputs_);
   WriteToken(os, binary, "<TimeStride>");
   WriteBasicType(os, binary, time_stride_);
   WriteToken(os, binary, "<NumLeftInputsRequired>");
   WriteBasicType(os, binary, num_left_inputs_required_);
   WriteToken(os, binary, "<NumRightInputsRequired>");
   WriteBasicType(os, binary, num_right_inputs_required_);
   WriteToken(os, binary, "<OutputContext>");
   WriteBasicType(os, binary, output_context_);
   WriteToken(os, binary, "<KeyScale>");
   WriteBasicType(os, binary, key_scale_);
   WriteToken(os, binary, "<StatsCount>");
   WriteBasicType(os, binary, stats_count_);
   WriteToken(os, binary, "<EntropyStats>");
   entropy_stats_.Write(os, binary);
   WriteToken(os, binary, "<PosteriorStats>");
   posterior_stats_.Write(os, binary);
   WriteToken(os, binary, "</RestrictedAttentionComponent>");
 }

◆ ZeroStats()

void ZeroStats ( )

virtual

Components that provide an implementation of StoreStats should also provide an implementation of ZeroStats(), to set those stats to zero.

Other components that store other types of statistics (e.g. regarding gradient clipping) should implement ZeroStats() also.

Reimplemented from Component.

Definition at line 249 of file nnet-attention-component.cc.

References RestrictedAttentionComponent::entropy_stats_, RestrictedAttentionComponent::posterior_stats_, VectorBase< Real >::SetZero(), CuMatrixBase< Real >::SetZero(), and RestrictedAttentionComponent::stats_count_.

Referenced by RestrictedAttentionComponent::Properties().

                                              {
   entropy_stats_.SetZero();
   posterior_stats_.SetZero();
   stats_count_ = 0.0;
 }

Member Data Documentation

◆ context_dim_

int32 context_dim_

private

Definition at line 287 of file nnet-attention-component.h.

◆ entropy_stats_

Vector<double> entropy_stats_

private

Definition at line 295 of file nnet-attention-component.h.

Referenced by RestrictedAttentionComponent::Add(), RestrictedAttentionComponent::Info(), RestrictedAttentionComponent::Read(), RestrictedAttentionComponent::Scale(), RestrictedAttentionComponent::StoreStats(), RestrictedAttentionComponent::Write(), and RestrictedAttentionComponent::ZeroStats().

◆ key_dim_

int32 key_dim_

private

Definition at line 282 of file nnet-attention-component.h.

Referenced by RestrictedAttentionComponent::Backprop(), RestrictedAttentionComponent::BackpropOneHead(), RestrictedAttentionComponent::Check(), RestrictedAttentionComponent::Info(), RestrictedAttentionComponent::InitFromConfig(), RestrictedAttentionComponent::InputDim(), RestrictedAttentionComponent::Propagate(), RestrictedAttentionComponent::PropagateOneHead(), RestrictedAttentionComponent::Read(), and RestrictedAttentionComponent::Write().

◆ key_scale_

BaseFloat key_scale_

private

Definition at line 292 of file nnet-attention-component.h.

Referenced by RestrictedAttentionComponent::BackpropOneHead(), RestrictedAttentionComponent::Check(), RestrictedAttentionComponent::Info(), RestrictedAttentionComponent::InitFromConfig(), RestrictedAttentionComponent::PropagateOneHead(), RestrictedAttentionComponent::Read(), and RestrictedAttentionComponent::Write().

◆ num_heads_

int32 num_heads_

private

Definition at line 281 of file nnet-attention-component.h.

Referenced by RestrictedAttentionComponent::Backprop(), RestrictedAttentionComponent::Check(), RestrictedAttentionComponent::Info(), RestrictedAttentionComponent::InitFromConfig(), RestrictedAttentionComponent::InputDim(), RestrictedAttentionComponent::OutputDim(), RestrictedAttentionComponent::Propagate(), RestrictedAttentionComponent::Read(), RestrictedAttentionComponent::StoreStats(), and RestrictedAttentionComponent::Write().

◆ num_left_inputs_

int32 num_left_inputs_

private

Definition at line 284 of file nnet-attention-component.h.

Referenced by RestrictedAttentionComponent::Check(), RestrictedAttentionComponent::GetComputationStructure(), RestrictedAttentionComponent::GetInputIndexes(), RestrictedAttentionComponent::Info(), RestrictedAttentionComponent::InitFromConfig(), RestrictedAttentionComponent::IsComputable(), RestrictedAttentionComponent::Read(), and RestrictedAttentionComponent::Write().

◆ num_left_inputs_required_

int32 num_left_inputs_required_

private

Definition at line 289 of file nnet-attention-component.h.

Referenced by RestrictedAttentionComponent::Check(), RestrictedAttentionComponent::GetComputationStructure(), RestrictedAttentionComponent::Info(), RestrictedAttentionComponent::InitFromConfig(), RestrictedAttentionComponent::IsComputable(), RestrictedAttentionComponent::Read(), and RestrictedAttentionComponent::Write().

◆ num_right_inputs_

int32 num_right_inputs_

private

Definition at line 285 of file nnet-attention-component.h.

Referenced by RestrictedAttentionComponent::Check(), RestrictedAttentionComponent::GetComputationStructure(), RestrictedAttentionComponent::GetInputIndexes(), RestrictedAttentionComponent::Info(), RestrictedAttentionComponent::InitFromConfig(), RestrictedAttentionComponent::IsComputable(), RestrictedAttentionComponent::Read(), and RestrictedAttentionComponent::Write().

◆ num_right_inputs_required_

int32 num_right_inputs_required_

private

Definition at line 290 of file nnet-attention-component.h.

Referenced by RestrictedAttentionComponent::Check(), RestrictedAttentionComponent::GetComputationStructure(), RestrictedAttentionComponent::Info(), RestrictedAttentionComponent::InitFromConfig(), RestrictedAttentionComponent::IsComputable(), RestrictedAttentionComponent::Read(), and RestrictedAttentionComponent::Write().

◆ output_context_

bool output_context_

private

Definition at line 291 of file nnet-attention-component.h.

Referenced by RestrictedAttentionComponent::Backprop(), RestrictedAttentionComponent::BackpropOneHead(), RestrictedAttentionComponent::Info(), RestrictedAttentionComponent::InitFromConfig(), RestrictedAttentionComponent::OutputDim(), RestrictedAttentionComponent::Propagate(), RestrictedAttentionComponent::PropagateOneHead(), RestrictedAttentionComponent::Read(), and RestrictedAttentionComponent::Write().

◆ posterior_stats_

CuMatrix<double> posterior_stats_

private

Definition at line 298 of file nnet-attention-component.h.

Referenced by RestrictedAttentionComponent::Add(), RestrictedAttentionComponent::Info(), RestrictedAttentionComponent::Read(), RestrictedAttentionComponent::Scale(), RestrictedAttentionComponent::StoreStats(), RestrictedAttentionComponent::Write(), and RestrictedAttentionComponent::ZeroStats().

◆ stats_count_

double stats_count_

private

Definition at line 294 of file nnet-attention-component.h.

Referenced by RestrictedAttentionComponent::Add(), RestrictedAttentionComponent::Check(), RestrictedAttentionComponent::Info(), RestrictedAttentionComponent::InitFromConfig(), RestrictedAttentionComponent::Read(), RestrictedAttentionComponent::Scale(), RestrictedAttentionComponent::StoreStats(), RestrictedAttentionComponent::Write(), and RestrictedAttentionComponent::ZeroStats().

◆ time_stride_

int32 time_stride_

private

Definition at line 286 of file nnet-attention-component.h.

Referenced by RestrictedAttentionComponent::Check(), RestrictedAttentionComponent::GetComputationStructure(), RestrictedAttentionComponent::GetInputIndexes(), RestrictedAttentionComponent::Info(), RestrictedAttentionComponent::InitFromConfig(), RestrictedAttentionComponent::IsComputable(), RestrictedAttentionComponent::Read(), and RestrictedAttentionComponent::Write().

◆ value_dim_

int32 value_dim_

private

Definition at line 283 of file nnet-attention-component.h.

Referenced by RestrictedAttentionComponent::Backprop(), RestrictedAttentionComponent::BackpropOneHead(), RestrictedAttentionComponent::Check(), RestrictedAttentionComponent::Info(), RestrictedAttentionComponent::InitFromConfig(), RestrictedAttentionComponent::InputDim(), RestrictedAttentionComponent::OutputDim(), RestrictedAttentionComponent::Propagate(), RestrictedAttentionComponent::PropagateOneHead(), RestrictedAttentionComponent::Read(), and RestrictedAttentionComponent::Write().

The documentation for this class was generated from the following files:

nnet3/nnet-attention-component.h
nnet3/nnet-attention-component.cc

Classes

Public Member Functions

Private Member Functions

Static Private Member Functions

Private Attributes

Additional Inherited Members

Detailed Description

Constructor & Destructor Documentation

◆ RestrictedAttentionComponent() [1/2]

◆ RestrictedAttentionComponent() [2/2]

Member Function Documentation

◆ Add()

◆ Backprop()

◆ BackpropOneHead()

◆ Check()

◆ Copy()

◆ CreateIndexesVector()

◆ DeleteMemo()

◆ GetComputationStructure()

◆ GetIndexes()

◆ GetInputIndexes()

◆ Info()

◆ InitFromConfig()

◆ InputDim()

◆ IsComputable()

◆ OutputDim()

◆ PrecomputeIndexes()

◆ Propagate()

◆ PropagateOneHead()

◆ Properties()

◆ Read()

◆ ReorderIndexes()

◆ Scale()

◆ StoreStats()

◆ Type()

◆ Write()

◆ ZeroStats()

Member Data Documentation

◆ context_dim_

◆ entropy_stats_

◆ key_dim_

◆ key_scale_

◆ num_heads_

◆ num_left_inputs_

◆ num_left_inputs_required_

◆ num_right_inputs_

◆ num_right_inputs_required_

◆ output_context_

◆ posterior_stats_

◆ stats_count_

◆ time_stride_

◆ value_dim_