This comment explains the basic framework used for everything related to time-height convolution. More...
#include <convolution.h>
Classes | |
struct | Offset |
Public Member Functions | |
void | ComputeDerived () |
int32 | InputDim () const |
int32 | OutputDim () const |
int32 | ParamRows () const |
int32 | ParamCols () const |
ConvolutionModel () | |
bool | operator== (const ConvolutionModel &other) const |
bool | Check (bool check_heights_used=true, bool allow_height_padding=true) const |
std::string | Info () const |
void | Write (std::ostream &os, bool binary) const |
void | Read (std::istream &is, bool binary) |
Public Attributes | |
int32 | num_filters_in |
int32 | num_filters_out |
int32 | height_in |
int32 | height_out |
int32 | height_subsample_out |
std::vector< Offset > | offsets |
std::set< int32 > | required_time_offsets |
std::set< int32 > | all_time_offsets |
int32 | time_offsets_modulus |
This comment explains the basic framework used for everything related to time-height convolution.
We are doing convolution in 2 dimensions; these would normally be width and height, but in the nnet3 framework we identify the width with the 'time' dimension (the 't' element of an Index). This enables us to use this framework in the normal way for speech tasks, and it turns out to have other advantages it too, giving us a very efficient and easy implementation of CNNs (basically, the nnet3 framework takes care of certain reorderings for us). As mentioned, the 't' index will correspond to the width, and the vectors we operate on will be of dimension height * num-filters, where the filter-index has the stride of 1.
We will use the GeneralComponent interface, and its function ReorderIndexes(), to ensure that the input and output Indexes of the component have a specified regular structure; we'll pad with 'blank' Indexes (t=kNoTime) on the input and output of the component, as needed to ensure that it's an evenly spaced grid over n and t, with x always zero and the t values evenly spaced. (However, a note on even spacing: for computations with downsampling this ordering of the 't' values is bit more complicated, search for 'blocks' in the rest of this header for more information).
First consider the simplest case, call it "same-t-stride" (where there is no downsampling on the time index, i.e. the input and output 't' values have the same stride, like 1, 2 or 4). The input and output matrices have dimension num-t-values * num-images, with the num-t-values having the higher stride. The computation involves copying a row-range of the input matrix to a temporary matrix with a column mapping (the temporary matrix will typically have more columns than the input matrix); and then doing a matrix-multiply between the reshaped temporary matrix and a block of the parameters; the block corresponds to a particular time-offset. Then we may need to repeat the whole process with a different, shifted row-range of the input matrix and a different column map. You may have to read the rest of this header, to understand this in more detail. This struct represents a convolutional model from a structural point of view (it doesn't contain the actual parameters). Note: the parameters are to be stored in a matrix of dimension (num_filters_out) by (offsets.size() * num_filters_in) [where the offset-index has the larger stride than the filter-index].
Partly out of a desire for generality, but also for convenience in implementation and integration with nnet3, at this level we don't represent the patch size in the normal way like '1x1' or '3x3', but as a list of pairs (time-offset, height-offset). E.g. a 1x1 patch would normally be the single pair (0,0), and a 3x3 patch might be represented as
offsets={ (0,0),(0,1),(0,2), (1,0),(1,1),(1,2), (2,0),(2,1),(2,2) }
However– and you have to be a bit careful here– the time indexes are on an absolute* numbering scheme so that if you had downsampled the time axis on a previous layer, the time-offsets might all be multiples of (e.g.) 2 or 4, but the height-offsets would normally always be separated by 1. [note: we always normalize the list of (time-offset, height-offset) pairs with the lexicographical ordering that you see above.] This asymmetry between time and height may not be very aesthetic, but the absolute numbering of time is at the core of how the framework works. Note: the offsets don't have to start from zero, they can be less than zero, just like the offsets in TDNNs which are often lists like (-3,0,3). Don't be surprised to see things like:
offsets={ (-3,-1),(-3,0),(-3,1), (0,-1),(0,0),(0,2), (3,-1),(3,0),(3,1) }
If there are negative offsets in the height dimension (as above) it means that there is zero-padding in the height dimension (because the first height-index at both the input and the output is 0, so having a height-offset means that to compute the output at height-index 0 we need the input at height-index -1, which doesn't exist; this implies zero padding on the bottom of the image.
Definition at line 125 of file convolution.h.
|
inline |
Definition at line 210 of file convolution.h.
References ConvolutionModel::Check(), ConvolutionModel::Info(), ConvolutionModel::Offset::operator==(), ConvolutionModel::Read(), and ConvolutionModel::Write().
Definition at line 130 of file convolution.cc.
References ConvolutionModel::all_time_offsets, ConvolutionModel::ComputeDerived(), ConvolutionModel::height_in, ConvolutionModel::Offset::height_offset, ConvolutionModel::height_out, ConvolutionModel::height_subsample_out, rnnlm::i, kaldi::IsSortedAndUniq(), KALDI_ASSERT, KALDI_WARN, ConvolutionModel::num_filters_in, ConvolutionModel::num_filters_out, ConvolutionModel::offsets, ConvolutionModel::required_time_offsets, ConvolutionModel::Offset::time_offset, and ConvolutionModel::Write().
Referenced by kaldi::nnet3::time_height_convolution::AppendInputFrames(), TimeHeightConvolutionComponent::Check(), ConvolutionModel::ConvolutionModel(), kaldi::nnet3::time_height_convolution::GetRandomConvolutionIndexes(), kaldi::nnet3::time_height_convolution::GetRandomConvolutionModel(), TimeHeightConvolutionComponent::InitFromConfig(), kaldi::nnet3::time_height_convolution::PadModelHeight(), ConvolutionModel::Read(), and ConvolutionComputation::Read().
void ComputeDerived | ( | ) |
Definition at line 109 of file convolution.cc.
References ConvolutionModel::all_time_offsets, kaldi::Gcd(), ConvolutionModel::offsets, and ConvolutionModel::time_offsets_modulus.
Referenced by kaldi::nnet3::time_height_convolution::AppendInputFrames(), ConvolutionModel::Check(), kaldi::nnet3::time_height_convolution::GetRandomConvolutionModel(), TimeHeightConvolutionComponent::InitFromConfig(), ConvolutionModel::Read(), and ConvolutionComputation::Read().
std::string Info | ( | ) | const |
Definition at line 87 of file convolution.cc.
References ConvolutionModel::height_in, ConvolutionModel::height_out, ConvolutionModel::height_subsample_out, rnnlm::i, ConvolutionModel::InputDim(), ConvolutionModel::num_filters_in, ConvolutionModel::num_filters_out, ConvolutionModel::offsets, ConvolutionModel::OutputDim(), and ConvolutionModel::required_time_offsets.
Referenced by ConvolutionModel::ConvolutionModel(), kaldi::nnet3::time_height_convolution::GetRandomConvolutionModel(), TimeHeightConvolutionComponent::Info(), and kaldi::nnet3::time_height_convolution::TestRunningComputation().
|
inline |
Definition at line 203 of file convolution.h.
References ConvolutionModel::height_in.
Referenced by ConvolutionModel::Info(), TimeHeightConvolutionComponent::InputDim(), kaldi::nnet3::time_height_convolution::TestDataBackprop(), kaldi::nnet3::time_height_convolution::TestParamsBackprop(), and kaldi::nnet3::time_height_convolution::TestRunningComputation().
bool operator== | ( | const ConvolutionModel & | other | ) | const |
Definition at line 212 of file convolution.cc.
References ConvolutionModel::all_time_offsets, ConvolutionModel::height_in, ConvolutionModel::height_out, ConvolutionModel::height_subsample_out, ConvolutionModel::num_filters_in, ConvolutionModel::num_filters_out, ConvolutionModel::offsets, ConvolutionModel::required_time_offsets, and ConvolutionModel::time_offsets_modulus.
|
inline |
Definition at line 204 of file convolution.h.
References ConvolutionModel::height_out.
Referenced by ConvolutionModel::Info(), TimeHeightConvolutionComponent::OutputDim(), kaldi::nnet3::time_height_convolution::TestDataBackprop(), kaldi::nnet3::time_height_convolution::TestParamsBackprop(), and kaldi::nnet3::time_height_convolution::TestRunningComputation().
|
inline |
Definition at line 208 of file convolution.h.
Referenced by TimeHeightConvolutionComponent::Check(), TimeHeightConvolutionComponent::InitFromConfig(), kaldi::nnet3::time_height_convolution::TestDataBackprop(), kaldi::nnet3::time_height_convolution::TestParamsBackprop(), and kaldi::nnet3::time_height_convolution::TestRunningComputation().
|
inline |
Definition at line 206 of file convolution.h.
References ConvolutionModel::num_filters_out.
Referenced by TimeHeightConvolutionComponent::Check(), TimeHeightConvolutionComponent::InitFromConfig(), kaldi::nnet3::time_height_convolution::TestDataBackprop(), kaldi::nnet3::time_height_convolution::TestParamsBackprop(), and kaldi::nnet3::time_height_convolution::TestRunningComputation().
void Read | ( | std::istream & | is, |
bool | binary | ||
) |
Definition at line 252 of file convolution.cc.
References ConvolutionModel::Check(), ConvolutionModel::ComputeDerived(), kaldi::ExpectOneOrTwoTokens(), kaldi::nnet3::ExpectToken(), ConvolutionModel::height_in, ConvolutionModel::height_out, ConvolutionModel::height_subsample_out, rnnlm::i, KALDI_ASSERT, ConvolutionModel::num_filters_in, ConvolutionModel::num_filters_out, ConvolutionModel::offsets, kaldi::ReadBasicType(), kaldi::ReadIntegerPairVector(), kaldi::ReadIntegerVector(), and ConvolutionModel::required_time_offsets.
Referenced by ConvolutionModel::ConvolutionModel(), TimeHeightConvolutionComponent::Read(), and kaldi::nnet3::time_height_convolution::UnitTestTimeHeightConvolutionIo().
void Write | ( | std::ostream & | os, |
bool | binary | ||
) | const |
Definition at line 225 of file convolution.cc.
References ConvolutionModel::height_in, ConvolutionModel::height_out, ConvolutionModel::height_subsample_out, rnnlm::i, ConvolutionModel::num_filters_in, ConvolutionModel::num_filters_out, ConvolutionModel::offsets, ConvolutionModel::required_time_offsets, kaldi::WriteBasicType(), kaldi::WriteIntegerPairVector(), kaldi::WriteIntegerVector(), and kaldi::WriteToken().
Referenced by ConvolutionModel::Check(), ConvolutionModel::ConvolutionModel(), kaldi::nnet3::time_height_convolution::UnitTestTimeHeightConvolutionIo(), and TimeHeightConvolutionComponent::Write().
std::set<int32> all_time_offsets |
Definition at line 173 of file convolution.h.
Referenced by kaldi::nnet3::time_height_convolution::AppendInputFrames(), ConvolutionModel::Check(), kaldi::nnet3::time_height_convolution::CheckModelAndIo(), ConvolutionModel::ComputeDerived(), TimeHeightConvolutionComponent::ComputeDerived(), kaldi::nnet3::time_height_convolution::GetRandomConvolutionIndexes(), ConvolutionModel::operator==(), kaldi::nnet3::time_height_convolution::PadComputationInputTime(), and kaldi::nnet3::time_height_convolution::ShiftAllTimeOffsets().
int32 height_in |
Definition at line 128 of file convolution.h.
Referenced by kaldi::nnet3::time_height_convolution::AppendInputFrames(), ConvolutionModel::Check(), ConvolutionComputation::Check(), ConvolutionComputation::ComputeDerived(), kaldi::nnet3::time_height_convolution::ConvolveForwardSimple(), kaldi::nnet3::time_height_convolution::GetRandomConvolutionModel(), ConvolutionModel::Info(), TimeHeightConvolutionComponent::InitFromConfig(), ConvolutionModel::InputDim(), kaldi::nnet3::time_height_convolution::MakeComputation(), ConvolutionModel::operator==(), kaldi::nnet3::time_height_convolution::PadModelHeight(), ConvolutionModel::Read(), ConvolutionComputation::Read(), kaldi::nnet3::time_height_convolution::UnPadModelHeight(), ConvolutionModel::Write(), and ConvolutionComputation::Write().
int32 height_out |
Definition at line 129 of file convolution.h.
Referenced by kaldi::nnet3::time_height_convolution::AppendInputFrames(), ConvolutionModel::Check(), ConvolutionComputation::Check(), kaldi::nnet3::time_height_convolution::ConvolveForwardSimple(), kaldi::nnet3::time_height_convolution::GetRandomConvolutionModel(), ConvolutionModel::Info(), TimeHeightConvolutionComponent::InitFromConfig(), kaldi::nnet3::time_height_convolution::MakeComputation(), ConvolutionModel::operator==(), ConvolutionModel::OutputDim(), kaldi::nnet3::time_height_convolution::PadModelHeight(), TimeHeightConvolutionComponent::Propagate(), ConvolutionModel::Read(), ConvolutionComputation::Read(), kaldi::nnet3::time_height_convolution::UnPadModelHeight(), TimeHeightConvolutionComponent::UpdateNaturalGradient(), TimeHeightConvolutionComponent::UpdateSimple(), ConvolutionModel::Write(), and ConvolutionComputation::Write().
int32 height_subsample_out |
Definition at line 132 of file convolution.h.
Referenced by kaldi::nnet3::time_height_convolution::AppendInputFrames(), ConvolutionModel::Check(), kaldi::nnet3::time_height_convolution::ConvolveForwardSimple(), kaldi::nnet3::time_height_convolution::GetRandomConvolutionModel(), ConvolutionModel::Info(), TimeHeightConvolutionComponent::InitFromConfig(), kaldi::nnet3::time_height_convolution::MakeComputation(), ConvolutionModel::operator==(), kaldi::nnet3::time_height_convolution::PadModelHeight(), ConvolutionModel::Read(), and ConvolutionModel::Write().
int32 num_filters_in |
Definition at line 126 of file convolution.h.
Referenced by kaldi::nnet3::time_height_convolution::AppendInputFrames(), ConvolutionModel::Check(), ConvolutionComputation::Check(), ConvolutionComputation::ComputeDerived(), kaldi::nnet3::time_height_convolution::ConvolveForwardSimple(), kaldi::nnet3::time_height_convolution::GetRandomConvolutionModel(), ConvolutionModel::Info(), TimeHeightConvolutionComponent::InitFromConfig(), TimeHeightConvolutionComponent::InitUnit(), kaldi::nnet3::time_height_convolution::MakeComputation(), ConvolutionModel::operator==(), ConvolutionModel::Read(), ConvolutionComputation::Read(), ConvolutionModel::Write(), and ConvolutionComputation::Write().
int32 num_filters_out |
Definition at line 127 of file convolution.h.
Referenced by kaldi::nnet3::time_height_convolution::AppendInputFrames(), ConvolutionModel::Check(), TimeHeightConvolutionComponent::Check(), ConvolutionComputation::Check(), kaldi::nnet3::time_height_convolution::ConvolveForwardSimple(), kaldi::nnet3::time_height_convolution::GetRandomConvolutionModel(), ConvolutionModel::Info(), TimeHeightConvolutionComponent::InitFromConfig(), TimeHeightConvolutionComponent::InitUnit(), kaldi::nnet3::time_height_convolution::MakeComputation(), ConvolutionModel::operator==(), ConvolutionModel::ParamRows(), TimeHeightConvolutionComponent::Propagate(), ConvolutionModel::Read(), ConvolutionComputation::Read(), TimeHeightConvolutionComponent::UpdateNaturalGradient(), TimeHeightConvolutionComponent::UpdateSimple(), ConvolutionModel::Write(), and ConvolutionComputation::Write().
std::vector<Offset> offsets |
Definition at line 157 of file convolution.h.
Referenced by kaldi::nnet3::time_height_convolution::AppendInputFrames(), ConvolutionModel::Check(), ConvolutionModel::ComputeDerived(), kaldi::nnet3::time_height_convolution::ConvolveForwardSimple(), kaldi::nnet3::time_height_convolution::GetRandomConvolutionModel(), ConvolutionModel::Info(), TimeHeightConvolutionComponent::InitFromConfig(), TimeHeightConvolutionComponent::InitUnit(), kaldi::nnet3::time_height_convolution::MakeComputation(), ConvolutionModel::operator==(), kaldi::nnet3::time_height_convolution::PadModelHeight(), ConvolutionModel::Read(), kaldi::nnet3::time_height_convolution::ShiftAllTimeOffsets(), kaldi::nnet3::time_height_convolution::UnPadModelHeight(), and ConvolutionModel::Write().
std::set<int32> required_time_offsets |
Definition at line 169 of file convolution.h.
Referenced by kaldi::nnet3::time_height_convolution::AppendInputFrames(), ConvolutionModel::Check(), kaldi::nnet3::time_height_convolution::CheckModelAndIo(), TimeHeightConvolutionComponent::ComputeDerived(), kaldi::nnet3::time_height_convolution::GetRandomConvolutionIndexes(), kaldi::nnet3::time_height_convolution::GetRandomConvolutionModel(), ConvolutionModel::Info(), TimeHeightConvolutionComponent::InitFromConfig(), ConvolutionModel::operator==(), ConvolutionModel::Read(), kaldi::nnet3::time_height_convolution::ShiftAllTimeOffsets(), and ConvolutionModel::Write().
int32 time_offsets_modulus |
Definition at line 180 of file convolution.h.
Referenced by ConvolutionModel::ComputeDerived(), ConvolutionModel::operator==(), and kaldi::nnet3::time_height_convolution::PadComputationInputTime().