All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
ConvolutionModel Struct Reference

This comment explains the basic framework used for everything related to time-height convolution. More...

#include <convolution.h>

Collaboration diagram for ConvolutionModel:

Classes

struct  Offset
 

Public Member Functions

void ComputeDerived ()
 
int32 InputDim () const
 
int32 OutputDim () const
 
int32 ParamRows () const
 
int32 ParamCols () const
 
 ConvolutionModel ()
 
bool operator== (const ConvolutionModel &other) const
 
bool Check (bool check_heights_used=true, bool allow_height_padding=true) const
 
std::string Info () const
 
void Write (std::ostream &os, bool binary) const
 
void Read (std::istream &is, bool binary)
 

Public Attributes

int32 num_filters_in
 
int32 num_filters_out
 
int32 height_in
 
int32 height_out
 
int32 height_subsample_out
 
std::vector< Offsetoffsets
 
std::set< int32 > required_time_offsets
 
std::set< int32 > all_time_offsets
 
int32 time_offsets_modulus
 

Detailed Description

This comment explains the basic framework used for everything related to time-height convolution.

We are doing convolution in 2 dimensions; these would normally be width and height, but in the nnet3 framework we identify the width with the 'time' dimension (the 't' element of an Index). This enables us to use this framework in the normal way for speech tasks, and it turns out to have other advantages it too, giving us a very efficient and easy implementation of CNNs (basically, the nnet3 framework takes care of certain reorderings for us). As mentioned, the 't' index will correspond to the width, and the vectors we operate on will be of dimension height * num-filters, where the filter-index has the stride of 1.

We will use the GeneralComponent interface, and its function ReorderIndexes(), to ensure that the input and output Indexes of the component have a specified regular structure; we'll pad with 'blank' Indexes (t=kNoTime) on the input and output of the component, as needed to ensure that it's an evenly spaced grid over n and t, with x always zero and the t values evenly spaced. (However, a note on even spacing: for computations with downsampling this ordering of the 't' values is bit more complicated, search for 'blocks' in the rest of this header for more information).

First consider the simplest case, call it "same-t-stride" (where there is no downsampling on the time index, i.e. the input and output 't' values have the same stride, like 1, 2 or 4). The input and output matrices have dimension num-t-values * num-images, with the num-t-values having the higher stride. The computation involves copying a row-range of the input matrix to a temporary matrix with a column mapping (the temporary matrix will typically have more columns than the input matrix); and then doing a matrix-multiply between the reshaped temporary matrix and a block of the parameters; the block corresponds to a particular time-offset. Then we may need to repeat the whole process with a different, shifted row-range of the input matrix and a different column map. You may have to read the rest of this header, to understand this in more detail. This struct represents a convolutional model from a structural point of view (it doesn't contain the actual parameters). Note: the parameters are to be stored in a matrix of dimension (num_filters_out) by (offsets.size() * num_filters_in) [where the offset-index has the larger stride than the filter-index].

Partly out of a desire for generality, but also for convenience in implementation and integration with nnet3, at this level we don't represent the patch size in the normal way like '1x1' or '3x3', but as a list of pairs (time-offset, height-offset). E.g. a 1x1 patch would normally be the single pair (0,0), and a 3x3 patch might be represented as

offsets={ (0,0),(0,1),(0,2), (1,0),(1,1),(1,2), (2,0),(2,1),(2,2) }

However– and you have to be a bit careful here– the time indexes are on an absolute* numbering scheme so that if you had downsampled the time axis on a previous layer, the time-offsets might all be multiples of (e.g.) 2 or 4, but the height-offsets would normally always be separated by 1. [note: we always normalize the list of (time-offset, height-offset) pairs with the lexicographical ordering that you see above.] This asymmetry between time and height may not be very aesthetic, but the absolute numbering of time is at the core of how the framework works. Note: the offsets don't have to start from zero, they can be less than zero, just like the offsets in TDNNs which are often lists like (-3,0,3). Don't be surprised to see things like:

offsets={ (-3,-1),(-3,0),(-3,1), (0,-1),(0,0),(0,2), (3,-1),(3,0),(3,1) }

If there are negative offsets in the height dimension (as above) it means that there is zero-padding in the height dimension (because the first height-index at both the input and the output is 0, so having a height-offset means that to compute the output at height-index 0 we need the input at height-index -1, which doesn't exist; this implies zero padding on the bottom of the image.

Definition at line 125 of file convolution.h.

Constructor & Destructor Documentation

ConvolutionModel ( )
inline

Definition at line 210 of file convolution.h.

210 { }

Member Function Documentation

bool Check ( bool  check_heights_used = true,
bool  allow_height_padding = true 
) const

Definition at line 130 of file convolution.cc.

References ConvolutionModel::all_time_offsets, ConvolutionModel::ComputeDerived(), ConvolutionModel::height_in, ConvolutionModel::Offset::height_offset, ConvolutionModel::height_out, ConvolutionModel::height_subsample_out, rnnlm::i, kaldi::IsSortedAndUniq(), KALDI_ASSERT, KALDI_WARN, ConvolutionModel::num_filters_in, ConvolutionModel::num_filters_out, ConvolutionModel::offsets, ConvolutionModel::required_time_offsets, ConvolutionModel::Offset::time_offset, and ConvolutionModel::Write().

Referenced by kaldi::nnet3::time_height_convolution::AppendInputFrames(), TimeHeightConvolutionComponent::Check(), kaldi::nnet3::time_height_convolution::GetRandomConvolutionIndexes(), kaldi::nnet3::time_height_convolution::GetRandomConvolutionModel(), TimeHeightConvolutionComponent::InitFromConfig(), kaldi::nnet3::time_height_convolution::PadModelHeight(), and ConvolutionModel::Read().

131  {
132  if (num_filters_in <= 0 || num_filters_out <= 0 ||
133  height_in <= 0 || height_out <= 0 ||
134  height_subsample_out <= 0 || offsets.empty() ||
135  required_time_offsets.empty()) {
136  KALDI_WARN << "Convolution model fails basic check.";
137  return false;
138  }
139  ConvolutionModel temp(*this);
140  temp.ComputeDerived();
141  if (!(temp == *this)) {
142  KALDI_WARN << "Derived variables are incorrect.";
143  return false;
144  }
145  // check that required_time_offsets is included in all_time_offsets.
146  for (std::set<int32>::iterator iter = required_time_offsets.begin();
147  iter != required_time_offsets.end(); ++iter) {
148  if (all_time_offsets.count(*iter) == 0) {
149  KALDI_WARN << "Required time offsets not a subset of all_time_offsets.";
150  return false;
151  }
152  }
154  std::vector<bool> h_in_used(height_in, false);
155  std::vector<bool> offsets_used(offsets.size(), false);
156 
157  // check that in cases where we only have the minimum
158  // required input (from required_time_offsets), each
159  // height in the output is potentially nonzero.
160  for (int32 h_out = 0; h_out < height_out * height_subsample_out;
161  h_out += height_subsample_out) {
162  bool some_input_available = false;
163  for (size_t i = 0; i < offsets.size(); i++) {
164  const Offset &offset = offsets[i];
165  int32 h_in = h_out + offset.height_offset;
166  if (h_in >= 0 && h_in < height_in) {
167  offsets_used[i] = true;
168  h_in_used[h_in] = true;
169  if (required_time_offsets.count(offset.time_offset) != 0)
170  some_input_available = true;
171  } else {
172  if (!allow_height_padding) {
173  KALDI_WARN << "height padding not allowed but is required.";
174  return false;
175  }
176  }
177  }
178  if (!some_input_available) {
179  // none of the
180  // input pixels for this output pixel were available (at least in the case
181  // where we only have the 'required' inputs on the time dimension).
182  std::ostringstream os;
183  Write(os, false);
184  KALDI_WARN << "for the " << (h_out / height_out) << "'th output height, "
185  "no input is available, if only required time-indexes "
186  "are available.";
187  // We could later change this part of the validation code to accept
188  // such models, if there is a legitimate use-case.
189  return false;
190  }
191  }
192  if (check_heights_used) {
193  for (int32 h = 0; h < height_in; h++) {
194  if (!h_in_used[h]) {
195  KALDI_WARN << "The input at the " << h << "'th height is never used.";
196  return false;
197  }
198  }
199  }
200  for (size_t i = 0; i < offsets_used.size(); i++) {
201  if (!offsets_used[i]) {
202  KALDI_WARN << "(time,height) offset (" << offsets[i].time_offset
203  << "," << offsets[i].height_offset
204  << ") of this computation is never used.";
205  return false;
206  }
207  }
208  return true;
209 }
#define KALDI_WARN
Definition: kaldi-error.h:130
void Write(std::ostream &os, bool binary) const
Definition: convolution.cc:225
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
bool IsSortedAndUniq(const std::vector< T > &vec)
Returns true if the vector is sorted and contains each element only once.
Definition: stl-utils.h:63
void ComputeDerived ( )

Definition at line 109 of file convolution.cc.

References ConvolutionModel::all_time_offsets, kaldi::Gcd(), ConvolutionModel::offsets, and ConvolutionModel::time_offsets_modulus.

Referenced by kaldi::nnet3::time_height_convolution::AppendInputFrames(), ConvolutionModel::Check(), kaldi::nnet3::time_height_convolution::GetRandomConvolutionModel(), TimeHeightConvolutionComponent::InitFromConfig(), and ConvolutionModel::Read().

109  {
110  { // compute all_time_offsets
111  all_time_offsets.clear();
112  for (std::vector<Offset>::const_iterator iter = offsets.begin();
113  iter != offsets.end(); ++iter)
114  all_time_offsets.insert(iter->time_offset);
115  }
116  { // compute time_offsets_modulus
118  std::set<int32>::iterator iter = all_time_offsets.begin();
119  int32 cur_offset = *iter;
120  for (++iter; iter != all_time_offsets.end(); ++iter) {
121  int32 this_offset = *iter;
123  this_offset - cur_offset);
124  cur_offset = this_offset;
125  }
126  }
127 }
I Gcd(I m, I n)
Definition: kaldi-math.h:294
std::string Info ( ) const

Definition at line 87 of file convolution.cc.

References ConvolutionModel::height_in, ConvolutionModel::height_out, ConvolutionModel::height_subsample_out, rnnlm::i, ConvolutionModel::InputDim(), ConvolutionModel::num_filters_in, ConvolutionModel::num_filters_out, ConvolutionModel::offsets, ConvolutionModel::OutputDim(), and ConvolutionModel::required_time_offsets.

Referenced by kaldi::nnet3::time_height_convolution::GetRandomConvolutionModel(), TimeHeightConvolutionComponent::Info(), and kaldi::nnet3::time_height_convolution::TestRunningComputation().

87  {
88  std::ostringstream os;
89  os << "num-filters-in=" << num_filters_in
90  << ", num-filters-out=" << num_filters_out
91  << ", height-in=" << height_in
92  << ", height-out=" << height_out
93  << ", height-subsample-out=" << height_subsample_out
94  << ", {time,height}-offsets=[";
95  for (size_t i = 0; i < offsets.size(); i++) {
96  if (i > 0) os << ' ';
97  os << offsets[i].time_offset << ',' << offsets[i].height_offset;
98  }
99  os << "], required-time-offsets=[";
100  for (std::set<int32>::const_iterator iter = required_time_offsets.begin();
101  iter != required_time_offsets.end(); ++iter) {
102  if (iter != required_time_offsets.begin()) os << ',';
103  os << *iter;
104  }
105  os << "], input-dim=" << InputDim() << ", output-dim=" << OutputDim();
106  return os.str();
107 }
bool operator== ( const ConvolutionModel other) const

Definition at line 212 of file convolution.cc.

References ConvolutionModel::all_time_offsets, ConvolutionModel::height_in, ConvolutionModel::height_out, ConvolutionModel::height_subsample_out, ConvolutionModel::num_filters_in, ConvolutionModel::num_filters_out, ConvolutionModel::offsets, ConvolutionModel::required_time_offsets, and ConvolutionModel::time_offsets_modulus.

212  {
213  return num_filters_in == other.num_filters_in &&
214  num_filters_out == other.num_filters_out &&
215  height_in == other.height_in &&
216  height_out == other.height_out &&
217  height_subsample_out == other.height_subsample_out &&
218  offsets == other.offsets &&
219  required_time_offsets == other.required_time_offsets &&
220  all_time_offsets == other.all_time_offsets &&
221  time_offsets_modulus == other.time_offsets_modulus;
222 }
void Read ( std::istream &  is,
bool  binary 
)

Definition at line 252 of file convolution.cc.

References ConvolutionModel::Check(), ConvolutionModel::ComputeDerived(), kaldi::nnet3::ExpectOneOrTwoTokens(), kaldi::nnet3::ExpectToken(), ConvolutionModel::height_in, ConvolutionModel::height_out, ConvolutionModel::height_subsample_out, rnnlm::i, KALDI_ASSERT, ConvolutionModel::num_filters_in, ConvolutionModel::num_filters_out, ConvolutionModel::offsets, kaldi::ReadBasicType(), kaldi::ReadIntegerPairVector(), kaldi::ReadIntegerVector(), and ConvolutionModel::required_time_offsets.

Referenced by TimeHeightConvolutionComponent::Read(), and kaldi::nnet3::time_height_convolution::UnitTestTimeHeightConvolutionIo().

252  {
253  ExpectOneOrTwoTokens(is, binary, "<ConvolutionModel>", "<NumFiltersIn>");
254  ReadBasicType(is, binary, &num_filters_in);
255  ExpectToken(is, binary, "<NumFiltersOut>");
256  ReadBasicType(is, binary, &num_filters_out);
257  ExpectToken(is, binary, "<HeightIn>");
258  ReadBasicType(is, binary, &height_in);
259  ExpectToken(is, binary, "<HeightOut>");
260  ReadBasicType(is, binary, &height_out);
261  ExpectToken(is, binary, "<HeightSubsampleOut>");
262  ReadBasicType(is, binary, &height_subsample_out);
263  ExpectToken(is, binary, "<Offsets>");
264  std::vector<std::pair<int32, int32> > pairs;
265  ReadIntegerPairVector(is, binary, &pairs);
266  offsets.resize(pairs.size());
267  for (size_t i = 0; i < offsets.size(); i++) {
268  offsets[i].time_offset = pairs[i].first;
269  offsets[i].height_offset = pairs[i].second;
270  }
271  std::vector<int32> required_time_offsets_list;
272  ExpectToken(is, binary, "<RequiredTimeOffsets>");
273  ReadIntegerVector(is, binary, &required_time_offsets_list);
274  required_time_offsets.clear();
275  required_time_offsets.insert(required_time_offsets_list.begin(),
276  required_time_offsets_list.end());
277  ExpectToken(is, binary, "</ConvolutionModel>");
278  ComputeDerived();
279  KALDI_ASSERT(Check(false, true));
280 }
void ReadBasicType(std::istream &is, bool binary, T *t)
ReadBasicType is the name of the read function for bool, integer types, and floating-point types...
Definition: io-funcs-inl.h:55
void ExpectOneOrTwoTokens(std::istream &is, bool binary, const std::string &token1, const std::string &token2)
This function is like ExpectToken but for two tokens, and it will either accept token1 and then token...
Definition: nnet-parse.cc:224
void ReadIntegerPairVector(std::istream &is, bool binary, std::vector< std::pair< T, T > > *v)
Function for reading STL vector of pairs of integer types.
Definition: io-funcs-inl.h:131
void ReadIntegerVector(std::istream &is, bool binary, std::vector< T > *v)
Function for reading STL vector of integer types.
Definition: io-funcs-inl.h:232
static void ExpectToken(const std::string &token, const std::string &what_we_are_parsing, const std::string **next_token)
bool Check(bool check_heights_used=true, bool allow_height_padding=true) const
Definition: convolution.cc:130
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void Write ( std::ostream &  os,
bool  binary 
) const

Definition at line 225 of file convolution.cc.

References ConvolutionModel::height_in, ConvolutionModel::height_out, ConvolutionModel::height_subsample_out, rnnlm::i, ConvolutionModel::num_filters_in, ConvolutionModel::num_filters_out, ConvolutionModel::offsets, ConvolutionModel::required_time_offsets, kaldi::WriteBasicType(), kaldi::WriteIntegerPairVector(), kaldi::WriteIntegerVector(), and kaldi::WriteToken().

Referenced by ConvolutionModel::Check(), kaldi::nnet3::time_height_convolution::UnitTestTimeHeightConvolutionIo(), and TimeHeightConvolutionComponent::Write().

225  {
226  WriteToken(os, binary, "<ConvolutionModel>");
227  WriteToken(os, binary, "<NumFiltersIn>");
228  WriteBasicType(os, binary, num_filters_in);
229  WriteToken(os, binary, "<NumFiltersOut>");
230  WriteBasicType(os, binary, num_filters_out);
231  WriteToken(os, binary, "<HeightIn>");
232  WriteBasicType(os, binary, height_in);
233  WriteToken(os, binary, "<HeightOut>");
234  WriteBasicType(os, binary, height_out);
235  WriteToken(os, binary, "<HeightSubsampleOut>");
237  WriteToken(os, binary, "<Offsets>");
238  std::vector<std::pair<int32, int32> > pairs(offsets.size());
239  for (size_t i = 0; i < offsets.size(); i++) {
240  pairs[i].first = offsets[i].time_offset;
241  pairs[i].second = offsets[i].height_offset;
242  }
243  WriteIntegerPairVector(os, binary, pairs);
244  std::vector<int32> required_time_offsets_list(required_time_offsets.begin(),
245  required_time_offsets.end());
246  WriteToken(os, binary, "<RequiredTimeOffsets>");
247  WriteIntegerVector(os, binary, required_time_offsets_list);
248  WriteToken(os, binary, "</ConvolutionModel>");
249 }
void WriteIntegerPairVector(std::ostream &os, bool binary, const std::vector< std::pair< T, T > > &v)
Function for writing STL vectors of pairs of integer types.
Definition: io-funcs-inl.h:93
void WriteToken(std::ostream &os, bool binary, const char *token)
The WriteToken functions are for writing nonempty sequences of non-space characters.
Definition: io-funcs.cc:134
void WriteIntegerVector(std::ostream &os, bool binary, const std::vector< T > &v)
Function for writing STL vectors of integer types.
Definition: io-funcs-inl.h:198
void WriteBasicType(std::ostream &os, bool binary, T t)
WriteBasicType is the name of the write function for bool, integer types, and floating-point types...
Definition: io-funcs-inl.h:34

Member Data Documentation


The documentation for this struct was generated from the following files: