All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
ConvolutionModel Struct Reference

This comment explains the basic framework used for everything related to time-height convolution. More...

#include <convolution.h>

Collaboration diagram for ConvolutionModel:

Classes

struct  Offset
 

Public Member Functions

void ComputeDerived ()
 
int32 InputDim () const
 
int32 OutputDim () const
 
int32 ParamRows () const
 
int32 ParamCols () const
 
 ConvolutionModel ()
 
bool operator== (const ConvolutionModel &other) const
 
bool Check (bool check_heights_used=true, bool allow_height_padding=true) const
 
std::string Info () const
 
void Write (std::ostream &os, bool binary) const
 
void Read (std::istream &is, bool binary)
 

Public Attributes

int32 num_filters_in
 
int32 num_filters_out
 
int32 height_in
 
int32 height_out
 
int32 height_subsample_out
 
std::vector< Offsetoffsets
 
std::set< int32 > required_time_offsets
 
std::set< int32 > all_time_offsets
 
int32 time_offsets_modulus
 

Detailed Description

This comment explains the basic framework used for everything related to time-height convolution.

We are doing convolution in 2 dimensions; these would normally be width and height, but in the nnet3 framework we identify the width with the 'time' dimension (the 't' element of an Index). This enables us to use this framework in the normal way for speech tasks, and it turns out to have other advantages it too, giving us a very efficient and easy implementation of CNNs (basically, the nnet3 framework takes care of certain reorderings for us). As mentioned, the 't' index will correspond to the width, and the vectors we operate on will be of dimension height * num-filters, where the filter-index has the stride of 1.

We will use the GeneralComponent interface, and its function ReorderIndexes(), to ensure that the input and output Indexes of the component have a specified regular structure; we'll pad with 'blank' Indexes (t=kNoTime) on the input and output of the component, as needed to ensure that it's an evenly spaced grid over n and t, with x always zero and the t values evenly spaced. (However, a note on even spacing: for computations with downsampling this ordering of the 't' values is bit more complicated, search for 'blocks' in the rest of this header for more information).

First consider the simplest case, call it "same-t-stride" (where there is no downsampling on the time index, i.e. the input and output 't' values have the same stride, like 1, 2 or 4). The input and output matrices have dimension num-t-values * num-images, with the num-t-values having the higher stride. The computation involves copying a row-range of the input matrix to a temporary matrix with a column mapping (the temporary matrix will typically have more columns than the input matrix); and then doing a matrix-multiply between the reshaped temporary matrix and a block of the parameters; the block corresponds to a particular time-offset. Then we may need to repeat the whole process with a different, shifted row-range of the input matrix and a different column map. You may have to read the rest of this header, to understand this in more detail. This struct represents a convolutional model from a structural point of view (it doesn't contain the actual parameters). Note: the parameters are to be stored in a matrix of dimension (num_filters_out) by (offsets.size() * num_filters_in) [where the offset-index has the larger stride than the filter-index].

Partly out of a desire for generality, but also for convenience in implementation and integration with nnet3, at this level we don't represent the patch size in the normal way like '1x1' or '3x3', but as a list of pairs (time-offset, height-offset). E.g. a 1x1 patch would normally be the single pair (0,0), and a 3x3 patch might be represented as

offsets={ (0,0),(0,1),(0,2), (1,0),(1,1),(1,2), (2,0),(2,1),(2,2) }

However– and you have to be a bit careful here– the time indexes are on an absolute* numbering scheme so that if you had downsampled the time axis on a previous layer, the time-offsets might all be multiples of (e.g.) 2 or 4, but the height-offsets would normally always be separated by 1. [note: we always normalize the list of (time-offset, height-offset) pairs with the lexicographical ordering that you see above.] This asymmetry between time and height may not be very aesthetic, but the absolute numbering of time is at the core of how the framework works. Note: the offsets don't have to start from zero, they can be less than zero, just like the offsets in TDNNs which are often lists like (-3,0,3). Don't be surprised to see things like:

offsets={ (-3,-1),(-3,0),(-3,1), (0,-1),(0,0),(0,2), (3,-1),(3,0),(3,1) }

If there are negative offsets in the height dimension (as above) it means that there is zero-padding in the height dimension (because the first height-index at both the input and the output is 0, so having a height-offset means that to compute the output at height-index 0 we need the input at height-index -1, which doesn't exist; this implies zero padding on the bottom of the image.

Definition at line 125 of file convolution.h.

Constructor & Destructor Documentation

ConvolutionModel ( )
inline

Definition at line 210 of file convolution.h.

210 { }

Member Function Documentation

bool Check ( bool  check_heights_used = true,
bool  allow_height_padding = true 
) const

Definition at line 129 of file convolution.cc.

References ConvolutionModel::all_time_offsets, ConvolutionModel::ComputeDerived(), ConvolutionModel::height_in, ConvolutionModel::Offset::height_offset, ConvolutionModel::height_out, ConvolutionModel::height_subsample_out, rnnlm::i, kaldi::IsSortedAndUniq(), KALDI_ASSERT, KALDI_WARN, ConvolutionModel::num_filters_in, ConvolutionModel::num_filters_out, ConvolutionModel::offsets, ConvolutionModel::required_time_offsets, ConvolutionModel::Offset::time_offset, and ConvolutionModel::Write().

Referenced by kaldi::nnet3::time_height_convolution::AppendInputFrames(), TimeHeightConvolutionComponent::Check(), kaldi::nnet3::time_height_convolution::GetRandomConvolutionIndexes(), kaldi::nnet3::time_height_convolution::GetRandomConvolutionModel(), TimeHeightConvolutionComponent::InitFromConfig(), kaldi::nnet3::time_height_convolution::PadModelHeight(), and ConvolutionModel::Read().

130  {
131  if (num_filters_in <= 0 || num_filters_out <= 0 ||
132  height_in <= 0 || height_out <= 0 ||
133  height_subsample_out <= 0 || offsets.empty() ||
134  required_time_offsets.empty()) {
135  KALDI_WARN << "Convolution model fails basic check.";
136  return false;
137  }
138  ConvolutionModel temp(*this);
139  temp.ComputeDerived();
140  if (!(temp == *this)) {
141  KALDI_WARN << "Derived variables are incorrect.";
142  return false;
143  }
144  // check that required_time_offsets is included in all_time_offsets.
145  for (std::set<int32>::iterator iter = required_time_offsets.begin();
146  iter != required_time_offsets.end(); ++iter) {
147  if (all_time_offsets.count(*iter) == 0) {
148  KALDI_WARN << "Required time offsets not a subset of all_time_offsets.";
149  return false;
150  }
151  }
153  std::vector<bool> h_in_used(height_in, false);
154  std::vector<bool> offsets_used(offsets.size(), false);
155 
156  // check that in cases where we only have the minimum
157  // required input (from required_time_offsets), each
158  // height in the output is potentially nonzero.
159  for (int32 h_out = 0; h_out < height_out * height_subsample_out;
160  h_out += height_subsample_out) {
161  bool some_input_available = false;
162  for (size_t i = 0; i < offsets.size(); i++) {
163  const Offset &offset = offsets[i];
164  int32 h_in = h_out + offset.height_offset;
165  if (h_in >= 0 && h_in < height_in) {
166  offsets_used[i] = true;
167  h_in_used[h_in] = true;
168  if (required_time_offsets.count(offset.time_offset) != 0)
169  some_input_available = true;
170  } else {
171  if (!allow_height_padding) {
172  KALDI_WARN << "height padding not allowed but is required.";
173  return false;
174  }
175  }
176  }
177  if (!some_input_available) {
178  // none of the
179  // input pixels for this output pixel were available (at least in the case
180  // where we only have the 'required' inputs on the time dimension).
181  std::ostringstream os;
182  Write(os, false);
183  KALDI_WARN << "for the " << (h_out / height_out) << "'th output height, "
184  "no input is available, if only required time-indexes "
185  "are available.";
186  // We could later change this part of the validation code to accept
187  // such models, if there is a legitimate use-case.
188  return false;
189  }
190  }
191  if (check_heights_used) {
192  for (int32 h = 0; h < height_in; h++) {
193  if (!h_in_used[h]) {
194  KALDI_WARN << "The input at the " << h << "'th height is never used.";
195  return false;
196  }
197  }
198  }
199  for (size_t i = 0; i < offsets_used.size(); i++) {
200  if (!offsets_used[i]) {
201  KALDI_WARN << "(time,height) offset (" << offsets[i].time_offset
202  << "," << offsets[i].height_offset
203  << ") of this computation is never used.";
204  return false;
205  }
206  }
207  return true;
208 }
#define KALDI_WARN
Definition: kaldi-error.h:130
void Write(std::ostream &os, bool binary) const
Definition: convolution.cc:224
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
bool IsSortedAndUniq(const std::vector< T > &vec)
Returns true if the vector is sorted and contains each element only once.
Definition: stl-utils.h:63
void ComputeDerived ( )

Definition at line 108 of file convolution.cc.

References ConvolutionModel::all_time_offsets, kaldi::Gcd(), ConvolutionModel::offsets, and ConvolutionModel::time_offsets_modulus.

Referenced by kaldi::nnet3::time_height_convolution::AppendInputFrames(), ConvolutionModel::Check(), kaldi::nnet3::time_height_convolution::GetRandomConvolutionModel(), TimeHeightConvolutionComponent::InitFromConfig(), and ConvolutionModel::Read().

108  {
109  { // compute all_time_offsets
110  all_time_offsets.clear();
111  for (std::vector<Offset>::const_iterator iter = offsets.begin();
112  iter != offsets.end(); ++iter)
113  all_time_offsets.insert(iter->time_offset);
114  }
115  { // compute time_offsets_modulus
117  std::set<int32>::iterator iter = all_time_offsets.begin();
118  int32 cur_offset = *iter;
119  for (++iter; iter != all_time_offsets.end(); ++iter) {
120  int32 this_offset = *iter;
122  this_offset - cur_offset);
123  cur_offset = this_offset;
124  }
125  }
126 }
I Gcd(I m, I n)
Definition: kaldi-math.h:294
std::string Info ( ) const

Definition at line 86 of file convolution.cc.

References ConvolutionModel::height_in, ConvolutionModel::height_out, ConvolutionModel::height_subsample_out, rnnlm::i, ConvolutionModel::InputDim(), ConvolutionModel::num_filters_in, ConvolutionModel::num_filters_out, ConvolutionModel::offsets, ConvolutionModel::OutputDim(), and ConvolutionModel::required_time_offsets.

Referenced by kaldi::nnet3::time_height_convolution::GetRandomConvolutionModel(), TimeHeightConvolutionComponent::Info(), and kaldi::nnet3::time_height_convolution::TestRunningComputation().

86  {
87  std::ostringstream os;
88  os << "num-filters-in=" << num_filters_in
89  << ", num-filters-out=" << num_filters_out
90  << ", height-in=" << height_in
91  << ", height-out=" << height_out
92  << ", height-subsample-out=" << height_subsample_out
93  << ", {time,height}-offsets=[";
94  for (size_t i = 0; i < offsets.size(); i++) {
95  if (i > 0) os << ' ';
96  os << offsets[i].time_offset << ',' << offsets[i].height_offset;
97  }
98  os << "], required-time-offsets=[";
99  for (std::set<int32>::const_iterator iter = required_time_offsets.begin();
100  iter != required_time_offsets.end(); ++iter) {
101  if (iter != required_time_offsets.begin()) os << ',';
102  os << *iter;
103  }
104  os << "], input-dim=" << InputDim() << ", output-dim=" << OutputDim();
105  return os.str();
106 }
bool operator== ( const ConvolutionModel other) const

Definition at line 211 of file convolution.cc.

References ConvolutionModel::all_time_offsets, ConvolutionModel::height_in, ConvolutionModel::height_out, ConvolutionModel::height_subsample_out, ConvolutionModel::num_filters_in, ConvolutionModel::num_filters_out, ConvolutionModel::offsets, ConvolutionModel::required_time_offsets, and ConvolutionModel::time_offsets_modulus.

211  {
212  return num_filters_in == other.num_filters_in &&
213  num_filters_out == other.num_filters_out &&
214  height_in == other.height_in &&
215  height_out == other.height_out &&
216  height_subsample_out == other.height_subsample_out &&
217  offsets == other.offsets &&
218  required_time_offsets == other.required_time_offsets &&
219  all_time_offsets == other.all_time_offsets &&
220  time_offsets_modulus == other.time_offsets_modulus;
221 }
void Read ( std::istream &  is,
bool  binary 
)

Definition at line 251 of file convolution.cc.

References ConvolutionModel::Check(), ConvolutionModel::ComputeDerived(), kaldi::nnet3::ExpectOneOrTwoTokens(), kaldi::nnet3::ExpectToken(), ConvolutionModel::height_in, ConvolutionModel::height_out, ConvolutionModel::height_subsample_out, rnnlm::i, KALDI_ASSERT, ConvolutionModel::num_filters_in, ConvolutionModel::num_filters_out, ConvolutionModel::offsets, kaldi::ReadBasicType(), kaldi::ReadIntegerPairVector(), kaldi::ReadIntegerVector(), and ConvolutionModel::required_time_offsets.

Referenced by TimeHeightConvolutionComponent::Read(), and kaldi::nnet3::time_height_convolution::UnitTestTimeHeightConvolutionIo().

251  {
252  ExpectOneOrTwoTokens(is, binary, "<ConvolutionModel>", "<NumFiltersIn>");
253  ReadBasicType(is, binary, &num_filters_in);
254  ExpectToken(is, binary, "<NumFiltersOut>");
255  ReadBasicType(is, binary, &num_filters_out);
256  ExpectToken(is, binary, "<HeightIn>");
257  ReadBasicType(is, binary, &height_in);
258  ExpectToken(is, binary, "<HeightOut>");
259  ReadBasicType(is, binary, &height_out);
260  ExpectToken(is, binary, "<HeightSubsampleOut>");
261  ReadBasicType(is, binary, &height_subsample_out);
262  ExpectToken(is, binary, "<Offsets>");
263  std::vector<std::pair<int32, int32> > pairs;
264  ReadIntegerPairVector(is, binary, &pairs);
265  offsets.resize(pairs.size());
266  for (size_t i = 0; i < offsets.size(); i++) {
267  offsets[i].time_offset = pairs[i].first;
268  offsets[i].height_offset = pairs[i].second;
269  }
270  std::vector<int32> required_time_offsets_list;
271  ExpectToken(is, binary, "<RequiredTimeOffsets>");
272  ReadIntegerVector(is, binary, &required_time_offsets_list);
273  required_time_offsets.clear();
274  required_time_offsets.insert(required_time_offsets_list.begin(),
275  required_time_offsets_list.end());
276  ExpectToken(is, binary, "</ConvolutionModel>");
277  ComputeDerived();
278  KALDI_ASSERT(Check(false, true));
279 }
void ReadBasicType(std::istream &is, bool binary, T *t)
ReadBasicType is the name of the read function for bool, integer types, and floating-point types...
Definition: io-funcs-inl.h:55
void ExpectOneOrTwoTokens(std::istream &is, bool binary, const std::string &token1, const std::string &token2)
This function is like ExpectToken but for two tokens, and it will either accept token1 and then token...
Definition: nnet-parse.cc:224
void ReadIntegerPairVector(std::istream &is, bool binary, std::vector< std::pair< T, T > > *v)
Function for reading STL vector of pairs of integer types.
Definition: io-funcs-inl.h:131
void ReadIntegerVector(std::istream &is, bool binary, std::vector< T > *v)
Function for reading STL vector of integer types.
Definition: io-funcs-inl.h:232
static void ExpectToken(const std::string &token, const std::string &what_we_are_parsing, const std::string **next_token)
bool Check(bool check_heights_used=true, bool allow_height_padding=true) const
Definition: convolution.cc:129
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
void Write ( std::ostream &  os,
bool  binary 
) const

Definition at line 224 of file convolution.cc.

References ConvolutionModel::height_in, ConvolutionModel::height_out, ConvolutionModel::height_subsample_out, rnnlm::i, ConvolutionModel::num_filters_in, ConvolutionModel::num_filters_out, ConvolutionModel::offsets, ConvolutionModel::required_time_offsets, kaldi::WriteBasicType(), kaldi::WriteIntegerPairVector(), kaldi::WriteIntegerVector(), and kaldi::WriteToken().

Referenced by ConvolutionModel::Check(), kaldi::nnet3::time_height_convolution::UnitTestTimeHeightConvolutionIo(), and TimeHeightConvolutionComponent::Write().

224  {
225  WriteToken(os, binary, "<ConvolutionModel>");
226  WriteToken(os, binary, "<NumFiltersIn>");
227  WriteBasicType(os, binary, num_filters_in);
228  WriteToken(os, binary, "<NumFiltersOut>");
229  WriteBasicType(os, binary, num_filters_out);
230  WriteToken(os, binary, "<HeightIn>");
231  WriteBasicType(os, binary, height_in);
232  WriteToken(os, binary, "<HeightOut>");
233  WriteBasicType(os, binary, height_out);
234  WriteToken(os, binary, "<HeightSubsampleOut>");
236  WriteToken(os, binary, "<Offsets>");
237  std::vector<std::pair<int32, int32> > pairs(offsets.size());
238  for (size_t i = 0; i < offsets.size(); i++) {
239  pairs[i].first = offsets[i].time_offset;
240  pairs[i].second = offsets[i].height_offset;
241  }
242  WriteIntegerPairVector(os, binary, pairs);
243  std::vector<int32> required_time_offsets_list(required_time_offsets.begin(),
244  required_time_offsets.end());
245  WriteToken(os, binary, "<RequiredTimeOffsets>");
246  WriteIntegerVector(os, binary, required_time_offsets_list);
247  WriteToken(os, binary, "</ConvolutionModel>");
248 }
void WriteIntegerPairVector(std::ostream &os, bool binary, const std::vector< std::pair< T, T > > &v)
Function for writing STL vectors of pairs of integer types.
Definition: io-funcs-inl.h:93
void WriteToken(std::ostream &os, bool binary, const char *token)
The WriteToken functions are for writing nonempty sequences of non-space characters.
Definition: io-funcs.cc:134
void WriteIntegerVector(std::ostream &os, bool binary, const std::vector< T > &v)
Function for writing STL vectors of integer types.
Definition: io-funcs-inl.h:198
void WriteBasicType(std::ostream &os, bool binary, T t)
WriteBasicType is the name of the write function for bool, integer types, and floating-point types...
Definition: io-funcs-inl.h:34

Member Data Documentation


The documentation for this struct was generated from the following files: