NnetExample Struct Reference

NnetExample is the input data and corresponding label (or labels) for one or more frames of input, used for standard cross-entropy training of neural nets (and possibly for other objective functions). More...

#include <nnet-example.h>

Collaboration diagram for NnetExample:

Public Member Functions

void Write (std::ostream &os, bool binary) const
 
void Read (std::istream &is, bool binary)
 
 NnetExample ()
 
 NnetExample (const NnetExample &input, int32 start_frame, int32 num_frames, int32 left_context, int32 right_context)
 This constructor can be used to extract one or more frames from an example that has multiple frames, and possibly truncate the context. More...
 
void SetLabelSingle (int32 frame, int32 pdf_id, BaseFloat weight=1.0)
 Set the label of this frame of this example to the specified pdf_id with the specified weight. More...
 
int32 GetLabelSingle (int32 frame, BaseFloat *weight=NULL)
 Get the maximum weight label (pdf_id and weight) of this frame of this example. More...
 

Public Attributes

std::vector< std::vector< std::pair< int32, BaseFloat > > > labels
 The label(s) for each frame in a sequence of frames; in the normal case, this will be just [ [ (pdf-id, 1.0) ] ], i.e. More...
 
CompressedMatrix input_frames
 The input data, with NumRows() >= labels.size() + left_context; it includes features to the left and right as needed for the temporal context of the network. More...
 
int32 left_context
 The number of frames of left context (we can work out the #frames of right context from input_frames.NumRows(), labels.size(), and this). More...
 
Vector< BaseFloatspk_info
 The speaker-specific input, if any, or an empty vector if we're not using this features. More...
 

Detailed Description

NnetExample is the input data and corresponding label (or labels) for one or more frames of input, used for standard cross-entropy training of neural nets (and possibly for other objective functions).

In the normal case there will be just one frame, and one label, with a weight of 1.0.

Definition at line 36 of file nnet-example.h.

Constructor & Destructor Documentation

◆ NnetExample() [1/2]

NnetExample ( )
inline

Definition at line 63 of file nnet-example.h.

References NnetExample::GetLabelSingle(), and NnetExample::SetLabelSingle().

63 { }

◆ NnetExample() [2/2]

NnetExample ( const NnetExample input,
int32  start_frame,
int32  num_frames,
int32  left_context,
int32  right_context 
)

This constructor can be used to extract one or more frames from an example that has multiple frames, and possibly truncate the context.

Most of its behavior is obvious from the variable names, but note the following: if left_context is -1, we use the left-context of the input; the same for right_context. If start_frame < 0 we start the labels from frame 0 of the labeled frames of ,input; if num_frames == -1 we go to the end of the labeled input from start_frame. If start_frame + num_frames is greater than the number of frames of labels of input, we output as much as we can instead of crashing. The same with left_context and right_context– if we can't provide the requested context we won't crash but will provide as much as we can, although in this case we'll print a warning (once).

Definition at line 159 of file nnet-example.cc.

References NnetExample::input_frames, KALDI_ASSERT, KALDI_WARN, NnetExample::labels, NnetExample::left_context, kaldi::nnet2::nnet_example_warned_right, CompressedMatrix::NumCols(), CompressedMatrix::NumRows(), and CompressedMatrix::Swap().

163  : spk_info(input.spk_info) {
164  int32 num_label_frames = input.labels.size();
165  if (start_frame < 0) start_frame = 0; // start_frame is offset in the labeled
166  // frames.
167  KALDI_ASSERT(start_frame < num_label_frames);
168  if (start_frame + new_num_frames > num_label_frames || new_num_frames == -1)
169  new_num_frames = num_label_frames - start_frame;
170  // compute right-context of input.
171  int32 input_right_context =
172  input.input_frames.NumRows() - input.left_context - num_label_frames;
173  if (new_left_context == -1) new_left_context = input.left_context;
174  if (new_right_context == -1) new_right_context = input_right_context;
175  if (new_left_context > input.left_context) {
178  KALDI_WARN << "Requested left-context " << new_left_context
179  << " exceeds input left-context " << input.left_context
180  << ", will not warn again.";
181  }
182  new_left_context = input.left_context;
183  }
184  if (new_right_context > input_right_context) {
187  KALDI_WARN << "Requested right-context " << new_right_context
188  << " exceeds input right-context " << input_right_context
189  << ", will not warn again.";
190  }
191  new_right_context = input_right_context;
192  }
193 
194  int32 new_tot_frames = new_left_context + new_num_frames + new_right_context,
195  left_frames_lost = (input.left_context - new_left_context) + start_frame;
196 
197  CompressedMatrix new_input_frames(input.input_frames,
198  left_frames_lost,
199  new_tot_frames,
200  0, input.input_frames.NumCols());
201  new_input_frames.Swap(&input_frames); // swap with class-member.
202  left_context = new_left_context; // set class-member.
203  labels.clear();
204  labels.insert(labels.end(),
205  input.labels.begin() + start_frame,
206  input.labels.begin() + start_frame + new_num_frames);
207 }
CompressedMatrix input_frames
The input data, with NumRows() >= labels.size() + left_context; it includes features to the left and ...
Definition: nnet-example.h:49
int32 left_context
The number of frames of left context (we can work out the #frames of right context from input_frames...
Definition: nnet-example.h:53
kaldi::int32 int32
static bool nnet_example_warned_right
#define KALDI_WARN
Definition: kaldi-error.h:150
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
static bool nnet_example_warned_left
std::vector< std::vector< std::pair< int32, BaseFloat > > > labels
The label(s) for each frame in a sequence of frames; in the normal case, this will be just [ [ (pdf-i...
Definition: nnet-example.h:43
Vector< BaseFloat > spk_info
The speaker-specific input, if any, or an empty vector if we&#39;re not using this features.
Definition: nnet-example.h:58

Member Function Documentation

◆ GetLabelSingle()

int32 GetLabelSingle ( int32  frame,
BaseFloat weight = NULL 
)

Get the maximum weight label (pdf_id and weight) of this frame of this example.

Definition at line 140 of file nnet-example.cc.

References rnnlm::i, KALDI_ASSERT, and NnetExample::labels.

Referenced by NnetExample::NnetExample().

140  {
141  BaseFloat max = -1.0;
142  int32 pdf_id = -1;
143  KALDI_ASSERT(static_cast<size_t>(frame) < labels.size());
144  for (int32 i = 0; i < labels[frame].size(); i++) {
145  if (labels[frame][i].second > max) {
146  pdf_id = labels[frame][i].first;
147  max = labels[frame][i].second;
148  }
149  }
150  if (weight != NULL) *weight = max;
151  return pdf_id;
152 }
kaldi::int32 int32
float BaseFloat
Definition: kaldi-types.h:29
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
std::vector< std::vector< std::pair< int32, BaseFloat > > > labels
The label(s) for each frame in a sequence of frames; in the normal case, this will be just [ [ (pdf-i...
Definition: nnet-example.h:43

◆ Read()

void Read ( std::istream &  is,
bool  binary 
)

Definition at line 80 of file nnet-example.cc.

References kaldi::ExpectToken(), rnnlm::i, NnetExample::input_frames, KALDI_ASSERT, KALDI_ERR, NnetExample::labels, NnetExample::left_context, CompressedMatrix::Read(), kaldi::ReadBasicType(), kaldi::ReadIntegerVector(), kaldi::ReadToken(), and NnetExample::spk_info.

80  {
81  // Note: weight, label, input_frames, left_context and spk_info are members.
82  // This is a struct.
83  ExpectToken(is, binary, "<NnetExample>");
84 
85  std::string token;
86  ReadToken(is, binary, &token);
87  if (!strcmp(token.c_str(), "<Lab1>")) { // simple label format
88  std::vector<int32> simple_labels;
89  ReadIntegerVector(is, binary, &simple_labels);
90  labels.resize(simple_labels.size());
91  for (size_t i = 0; i < simple_labels.size(); i++) {
92  labels[i].resize(1);
93  labels[i][0].first = simple_labels[i];
94  labels[i][0].second = 1.0;
95  }
96  } else if (!strcmp(token.c_str(), "<Lab2>")) { // generic label format
97  int32 num_frames;
98  ReadBasicType(is, binary, &num_frames);
99  KALDI_ASSERT(num_frames > 0);
100  labels.resize(num_frames);
101  for (int32 t = 0; t < num_frames; t++) {
102  int32 size;
103  ReadBasicType(is, binary, &size);
104  KALDI_ASSERT(size >= 0);
105  labels[t].resize(size);
106  for (int32 i = 0; i < size; i++) {
107  ReadBasicType(is, binary, &(labels[t][i].first));
108  ReadBasicType(is, binary, &(labels[t][i].second));
109  }
110  }
111  } else if (!strcmp(token.c_str(), "<Labels>")) { // back-compatibility
112  labels.resize(1); // old format had 1 frame of labels.
113  int32 size;
114  ReadBasicType(is, binary, &size);
115  labels[0].resize(size);
116  for (int32 i = 0; i < size; i++) {
117  ReadBasicType(is, binary, &(labels[0][i].first));
118  ReadBasicType(is, binary, &(labels[0][i].second));
119  }
120  } else {
121  KALDI_ERR << "Expected token <Lab1>, <Lab2> or <Labels>, got " << token;
122  }
123  ExpectToken(is, binary, "<InputFrames>");
124  input_frames.Read(is, binary);
125  ExpectToken(is, binary, "<LeftContext>"); // Note: this member is
126  // recently added, but I don't think we'll get too much back-compatibility
127  // problems from not handling the old format.
128  ReadBasicType(is, binary, &left_context);
129  ExpectToken(is, binary, "<SpkInfo>");
130  spk_info.Read(is, binary);
131  ExpectToken(is, binary, "</NnetExample>");
132 }
CompressedMatrix input_frames
The input data, with NumRows() >= labels.size() + left_context; it includes features to the left and ...
Definition: nnet-example.h:49
void ReadBasicType(std::istream &is, bool binary, T *t)
ReadBasicType is the name of the read function for bool, integer types, and floating-point types...
Definition: io-funcs-inl.h:55
int32 left_context
The number of frames of left context (we can work out the #frames of right context from input_frames...
Definition: nnet-example.h:53
kaldi::int32 int32
void ReadToken(std::istream &is, bool binary, std::string *str)
ReadToken gets the next token and puts it in str (exception on failure).
Definition: io-funcs.cc:154
void Read(std::istream &is, bool binary)
void ReadIntegerVector(std::istream &is, bool binary, std::vector< T > *v)
Function for reading STL vector of integer types.
Definition: io-funcs-inl.h:232
void ExpectToken(std::istream &is, bool binary, const char *token)
ExpectToken tries to read in the given token, and throws an exception on failure. ...
Definition: io-funcs.cc:191
#define KALDI_ERR
Definition: kaldi-error.h:147
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
std::vector< std::vector< std::pair< int32, BaseFloat > > > labels
The label(s) for each frame in a sequence of frames; in the normal case, this will be just [ [ (pdf-i...
Definition: nnet-example.h:43
Vector< BaseFloat > spk_info
The speaker-specific input, if any, or an empty vector if we&#39;re not using this features.
Definition: nnet-example.h:58

◆ SetLabelSingle()

void SetLabelSingle ( int32  frame,
int32  pdf_id,
BaseFloat  weight = 1.0 
)

Set the label of this frame of this example to the specified pdf_id with the specified weight.

Definition at line 134 of file nnet-example.cc.

References KALDI_ASSERT, and NnetExample::labels.

Referenced by NnetExample::NnetExample().

134  {
135  KALDI_ASSERT(static_cast<size_t>(frame) < labels.size());
136  labels[frame].clear();
137  labels[frame].push_back(std::make_pair(pdf_id, weight));
138 }
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
std::vector< std::vector< std::pair< int32, BaseFloat > > > labels
The label(s) for each frame in a sequence of frames; in the normal case, this will be just [ [ (pdf-i...
Definition: nnet-example.h:43

◆ Write()

void Write ( std::ostream &  os,
bool  binary 
) const

Definition at line 46 of file nnet-example.cc.

References kaldi::nnet2::HasSimpleLabels(), rnnlm::i, NnetExample::input_frames, NnetExample::labels, NnetExample::left_context, NnetExample::spk_info, CompressedMatrix::Write(), kaldi::WriteBasicType(), kaldi::WriteIntegerVector(), and kaldi::WriteToken().

46  {
47  // Note: weight, label, input_frames and spk_info are members. This is a
48  // struct.
49  WriteToken(os, binary, "<NnetExample>");
50 
51  // At this point, we write <Lab1> if we have "simple" labels, or
52  // <Lab2> in general. Previous code (when we had only one frame of
53  // labels) just wrote <Labels>.
54  std::vector<int32> simple_labels;
55  if (HasSimpleLabels(*this, &simple_labels)) {
56  WriteToken(os, binary, "<Lab1>");
57  WriteIntegerVector(os, binary, simple_labels);
58  } else {
59  WriteToken(os, binary, "<Lab2>");
60  int32 num_frames = labels.size();
61  WriteBasicType(os, binary, num_frames);
62  for (int32 t = 0; t < num_frames; t++) {
63  int32 size = labels[t].size();
64  WriteBasicType(os, binary, size);
65  for (int32 i = 0; i < size; i++) {
66  WriteBasicType(os, binary, labels[t][i].first);
67  WriteBasicType(os, binary, labels[t][i].second);
68  }
69  }
70  }
71  WriteToken(os, binary, "<InputFrames>");
72  input_frames.Write(os, binary);
73  WriteToken(os, binary, "<LeftContext>");
74  WriteBasicType(os, binary, left_context);
75  WriteToken(os, binary, "<SpkInfo>");
76  spk_info.Write(os, binary);
77  WriteToken(os, binary, "</NnetExample>");
78 }
CompressedMatrix input_frames
The input data, with NumRows() >= labels.size() + left_context; it includes features to the left and ...
Definition: nnet-example.h:49
int32 left_context
The number of frames of left context (we can work out the #frames of right context from input_frames...
Definition: nnet-example.h:53
kaldi::int32 int32
void Write(std::ostream &os, bool binary) const
bool HasSimpleLabels(const NnetExample &eg, std::vector< int32 > *simple_labels)
Definition: nnet-example.cc:32
void WriteToken(std::ostream &os, bool binary, const char *token)
The WriteToken functions are for writing nonempty sequences of non-space characters.
Definition: io-funcs.cc:134
void WriteIntegerVector(std::ostream &os, bool binary, const std::vector< T > &v)
Function for writing STL vectors of integer types.
Definition: io-funcs-inl.h:198
void WriteBasicType(std::ostream &os, bool binary, T t)
WriteBasicType is the name of the write function for bool, integer types, and floating-point types...
Definition: io-funcs-inl.h:34
std::vector< std::vector< std::pair< int32, BaseFloat > > > labels
The label(s) for each frame in a sequence of frames; in the normal case, this will be just [ [ (pdf-i...
Definition: nnet-example.h:43
Vector< BaseFloat > spk_info
The speaker-specific input, if any, or an empty vector if we&#39;re not using this features.
Definition: nnet-example.h:58

Member Data Documentation

◆ input_frames

CompressedMatrix input_frames

The input data, with NumRows() >= labels.size() + left_context; it includes features to the left and right as needed for the temporal context of the network.

The features corresponding to labels[0] would be in the row with index equal to left_context.

Definition at line 49 of file nnet-example.h.

Referenced by DiscriminativeNnetExample::Check(), main(), NnetExample::NnetExample(), kaldi::nnet2::ProcessFile(), NnetExample::Read(), DiscriminativeNnetExample::Read(), NnetExample::Write(), and DiscriminativeNnetExample::Write().

◆ labels

std::vector<std::vector<std::pair<int32, BaseFloat> > > labels

The label(s) for each frame in a sequence of frames; in the normal case, this will be just [ [ (pdf-id, 1.0) ] ], i.e.

one frame with one label. Top-level index is the frame index; then for each frame, a list of pdf-ids each with its weight. In some contexts, we will require that labels.size() == 1.

Definition at line 43 of file nnet-example.h.

Referenced by NnetExample::GetLabelSingle(), kaldi::nnet2::HasSimpleLabels(), main(), NnetExample::NnetExample(), kaldi::nnet2::ProcessFile(), NnetExample::Read(), NnetExample::SetLabelSingle(), and NnetExample::Write().

◆ left_context

int32 left_context

The number of frames of left context (we can work out the #frames of right context from input_frames.NumRows(), labels.size(), and this).

Definition at line 53 of file nnet-example.h.

Referenced by DiscriminativeNnetExample::Check(), main(), NnetExample::NnetExample(), kaldi::nnet2::ProcessFile(), NnetExample::Read(), DiscriminativeNnetExample::Read(), NnetExample::Write(), and DiscriminativeNnetExample::Write().

◆ spk_info

Vector<BaseFloat> spk_info

The speaker-specific input, if any, or an empty vector if we're not using this features.

We'll append this to the features for each of the frames.

Definition at line 58 of file nnet-example.h.

Referenced by main(), kaldi::nnet2::ProcessFile(), NnetExample::Read(), DiscriminativeNnetExample::Read(), NnetExample::Write(), and DiscriminativeNnetExample::Write().


The documentation for this struct was generated from the following files: