All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
extract-feature-segments.cc File Reference
Include dependency graph for extract-feature-segments.cc:

Go to the source code of this file.

Functions

int main (int argc, char *argv[])
 This is a program for extracting segments from feature files/archives. More...
 

Function Documentation

int main ( int  argc,
char *  argv[] 
)

This is a program for extracting segments from feature files/archives.

  • usage :
    • extract-feature-segments [options ..] <scriptfile/archive> <segments-file> <features-written-specifier>
    • "segments-file" should have the information of the segments that needs to be extracted from the feature files
    • the format of the segments file : speaker_name filename start_time(in secs) end_time(in secs)
    • "features-written-specifier" is the output segment format

Definition at line 34 of file extract-feature-segments.cc.

References kaldi::ConvertStringToInteger(), kaldi::ConvertStringToReal(), ParseOptions::GetArg(), RandomAccessTableReader< Holder >::HasKey(), KALDI_LOG, KALDI_WARN, ParseOptions::NumArgs(), MatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), ParseOptions::PrintUsage(), ParseOptions::Read(), ParseOptions::Register(), kaldi::SplitStringToVector(), Input::Stream(), RandomAccessTableReader< Holder >::Value(), and TableWriter< Holder >::Write().

34  {
35  try {
36  using namespace kaldi;
37 
38  const char *usage =
39  "Create feature files by segmenting input files.\n"
40  "Usage: "
41  "extract-feature-segments [options...] <feats-rspecifier> "
42  " <segments-file> <feats-wspecifier>\n"
43  " (segments-file has lines like: "
44  "output-utterance-id input-utterance-or-spk-id 1.10 2.36)\n";
45 
46  // construct all the global objects
47  ParseOptions po(usage);
48 
49  BaseFloat min_segment_length = 0.1, // Minimum segment length in seconds.
50  max_overshoot = 0.0; // max time by which last segment can overshoot
51  int32 frame_shift = 10;
52  int32 frame_length = 25;
53  bool snip_edges = true;
54 
55  // Register the options
56  po.Register("min-segment-length", &min_segment_length,
57  "Minimum segment length in seconds (reject shorter segments)");
58  po.Register("frame-length", &frame_length, "Frame length in milliseconds");
59  po.Register("frame-shift", &frame_shift, "Frame shift in milliseconds");
60  po.Register("max-overshoot", &max_overshoot,
61  "End segments overshooting by less (in seconds) are truncated,"
62  " else rejected.");
63  po.Register("snip-edges", &snip_edges,
64  "If true, n_frames frames will be snipped from the end of each "
65  "extracted feature matrix, "
66  "where n_frames = ceil((frame_length - frame_shift) / frame_shift), "
67  "This ensures that only the feature vectors that "
68  "completely fit in the segment are extracted. "
69  "This makes the extracted segment lengths match the lengths of the "
70  "features that have been extracted from already segmented audio.");
71 
72  // OPTION PARSING ...
73  // parse options (+filling the registered variables)
74  po.Read(argc, argv);
75  // number of arguments should be 3
76  // (scriptfile, segments file and outputwav write mode)
77  if (po.NumArgs() != 3) {
78  po.PrintUsage();
79  exit(1);
80  }
81 
82  std::string rspecifier = po.GetArg(1); // get script file/feature archive
83  std::string segments_rxfilename = po.GetArg(2); // get segment file
84  std::string wspecifier = po.GetArg(3); // get written archive name
85 
86  BaseFloatMatrixWriter feat_writer(wspecifier);
87 
88  RandomAccessBaseFloatMatrixReader feat_reader(rspecifier);
89 
90  Input ki(segments_rxfilename); // no binary argment: never binary.
91 
92  int32 num_lines = 0, num_success = 0;
93 
94  int32 snip_length = 0;
95  if (snip_edges) {
96  snip_length = static_cast<int32>(ceil(
97  1.0 * (frame_length - frame_shift) / frame_shift));
98  }
99 
100  std::string line;
101  /* read each line from segments file */
102  while (std::getline(ki.Stream(), line)) {
103  num_lines++;
104  std::vector<std::string> split_line;
105  // Split the line by space or tab and check the number of fields in each
106  // line. There must be 4 fields--segment name , reacording wav file name,
107  // start time, end time; 5th field (channel info) is optional.
108  SplitStringToVector(line, " \t\r", true, &split_line);
109  if (split_line.size() != 4 && split_line.size() != 5) {
110  KALDI_WARN << "Invalid line in segments file: " << line;
111  continue;
112  }
113  std::string segment = split_line[0],
114  utterance = split_line[1],
115  start_str = split_line[2],
116  end_str = split_line[3];
117 
118  // Convert the start time and endtime to real from string. Segment is
119  // ignored if start or end time cannot be converted to real.
120  double start, end;
121  if (!ConvertStringToReal(start_str, &start)) {
122  KALDI_WARN << "Invalid line in segments file [bad start]: " << line;
123  continue;
124  }
125  if (!ConvertStringToReal(end_str, &end)) {
126  KALDI_WARN << "Invalid line in segments file [bad end]: " << line;
127  continue;
128  }
129 
130  // start time must not be negative; start time must not be greater than
131  // end time, except if end time is -1
132  if (start < 0 || end <= 0 || start >= end) {
133  KALDI_WARN << "Invalid line in segments file "
134  "[empty or invalid segment]: "
135  << line;
136  continue;
137  }
138  int32 channel = -1; // means channel info is unspecified.
139  // if each line has 5 elements then 5th element must be channel identifier
140  if (split_line.size() == 5) {
141  if (!ConvertStringToInteger(split_line[4], &channel) || channel < 0) {
142  KALDI_WARN<< "Invalid line in segments file [bad channel]: " << line;
143  continue;
144  }
145  }
146 
147  /* check whether a segment start time and end time exists in utterance
148  * if fails , skips the segment.
149  */
150  if (!feat_reader.HasKey(utterance)) {
151  KALDI_WARN << "Did not find features for utterance " << utterance
152  << ", skipping segment " << segment;
153  continue;
154  }
155  const Matrix<BaseFloat> &feats = feat_reader.Value(utterance);
156  // total number of samples present in wav data
157  int32 num_samp = feats.NumRows();
158  // total number of channels present in wav file
159  int32 num_chan = feats.NumCols();
160  // Convert start & end times of the segment to corresponding sample number
161  int32 start_samp = static_cast<int32>(round(
162  (start * 1000.0 / frame_shift)));
163  int32 end_samp = static_cast<int32>(round(end * 1000.0 / frame_shift));
164 
165  if (snip_edges) {
166  // snip the edge at the end of the segment (usually 2 frames),
167  end_samp -= snip_length;
168  }
169 
170  /* start sample must be less than total number of samples
171  * otherwise skip the segment
172  */
173  if (start_samp < 0 || start_samp >= num_samp) {
174  KALDI_WARN << "Start sample out of range " << start_samp
175  << " [length:] " << num_samp << "x" << num_chan
176  << ", skipping segment " << segment;
177  continue;
178  }
179 
180  /* end sample must be less than total number samples
181  * otherwise skip the segment
182  */
183  if (end_samp > num_samp) {
184  if (end_samp >= num_samp
185  + static_cast<int32>(
186  round(max_overshoot * 1000.0 / frame_shift))) {
187  KALDI_WARN<< "End sample too far out of range " << end_samp
188  << " [length:] " << num_samp << "x" << num_chan
189  << ", skipping segment "
190  << segment;
191  continue;
192  }
193  end_samp = num_samp; // for small differences, just truncate.
194  }
195 
196  /* check whether the segment size is less than minimum segment length(default 0.1 sec)
197  * if yes, skip the segment
198  */
199  if (end_samp
200  <= start_samp
201  + static_cast<int32>(round(
202  (min_segment_length * 1000.0 / frame_shift)))) {
203  KALDI_WARN<< "Segment " << segment << " too short, skipping it.";
204  continue;
205  }
206 
207  SubMatrix<BaseFloat> segment_matrix(feats, start_samp,
208  end_samp-start_samp, 0, num_chan);
209  Matrix<BaseFloat> outmatrix(segment_matrix);
210  // write segment in feature archive.
211  feat_writer.Write(segment, outmatrix);
212  num_success++;
213  }
214  KALDI_LOG << "Successfully processed " << num_success << " lines out of "
215  << num_lines << " in the segments file. ";
216  /* prints number of segments processed */
217  if (num_success == 0) return -1;
218  return 0;
219  } catch(const std::exception &e) {
220  std::cerr << e.what();
221  return -1;
222  }
223 }
Relabels neural network egs with the read pdf-id alignments.
Definition: chain.dox:20
bool ConvertStringToInteger(const std::string &str, Int *out)
Converts a string into an integer via strtoll and returns false if there was any kind of problem (i...
Definition: text-utils.h:118
A templated class for writing objects to an archive or script file; see The Table concept...
Definition: kaldi-table.h:366
Allows random access to a collection of objects in an archive or script file; see The Table concept...
Definition: kaldi-table.h:233
float BaseFloat
Definition: kaldi-types.h:29
The class ParseOptions is for parsing command-line options; see Parsing command-line options for more...
Definition: parse-options.h:36
void SplitStringToVector(const std::string &full, const char *delim, bool omit_empty_strings, std::vector< std::string > *out)
Split a string using any of the single character delimiters.
Definition: text-utils.cc:63
bool ConvertStringToReal(const std::string &str, T *out)
ConvertStringToReal converts a string into either float or double and returns false if there was any ...
Definition: text-utils.cc:238
#define KALDI_WARN
Definition: kaldi-error.h:130
MatrixIndexT NumRows() const
Returns number of rows (or zero for emtpy matrix).
Definition: kaldi-matrix.h:58
MatrixIndexT NumCols() const
Returns number of columns (or zero for emtpy matrix).
Definition: kaldi-matrix.h:61
#define KALDI_LOG
Definition: kaldi-error.h:133
Sub-matrix representation.
Definition: kaldi-matrix.h:908