All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
extract-segments.cc File Reference
Include dependency graph for extract-segments.cc:

Go to the source code of this file.

Functions

int main (int argc, char *argv[])
 This is the main program for extracting segments from a wav file. More...
 

Function Documentation

int main ( int  argc,
char *  argv[] 
)

This is the main program for extracting segments from a wav file.

  • usage :
    • extract-segments [options ..] <scriptfile> <segments-file> <wav-written-specifier>
    • "scriptfile" must contain full path of the wav file.
    • "segments-file" should have the information of the segments that needs to be extracted from wav file
    • the format of the segments file : speaker_name wavfilename start_time(in secs) end_time(in secs) channel-id(0 or 1)
    • The channel-id is 0 for the left channel and 1 for the right channel. This is not required for mono recordings.
    • "wav-written-specifier" is the output segment format

Definition at line 34 of file extract-segments.cc.

References kaldi::ConvertStringToInteger(), kaldi::ConvertStringToReal(), WaveData::Data(), ParseOptions::GetArg(), RandomAccessTableReader< Holder >::HasKey(), KALDI_ASSERT, KALDI_ERR, KALDI_LOG, KALDI_WARN, ParseOptions::NumArgs(), MatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), ParseOptions::PrintUsage(), ParseOptions::Read(), ParseOptions::Register(), WaveData::SampFreq(), kaldi::SplitStringToVector(), Input::Stream(), RandomAccessTableReader< Holder >::Value(), and TableWriter< Holder >::Write().

34  {
35  try {
36  using namespace kaldi;
37 
38  const char *usage =
39  "Extract segments from a large audio file in WAV format.\n"
40  "Usage: extract-segments [options] <wav-rspecifier> <segments-file> <wav-wspecifier>\n"
41  "e.g. extract-segments scp:wav.scp segments ark:- | <some-other-program>\n"
42  " segments-file format: each line is either\n"
43  "<segment-id> <recording-id> <start-time> <end-time>\n"
44  "e.g. call-861225-A-0050-0065 call-861225-A 5.0 6.5\n"
45  "or (less frequently, and not supported in scripts):\n"
46  "<segment-id> <wav-file-name> <start-time> <end-time> <channel>\n"
47  "where <channel> will normally be 0 (left) or 1 (right)\n"
48  "e.g. call-861225-A-0050-0065 call-861225 5.0 6.5 1\n"
49  "And <end-time> of -1 means the segment runs till the end of the WAV file\n"
50  "See also: extract-rows, which does the same thing but to feature files,\n"
51  " wav-copy, wav-to-duration\n";
52 
53  ParseOptions po(usage);
54  BaseFloat min_segment_length = 0.1, // Minimum segment length in seconds.
55  max_overshoot = 0.5; // max time by which last segment can overshoot
56  po.Register("min-segment-length", &min_segment_length,
57  "Minimum segment length in seconds (reject shorter segments)");
58  po.Register("max-overshoot", &max_overshoot,
59  "End segments overshooting audio by less than this (in seconds) "
60  "are truncated, else rejected.");
61 
62  po.Read(argc, argv);
63  if (po.NumArgs() != 3) {
64  po.PrintUsage();
65  exit(1);
66  }
67 
68  std::string wav_rspecifier = po.GetArg(1);
69  std::string segments_rxfilename = po.GetArg(2);
70  std::string wav_wspecifier = po.GetArg(3);
71 
72  RandomAccessTableReader<WaveHolder> reader(wav_rspecifier);
73  TableWriter<WaveHolder> writer(wav_wspecifier);
74  Input ki(segments_rxfilename); // no binary argment: never binary.
75 
76  int32 num_lines = 0, num_success = 0;
77 
78  std::string line;
79  /* read each line from segments file */
80  while (std::getline(ki.Stream(), line)) {
81  num_lines++;
82  std::vector<std::string> split_line;
83  // Split the line by space or tab and check the number of fields in each
84  // line. There must be 4 fields--segment name , reacording wav file name,
85  // start time, end time; 5th field (channel info) is optional.
86  SplitStringToVector(line, " \t\r", true, &split_line);
87  if (split_line.size() != 4 && split_line.size() != 5) {
88  KALDI_WARN << "Invalid line in segments file: " << line;
89  continue;
90  }
91  std::string segment = split_line[0],
92  recording = split_line[1],
93  start_str = split_line[2],
94  end_str = split_line[3];
95 
96  // Convert the start time and endtime to real from string. Segment is
97  // ignored if start or end time cannot be converted to real.
98  double start, end;
99  if (!ConvertStringToReal(start_str, &start)) {
100  KALDI_WARN << "Invalid line in segments file [bad start]: " << line;
101  continue;
102  }
103  if (!ConvertStringToReal(end_str, &end)) {
104  KALDI_WARN << "Invalid line in segments file [bad end]: " << line;
105  continue;
106  }
107  // start time must not be negative; start time must not be greater than
108  // end time, except if end time is -1
109  if (start < 0 || (end != -1.0 && end <= 0) || ((start >= end) && (end > 0))) {
110  KALDI_WARN << "Invalid line in segments file [empty or invalid segment]: "
111  << line;
112  continue;
113  }
114  int32 channel = -1; // means channel info is unspecified.
115  // if each line has 5 elements then 5th element must be channel identifier
116  if (split_line.size() == 5) {
117  if (!ConvertStringToInteger(split_line[4], &channel) || channel < 0) {
118  KALDI_WARN << "Invalid line in segments file [bad channel]: " << line;
119  continue;
120  }
121  }
122  /* check whether a segment start time and end time exists in recording
123  * if fails , skips the segment.
124  */
125  if (!reader.HasKey(recording)) {
126  KALDI_WARN << "Could not find recording " << recording
127  << ", skipping segment " << segment;
128  continue;
129  }
130 
131  const WaveData &wave = reader.Value(recording);
132  const Matrix<BaseFloat> &wave_data = wave.Data();
133  BaseFloat samp_freq = wave.SampFreq(); // read sampling fequency
134  int32 num_samp = wave_data.NumCols(), // number of samples in recording
135  num_chan = wave_data.NumRows(); // number of channels in recording
136 
137  // Convert starting time of the segment to corresponding sample number.
138  // If end time is -1 then use the whole file starting from start time.
139  int32 start_samp = start * samp_freq,
140  end_samp = (end != -1)? (end * samp_freq) : num_samp;
141  KALDI_ASSERT(start_samp >= 0 && end_samp > 0 && "Invalid start or end.");
142 
143  // start sample must be less than total number of samples,
144  // otherwise skip the segment
145  if (start_samp < 0 || start_samp >= num_samp) {
146  KALDI_WARN << "Start sample out of range " << start_samp << " [length:] "
147  << num_samp << ", skipping segment " << segment;
148  continue;
149  }
150  /* end sample must be less than total number samples
151  * otherwise skip the segment
152  */
153  if (end_samp > num_samp) {
154  if ((end_samp >=
155  num_samp + static_cast<int32>(max_overshoot * samp_freq))) {
156  KALDI_WARN << "End sample too far out of range " << end_samp
157  << " [length:] " << num_samp << ", skipping segment "
158  << segment;
159  continue;
160  }
161  end_samp = num_samp; // for small differences, just truncate.
162  }
163  // Skip if segment size is less than minimum segment length (default 0.1s)
164  if (end_samp <=
165  start_samp + static_cast<int32>(min_segment_length * samp_freq)) {
166  KALDI_WARN << "Segment " << segment << " too short, skipping it.";
167  continue;
168  }
169  /* check whether the wav file has more than one channel
170  * if yes, specify the channel info in segments file
171  * otherwise skips the segment
172  */
173  if (channel == -1) {
174  if (num_chan == 1) channel = 0;
175  else {
176  KALDI_ERR << "If your data has multiple channels, you must specify the"
177  " channel in the segments file. Processing segment " << segment;
178  }
179  } else {
180  if (channel >= num_chan) {
181  KALDI_WARN << "Invalid channel " << channel << " >= " << num_chan
182  << ", processing segment " << segment;
183  continue;
184  }
185  }
186  /*
187  * This function return a portion of a wav data from the orignial wav data matrix
188  */
189  SubMatrix<BaseFloat> segment_matrix(wave_data, channel, 1, start_samp, end_samp-start_samp);
190  WaveData segment_wave(samp_freq, segment_matrix);
191  writer.Write(segment, segment_wave); // write segment in wave format.
192  num_success++;
193  }
194  KALDI_LOG << "Successfully processed " << num_success << " lines out of "
195  << num_lines << " in the segments file. ";
196  /* prints number of segments processed */
197  return 0;
198  } catch(const std::exception &e) {
199  std::cerr << e.what();
200  return -1;
201  }
202 }
Relabels neural network egs with the read pdf-id alignments.
Definition: chain.dox:20
bool ConvertStringToInteger(const std::string &str, Int *out)
Converts a string into an integer via strtoll and returns false if there was any kind of problem (i...
Definition: text-utils.h:118
A templated class for writing objects to an archive or script file; see The Table concept...
Definition: kaldi-table.h:366
const Matrix< BaseFloat > & Data() const
Definition: wave-reader.h:124
Allows random access to a collection of objects in an archive or script file; see The Table concept...
Definition: kaldi-table.h:233
BaseFloat SampFreq() const
Definition: wave-reader.h:126
float BaseFloat
Definition: kaldi-types.h:29
The class ParseOptions is for parsing command-line options; see Parsing command-line options for more...
Definition: parse-options.h:36
void SplitStringToVector(const std::string &full, const char *delim, bool omit_empty_strings, std::vector< std::string > *out)
Split a string using any of the single character delimiters.
Definition: text-utils.cc:63
#define KALDI_ERR
Definition: kaldi-error.h:127
bool ConvertStringToReal(const std::string &str, T *out)
ConvertStringToReal converts a string into either float or double and returns false if there was any ...
Definition: text-utils.cc:238
#define KALDI_WARN
Definition: kaldi-error.h:130
This class's purpose is to read in Wave files.
Definition: wave-reader.h:106
MatrixIndexT NumRows() const
Returns number of rows (or zero for emtpy matrix).
Definition: kaldi-matrix.h:58
MatrixIndexT NumCols() const
Returns number of columns (or zero for emtpy matrix).
Definition: kaldi-matrix.h:61
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
#define KALDI_LOG
Definition: kaldi-error.h:133
Sub-matrix representation.
Definition: kaldi-matrix.h:908