extract-segments.cc File Reference
Include dependency graph for extract-segments.cc:

Go to the source code of this file.

Functions

int main (int argc, char *argv[])
 This is the main program for extracting segments from a wav file. More...
 

Function Documentation

◆ main()

int main ( int  argc,
char *  argv[] 
)

This is the main program for extracting segments from a wav file.

  • usage :
    • extract-segments [options ..] <scriptfile> <segments-file> <wav-written-specifier>
    • "scriptfile" must contain full path of the wav file.
    • "segments-file" should have the information of the segments that needs to be extracted from wav file
    • the format of the segments file : speaker_name wavfilename start_time(in secs) end_time(in secs) channel-id(0 or 1)
    • The channel-id is 0 for the left channel and 1 for the right channel. This is not required for mono recordings.
    • "wav-written-specifier" is the output segment format

Definition at line 34 of file extract-segments.cc.

References kaldi::ConvertStringToInteger(), kaldi::ConvertStringToReal(), WaveData::Data(), ParseOptions::GetArg(), RandomAccessTableReader< Holder >::HasKey(), KALDI_ASSERT, KALDI_ERR, KALDI_LOG, KALDI_WARN, ParseOptions::NumArgs(), MatrixBase< Real >::NumCols(), MatrixBase< Real >::NumRows(), ParseOptions::PrintUsage(), ParseOptions::Read(), ParseOptions::Register(), WaveData::SampFreq(), kaldi::SplitStringToVector(), Input::Stream(), RandomAccessTableReader< Holder >::Value(), and TableWriter< Holder >::Write().

34  {
35  try {
36  using namespace kaldi;
37 
38  const char *usage =
39  "Extract segments from a large audio file in WAV format.\n"
40  "Usage: extract-segments [options] <wav-rspecifier> <segments-file> <wav-wspecifier>\n"
41  "e.g. extract-segments scp:wav.scp segments ark:- | <some-other-program>\n"
42  " segments-file format: each line is either\n"
43  "<segment-id> <recording-id> <start-time> <end-time>\n"
44  "e.g. call-861225-A-0050-0065 call-861225-A 5.0 6.5\n"
45  "or (less frequently, and not supported in scripts):\n"
46  "<segment-id> <wav-file-name> <start-time> <end-time> <channel>\n"
47  "where <channel> will normally be 0 (left) or 1 (right)\n"
48  "e.g. call-861225-A-0050-0065 call-861225 5.0 6.5 1\n"
49  "And <end-time> of -1 means the segment runs till the end of the WAV file\n"
50  "See also: extract-feature-segments, wav-copy, wav-to-duration\n";
51 
52  ParseOptions po(usage);
53  BaseFloat min_segment_length = 0.1, // Minimum segment length in seconds.
54  max_overshoot = 0.5; // max time by which last segment can overshoot
55  po.Register("min-segment-length", &min_segment_length,
56  "Minimum segment length in seconds (reject shorter segments)");
57  po.Register("max-overshoot", &max_overshoot,
58  "End segments overshooting audio by less than this (in seconds) "
59  "are truncated, else rejected.");
60 
61  po.Read(argc, argv);
62  if (po.NumArgs() != 3) {
63  po.PrintUsage();
64  exit(1);
65  }
66 
67  std::string wav_rspecifier = po.GetArg(1);
68  std::string segments_rxfilename = po.GetArg(2);
69  std::string wav_wspecifier = po.GetArg(3);
70 
71  RandomAccessTableReader<WaveHolder> reader(wav_rspecifier);
72  TableWriter<WaveHolder> writer(wav_wspecifier);
73  Input ki(segments_rxfilename); // no binary argment: never binary.
74 
75  int32 num_lines = 0, num_success = 0;
76 
77  std::string line;
78  /* read each line from segments file */
79  while (std::getline(ki.Stream(), line)) {
80  num_lines++;
81  std::vector<std::string> split_line;
82  // Split the line by space or tab and check the number of fields in each
83  // line. There must be 4 fields--segment name , reacording wav file name,
84  // start time, end time; 5th field (channel info) is optional.
85  SplitStringToVector(line, " \t\r", true, &split_line);
86  if (split_line.size() != 4 && split_line.size() != 5) {
87  KALDI_WARN << "Invalid line in segments file: " << line;
88  continue;
89  }
90  std::string segment = split_line[0],
91  recording = split_line[1],
92  start_str = split_line[2],
93  end_str = split_line[3];
94 
95  // Convert the start time and endtime to real from string. Segment is
96  // ignored if start or end time cannot be converted to real.
97  double start, end;
98  if (!ConvertStringToReal(start_str, &start)) {
99  KALDI_WARN << "Invalid line in segments file [bad start]: " << line;
100  continue;
101  }
102  if (!ConvertStringToReal(end_str, &end)) {
103  KALDI_WARN << "Invalid line in segments file [bad end]: " << line;
104  continue;
105  }
106  // start time must not be negative; start time must not be greater than
107  // end time, except if end time is -1
108  if (start < 0 || (end != -1.0 && end <= 0) || ((start >= end) && (end > 0))) {
109  KALDI_WARN << "Invalid line in segments file [empty or invalid segment]: "
110  << line;
111  continue;
112  }
113  int32 channel = -1; // means channel info is unspecified.
114  // if each line has 5 elements then 5th element must be channel identifier
115  if (split_line.size() == 5) {
116  if (!ConvertStringToInteger(split_line[4], &channel) || channel < 0) {
117  KALDI_WARN << "Invalid line in segments file [bad channel]: " << line;
118  continue;
119  }
120  }
121  /* check whether a segment start time and end time exists in recording
122  * if fails , skips the segment.
123  */
124  if (!reader.HasKey(recording)) {
125  KALDI_WARN << "Could not find recording " << recording
126  << ", skipping segment " << segment;
127  continue;
128  }
129 
130  const WaveData &wave = reader.Value(recording);
131  const Matrix<BaseFloat> &wave_data = wave.Data();
132  BaseFloat samp_freq = wave.SampFreq(); // read sampling fequency
133  int32 num_samp = wave_data.NumCols(), // number of samples in recording
134  num_chan = wave_data.NumRows(); // number of channels in recording
135 
136  // Convert starting time of the segment to corresponding sample number.
137  // If end time is -1 then use the whole file starting from start time.
138  int32 start_samp = start * samp_freq,
139  end_samp = (end != -1)? (end * samp_freq) : num_samp;
140  KALDI_ASSERT(start_samp >= 0 && end_samp > 0 && "Invalid start or end.");
141 
142  // start sample must be less than total number of samples,
143  // otherwise skip the segment
144  if (start_samp < 0 || start_samp >= num_samp) {
145  KALDI_WARN << "Start sample out of range " << start_samp << " [length:] "
146  << num_samp << ", skipping segment " << segment;
147  continue;
148  }
149  /* end sample must be less than total number samples
150  * otherwise skip the segment
151  */
152  if (end_samp > num_samp) {
153  if ((end_samp >=
154  num_samp + static_cast<int32>(max_overshoot * samp_freq))) {
155  KALDI_WARN << "End sample too far out of range " << end_samp
156  << " [length:] " << num_samp << ", skipping segment "
157  << segment;
158  continue;
159  }
160  end_samp = num_samp; // for small differences, just truncate.
161  }
162  // Skip if segment size is less than minimum segment length (default 0.1s)
163  if (end_samp <=
164  start_samp + static_cast<int32>(min_segment_length * samp_freq)) {
165  KALDI_WARN << "Segment " << segment << " too short, skipping it.";
166  continue;
167  }
168  /* check whether the wav file has more than one channel
169  * if yes, specify the channel info in segments file
170  * otherwise skips the segment
171  */
172  if (channel == -1) {
173  if (num_chan == 1) channel = 0;
174  else {
175  KALDI_ERR << "If your data has multiple channels, you must specify the"
176  " channel in the segments file. Processing segment " << segment;
177  }
178  } else {
179  if (channel >= num_chan) {
180  KALDI_WARN << "Invalid channel " << channel << " >= " << num_chan
181  << ", processing segment " << segment;
182  continue;
183  }
184  }
185  /*
186  * This function return a portion of a wav data from the orignial wav data matrix
187  */
188  SubMatrix<BaseFloat> segment_matrix(wave_data, channel, 1, start_samp, end_samp-start_samp);
189  WaveData segment_wave(samp_freq, segment_matrix);
190  writer.Write(segment, segment_wave); // write segment in wave format.
191  num_success++;
192  }
193  KALDI_LOG << "Successfully processed " << num_success << " lines out of "
194  << num_lines << " in the segments file. ";
195  /* prints number of segments processed */
196  return 0;
197  } catch(const std::exception &e) {
198  std::cerr << e.what();
199  return -1;
200  }
201 }
Relabels neural network egs with the read pdf-id alignments.
Definition: chain.dox:20
bool ConvertStringToInteger(const std::string &str, Int *out)
Converts a string into an integer via strtoll and returns false if there was any kind of problem (i...
Definition: text-utils.h:118
MatrixIndexT NumCols() const
Returns number of columns (or zero for emtpy matrix).
Definition: kaldi-matrix.h:64
A templated class for writing objects to an archive or script file; see The Table concept...
Definition: kaldi-table.h:368
BaseFloat SampFreq() const
Definition: wave-reader.h:126
const Matrix< BaseFloat > & Data() const
Definition: wave-reader.h:124
Allows random access to a collection of objects in an archive or script file; see The Table concept...
Definition: kaldi-table.h:233
float BaseFloat
Definition: kaldi-types.h:29
The class ParseOptions is for parsing command-line options; see Parsing command-line options for more...
Definition: parse-options.h:36
void SplitStringToVector(const std::string &full, const char *delim, bool omit_empty_strings, std::vector< std::string > *out)
Split a string using any of the single character delimiters.
Definition: text-utils.cc:63
#define KALDI_ERR
Definition: kaldi-error.h:127
bool ConvertStringToReal(const std::string &str, T *out)
ConvertStringToReal converts a string into either float or double and returns false if there was any ...
Definition: text-utils.cc:238
#define KALDI_WARN
Definition: kaldi-error.h:130
This class&#39;s purpose is to read in Wave files.
Definition: wave-reader.h:106
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:169
MatrixIndexT NumRows() const
Returns number of rows (or zero for emtpy matrix).
Definition: kaldi-matrix.h:61
#define KALDI_LOG
Definition: kaldi-error.h:133
Sub-matrix representation.
Definition: kaldi-matrix.h:942