"Classes for opening streams"

This group contains the Input and Output classes, which are provided to open streams for reading and writing in Kaldi code; for an explanation of how this fits into the bigger picture of Kaldi I/O, see How to open files in Kaldi. More...

Classes

class  Output
 
class  Input
 

Enumerations

enum  OutputType { kNoOutput, kFileOutput, kStandardOutput, kPipeOutput }
 
enum  InputType {
  kNoInput, kFileInput, kStandardInput, kOffsetFileInput,
  kPipeInput
}
 

Functions

OutputType ClassifyWxfilename (const std::string &wxfilename)
 ClassifyWxfilename interprets filenames as follows: More...
 
InputType ClassifyRxfilename (const std::string &rxfilename)
 ClassifyRxfilenames interprets filenames for reading as follows: More...
 
template<class C >
void ReadKaldiObject (const std::string &filename, C *c)
 
template<>
void ReadKaldiObject (const std::string &filename, Matrix< float > *m)
 
template<>
void ReadKaldiObject (const std::string &filename, Matrix< double > *m)
 
template<class C >
void WriteKaldiObject (const C &c, const std::string &filename, bool binary)
 
std::string PrintableRxfilename (const std::string &rxfilename)
 PrintableRxfilename turns the rxfilename into a more human-readable form for error reporting, i.e. More...
 
std::string PrintableWxfilename (const std::string &wxfilename)
 PrintableWxfilename turns the wxfilename into a more human-readable form for error reporting, i.e. More...
 

Detailed Description

This group contains the Input and Output classes, which are provided to open streams for reading and writing in Kaldi code; for an explanation of how this fits into the bigger picture of Kaldi I/O, see How to open files in Kaldi.

Enumeration Type Documentation

◆ InputType

enum InputType
Enumerator
kNoInput 
kFileInput 
kStandardInput 
kOffsetFileInput 
kPipeInput 

Definition at line 105 of file kaldi-io.h.

◆ OutputType

enum OutputType
Enumerator
kNoOutput 
kFileOutput 
kStandardOutput 
kPipeOutput 

Definition at line 89 of file kaldi-io.h.

Function Documentation

◆ ClassifyRxfilename()

InputType ClassifyRxfilename ( const std::string &  rxfilename)

ClassifyRxfilenames interprets filenames for reading as follows:

  • kNoInput: invalid filenames (leading or trailing space, things that look like wspecifiers and rspecifiers or pipes to write to with trailing |.
  • kFileInput: normal filenames
  • kStandardInput: the empty string or "-"
  • kPipeInput: e.g. "gunzip -c /tmp/abc.gz |"
  • kOffsetFileInput: offsets into files, e.g. /some/filename:12970

Definition at line 138 of file kaldi-io.cc.

References kaldi::ClassifyRspecifier(), kaldi::ClassifyWspecifier(), rnnlm::d, KALDI_WARN, kaldi::kFileInput, kaldi::kNoInput, kaldi::kNoRspecifier, kaldi::kNoWspecifier, kaldi::kOffsetFileInput, kaldi::kPipeInput, and kaldi::kStandardInput.

Referenced by Input::OpenInternal(), and kaldi::UnitTestClassifyRxfilename().

138  {
139  const char *c = filename.c_str();
140  size_t length = filename.length();
141  char first_char = c[0],
142  last_char = (length == 0 ? '\0' : c[filename.length()-1]);
143 
144  // if 'filename' is "" or "-", return kStandardInput.
145  if (length == 0 || (length == 1 && first_char == '-')) {
146  return kStandardInput;
147  } else if (first_char == '|') {
148  return kNoInput; // An output pipe like "|blah": not
149  // valid for input.
150  } else if (last_char == '|') {
151  return kPipeInput;
152  } else if (isspace(first_char) || isspace(last_char)) {
153  return kNoInput; // We don't allow leading or trailing space in a filename.
154  } else if ((first_char == 'a' || first_char == 's') &&
155  strchr(c, ':') != NULL &&
156  (ClassifyWspecifier(filename, NULL, NULL, NULL) != kNoWspecifier ||
157  ClassifyRspecifier(filename, NULL, NULL) != kNoRspecifier)) {
158  // e.g. ark:something or scp:something... this is almost certainly a
159  // scripting error, so call it an error rather than treating it as a file.
160  // In practice in modern kaldi scripts all (r,w)filenames begin with "ark"
161  // or "scp", even though technically speaking options like "b", "t", "s" or
162  // "cs" can appear before the ark or scp, like "b,ark". For efficiency,
163  // and because this code is really just a nicety to catch errors earlier
164  // than they would otherwise be caught, we only call those extra functions
165  // for filenames beginning with 'a' or 's'.
166  return kNoInput;
167  } else if (isdigit(last_char)) {
168  const char *d = c + length - 1;
169  while (isdigit(*d) && d > c) d--;
170  if (*d == ':') return kOffsetFileInput; // Filename is like
171  // some_file:12345
172  // otherwise it could still be a filename; continue to the next check.
173  }
174 
175 
176  // At this point it matched no other pattern so we assume a filename, but
177  // we check for '|' as it's a common source of errors to have pipe
178  // commands without the pipe in the right place. Say that it can't be
179  // classified in this case.
180  if (strchr(c, '|') != NULL) {
181  KALDI_WARN << "Trying to classify rxfilename with pipe symbol in the"
182  " wrong place (pipe without | at the end?): " << filename;
183  return kNoInput;
184  }
185  return kFileInput; // It matched no other pattern: assume it's a filename.
186 }
RspecifierType ClassifyRspecifier(const std::string &rspecifier, std::string *rxfilename, RspecifierOptions *opts)
Definition: kaldi-table.cc:225
#define KALDI_WARN
Definition: kaldi-error.h:150
WspecifierType ClassifyWspecifier(const std::string &wspecifier, std::string *archive_wxfilename, std::string *script_wxfilename, WspecifierOptions *opts)
Definition: kaldi-table.cc:135

◆ ClassifyWxfilename()

OutputType ClassifyWxfilename ( const std::string &  wxfilename)

ClassifyWxfilename interprets filenames as follows:

  • kNoOutput: invalid filenames (leading or trailing space, things that look like wspecifiers and rspecifiers or like pipes to read from with leading |.
  • kFileOutput: Normal filenames
  • kStandardOutput: The empty string or "-", interpreted as standard output
  • kPipeOutput: pipes, e.g. "| gzip -c > /tmp/abc.gz"

Definition at line 85 of file kaldi-io.cc.

References kaldi::ClassifyRspecifier(), kaldi::ClassifyWspecifier(), rnnlm::d, KALDI_WARN, kaldi::kFileOutput, kaldi::kNoOutput, kaldi::kNoRspecifier, kaldi::kNoWspecifier, kaldi::kPipeOutput, and kaldi::kStandardOutput.

Referenced by Output::Open(), TableWriterBothImpl< Holder >::Open(), kaldi::UnitTestClassifyWxfilename(), and Output::~Output().

85  {
86  const char *c = filename.c_str();
87  size_t length = filename.length();
88  char first_char = c[0],
89  last_char = (length == 0 ? '\0' : c[filename.length()-1]);
90 
91  // if 'filename' is "" or "-", return kStandardOutput.
92  if (length == 0 || (length == 1 && first_char == '-'))
93  return kStandardOutput;
94  else if (first_char == '|') return kPipeOutput; // An output pipe like "|blah".
95  else if (isspace(first_char) || isspace(last_char) || last_char == '|') {
96  return kNoOutput; // Leading or trailing space: can't interpret this.
97  // Final '|' would represent an input pipe, not an
98  // output pipe.
99  } else if ((first_char == 'a' || first_char == 's') &&
100  strchr(c, ':') != NULL &&
101  (ClassifyWspecifier(filename, NULL, NULL, NULL) != kNoWspecifier ||
102  ClassifyRspecifier(filename, NULL, NULL) != kNoRspecifier)) {
103  // e.g. ark:something or scp:something... this is almost certainly a
104  // scripting error, so call it an error rather than treating it as a file.
105  // In practice in modern kaldi scripts all (r,w)filenames begin with "ark"
106  // or "scp", even though technically speaking options like "b", "t", "s" or
107  // "cs" can appear before the ark or scp, like "b,ark". For efficiency,
108  // and because this code is really just a nicety to catch errors earlier
109  // than they would otherwise be caught, we only call those extra functions
110  // for filenames beginning with 'a' or 's'.
111  return kNoOutput;
112  } else if (isdigit(last_char)) {
113  // This could be a file, but we have to see if it's an offset into a file
114  // (like foo.ark:4314328), which is not allowed for writing (but is
115  // allowed for reaching). This eliminates some things which would be
116  // valid UNIX filenames but are not allowed by Kaldi. (Even if we allowed
117  // such filenames for writing, we woudln't be able to correctly read them).
118  const char *d = c + length - 1;
119  while (isdigit(*d) && d > c) d--;
120  if (*d == ':') return kNoOutput;
121  // else it could still be a filename; continue to the next check.
122  }
123 
124  // At this point it matched no other pattern so we assume a filename, but we
125  // check for internal '|' as it's a common source of errors to have pipe
126  // commands without the pipe in the right place. Say that it can't be
127  // classified.
128  if (strchr(c, '|') != NULL) {
129  KALDI_WARN << "Trying to classify wxfilename with pipe symbol in the"
130  " wrong place (pipe without | at the beginning?): " <<
131  filename;
132  return kNoOutput;
133  }
134  return kFileOutput; // It matched no other pattern: assume it's a filename.
135 }
RspecifierType ClassifyRspecifier(const std::string &rspecifier, std::string *rxfilename, RspecifierOptions *opts)
Definition: kaldi-table.cc:225
#define KALDI_WARN
Definition: kaldi-error.h:150
WspecifierType ClassifyWspecifier(const std::string &wspecifier, std::string *archive_wxfilename, std::string *script_wxfilename, WspecifierOptions *opts)
Definition: kaldi-table.cc:135

◆ PrintableRxfilename()

std::string PrintableRxfilename ( const std::string &  rxfilename)

PrintableRxfilename turns the rxfilename into a more human-readable form for error reporting, i.e.

it does quoting and escaping and replaces "" or "-" with "standard input".

Definition at line 61 of file kaldi-io.cc.

References ParseOptions::Escape().

Referenced by SequentialTableReaderArchiveImpl< Holder >::Close(), SequentialTableReaderScriptImpl< Holder >::EnsureObjectLoaded(), RandomAccessTableReaderSortedArchiveImpl< Holder >::FindKeyInternal(), kaldi::GetUtterancePairs(), RandomAccessTableReaderMapped< Holder >::HasKey(), RandomAccessTableReaderScriptImpl< Holder >::HasKeyInternal(), Input::Input(), main(), SequentialTableReaderArchiveImpl< Holder >::Next(), SequentialTableReaderScriptImpl< Holder >::Open(), PipeInputImpl::Open(), SequentialTableReaderArchiveImpl< Holder >::Open(), TableWriterScriptImpl< Holder >::Open(), RandomAccessTableReaderScriptImpl< Holder >::Open(), RandomAccessTableReaderArchiveImplBase< Holder >::Open(), Input::OpenInternal(), fst::ReadFstKaldi(), fst::ReadFstKaldiGeneric(), RandomAccessTableReaderArchiveImplBase< Holder >::ReadNextObject(), kaldi::ReadPhoneMap(), kaldi::ReadScriptFile(), kaldi::ReadSharedPhonesList(), kaldi::ReadSymbolList(), SequentialTableReaderScriptImpl< Holder >::Value(), RandomAccessTableReaderMapped< Holder >::Value(), RandomAccessTableReaderDSortedArchiveImpl< Holder >::Value(), RandomAccessTableReaderSortedArchiveImpl< Holder >::Value(), RandomAccessTableReaderUnsortedArchiveImpl< Holder >::Value(), TableWriterScriptImpl< Holder >::Write(), kaldi::WriteKaldiObject(), SequentialTableReaderArchiveImpl< Holder >::~SequentialTableReaderArchiveImpl(), and SequentialTableReaderScriptImpl< Holder >::~SequentialTableReaderScriptImpl().

61  {
62  if (rxfilename == "" || rxfilename == "-") {
63  return "standard input";
64  } else {
65  // If this call to Escape later causes compilation issues,
66  // just replace it with "return rxfilename"; it's only a
67  // pretty-printing issue.
68  return ParseOptions::Escape(rxfilename);
69  }
70 }

◆ PrintableWxfilename()

std::string PrintableWxfilename ( const std::string &  wxfilename)

PrintableWxfilename turns the wxfilename into a more human-readable form for error reporting, i.e.

it does quoting and escaping and replaces "" or "-" with "standard output".

Definition at line 73 of file kaldi-io.cc.

References ParseOptions::Escape().

Referenced by main(), Output::Open(), Output::Output(), kaldi::TypeThreeUsage(), kaldi::TypeTwoUsage(), TableWriterArchiveImpl< Holder >::Write(), TableWriterScriptImpl< Holder >::Write(), TableWriterBothImpl< Holder >::Write(), fst::WriteFstKaldi(), kaldi::WriteKaldiObject(), kaldi::WriteScriptFile(), Output::~Output(), and PipeOutputImpl::~PipeOutputImpl().

73  {
74  if (wxfilename == "" || wxfilename == "-") {
75  return "standard output";
76  } else {
77  // If this call to Escape later causes compilation issues,
78  // just replace it with "return wxfilename"; it's only a
79  // pretty-printing issue.
80  return ParseOptions::Escape(wxfilename);
81  }
82 }

◆ ReadKaldiObject() [1/3]

void kaldi::ReadKaldiObject ( const std::string &  filename,
C *  c 
)

Definition at line 239 of file kaldi-io.h.

References kaldi::ReadKaldiObject(), and Input::Stream().

240  {
241  bool binary_in;
242  Input ki(filename, &binary_in);
243  c->Read(ki.Stream(), binary_in);
244 }

◆ ReadKaldiObject() [2/3]

void ReadKaldiObject ( const std::string &  filename,
Matrix< float > *  m 
)

Definition at line 832 of file kaldi-io.cc.

References kaldi::ExtractObjectRange(), kaldi::ExtractRangeSpecifier(), KALDI_ERR, Matrix< Real >::Read(), and Input::Stream().

Referenced by ComputeLogPosteriors(), ComputeScores(), ConvolutionComponent::Init(), OnlineIvectorExtractionInfo::Init(), AffineComponent::Init(), AffineComponentPreconditioned::Init(), AffineComponentPreconditionedOnline::Init(), PerElementScaleComponent::Init(), Convolutional1dComponent::Init(), NaturalGradientAffineComponent::InitFromConfig(), LinearComponent::InitFromConfig(), FixedScaleComponent::InitFromConfig(), FixedBiasComponent::InitFromConfig(), PerElementOffsetComponent::InitFromConfig(), FixedScaleComponent::InitFromString(), FixedBiasComponent::InitFromString(), main(), OnlineFeaturePipeline::OnlineFeaturePipeline(), OnlineNnet2FeaturePipeline::OnlineNnet2FeaturePipeline(), kaldi::ReadKaldiObject(), kaldi::ReadModels(), kaldi::RunPerSpeaker(), and kaldi::TypeThreeUsage().

833  {
834  if (!filename.empty() && filename[filename.size() - 1] == ']') {
835  // This filename seems to have a 'range'... like foo.ark:4312423[20:30].
836  // (the bit in square brackets is the range).
837  std::string rxfilename, range;
838  if (!ExtractRangeSpecifier(filename, &rxfilename, &range)) {
839  KALDI_ERR << "Could not make sense of possible range specifier in filename "
840  << "while reading matrix: " << filename;
841  }
842  Matrix<float> temp;
843  bool binary_in;
844  Input ki(rxfilename, &binary_in);
845  temp.Read(ki.Stream(), binary_in);
846  if (!ExtractObjectRange(temp, range, m)) {
847  KALDI_ERR << "Error extracting range of object: " << filename;
848  }
849  } else {
850  // The normal case, there is no range.
851  bool binary_in;
852  Input ki(filename, &binary_in);
853  m->Read(ki.Stream(), binary_in);
854  }
855 }
bool ExtractObjectRange(const GeneralMatrix &input, const std::string &range, GeneralMatrix *output)
GeneralMatrix is always of type BaseFloat.
Definition: kaldi-holder.cc:88
#define KALDI_ERR
Definition: kaldi-error.h:147
bool ExtractRangeSpecifier(const std::string &rxfilename_with_range, std::string *data_rxfilename, std::string *range)

◆ ReadKaldiObject() [3/3]

void ReadKaldiObject ( const std::string &  filename,
Matrix< double > *  m 
)

Definition at line 857 of file kaldi-io.cc.

References kaldi::ExtractObjectRange(), kaldi::ExtractRangeSpecifier(), KALDI_ERR, Matrix< Real >::Read(), and Input::Stream().

858  {
859  if (!filename.empty() && filename[filename.size() - 1] == ']') {
860  // This filename seems to have a 'range'... like foo.ark:4312423[20:30].
861  // (the bit in square brackets is the range).
862  std::string rxfilename, range;
863  if (!ExtractRangeSpecifier(filename, &rxfilename, &range)) {
864  KALDI_ERR << "Could not make sense of possible range specifier in filename "
865  << "while reading matrix: " << filename;
866  }
867  Matrix<double> temp;
868  bool binary_in;
869  Input ki(rxfilename, &binary_in);
870  temp.Read(ki.Stream(), binary_in);
871  if (!ExtractObjectRange(temp, range, m)) {
872  KALDI_ERR << "Error extracting range of object: " << filename;
873  }
874  } else {
875  // The normal case, there is no range.
876  bool binary_in;
877  Input ki(filename, &binary_in);
878  m->Read(ki.Stream(), binary_in);
879  }
880 }
bool ExtractObjectRange(const GeneralMatrix &input, const std::string &range, GeneralMatrix *output)
GeneralMatrix is always of type BaseFloat.
Definition: kaldi-holder.cc:88
#define KALDI_ERR
Definition: kaldi-error.h:147
bool ExtractRangeSpecifier(const std::string &rxfilename_with_range, std::string *data_rxfilename, std::string *range)

◆ WriteKaldiObject()