All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
"Classes for opening streams"

This group contains the Input and Output classes, which are provided to open streams for reading and writing in Kaldi code; for an explanation of how this fits into the bigger picture of Kaldi I/O, see How to open files in Kaldi. More...

Classes

class  Output
 
class  Input
 

Enumerations

enum  OutputType { kNoOutput, kFileOutput, kStandardOutput, kPipeOutput }
 
enum  InputType {
  kNoInput, kFileInput, kStandardInput, kOffsetFileInput,
  kPipeInput
}
 

Functions

OutputType ClassifyWxfilename (const std::string &wxfilename)
 ClassifyWxfilename interprets filenames as follows: More...
 
InputType ClassifyRxfilename (const std::string &rxfilename)
 ClassifyRxfilenames interprets filenames for reading as follows: More...
 
template<class C >
void ReadKaldiObject (const std::string &filename, C *c)
 
template<>
void ReadKaldiObject (const std::string &filename, Matrix< float > *m)
 
template<>
void ReadKaldiObject (const std::string &filename, Matrix< double > *m)
 
template<class C >
void WriteKaldiObject (const C &c, const std::string &filename, bool binary)
 
std::string PrintableRxfilename (const std::string &rxfilename)
 PrintableRxfilename turns the rxfilename into a more human-readable form for error reporting, i.e. More...
 
std::string PrintableWxfilename (const std::string &wxfilename)
 PrintableWxfilename turns the wxfilename into a more human-readable form for error reporting, i.e. More...
 

Detailed Description

This group contains the Input and Output classes, which are provided to open streams for reading and writing in Kaldi code; for an explanation of how this fits into the bigger picture of Kaldi I/O, see How to open files in Kaldi.

Enumeration Type Documentation

◆ InputType

enum InputType
Enumerator
kNoInput 
kFileInput 
kStandardInput 
kOffsetFileInput 
kPipeInput 

Definition at line 105 of file kaldi-io.h.

◆ OutputType

enum OutputType
Enumerator
kNoOutput 
kFileOutput 
kStandardOutput 
kPipeOutput 

Definition at line 89 of file kaldi-io.h.

Function Documentation

◆ ClassifyRxfilename()

InputType ClassifyRxfilename ( const std::string &  rxfilename)

ClassifyRxfilenames interprets filenames for reading as follows:

  • kNoInput: invalid filenames (leading or trailing space, things that look like wspecifiers and rspecifiers or pipes to write to with trailing |.
  • kFileInput: normal filenames
  • kStandardInput: the empty string or "-"
  • kPipeInput: e.g. "gunzip -c /tmp/abc.gz |"
  • kOffsetFileInput: offsets into files, e.g. /some/filename:12970

Definition at line 138 of file kaldi-io.cc.

References kaldi::ClassifyRspecifier(), kaldi::ClassifyWspecifier(), rnnlm::d, KALDI_WARN, kaldi::kFileInput, kaldi::kNoInput, kaldi::kNoRspecifier, kaldi::kNoWspecifier, kaldi::kOffsetFileInput, kaldi::kPipeInput, and kaldi::kStandardInput.

Referenced by Input::OpenInternal(), and kaldi::UnitTestClassifyRxfilename().

138  {
139  const char *c = filename.c_str();
140  size_t length = filename.length();
141  char first_char = c[0],
142  last_char = (length == 0 ? '\0' : c[filename.length()-1]);
143 
144  // if 'filename' is "" or "-", return kStandardInput.
145  if (length == 0 || (length == 1 && first_char == '-')) {
146  return kStandardInput;
147  } else if (first_char == '|') {
148  return kNoInput; // An output pipe like "|blah": not
149  // valid for input.
150  } else if (last_char == '|') {
151  return kPipeInput;
152  } else if (isspace(first_char) || isspace(last_char)) {
153  return kNoInput; // We don't allow leading or trailing space in a filename.
154  } else if ((first_char == 'a' || first_char == 's') &&
155  strchr(c, ':') != NULL &&
156  (ClassifyWspecifier(filename, NULL, NULL, NULL) != kNoWspecifier ||
157  ClassifyRspecifier(filename, NULL, NULL) != kNoRspecifier)) {
158  // e.g. ark:something or scp:something... this is almost certainly a
159  // scripting error, so call it an error rather than treating it as a file.
160  // In practice in modern kaldi scripts all (r,w)filenames begin with "ark"
161  // or "scp", even though technically speaking options like "b", "t", "s" or
162  // "cs" can appear before the ark or scp, like "b,ark". For efficiency,
163  // and because this code is really just a nicety to catch errors earlier
164  // than they would otherwise be caught, we only call those extra functions
165  // for filenames beginning with 'a' or 's'.
166  return kNoInput;
167  } else if (isdigit(last_char)) {
168  const char *d = c + length - 1;
169  while (isdigit(*d) && d > c) d--;
170  if (*d == ':') return kOffsetFileInput; // Filename is like
171  // some_file:12345
172  // otherwise it could still be a filename; continue to the next check.
173  }
174 
175 
176  // At this point it matched no other pattern so we assume a filename, but
177  // we check for '|' as it's a common source of errors to have pipe
178  // commands without the pipe in the right place. Say that it can't be
179  // classified in this case.
180  if (strchr(c, '|') != NULL) {
181  KALDI_WARN << "Trying to classify rxfilename with pipe symbol in the"
182  " wrong place (pipe without | at the end?): " << filename;
183  return kNoInput;
184  }
185  return kFileInput; // It matched no other pattern: assume it's a filename.
186 }
RspecifierType ClassifyRspecifier(const std::string &rspecifier, std::string *rxfilename, RspecifierOptions *opts)
Definition: kaldi-table.cc:225
#define KALDI_WARN
Definition: kaldi-error.h:150
WspecifierType ClassifyWspecifier(const std::string &wspecifier, std::string *archive_wxfilename, std::string *script_wxfilename, WspecifierOptions *opts)
Definition: kaldi-table.cc:135

◆ ClassifyWxfilename()

OutputType ClassifyWxfilename ( const std::string &  wxfilename)

ClassifyWxfilename interprets filenames as follows:

  • kNoOutput: invalid filenames (leading or trailing space, things that look like wspecifiers and rspecifiers or like pipes to read from with leading |.
  • kFileOutput: Normal filenames
  • kStandardOutput: The empty string or "-", interpreted as standard output
  • kPipeOutput: pipes, e.g. "| gzip -c > /tmp/abc.gz"

Definition at line 85 of file kaldi-io.cc.

References kaldi::ClassifyRspecifier(), kaldi::ClassifyWspecifier(), rnnlm::d, KALDI_WARN, kaldi::kFileOutput, kaldi::kNoOutput, kaldi::kNoRspecifier, kaldi::kNoWspecifier, kaldi::kPipeOutput, and kaldi::kStandardOutput.

Referenced by Output::Open(), TableWriterBothImpl< Holder >::Open(), kaldi::UnitTestClassifyWxfilename(), and Output::~Output().

85  {
86  const char *c = filename.c_str();
87  size_t length = filename.length();
88  char first_char = c[0],
89  last_char = (length == 0 ? '\0' : c[filename.length()-1]);
90 
91  // if 'filename' is "" or "-", return kStandardOutput.
92  if (length == 0 || (length == 1 && first_char == '-'))
93  return kStandardOutput;
94  else if (first_char == '|') return kPipeOutput; // An output pipe like "|blah".
95  else if (isspace(first_char) || isspace(last_char) || last_char == '|') {
96  return kNoOutput; // Leading or trailing space: can't interpret this.
97  // Final '|' would represent an input pipe, not an
98  // output pipe.
99  } else if ((first_char == 'a' || first_char == 's') &&
100  strchr(c, ':') != NULL &&
101  (ClassifyWspecifier(filename, NULL, NULL, NULL) != kNoWspecifier ||
102  ClassifyRspecifier(filename, NULL, NULL) != kNoRspecifier)) {
103  // e.g. ark:something or scp:something... this is almost certainly a
104  // scripting error, so call it an error rather than treating it as a file.
105  // In practice in modern kaldi scripts all (r,w)filenames begin with "ark"
106  // or "scp", even though technically speaking options like "b", "t", "s" or
107  // "cs" can appear before the ark or scp, like "b,ark". For efficiency,
108  // and because this code is really just a nicety to catch errors earlier
109  // than they would otherwise be caught, we only call those extra functions
110  // for filenames beginning with 'a' or 's'.
111  return kNoOutput;
112  } else if (isdigit(last_char)) {
113  // This could be a file, but we have to see if it's an offset into a file
114  // (like foo.ark:4314328), which is not allowed for writing (but is
115  // allowed for reaching). This eliminates some things which would be
116  // valid UNIX filenames but are not allowed by Kaldi. (Even if we allowed
117  // such filenames for writing, we woudln't be able to correctly read them).
118  const char *d = c + length - 1;
119  while (isdigit(*d) && d > c) d--;
120  if (*d == ':') return kNoOutput;
121  // else it could still be a filename; continue to the next check.
122  }
123 
124  // At this point it matched no other pattern so we assume a filename, but we
125  // check for internal '|' as it's a common source of errors to have pipe
126  // commands without the pipe in the right place. Say that it can't be
127  // classified.
128  if (strchr(c, '|') != NULL) {
129  KALDI_WARN << "Trying to classify wxfilename with pipe symbol in the"
130  " wrong place (pipe without | at the beginning?): " <<
131  filename;
132  return kNoOutput;
133  }
134  return kFileOutput; // It matched no other pattern: assume it's a filename.
135 }
RspecifierType ClassifyRspecifier(const std::string &rspecifier, std::string *rxfilename, RspecifierOptions *opts)
Definition: kaldi-table.cc:225
#define KALDI_WARN
Definition: kaldi-error.h:150
WspecifierType ClassifyWspecifier(const std::string &wspecifier, std::string *archive_wxfilename, std::string *script_wxfilename, WspecifierOptions *opts)
Definition: kaldi-table.cc:135

◆ PrintableRxfilename()

std::string PrintableRxfilename ( const std::string &  rxfilename)

PrintableRxfilename turns the rxfilename into a more human-readable form for error reporting, i.e.

it does quoting and escaping and replaces "" or "-" with "standard input".

Definition at line 61 of file kaldi-io.cc.

References ParseOptions::Escape().

Referenced by SequentialTableReaderArchiveImpl< Holder >::Close(), SequentialTableReaderScriptImpl< Holder >::EnsureObjectLoaded(), RandomAccessTableReaderSortedArchiveImpl< Holder >::FindKeyInternal(), kaldi::GetUtterancePairs(), RandomAccessTableReaderMapped< Holder >::HasKey(), RandomAccessTableReaderScriptImpl< Holder >::HasKeyInternal(), Input::Input(), main(), SequentialTableReaderArchiveImpl< Holder >::Next(), SequentialTableReaderScriptImpl< Holder >::Open(), PipeInputImpl::Open(), SequentialTableReaderArchiveImpl< Holder >::Open(), TableWriterScriptImpl< Holder >::Open(), RandomAccessTableReaderScriptImpl< Holder >::Open(), RandomAccessTableReaderArchiveImplBase< Holder >::Open(), Input::OpenInternal(), fst::ReadFstKaldi(), fst::ReadFstKaldiGeneric(), RandomAccessTableReaderArchiveImplBase< Holder >::ReadNextObject(), kaldi::ReadPhoneMap(), kaldi::ReadScriptFile(), kaldi::ReadSharedPhonesList(), kaldi::ReadSymbolList(), SequentialTableReaderScriptImpl< Holder >::Value(), RandomAccessTableReaderMapped< Holder >::Value(), RandomAccessTableReaderDSortedArchiveImpl< Holder >::Value(), RandomAccessTableReaderSortedArchiveImpl< Holder >::Value(), RandomAccessTableReaderUnsortedArchiveImpl< Holder >::Value(), TableWriterScriptImpl< Holder >::Write(), kaldi::WriteKaldiObject(), SequentialTableReaderArchiveImpl< Holder >::~SequentialTableReaderArchiveImpl(), and SequentialTableReaderScriptImpl< Holder >::~SequentialTableReaderScriptImpl().

61  {
62  if (rxfilename == "" || rxfilename == "-") {
63  return "standard input";
64  } else {
65  // If this call to Escape later causes compilation issues,
66  // just replace it with "return rxfilename"; it's only a
67  // pretty-printing issue.
68  return ParseOptions::Escape(rxfilename);
69  }
70 }

◆ PrintableWxfilename()

std::string PrintableWxfilename ( const std::string &  wxfilename)

PrintableWxfilename turns the wxfilename into a more human-readable form for error reporting, i.e.

it does quoting and escaping and replaces "" or "-" with "standard output".

Definition at line 73 of file kaldi-io.cc.

References ParseOptions::Escape().

Referenced by main(), Output::Open(), Output::Output(), kaldi::TypeThreeUsage(), kaldi::TypeTwoUsage(), TableWriterArchiveImpl< Holder >::Write(), TableWriterScriptImpl< Holder >::Write(), TableWriterBothImpl< Holder >::Write(), fst::WriteFstKaldi(), kaldi::WriteKaldiObject(), kaldi::WriteScriptFile(), Output::~Output(), and PipeOutputImpl::~PipeOutputImpl().

73  {
74  if (wxfilename == "" || wxfilename == "-") {
75  return "standard output";
76  } else {
77  // If this call to Escape later causes compilation issues,
78  // just replace it with "return wxfilename"; it's only a
79  // pretty-printing issue.
80  return ParseOptions::Escape(wxfilename);
81  }
82 }

◆ ReadKaldiObject() [1/3]

void kaldi::ReadKaldiObject ( const std::string &  filename,
C *  c 
)

Definition at line 239 of file kaldi-io.h.

References kaldi::ReadKaldiObject(), and Input::Stream().

240  {
241  bool binary_in;
242  Input ki(filename, &binary_in);
243  c->Read(ki.Stream(), binary_in);
244 }

◆ ReadKaldiObject() [2/3]

void ReadKaldiObject ( const std::string &  filename,
Matrix< float > *  m 
)

Definition at line 832 of file kaldi-io.cc.

References kaldi::ExtractObjectRange(), kaldi::ExtractRangeSpecifier(), KALDI_ERR, Matrix< Real >::Read(), and Input::Stream().

Referenced by ComputeLogPosteriors(), ComputeScores(), ConvolutionComponent::Init(), OnlineIvectorExtractionInfo::Init(), AffineComponent::Init(), AffineComponentPreconditioned::Init(), AffineComponentPreconditionedOnline::Init(), PerElementScaleComponent::Init(), Convolutional1dComponent::Init(), NaturalGradientAffineComponent::InitFromConfig(), LinearComponent::InitFromConfig(), FixedScaleComponent::InitFromConfig(), FixedBiasComponent::InitFromConfig(), PerElementOffsetComponent::InitFromConfig(), FixedScaleComponent::InitFromString(), FixedBiasComponent::InitFromString(), main(), OnlineFeaturePipeline::OnlineFeaturePipeline(), OnlineNnet2FeaturePipeline::OnlineNnet2FeaturePipeline(), kaldi::ReadKaldiObject(), kaldi::ReadModels(), kaldi::RunPerSpeaker(), and kaldi::TypeThreeUsage().

833  {
834  if (!filename.empty() && filename[filename.size() - 1] == ']') {
835  // This filename seems to have a 'range'... like foo.ark:4312423[20:30].
836  // (the bit in square brackets is the range).
837  std::string rxfilename, range;
838  if (!ExtractRangeSpecifier(filename, &rxfilename, &range)) {
839  KALDI_ERR << "Could not make sense of possible range specifier in filename "
840  << "while reading matrix: " << filename;
841  }
842  Matrix<float> temp;
843  bool binary_in;
844  Input ki(rxfilename, &binary_in);
845  temp.Read(ki.Stream(), binary_in);
846  if (!ExtractObjectRange(temp, range, m)) {
847  KALDI_ERR << "Error extracting range of object: " << filename;
848  }
849  } else {
850  // The normal case, there is no range.
851  bool binary_in;
852  Input ki(filename, &binary_in);
853  m->Read(ki.Stream(), binary_in);
854  }
855 }
bool ExtractObjectRange(const GeneralMatrix &input, const std::string &range, GeneralMatrix *output)
GeneralMatrix is always of type BaseFloat.
Definition: kaldi-holder.cc:88
#define KALDI_ERR
Definition: kaldi-error.h:147
bool ExtractRangeSpecifier(const std::string &rxfilename_with_range, std::string *data_rxfilename, std::string *range)

◆ ReadKaldiObject() [3/3]

void ReadKaldiObject ( const std::string &  filename,
Matrix< double > *  m 
)

Definition at line 857 of file kaldi-io.cc.

References kaldi::ExtractObjectRange(), kaldi::ExtractRangeSpecifier(), KALDI_ERR, Matrix< Real >::Read(), and Input::Stream().

858  {
859  if (!filename.empty() && filename[filename.size() - 1] == ']') {
860  // This filename seems to have a 'range'... like foo.ark:4312423[20:30].
861  // (the bit in square brackets is the range).
862  std::string rxfilename, range;
863  if (!ExtractRangeSpecifier(filename, &rxfilename, &range)) {
864  KALDI_ERR << "Could not make sense of possible range specifier in filename "
865  << "while reading matrix: " << filename;
866  }
867  Matrix<double> temp;
868  bool binary_in;
869  Input ki(rxfilename, &binary_in);
870  temp.Read(ki.Stream(), binary_in);
871  if (!ExtractObjectRange(temp, range, m)) {
872  KALDI_ERR << "Error extracting range of object: " << filename;
873  }
874  } else {
875  // The normal case, there is no range.
876  bool binary_in;
877  Input ki(filename, &binary_in);
878  m->Read(ki.Stream(), binary_in);
879  }
880 }
bool ExtractObjectRange(const GeneralMatrix &input, const std::string &range, GeneralMatrix *output)
GeneralMatrix is always of type BaseFloat.
Definition: kaldi-holder.cc:88
#define KALDI_ERR
Definition: kaldi-error.h:147
bool ExtractRangeSpecifier(const std::string &rxfilename_with_range, std::string *data_rxfilename, std::string *range)

◆ WriteKaldiObject()