All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
"Classes for opening streams"

This group contains the Input and Output classes, which are provided to open streams for reading and writing in Kaldi code; for an explanation of how this fits into the bigger picture of Kaldi I/O, see How to open files in Kaldi. More...

Classes

class  Output
 
class  Input
 

Enumerations

enum  OutputType { kNoOutput, kFileOutput, kStandardOutput, kPipeOutput }
 
enum  InputType {
  kNoInput, kFileInput, kStandardInput, kOffsetFileInput,
  kPipeInput
}
 

Functions

OutputType ClassifyWxfilename (const std::string &wxfilename)
 ClassifyWxfilename interprets filenames as follows: More...
 
InputType ClassifyRxfilename (const std::string &rxfilename)
 ClassifyRxfilenames interprets filenames for reading as follows: More...
 
template<class C >
void ReadKaldiObject (const std::string &filename, C *c)
 
template<>
void ReadKaldiObject (const std::string &filename, Matrix< float > *m)
 
template<>
void ReadKaldiObject (const std::string &filename, Matrix< double > *m)
 
template<class C >
void WriteKaldiObject (const C &c, const std::string &filename, bool binary)
 
std::string PrintableRxfilename (std::string rxfilename)
 PrintableRxfilename turns the rxfilename into a more human-readable form for error reporting, i.e. More...
 
std::string PrintableWxfilename (std::string wxfilename)
 PrintableWxfilename turns the filename into a more human-readable form for error reporting, i.e. More...
 

Detailed Description

This group contains the Input and Output classes, which are provided to open streams for reading and writing in Kaldi code; for an explanation of how this fits into the bigger picture of Kaldi I/O, see How to open files in Kaldi.

Enumeration Type Documentation

enum InputType
Enumerator
kNoInput 
kFileInput 
kStandardInput 
kOffsetFileInput 
kPipeInput 

Definition at line 105 of file kaldi-io.h.

enum OutputType
Enumerator
kNoOutput 
kFileOutput 
kStandardOutput 
kPipeOutput 

Definition at line 89 of file kaldi-io.h.

Function Documentation

InputType ClassifyRxfilename ( const std::string &  rxfilename)

ClassifyRxfilenames interprets filenames for reading as follows:

  • kNoInput: invalid filenames (leading or trailing space, things that look like wspecifiers and rspecifiers or pipes to write to with trailing |.
  • kFileInput: normal filenames
  • kStandardInput: the empty string or "-"
  • kPipeInput: e.g. "| gzip -c > blah.gz"
  • kOffsetFileInput: offsets into files, e.g. /some/filename:12970

Definition at line 135 of file kaldi-io.cc.

References kaldi::ClassifyRspecifier(), kaldi::ClassifyWspecifier(), rnnlm::d, KALDI_WARN, kaldi::kFileInput, kaldi::kNoInput, kaldi::kNoRspecifier, kaldi::kNoWspecifier, kaldi::kOffsetFileInput, kaldi::kPipeInput, and kaldi::kStandardInput.

Referenced by Input::OpenInternal(), and kaldi::UnitTestClassifyRxfilename().

135  {
136  const char *c = filename.c_str();
137  size_t length = filename.length();
138  char first_char = c[0],
139  last_char = (length == 0 ? '\0' : c[filename.length()-1]);
140 
141  // if 'filename' is "" or "-", return kStandardInput.
142  if (length == 0 || (length == 1 && first_char == '-')) {
143  return kStandardInput;
144  } else if (first_char == '|') {
145  return kNoInput; // An output pipe like "|blah": not
146  // valid for input.
147  } else if (last_char == '|') {
148  return kPipeInput;
149  } else if (isspace(first_char) || isspace(last_char)) {
150  return kNoInput; // We don't allow leading or trailing space in a filename.
151  } else if ((first_char == 'a' || first_char == 's') &&
152  strchr(c, ':') != NULL &&
153  (ClassifyWspecifier(filename, NULL, NULL, NULL) != kNoWspecifier ||
154  ClassifyRspecifier(filename, NULL, NULL) != kNoRspecifier)) {
155  // e.g. ark:something or scp:something... this is almost certainly a
156  // scripting error, so call it an error rather than treating it as a file.
157  // In practice in modern kaldi scripts all (r,w)filenames begin with "ark"
158  // or "scp", even though technically speaking options like "b", "t", "s" or
159  // "cs" can appear before the ark or scp, like "b,ark". For efficiency,
160  // and because this code is really just a nicety to catch errors earlier
161  // than they would otherwise be caught, we only call those extra functions
162  // for filenames beginning with 'a' or 's'.
163  return kNoInput;
164  } else if (isdigit(last_char)) {
165  const char *d = c + length - 1;
166  while (isdigit(*d) && d > c) d--;
167  if (*d == ':') return kOffsetFileInput; // Filename is like
168  // some_file:12345
169  // otherwise it could still be a filename; continue to the next check.
170  }
171 
172 
173  // At this point it matched no other pattern so we assume a filename, but
174  // we check for '|' as it's a common source of errors to have pipe
175  // commands without the pipe in the right place. Say that it can't be
176  // classified in this case.
177  if (strchr(c, '|') != NULL) {
178  KALDI_WARN << "Trying to classify rxfilename with pipe symbol in the"
179  " wrong place (pipe without | at the end?): " << filename;
180  return kNoInput;
181  }
182  return kFileInput; // It matched no other pattern: assume it's a filename.
183 }
#define KALDI_WARN
Definition: kaldi-error.h:130
RspecifierType ClassifyRspecifier(const std::string &rspecifier, std::string *wxfilename, RspecifierOptions *opts)
Definition: kaldi-table.cc:225
WspecifierType ClassifyWspecifier(const std::string &wspecifier, std::string *archive_wxfilename, std::string *script_wxfilename, WspecifierOptions *opts)
Definition: kaldi-table.cc:135
OutputType ClassifyWxfilename ( const std::string &  wxfilename)

ClassifyWxfilename interprets filenames as follows:

  • kNoOutput: invalid filenames (leading or trailing space, things that look like wspecifiers and rspecifiers or like pipes to read from with leading |.
  • kFileOutput: Normal filenames
  • kStandardOutput: The empty string or "-", interpreted as standard output
  • kPipeOutput: pipes, e.g. "gunzip -c some_file.gz |"

Definition at line 82 of file kaldi-io.cc.

References kaldi::ClassifyRspecifier(), kaldi::ClassifyWspecifier(), rnnlm::d, KALDI_WARN, kaldi::kFileOutput, kaldi::kNoOutput, kaldi::kNoRspecifier, kaldi::kNoWspecifier, kaldi::kPipeOutput, and kaldi::kStandardOutput.

Referenced by Output::Open(), TableWriterBothImpl< Holder >::Open(), kaldi::UnitTestClassifyWxfilename(), and Output::~Output().

82  {
83  const char *c = filename.c_str();
84  size_t length = filename.length();
85  char first_char = c[0],
86  last_char = (length == 0 ? '\0' : c[filename.length()-1]);
87 
88  // if 'filename' is "" or "-", return kStandardOutput.
89  if (length == 0 || (length == 1 && first_char == '-'))
90  return kStandardOutput;
91  else if (first_char == '|') return kPipeOutput; // An output pipe like "|blah".
92  else if (isspace(first_char) || isspace(last_char) || last_char == '|') {
93  return kNoOutput; // Leading or trailing space: can't interpret this.
94  // Final '|' would represent an input pipe, not an
95  // output pipe.
96  } else if ((first_char == 'a' || first_char == 's') &&
97  strchr(c, ':') != NULL &&
98  (ClassifyWspecifier(filename, NULL, NULL, NULL) != kNoWspecifier ||
99  ClassifyRspecifier(filename, NULL, NULL) != kNoRspecifier)) {
100  // e.g. ark:something or scp:something... this is almost certainly a
101  // scripting error, so call it an error rather than treating it as a file.
102  // In practice in modern kaldi scripts all (r,w)filenames begin with "ark"
103  // or "scp", even though technically speaking options like "b", "t", "s" or
104  // "cs" can appear before the ark or scp, like "b,ark". For efficiency,
105  // and because this code is really just a nicety to catch errors earlier
106  // than they would otherwise be caught, we only call those extra functions
107  // for filenames beginning with 'a' or 's'.
108  return kNoOutput;
109  } else if (isdigit(last_char)) {
110  // This could be a file, but we have to see if it's an offset into a file
111  // (like foo.ark:4314328), which is not allowed for writing (but is
112  // allowed for reaching). This eliminates some things which would be
113  // valid UNIX filenames but are not allowed by Kaldi. (Even if we allowed
114  // such filenames for writing, we woudln't be able to correctly read them).
115  const char *d = c + length - 1;
116  while (isdigit(*d) && d > c) d--;
117  if (*d == ':') return kNoOutput;
118  // else it could still be a filename; continue to the next check.
119  }
120 
121  // At this point it matched no other pattern so we assume a filename, but we
122  // check for internal '|' as it's a common source of errors to have pipe
123  // commands without the pipe in the right place. Say that it can't be
124  // classified.
125  if (strchr(c, '|') != NULL) {
126  KALDI_WARN << "Trying to classify wxfilename with pipe symbol in the"
127  " wrong place (pipe without | at the beginning?): " <<
128  filename;
129  return kNoOutput;
130  }
131  return kFileOutput; // It matched no other pattern: assume it's a filename.
132 }
#define KALDI_WARN
Definition: kaldi-error.h:130
RspecifierType ClassifyRspecifier(const std::string &rspecifier, std::string *wxfilename, RspecifierOptions *opts)
Definition: kaldi-table.cc:225
WspecifierType ClassifyWspecifier(const std::string &wspecifier, std::string *archive_wxfilename, std::string *script_wxfilename, WspecifierOptions *opts)
Definition: kaldi-table.cc:135
std::string PrintableRxfilename ( std::string  rxfilename)

PrintableRxfilename turns the rxfilename into a more human-readable form for error reporting, i.e.

it does quoting and escaping and replaces "" or "-" with "standard input".

Definition at line 58 of file kaldi-io.cc.

References ParseOptions::Escape().

Referenced by SequentialTableReaderArchiveImpl< Holder >::Close(), SequentialTableReaderScriptImpl< Holder >::EnsureObjectLoaded(), RandomAccessTableReaderSortedArchiveImpl< Holder >::FindKeyInternal(), kaldi::GetUtterancePairs(), RandomAccessTableReaderMapped< Holder >::HasKey(), RandomAccessTableReaderScriptImpl< Holder >::HasKeyInternal(), Input::Input(), main(), SequentialTableReaderArchiveImpl< Holder >::Next(), SequentialTableReaderScriptImpl< Holder >::Open(), PipeInputImpl::Open(), SequentialTableReaderArchiveImpl< Holder >::Open(), TableWriterScriptImpl< Holder >::Open(), RandomAccessTableReaderScriptImpl< Holder >::Open(), RandomAccessTableReaderArchiveImplBase< Holder >::Open(), Input::OpenInternal(), fst::ReadFstKaldi(), fst::ReadFstKaldiGeneric(), RandomAccessTableReaderArchiveImplBase< Holder >::ReadNextObject(), kaldi::ReadPhoneMap(), kaldi::ReadScriptFile(), kaldi::ReadSharedPhonesList(), kaldi::ReadSymbolList(), SequentialTableReaderScriptImpl< Holder >::Value(), RandomAccessTableReaderMapped< Holder >::Value(), RandomAccessTableReaderDSortedArchiveImpl< Holder >::Value(), RandomAccessTableReaderSortedArchiveImpl< Holder >::Value(), RandomAccessTableReaderUnsortedArchiveImpl< Holder >::Value(), TableWriterScriptImpl< Holder >::Write(), SequentialTableReaderArchiveImpl< Holder >::~SequentialTableReaderArchiveImpl(), and SequentialTableReaderScriptImpl< Holder >::~SequentialTableReaderScriptImpl().

58  {
59  if (rxfilename == "" || rxfilename == "-") {
60  return "standard input";
61  } else {
62  // If this call to Escape later causes compilation issues,
63  // just replace it with "return rxfilename"; it's only a
64  // pretty-printing issue.
65  return ParseOptions::Escape(rxfilename);
66  }
67 }
std::string PrintableWxfilename ( std::string  wxfilename)

PrintableWxfilename turns the filename into a more human-readable form for error reporting, i.e.

it does quoting and escaping and replaces "" or "-" with "standard output".

Definition at line 70 of file kaldi-io.cc.

References ParseOptions::Escape().

Referenced by main(), Output::Open(), Output::Output(), kaldi::TypeThreeUsage(), kaldi::TypeTwoUsage(), TableWriterArchiveImpl< Holder >::Write(), TableWriterScriptImpl< Holder >::Write(), TableWriterBothImpl< Holder >::Write(), fst::WriteFstKaldi(), kaldi::WriteScriptFile(), Output::~Output(), and PipeOutputImpl::~PipeOutputImpl().

70  {
71  if (wxfilename == "" || wxfilename == "-") {
72  return "standard output";
73  } else {
74  // If this call to Escape later causes compilation issues,
75  // just replace it with "return rxfilename"; it's only a
76  // pretty-printing issue.
77  return ParseOptions::Escape(wxfilename);
78  }
79 }
void kaldi::ReadKaldiObject ( const std::string &  filename,
C *  c 
)

Definition at line 239 of file kaldi-io.h.

References Input::Stream().

240  {
241  bool binary_in;
242  Input ki(filename, &binary_in);
243  c->Read(ki.Stream(), binary_in);
244 }
void ReadKaldiObject ( const std::string &  filename,
Matrix< float > *  m 
)

Definition at line 829 of file kaldi-io.cc.

References kaldi::ExtractObjectRange(), kaldi::ExtractRangeSpecifier(), KALDI_ERR, Matrix< Real >::Read(), and Input::Stream().

Referenced by ComputeLogPosteriors(), ComputeScores(), AffineComponent::Init(), AffineComponentPreconditioned::Init(), AffineComponentPreconditionedOnline::Init(), PerElementScaleComponent::Init(), Convolutional1dComponent::Init(), ConvolutionComponent::Init(), NaturalGradientAffineComponent::InitFromConfig(), LinearComponent::InitFromConfig(), FixedScaleComponent::InitFromConfig(), FixedBiasComponent::InitFromConfig(), PerElementOffsetComponent::InitFromConfig(), FixedScaleComponent::InitFromString(), FixedBiasComponent::InitFromString(), main(), and kaldi::RunPerSpeaker().

830  {
831  if (!filename.empty() && filename[filename.size() - 1] == ']') {
832  // This filename seems to have a 'range'... like foo.ark:4312423[20:30].
833  // (the bit in square brackets is the range).
834  std::string rxfilename, range;
835  if (!ExtractRangeSpecifier(filename, &rxfilename, &range)) {
836  KALDI_ERR << "Could not make sense of possible range specifier in filename "
837  << "while reading matrix: " << filename;
838  }
839  Matrix<float> temp;
840  bool binary_in;
841  Input ki(rxfilename, &binary_in);
842  temp.Read(ki.Stream(), binary_in);
843  if (!ExtractObjectRange(temp, range, m)) {
844  KALDI_ERR << "Error extracting range of object: " << filename;
845  }
846  } else {
847  // The normal case, there is no range.
848  bool binary_in;
849  Input ki(filename, &binary_in);
850  m->Read(ki.Stream(), binary_in);
851  }
852 }
bool ExtractObjectRange(const GeneralMatrix &input, const std::string &range, GeneralMatrix *output)
GeneralMatrix is always of type BaseFloat.
Definition: kaldi-holder.cc:88
#define KALDI_ERR
Definition: kaldi-error.h:127
bool ExtractRangeSpecifier(const std::string &rxfilename_with_range, std::string *data_rxfilename, std::string *range)
void ReadKaldiObject ( const std::string &  filename,
Matrix< double > *  m 
)

Definition at line 854 of file kaldi-io.cc.

References kaldi::ExtractObjectRange(), kaldi::ExtractRangeSpecifier(), KALDI_ERR, Matrix< Real >::Read(), and Input::Stream().

855  {
856  if (!filename.empty() && filename[filename.size() - 1] == ']') {
857  // This filename seems to have a 'range'... like foo.ark:4312423[20:30].
858  // (the bit in square brackets is the range).
859  std::string rxfilename, range;
860  if (!ExtractRangeSpecifier(filename, &rxfilename, &range)) {
861  KALDI_ERR << "Could not make sense of possible range specifier in filename "
862  << "while reading matrix: " << filename;
863  }
864  Matrix<double> temp;
865  bool binary_in;
866  Input ki(rxfilename, &binary_in);
867  temp.Read(ki.Stream(), binary_in);
868  if (!ExtractObjectRange(temp, range, m)) {
869  KALDI_ERR << "Error extracting range of object: " << filename;
870  }
871  } else {
872  // The normal case, there is no range.
873  bool binary_in;
874  Input ki(filename, &binary_in);
875  m->Read(ki.Stream(), binary_in);
876  }
877 }
bool ExtractObjectRange(const GeneralMatrix &input, const std::string &range, GeneralMatrix *output)
GeneralMatrix is always of type BaseFloat.
Definition: kaldi-holder.cc:88
#define KALDI_ERR
Definition: kaldi-error.h:127
bool ExtractRangeSpecifier(const std::string &rxfilename_with_range, std::string *data_rxfilename, std::string *range)
void kaldi::WriteKaldiObject ( const C &  c,
const std::string &  filename,
bool  binary 
)
inline