OnlinePitchFeatureImpl Class Reference
Collaboration diagram for OnlinePitchFeatureImpl:

Public Member Functions

 OnlinePitchFeatureImpl (const PitchExtractionOptions &opts)
 
int32 Dim () const
 
BaseFloat FrameShiftInSeconds () const
 
int32 NumFramesReady () const
 
bool IsLastFrame (int32 frame) const
 
void GetFrame (int32 frame, VectorBase< BaseFloat > *feat)
 
void AcceptWaveform (BaseFloat sampling_rate, const VectorBase< BaseFloat > &waveform)
 
void InputFinished ()
 
 ~OnlinePitchFeatureImpl ()
 
 OnlinePitchFeatureImpl (const OnlinePitchFeatureImpl &other)
 

Private Member Functions

int32 NumFramesAvailable (int64 num_downsampled_samples, bool snip_edges) const
 This function works out from the signal how many frames are currently available to process (this is called from inside AcceptWaveform()). More...
 
void ExtractFrame (const VectorBase< BaseFloat > &downsampled_wave_part, int64 frame_index, VectorBase< BaseFloat > *window)
 This function extracts from the signal the samples numbered from "sample_index" (numbered in the full downsampled signal, not just this part), and of length equal to window->Dim(). More...
 
void RecomputeBacktraces ()
 This function is called after we reach frame "recompute_frame", or when InputFinished() is called, whichever comes sooner. More...
 
void UpdateRemainder (const VectorBase< BaseFloat > &downsampled_wave_part)
 This function updates downsampled_signal_remainder_, downsampled_samples_processed_, signal_sum_ and signal_sumsq_; it's called from AcceptWaveform(). More...
 

Private Attributes

PitchExtractionOptions opts_
 
int32 nccf_first_lag_
 
int32 nccf_last_lag_
 
Vector< BaseFloatlags_
 
ArbitraryResamplenccf_resampler_
 
LinearResamplesignal_resampler_
 
std::vector< PitchFrameInfo * > frame_info_
 
std::vector< NccfInfo * > nccf_info_
 
int32 frames_latency_
 
Vector< BaseFloatforward_cost_
 
double forward_cost_remainder_
 
std::vector< std::pair< int32, BaseFloat > > lag_nccf_
 
bool input_finished_
 
double signal_sumsq_
 sum-squared of previously processed parts of signal; used to get NCCF ballast term. More...
 
double signal_sum_
 sum of previously processed parts of signal; used to do mean-subtraction when getting sum-squared, along with signal_sumsq_. More...
 
int64 downsampled_samples_processed_
 downsampled_samples_processed is the number of samples (after downsampling) that we got in previous calls to AcceptWaveform(). More...
 
Vector< BaseFloatdownsampled_signal_remainder_
 This is a small remainder of the previous downsampled signal; it's used by ExtractFrame for frames near the boundary of two waveforms supplied to AcceptWaveform(). More...
 

Detailed Description

Definition at line 574 of file pitch-functions.cc.

Constructor & Destructor Documentation

◆ OnlinePitchFeatureImpl() [1/2]

OnlinePitchFeatureImpl ( const PitchExtractionOptions opts)
explicit

Definition at line 715 of file pitch-functions.cc.

References VectorBase< Real >::Add(), OnlinePitchFeatureImpl::forward_cost_, OnlinePitchFeatureImpl::frame_info_, OnlinePitchFeatureImpl::frames_latency_, OnlinePitchFeatureImpl::lags_, PitchExtractionOptions::lowpass_cutoff, PitchExtractionOptions::lowpass_filter_width, PitchExtractionOptions::max_f0, PitchExtractionOptions::min_f0, OnlinePitchFeatureImpl::nccf_first_lag_, OnlinePitchFeatureImpl::nccf_last_lag_, OnlinePitchFeatureImpl::nccf_resampler_, PitchExtractionOptions::resample_freq, PitchExtractionOptions::samp_freq, kaldi::SelectLags(), OnlinePitchFeatureImpl::signal_resampler_, and PitchExtractionOptions::upsample_filter_width.

716  :
717  opts_(opts), forward_cost_remainder_(0.0), input_finished_(false),
719  signal_resampler_ = new LinearResample(opts.samp_freq, opts.resample_freq,
720  opts.lowpass_cutoff,
721  opts.lowpass_filter_width);
722 
723  double outer_min_lag = 1.0 / opts.max_f0 -
724  (opts.upsample_filter_width/(2.0 * opts.resample_freq));
725  double outer_max_lag = 1.0 / opts.min_f0 +
726  (opts.upsample_filter_width/(2.0 * opts.resample_freq));
727  nccf_first_lag_ = ceil(opts.resample_freq * outer_min_lag);
728  nccf_last_lag_ = floor(opts.resample_freq * outer_max_lag);
729 
730  frames_latency_ = 0; // will be set in AcceptWaveform()
731 
732  // Choose the lags at which we resample the NCCF.
733  SelectLags(opts, &lags_);
734 
735  // upsample_cutoff is the filter cutoff for upsampling the NCCF, which is the
736  // Nyquist of the resampling frequency. The NCCF is (almost completely)
737  // bandlimited to around "lowpass_cutoff" (1000 by default), and when the
738  // spectrum of this bandlimited signal is convolved with the spectrum of an
739  // impulse train with frequency "resample_freq", which are separated by 4kHz,
740  // we get energy at -5000,-3000, -1000...1000, 3000..5000, etc. Filtering at
741  // half the Nyquist (2000 by default) is sufficient to get only the first
742  // repetition.
743  BaseFloat upsample_cutoff = opts.resample_freq * 0.5;
744 
745 
746  Vector<BaseFloat> lags_offset(lags_);
747  // lags_offset equals lags_ (which are the log-spaced lag values we want to
748  // measure the NCCF at) with nccf_first_lag_ / opts.resample_freq subtracted
749  // from each element, so we can treat the measured NCCF values as as starting
750  // from sample zero in a signal that starts at the point start /
751  // opts.resample_freq. This is necessary because the ArbitraryResample code
752  // assumes that the input signal starts from sample zero.
753  lags_offset.Add(-nccf_first_lag_ / opts.resample_freq);
754 
755  int32 num_measured_lags = nccf_last_lag_ + 1 - nccf_first_lag_;
756 
757  nccf_resampler_ = new ArbitraryResample(num_measured_lags, opts.resample_freq,
758  upsample_cutoff, lags_offset,
759  opts.upsample_filter_width);
760 
761  // add a PitchInfo object for frame -1 (not a real frame).
762  frame_info_.push_back(new PitchFrameInfo(lags_.Dim()));
763  // zeroes forward_cost_; this is what we want for the fake frame -1.
764  forward_cost_.Resize(lags_.Dim());
765 }
std::vector< PitchFrameInfo * > frame_info_
double signal_sum_
sum of previously processed parts of signal; used to do mean-subtraction when getting sum-squared...
kaldi::int32 int32
ArbitraryResample * nccf_resampler_
float BaseFloat
Definition: kaldi-types.h:29
Vector< BaseFloat > forward_cost_
PitchExtractionOptions opts_
int64 downsampled_samples_processed_
downsampled_samples_processed is the number of samples (after downsampling) that we got in previous c...
double signal_sumsq_
sum-squared of previously processed parts of signal; used to get NCCF ballast term.
void SelectLags(const PitchExtractionOptions &opts, Vector< BaseFloat > *lags)
This function selects the lags at which we measure the NCCF: we need to select lags from 1/max_f0 to ...

◆ ~OnlinePitchFeatureImpl()

Definition at line 1037 of file pitch-functions.cc.

References OnlinePitchFeatureImpl::frame_info_, rnnlm::i, OnlinePitchFeatureImpl::nccf_info_, OnlinePitchFeatureImpl::nccf_resampler_, and OnlinePitchFeatureImpl::signal_resampler_.

1037  {
1038  delete nccf_resampler_;
1039  delete signal_resampler_;
1040  for (size_t i = 0; i < frame_info_.size(); i++)
1041  delete frame_info_[i];
1042  for (size_t i = 0; i < nccf_info_.size(); i++)
1043  delete nccf_info_[i];
1044 }
std::vector< PitchFrameInfo * > frame_info_
ArbitraryResample * nccf_resampler_
std::vector< NccfInfo * > nccf_info_

◆ OnlinePitchFeatureImpl() [2/2]

Member Function Documentation

◆ AcceptWaveform()

void AcceptWaveform ( BaseFloat  sampling_rate,
const VectorBase< BaseFloat > &  waveform 
)

Definition at line 1046 of file pitch-functions.cc.

References kaldi::ComputeCorrelation(), kaldi::ComputeNccf(), VectorBase< Real >::Dim(), OnlinePitchFeatureImpl::downsampled_samples_processed_, OnlinePitchFeatureImpl::ExtractFrame(), OnlinePitchFeatureImpl::forward_cost_, OnlinePitchFeatureImpl::forward_cost_remainder_, OnlinePitchFeatureImpl::frame_info_, OnlinePitchFeatureImpl::frames_latency_, OnlinePitchFeatureImpl::input_finished_, KALDI_ASSERT, KALDI_VLOG, OnlinePitchFeatureImpl::lag_nccf_, OnlinePitchFeatureImpl::lags_, PitchExtractionOptions::max_frames_latency, PitchExtractionOptions::nccf_ballast, PitchExtractionOptions::nccf_ballast_online, OnlinePitchFeatureImpl::nccf_first_lag_, OnlinePitchFeatureImpl::nccf_info_, OnlinePitchFeatureImpl::nccf_last_lag_, OnlinePitchFeatureImpl::nccf_resampler_, PitchExtractionOptions::NccfWindowShift(), PitchExtractionOptions::NccfWindowSize(), OnlinePitchFeatureImpl::NumFramesAvailable(), OnlinePitchFeatureImpl::opts_, PitchExtractionOptions::recompute_frame, OnlinePitchFeatureImpl::RecomputeBacktraces(), ArbitraryResample::Resample(), LinearResample::Resample(), Matrix< Real >::Resize(), MatrixBase< Real >::Row(), OnlinePitchFeatureImpl::signal_resampler_, OnlinePitchFeatureImpl::signal_sum_, OnlinePitchFeatureImpl::signal_sumsq_, PitchExtractionOptions::snip_edges, VectorBase< Real >::Sum(), OnlinePitchFeatureImpl::UpdateRemainder(), and kaldi::VecVec().

Referenced by OnlinePitchFeature::AcceptWaveform(), and OnlinePitchFeatureImpl::InputFinished().

1048  {
1049  // flush out the last few samples of input waveform only if input_finished_ ==
1050  // true.
1051  const bool flush = input_finished_;
1052 
1053  Vector<BaseFloat> downsampled_wave;
1054  signal_resampler_->Resample(wave, flush, &downsampled_wave);
1055 
1056  // these variables will be used to compute the root-mean-square value of the
1057  // signal for the ballast term.
1058  double cur_sumsq = signal_sumsq_, cur_sum = signal_sum_;
1059  int64 cur_num_samp = downsampled_samples_processed_,
1060  prev_frame_end_sample = 0;
1061  if (!opts_.nccf_ballast_online) {
1062  cur_sumsq += VecVec(downsampled_wave, downsampled_wave);
1063  cur_sum += downsampled_wave.Sum();
1064  cur_num_samp += downsampled_wave.Dim();
1065  }
1066 
1067  // end_frame is the total number of frames we can now process, including
1068  // previously processed ones.
1069  int32 end_frame = NumFramesAvailable(
1070  downsampled_samples_processed_ + downsampled_wave.Dim(), opts_.snip_edges);
1071  // "start_frame" is the first frame-index we process
1072  int32 start_frame = frame_info_.size() - 1,
1073  num_new_frames = end_frame - start_frame;
1074 
1075  if (num_new_frames == 0) {
1076  UpdateRemainder(downsampled_wave);
1077  return;
1078  // continuing to the rest of the code would generate
1079  // an error when sizing matrices with zero rows, and
1080  // anyway is a waste of time.
1081  }
1082 
1083  int32 num_measured_lags = nccf_last_lag_ + 1 - nccf_first_lag_,
1084  num_resampled_lags = lags_.Dim(),
1085  frame_shift = opts_.NccfWindowShift(),
1086  basic_frame_length = opts_.NccfWindowSize(),
1087  full_frame_length = basic_frame_length + nccf_last_lag_;
1088 
1089  Vector<BaseFloat> window(full_frame_length),
1090  inner_prod(num_measured_lags),
1091  norm_prod(num_measured_lags);
1092  Matrix<BaseFloat> nccf_pitch(num_new_frames, num_measured_lags),
1093  nccf_pov(num_new_frames, num_measured_lags);
1094 
1095  Vector<BaseFloat> cur_forward_cost(num_resampled_lags);
1096 
1097 
1098  // Because the resampling of the NCCF is more efficient when grouped together,
1099  // we first compute the NCCF for all frames, then resample as a matrix, then
1100  // do the Viterbi [that happens inside the constructor of PitchFrameInfo].
1101 
1102  for (int32 frame = start_frame; frame < end_frame; frame++) {
1103  // start_sample is index into the whole wave, not just this part.
1104  int64 start_sample;
1105  if (opts_.snip_edges) {
1106  // Usual case: offset starts at 0
1107  start_sample = static_cast<int64>(frame) * frame_shift;
1108  } else {
1109  // When we are not snipping the edges, the first offsets may be
1110  // negative. In this case we will pad with zeros, it should not impact
1111  // the pitch tracker.
1112  start_sample =
1113  static_cast<int64>((frame + 0.5) * frame_shift) - full_frame_length / 2;
1114  }
1115  ExtractFrame(downsampled_wave, start_sample, &window);
1116  if (opts_.nccf_ballast_online) {
1117  // use only up to end of current frame to compute root-mean-square value.
1118  // end_sample will be the sample-index into "downsampled_wave", so
1119  // not really comparable to start_sample.
1120  int64 end_sample = start_sample + full_frame_length -
1122  KALDI_ASSERT(end_sample > 0); // or should have processed this frame last
1123  // time. Note: end_sample is one past last
1124  // sample.
1125  if (end_sample > downsampled_wave.Dim()) {
1127  end_sample = downsampled_wave.Dim();
1128  }
1129  SubVector<BaseFloat> new_part(downsampled_wave, prev_frame_end_sample,
1130  end_sample - prev_frame_end_sample);
1131  cur_num_samp += new_part.Dim();
1132  cur_sumsq += VecVec(new_part, new_part);
1133  cur_sum += new_part.Sum();
1134  prev_frame_end_sample = end_sample;
1135  }
1136  double mean_square = cur_sumsq / cur_num_samp -
1137  pow(cur_sum / cur_num_samp, 2.0);
1138 
1139  ComputeCorrelation(window, nccf_first_lag_, nccf_last_lag_,
1140  basic_frame_length, &inner_prod, &norm_prod);
1141  double nccf_ballast_pov = 0.0,
1142  nccf_ballast_pitch = pow(mean_square * basic_frame_length, 2) *
1144  avg_norm_prod = norm_prod.Sum() / norm_prod.Dim();
1145  SubVector<BaseFloat> nccf_pitch_row(nccf_pitch, frame - start_frame);
1146  ComputeNccf(inner_prod, norm_prod, nccf_ballast_pitch,
1147  &nccf_pitch_row);
1148  SubVector<BaseFloat> nccf_pov_row(nccf_pov, frame - start_frame);
1149  ComputeNccf(inner_prod, norm_prod, nccf_ballast_pov,
1150  &nccf_pov_row);
1151  if (frame < opts_.recompute_frame)
1152  nccf_info_.push_back(new NccfInfo(avg_norm_prod, mean_square));
1153  }
1154 
1155  Matrix<BaseFloat> nccf_pitch_resampled(num_new_frames, num_resampled_lags);
1156  nccf_resampler_->Resample(nccf_pitch, &nccf_pitch_resampled);
1157  nccf_pitch.Resize(0, 0); // no longer needed.
1158  Matrix<BaseFloat> nccf_pov_resampled(num_new_frames, num_resampled_lags);
1159  nccf_resampler_->Resample(nccf_pov, &nccf_pov_resampled);
1160  nccf_pov.Resize(0, 0); // no longer needed.
1161 
1162  // We've finished dealing with the waveform so we can call UpdateRemainder
1163  // now; we need to call it before we possibly call RecomputeBacktraces()
1164  // below, which is why we don't do it at the very end.
1165  UpdateRemainder(downsampled_wave);
1166 
1167  std::vector<std::pair<int32, int32 > > index_info;
1168 
1169  for (int32 frame = start_frame; frame < end_frame; frame++) {
1170  int32 frame_idx = frame - start_frame;
1171  PitchFrameInfo *prev_info = frame_info_.back(),
1172  *cur_info = new PitchFrameInfo(prev_info);
1173  cur_info->SetNccfPov(nccf_pov_resampled.Row(frame_idx));
1174  cur_info->ComputeBacktraces(opts_, nccf_pitch_resampled.Row(frame_idx),
1175  lags_, forward_cost_, &index_info,
1176  &cur_forward_cost);
1177  forward_cost_.Swap(&cur_forward_cost);
1178  // Renormalize forward_cost so smallest element is zero.
1179  BaseFloat remainder = forward_cost_.Min();
1180  forward_cost_remainder_ += remainder;
1181  forward_cost_.Add(-remainder);
1182  frame_info_.push_back(cur_info);
1183  if (frame < opts_.recompute_frame)
1184  nccf_info_[frame]->nccf_pitch_resampled =
1185  nccf_pitch_resampled.Row(frame_idx);
1186  if (frame == opts_.recompute_frame - 1 && !opts_.nccf_ballast_online)
1188  }
1189 
1190  // Trace back the best-path.
1191  int32 best_final_state;
1192  forward_cost_.Min(&best_final_state);
1193  lag_nccf_.resize(frame_info_.size() - 1); // will keep any existing data.
1194  frame_info_.back()->SetBestState(best_final_state, lag_nccf_);
1195  frames_latency_ =
1196  frame_info_.back()->ComputeLatency(opts_.max_frames_latency);
1197  KALDI_VLOG(4) << "Latency is " << frames_latency_;
1198 }
std::vector< PitchFrameInfo * > frame_info_
void Resample(const VectorBase< BaseFloat > &input, bool flush, Vector< BaseFloat > *output)
This function does the resampling.
Definition: resample.cc:152
double signal_sum_
sum of previously processed parts of signal; used to do mean-subtraction when getting sum-squared...
void ExtractFrame(const VectorBase< BaseFloat > &downsampled_wave_part, int64 frame_index, VectorBase< BaseFloat > *window)
This function extracts from the signal the samples numbered from "sample_index" (numbered in the full...
kaldi::int32 int32
int32 NccfWindowShift() const
Returns the window-shift in samples, after resampling.
ArbitraryResample * nccf_resampler_
void RecomputeBacktraces()
This function is called after we reach frame "recompute_frame", or when InputFinished() is called...
float BaseFloat
Definition: kaldi-types.h:29
Vector< BaseFloat > forward_cost_
void Resample(const MatrixBase< BaseFloat > &input, MatrixBase< BaseFloat > *output) const
This function does the resampling.
Definition: resample.cc:280
PitchExtractionOptions opts_
void ComputeNccf(const VectorBase< BaseFloat > &inner_prod, const VectorBase< BaseFloat > &norm_prod, BaseFloat nccf_ballast, VectorBase< BaseFloat > *nccf_vec)
Computes the NCCF as a fraction of the numerator term (a dot product between two vectors) and a denom...
int64 downsampled_samples_processed_
downsampled_samples_processed is the number of samples (after downsampling) that we got in previous c...
int32 NumFramesAvailable(int64 num_downsampled_samples, bool snip_edges) const
This function works out from the signal how many frames are currently available to process (this is c...
std::vector< NccfInfo * > nccf_info_
std::vector< std::pair< int32, BaseFloat > > lag_nccf_
void UpdateRemainder(const VectorBase< BaseFloat > &downsampled_wave_part)
This function updates downsampled_signal_remainder_, downsampled_samples_processed_, signal_sum_ and signal_sumsq_; it&#39;s called from AcceptWaveform().
void ComputeCorrelation(const VectorBase< BaseFloat > &wave, int32 first_lag, int32 last_lag, int32 nccf_window_size, VectorBase< BaseFloat > *inner_prod, VectorBase< BaseFloat > *norm_prod)
This function computes some dot products that are required while computing the NCCF.
int32 NccfWindowSize() const
Returns the window-size in samples, after resampling.
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
#define KALDI_VLOG(v)
Definition: kaldi-error.h:156
double signal_sumsq_
sum-squared of previously processed parts of signal; used to get NCCF ballast term.
Real VecVec(const VectorBase< Real > &a, const VectorBase< Real > &b)
Returns dot product between v1 and v2.
Definition: kaldi-vector.cc:37

◆ Dim()

int32 Dim ( ) const
inline

Definition at line 578 of file pitch-functions.cc.

578 { return 2; }

◆ ExtractFrame()

void ExtractFrame ( const VectorBase< BaseFloat > &  downsampled_wave_part,
int64  frame_index,
VectorBase< BaseFloat > *  window 
)
private

This function extracts from the signal the samples numbered from "sample_index" (numbered in the full downsampled signal, not just this part), and of length equal to window->Dim().

It uses the data members downsampled_samples_discarded_ and downsampled_signal_remainder_, as well as the more recent part of the downsampled wave "downsampled_wave_part" which is provided.

Parameters
downsampled_wave_partOne chunk of the downsampled wave, starting from sample-index downsampled_samples_discarded_.
sample_indexThe desired starting sample index (measured from the start of the whole signal, not just this part).
windowThe part of the signal is output to here.

Definition at line 839 of file pitch-functions.cc.

References VectorBase< Real >::CopyFromVec(), VectorBase< Real >::Dim(), OnlinePitchFeatureImpl::downsampled_samples_processed_, OnlinePitchFeatureImpl::downsampled_signal_remainder_, rnnlm::i, OnlinePitchFeatureImpl::input_finished_, KALDI_ASSERT, OnlinePitchFeatureImpl::opts_, PitchExtractionOptions::preemph_coeff, VectorBase< Real >::Range(), VectorBase< Real >::SetZero(), and PitchExtractionOptions::snip_edges.

Referenced by OnlinePitchFeatureImpl::AcceptWaveform().

842  {
843  int32 full_frame_length = window->Dim();
844  int32 offset = static_cast<int32>(sample_index -
846 
847  // Treat edge cases first
848  if (sample_index < 0) {
849  // Part of the frame is before the beginning of the signal. This
850  // should only happen if opts_.snip_edges == false, when we are
851  // processing the first few frames of signal. In this case
852  // we pad with zeros.
853  KALDI_ASSERT(opts_.snip_edges == false);
854  int32 sub_frame_length = sample_index + full_frame_length;
855  int32 sub_frame_index = full_frame_length - sub_frame_length;
856  KALDI_ASSERT(sub_frame_length > 0 && sub_frame_index > 0);
857  window->SetZero();
858  SubVector<BaseFloat> sub_window(*window, sub_frame_index, sub_frame_length);
859  ExtractFrame(downsampled_wave_part, 0, &sub_window);
860  return;
861  }
862 
863  if (offset + full_frame_length > downsampled_wave_part.Dim()) {
864  // Requested frame is past end of the signal. This should only happen if
865  // input_finished_ == true, when we're flushing out the last couple of
866  // frames of signal. In this case we pad with zeros.
868  int32 sub_frame_length = downsampled_wave_part.Dim() - offset;
869  KALDI_ASSERT(sub_frame_length > 0);
870  window->SetZero();
871  SubVector<BaseFloat> sub_window(*window, 0, sub_frame_length);
872  ExtractFrame(downsampled_wave_part, sample_index, &sub_window);
873  return;
874  }
875 
876  // "offset" is the offset of the start of the frame, into this
877  // signal.
878  if (offset >= 0) {
879  // frame is full inside the new part of the signal.
880  window->CopyFromVec(downsampled_wave_part.Range(offset, full_frame_length));
881  } else {
882  // frame is partly in the remainder and partly in the new part.
883  int32 remainder_offset = downsampled_signal_remainder_.Dim() + offset;
884  KALDI_ASSERT(remainder_offset >= 0); // or we didn't keep enough remainder.
885  KALDI_ASSERT(offset + full_frame_length > 0); // or we should have
886  // processed this frame last
887  // time.
888 
889  int32 old_length = -offset, new_length = offset + full_frame_length;
890  window->Range(0, old_length).CopyFromVec(
891  downsampled_signal_remainder_.Range(remainder_offset, old_length));
892  window->Range(old_length, new_length).CopyFromVec(
893  downsampled_wave_part.Range(0, new_length));
894  }
895  if (opts_.preemph_coeff != 0.0) {
896  BaseFloat preemph_coeff = opts_.preemph_coeff;
897  for (int32 i = window->Dim() - 1; i > 0; i--)
898  (*window)(i) -= preemph_coeff * (*window)(i-1);
899  (*window)(0) *= (1.0 - preemph_coeff);
900  }
901 }
void ExtractFrame(const VectorBase< BaseFloat > &downsampled_wave_part, int64 frame_index, VectorBase< BaseFloat > *window)
This function extracts from the signal the samples numbered from "sample_index" (numbered in the full...
kaldi::int32 int32
Vector< BaseFloat > downsampled_signal_remainder_
This is a small remainder of the previous downsampled signal; it&#39;s used by ExtractFrame for frames ne...
float BaseFloat
Definition: kaldi-types.h:29
PitchExtractionOptions opts_
int64 downsampled_samples_processed_
downsampled_samples_processed is the number of samples (after downsampling) that we got in previous c...
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185

◆ FrameShiftInSeconds()

BaseFloat FrameShiftInSeconds ( ) const

Definition at line 909 of file pitch-functions.cc.

References PitchExtractionOptions::frame_shift_ms, and OnlinePitchFeatureImpl::opts_.

Referenced by OnlinePitchFeature::FrameShiftInSeconds().

909  {
910  return opts_.frame_shift_ms / 1000.0f;
911 }
PitchExtractionOptions opts_

◆ GetFrame()

void GetFrame ( int32  frame,
VectorBase< BaseFloat > *  feat 
)

Definition at line 921 of file pitch-functions.cc.

References VectorBase< Real >::Dim(), KALDI_ASSERT, OnlinePitchFeatureImpl::lag_nccf_, OnlinePitchFeatureImpl::lags_, and OnlinePitchFeatureImpl::NumFramesReady().

Referenced by OnlinePitchFeature::GetFrame().

922  {
923  KALDI_ASSERT(frame < NumFramesReady() && feat->Dim() == 2);
924  (*feat)(0) = lag_nccf_[frame].second;
925  (*feat)(1) = 1.0 / lags_(lag_nccf_[frame].first);
926 }
std::vector< std::pair< int32, BaseFloat > > lag_nccf_
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185

◆ InputFinished()

void InputFinished ( )

Definition at line 928 of file pitch-functions.cc.

References OnlinePitchFeatureImpl::AcceptWaveform(), OnlinePitchFeatureImpl::forward_cost_remainder_, OnlinePitchFeatureImpl::frame_info_, OnlinePitchFeatureImpl::frames_latency_, OnlinePitchFeatureImpl::input_finished_, KALDI_VLOG, PitchExtractionOptions::nccf_ballast_online, OnlinePitchFeatureImpl::opts_, PitchExtractionOptions::recompute_frame, OnlinePitchFeatureImpl::RecomputeBacktraces(), and PitchExtractionOptions::samp_freq.

Referenced by OnlinePitchFeature::InputFinished().

928  {
929  input_finished_ = true;
930  // Process an empty waveform; this has an effect because
931  // after setting input_finished_ to true, NumFramesAvailable()
932  // will return a slightly larger number.
933  AcceptWaveform(opts_.samp_freq, Vector<BaseFloat>());
934  int32 num_frames = static_cast<size_t>(frame_info_.size() - 1);
935  if (num_frames < opts_.recompute_frame && !opts_.nccf_ballast_online)
937  frames_latency_ = 0;
938  KALDI_VLOG(3) << "Pitch-tracking Viterbi cost is "
939  << (forward_cost_remainder_ / num_frames)
940  << " per frame, over " << num_frames << " frames.";
941 }
std::vector< PitchFrameInfo * > frame_info_
void AcceptWaveform(BaseFloat sampling_rate, const VectorBase< BaseFloat > &waveform)
kaldi::int32 int32
void RecomputeBacktraces()
This function is called after we reach frame "recompute_frame", or when InputFinished() is called...
PitchExtractionOptions opts_
#define KALDI_VLOG(v)
Definition: kaldi-error.h:156

◆ IsLastFrame()

bool IsLastFrame ( int32  frame) const

Definition at line 903 of file pitch-functions.cc.

References OnlinePitchFeatureImpl::input_finished_, KALDI_ASSERT, and OnlinePitchFeatureImpl::NumFramesReady().

Referenced by OnlinePitchFeature::IsLastFrame().

903  {
904  int32 T = NumFramesReady();
905  KALDI_ASSERT(frame < T);
906  return (input_finished_ && frame + 1 == T);
907 }
kaldi::int32 int32
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185

◆ NumFramesAvailable()

int32 NumFramesAvailable ( int64  num_downsampled_samples,
bool  snip_edges 
) const
private

This function works out from the signal how many frames are currently available to process (this is called from inside AcceptWaveform()).

Note: the number of frames differs slightly from the number the old pitch code gave. Note: the number this returns depends on whether input_finished_ == true; if it is, it will "force out" a final frame or two.

Definition at line 768 of file pitch-functions.cc.

References OnlinePitchFeatureImpl::input_finished_, OnlinePitchFeatureImpl::nccf_last_lag_, PitchExtractionOptions::NccfWindowShift(), PitchExtractionOptions::NccfWindowSize(), and OnlinePitchFeatureImpl::opts_.

Referenced by OnlinePitchFeatureImpl::AcceptWaveform().

769  {
770  int32 frame_shift = opts_.NccfWindowShift(),
771  frame_length = opts_.NccfWindowSize();
772  // Use the "full frame length" to compute the number
773  // of frames only if the input is not finished.
774  if (!input_finished_)
775  frame_length += nccf_last_lag_;
776  if (num_downsampled_samples < frame_length) {
777  return 0;
778  } else {
779  if (!snip_edges) {
780  if (input_finished_) {
781  return static_cast<int32>(num_downsampled_samples * 1.0f /
782  frame_shift + 0.5f);
783  } else {
784  return static_cast<int32>((num_downsampled_samples - frame_length / 2) *
785  1.0f / frame_shift + 0.5f);
786  }
787  } else {
788  return static_cast<int32>((num_downsampled_samples - frame_length) /
789  frame_shift + 1);
790  }
791  }
792 }
kaldi::int32 int32
int32 NccfWindowShift() const
Returns the window-shift in samples, after resampling.
PitchExtractionOptions opts_
int32 NccfWindowSize() const
Returns the window-size in samples, after resampling.

◆ NumFramesReady()

int32 NumFramesReady ( ) const

Definition at line 913 of file pitch-functions.cc.

References OnlinePitchFeatureImpl::frames_latency_, KALDI_ASSERT, and OnlinePitchFeatureImpl::lag_nccf_.

Referenced by OnlinePitchFeatureImpl::GetFrame(), and OnlinePitchFeatureImpl::IsLastFrame().

913  {
914  int32 num_frames = lag_nccf_.size(),
915  latency = frames_latency_;
916  KALDI_ASSERT(latency <= num_frames);
917  return num_frames - latency;
918 }
kaldi::int32 int32
std::vector< std::pair< int32, BaseFloat > > lag_nccf_
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185

◆ RecomputeBacktraces()

void RecomputeBacktraces ( )
private

This function is called after we reach frame "recompute_frame", or when InputFinished() is called, whichever comes sooner.

It recomputes the backtraces for frames zero through recompute_frame, if needed because the average energy of the signal has changed, affecting the nccf ballast term. It works out the average signal energy from downsampled_samples_processed_, signal_sum_ and signal_sumsq_ (which, if you see the calling code, might include more frames than just "recompute_frame", it might include up to the end of the current chunk).

Definition at line 945 of file pitch-functions.cc.

References kaldi::ApproxEqual(), OnlinePitchFeatureImpl::downsampled_samples_processed_, OnlinePitchFeatureImpl::forward_cost_, OnlinePitchFeatureImpl::forward_cost_remainder_, OnlinePitchFeatureImpl::frame_info_, OnlinePitchFeatureImpl::frames_latency_, rnnlm::i, KALDI_ASSERT, KALDI_VLOG, OnlinePitchFeatureImpl::lag_nccf_, OnlinePitchFeatureImpl::lags_, PitchExtractionOptions::max_frames_latency, PitchExtractionOptions::nccf_ballast, PitchExtractionOptions::nccf_ballast_online, OnlinePitchFeatureImpl::nccf_info_, NccfInfo::nccf_pitch_resampled, PitchExtractionOptions::NccfWindowSize(), OnlinePitchFeatureImpl::opts_, PitchExtractionOptions::recompute_frame, OnlinePitchFeatureImpl::signal_sum_, and OnlinePitchFeatureImpl::signal_sumsq_.

Referenced by OnlinePitchFeatureImpl::AcceptWaveform(), and OnlinePitchFeatureImpl::InputFinished().

945  {
947  int32 num_frames = static_cast<int32>(frame_info_.size()) - 1;
948 
949  // The assertion reflects how we believe this function will be called.
950  KALDI_ASSERT(num_frames <= opts_.recompute_frame);
951  KALDI_ASSERT(nccf_info_.size() == static_cast<size_t>(num_frames));
952  if (num_frames == 0)
953  return;
954  double num_samp = downsampled_samples_processed_, sum = signal_sum_,
955  sumsq = signal_sumsq_, mean = sum / num_samp;
956  BaseFloat mean_square = sumsq / num_samp - mean * mean;
957 
958  bool must_recompute = false;
959  BaseFloat threshold = 0.01;
960  for (int32 frame = 0; frame < num_frames; frame++)
961  if (!ApproxEqual(nccf_info_[frame]->mean_square_energy,
962  mean_square, threshold))
963  must_recompute = true;
964 
965  if (!must_recompute) {
966  // Nothing to do. We'll reach here, for instance, if everything was in one
967  // chunk and opts_.nccf_ballast_online == false. This is the case for
968  // offline processing.
969  for (size_t i = 0; i < nccf_info_.size(); i++)
970  delete nccf_info_[i];
971  nccf_info_.clear();
972  return;
973  }
974 
975  int32 num_states = forward_cost_.Dim(),
976  basic_frame_length = opts_.NccfWindowSize();
977 
978  BaseFloat new_nccf_ballast = pow(mean_square * basic_frame_length, 2) *
980 
981  double forward_cost_remainder = 0.0;
982  Vector<BaseFloat> forward_cost(num_states), // start off at zero.
983  next_forward_cost(forward_cost);
984  std::vector<std::pair<int32, int32 > > index_info;
985 
986  for (int32 frame = 0; frame < num_frames; frame++) {
987  NccfInfo &nccf_info = *nccf_info_[frame];
988  BaseFloat old_mean_square = nccf_info_[frame]->mean_square_energy,
989  avg_norm_prod = nccf_info_[frame]->avg_norm_prod,
990  old_nccf_ballast = pow(old_mean_square * basic_frame_length, 2) *
992  nccf_scale = pow((old_nccf_ballast + avg_norm_prod) /
993  (new_nccf_ballast + avg_norm_prod),
994  static_cast<BaseFloat>(0.5));
995  // The "nccf_scale" is an estimate of the scaling factor by which the NCCF
996  // would change on this frame, on average, by changing the ballast term from
997  // "old_nccf_ballast" to "new_nccf_ballast". It's not exact because the
998  // "avg_norm_prod" is just an average of the product e1 * e2 of frame
999  // energies of the (frame, shifted-frame), but these won't change that much
1000  // within a frame, and even if they do, the inaccuracy of the scaled NCCF
1001  // will still be very small if the ballast term didn't change much, or if
1002  // it's much larger or smaller than e1*e2. By doing it as a simple scaling,
1003  // we save the overhead of the NCCF resampling, which is a considerable part
1004  // of the whole computation.
1005  nccf_info.nccf_pitch_resampled.Scale(nccf_scale);
1006 
1007  frame_info_[frame + 1]->ComputeBacktraces(
1008  opts_, nccf_info.nccf_pitch_resampled, lags_,
1009  forward_cost, &index_info, &next_forward_cost);
1010 
1011  forward_cost.Swap(&next_forward_cost);
1012  BaseFloat remainder = forward_cost.Min();
1013  forward_cost_remainder += remainder;
1014  forward_cost.Add(-remainder);
1015  }
1016  KALDI_VLOG(3) << "Forward-cost per frame changed from "
1017  << (forward_cost_remainder_ / num_frames) << " to "
1018  << (forward_cost_remainder / num_frames);
1019 
1020  forward_cost_remainder_ = forward_cost_remainder;
1021  forward_cost_.Swap(&forward_cost);
1022 
1023  int32 best_final_state;
1024  forward_cost_.Min(&best_final_state);
1025 
1026  if (lag_nccf_.size() != static_cast<size_t>(num_frames))
1027  lag_nccf_.resize(num_frames);
1028 
1029  frame_info_.back()->SetBestState(best_final_state, lag_nccf_);
1030  frames_latency_ =
1031  frame_info_.back()->ComputeLatency(opts_.max_frames_latency);
1032  for (size_t i = 0; i < nccf_info_.size(); i++)
1033  delete nccf_info_[i];
1034  nccf_info_.clear();
1035 }
std::vector< PitchFrameInfo * > frame_info_
double signal_sum_
sum of previously processed parts of signal; used to do mean-subtraction when getting sum-squared...
kaldi::int32 int32
float BaseFloat
Definition: kaldi-types.h:29
Vector< BaseFloat > forward_cost_
PitchExtractionOptions opts_
int64 downsampled_samples_processed_
downsampled_samples_processed is the number of samples (after downsampling) that we got in previous c...
std::vector< NccfInfo * > nccf_info_
std::vector< std::pair< int32, BaseFloat > > lag_nccf_
int32 NccfWindowSize() const
Returns the window-size in samples, after resampling.
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
#define KALDI_VLOG(v)
Definition: kaldi-error.h:156
double signal_sumsq_
sum-squared of previously processed parts of signal; used to get NCCF ballast term.
static bool ApproxEqual(float a, float b, float relative_tolerance=0.001)
return abs(a - b) <= relative_tolerance * (abs(a)+abs(b)).
Definition: kaldi-math.h:265

◆ UpdateRemainder()

void UpdateRemainder ( const VectorBase< BaseFloat > &  downsampled_wave_part)
private

This function updates downsampled_signal_remainder_, downsampled_samples_processed_, signal_sum_ and signal_sumsq_; it's called from AcceptWaveform().

Definition at line 794 of file pitch-functions.cc.

References VectorBase< Real >::Dim(), OnlinePitchFeatureImpl::downsampled_samples_processed_, OnlinePitchFeatureImpl::downsampled_signal_remainder_, OnlinePitchFeatureImpl::frame_info_, rnnlm::i, KALDI_ASSERT, OnlinePitchFeatureImpl::nccf_last_lag_, PitchExtractionOptions::NccfWindowShift(), PitchExtractionOptions::NccfWindowSize(), OnlinePitchFeatureImpl::opts_, OnlinePitchFeatureImpl::signal_sum_, OnlinePitchFeatureImpl::signal_sumsq_, VectorBase< Real >::Sum(), and kaldi::VecVec().

Referenced by OnlinePitchFeatureImpl::AcceptWaveform().

795  {
796  // frame_info_ has an extra element at frame-1, so subtract
797  // one from the length.
798  int64 num_frames = static_cast<int64>(frame_info_.size()) - 1,
799  next_frame = num_frames,
800  frame_shift = opts_.NccfWindowShift(),
801  next_frame_sample = frame_shift * next_frame;
802 
803  signal_sumsq_ += VecVec(downsampled_wave_part, downsampled_wave_part);
804  signal_sum_ += downsampled_wave_part.Sum();
805 
806  // next_frame_sample is the first sample index we'll need for the
807  // next frame.
808  int64 next_downsampled_samples_processed =
809  downsampled_samples_processed_ + downsampled_wave_part.Dim();
810 
811  if (next_frame_sample > next_downsampled_samples_processed) {
812  // this could only happen in the weird situation that the full frame length
813  // is less than the frame shift.
814  int32 full_frame_length = opts_.NccfWindowSize() + nccf_last_lag_;
815  KALDI_ASSERT(full_frame_length < frame_shift && "Code error");
817  } else {
818  Vector<BaseFloat> new_remainder(next_downsampled_samples_processed -
819  next_frame_sample);
820  // note: next_frame_sample is the index into the entire signal, of
821  // new_remainder(0).
822  // i is the absolute index of the signal.
823  for (int64 i = next_frame_sample;
824  i < next_downsampled_samples_processed; i++) {
825  if (i >= downsampled_samples_processed_) { // in current signal.
826  new_remainder(i - next_frame_sample) =
827  downsampled_wave_part(i - downsampled_samples_processed_);
828  } else { // in old remainder; only reach here if waveform supplied is
829  new_remainder(i - next_frame_sample) = // tiny.
832  }
833  }
834  downsampled_signal_remainder_.Swap(&new_remainder);
835  }
836  downsampled_samples_processed_ = next_downsampled_samples_processed;
837 }
std::vector< PitchFrameInfo * > frame_info_
double signal_sum_
sum of previously processed parts of signal; used to do mean-subtraction when getting sum-squared...
kaldi::int32 int32
Vector< BaseFloat > downsampled_signal_remainder_
This is a small remainder of the previous downsampled signal; it&#39;s used by ExtractFrame for frames ne...
int32 NccfWindowShift() const
Returns the window-shift in samples, after resampling.
PitchExtractionOptions opts_
int64 downsampled_samples_processed_
downsampled_samples_processed is the number of samples (after downsampling) that we got in previous c...
int32 NccfWindowSize() const
Returns the window-size in samples, after resampling.
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
double signal_sumsq_
sum-squared of previously processed parts of signal; used to get NCCF ballast term.
Real VecVec(const VectorBase< Real > &a, const VectorBase< Real > &b)
Returns dot product between v1 and v2.
Definition: kaldi-vector.cc:37

Member Data Documentation

◆ downsampled_samples_processed_

int64 downsampled_samples_processed_
private

downsampled_samples_processed is the number of samples (after downsampling) that we got in previous calls to AcceptWaveform().

Definition at line 707 of file pitch-functions.cc.

Referenced by OnlinePitchFeatureImpl::AcceptWaveform(), OnlinePitchFeatureImpl::ExtractFrame(), OnlinePitchFeatureImpl::RecomputeBacktraces(), and OnlinePitchFeatureImpl::UpdateRemainder().

◆ downsampled_signal_remainder_

Vector<BaseFloat> downsampled_signal_remainder_
private

This is a small remainder of the previous downsampled signal; it's used by ExtractFrame for frames near the boundary of two waveforms supplied to AcceptWaveform().

Definition at line 711 of file pitch-functions.cc.

Referenced by OnlinePitchFeatureImpl::ExtractFrame(), and OnlinePitchFeatureImpl::UpdateRemainder().

◆ forward_cost_

◆ forward_cost_remainder_

◆ frame_info_

◆ frames_latency_

◆ input_finished_

◆ lag_nccf_

◆ lags_

◆ nccf_first_lag_

int32 nccf_first_lag_
private

◆ nccf_info_

◆ nccf_last_lag_

◆ nccf_resampler_

◆ opts_

◆ signal_resampler_

◆ signal_sum_

double signal_sum_
private

sum of previously processed parts of signal; used to do mean-subtraction when getting sum-squared, along with signal_sumsq_.

Definition at line 703 of file pitch-functions.cc.

Referenced by OnlinePitchFeatureImpl::AcceptWaveform(), OnlinePitchFeatureImpl::RecomputeBacktraces(), and OnlinePitchFeatureImpl::UpdateRemainder().

◆ signal_sumsq_

double signal_sumsq_
private

sum-squared of previously processed parts of signal; used to get NCCF ballast term.

Denominator is downsampled_samples_processed_.

Definition at line 699 of file pitch-functions.cc.

Referenced by OnlinePitchFeatureImpl::AcceptWaveform(), OnlinePitchFeatureImpl::RecomputeBacktraces(), and OnlinePitchFeatureImpl::UpdateRemainder().


The documentation for this class was generated from the following file: