OnlineProcessPitch Class Reference

This online-feature class implements post processing of pitch features. More...

#include <pitch-functions.h>

Inheritance diagram for OnlineProcessPitch:
Collaboration diagram for OnlineProcessPitch:

Classes

struct  NormalizationStats
 

Public Member Functions

virtual int32 Dim () const
 
virtual bool IsLastFrame (int32 frame) const
 Returns true if this is the last frame. More...
 
virtual BaseFloat FrameShiftInSeconds () const
 
virtual int32 NumFramesReady () const
 returns the feature dimension. More...
 
virtual void GetFrame (int32 frame, VectorBase< BaseFloat > *feat)
 Gets the feature vector for this frame. More...
 
virtual ~OnlineProcessPitch ()
 
 OnlineProcessPitch (const ProcessPitchOptions &opts, OnlineFeatureInterface *src)
 Note on the implementation of OnlineProcessPitch: the OnlineFeatureInterface allows random access to features (i.e. More...
 
- Public Member Functions inherited from OnlineFeatureInterface
virtual void GetFrames (const std::vector< int32 > &frames, MatrixBase< BaseFloat > *feats)
 This is like GetFrame() but for a collection of frames. More...
 
virtual ~OnlineFeatureInterface ()
 Virtual destructor. More...
 

Private Types

enum  { kRawFeatureDim = 2 }
 

Private Member Functions

BaseFloat GetPovFeature (int32 frame) const
 Computes and returns the POV feature for this frame. More...
 
BaseFloat GetDeltaPitchFeature (int32 frame)
 Computes and returns the delta-log-pitch feature for this frame. More...
 
BaseFloat GetRawLogPitchFeature (int32 frame) const
 Computes and returns the raw log-pitch feature for this frame. More...
 
BaseFloat GetNormalizedLogPitchFeature (int32 frame)
 Computes and returns the mean-subtracted log-pitch feature for this frame. More...
 
void GetNormalizationWindow (int32 frame, int32 src_frames_ready, int32 *window_begin, int32 *window_end) const
 Computes the normalization window sizes. More...
 
void UpdateNormalizationStats (int32 frame)
 Makes sure the entry in normalization_stats_ for this frame is up to date; called from GetNormalizedLogPitchFeature. More...
 

Private Attributes

ProcessPitchOptions opts_
 
OnlineFeatureInterfacesrc_
 
int32 dim_
 
std::vector< BaseFloatdelta_feature_noise_
 
std::vector< NormalizationStatsnormalization_stats_
 

Detailed Description

This online-feature class implements post processing of pitch features.

Inputs are original 2 dims (nccf, pitch). It can produce various kinds of outputs, using the default options it will be (pov-feature, normalized-log-pitch, delta-log-pitch).

Definition at line 332 of file pitch-functions.h.

Member Enumeration Documentation

◆ anonymous enum

anonymous enum
private
Enumerator
kRawFeatureDim 

Definition at line 359 of file pitch-functions.h.

359 { kRawFeatureDim = 2}; // anonymous enum to define a constant.

Constructor & Destructor Documentation

◆ ~OnlineProcessPitch()

virtual ~OnlineProcessPitch ( )
inlinevirtual

Definition at line 352 of file pitch-functions.h.

352 { }

◆ OnlineProcessPitch()

Note on the implementation of OnlineProcessPitch: the OnlineFeatureInterface allows random access to features (i.e.

not necessarily sequential order), so we need to support that. But we don't need to support it very efficiently, and our implementation is most efficient if frames are accessed in sequential order.

Also note: we have to be a bit careful in this implementation because the input features may change. That is: if we call src_->GetFrame(t, &vec) from GetFrame(), we can't guarantee that a later call to src_->GetFrame(t, &vec) from another GetFrame() will return the same value. In fact, while designing this class we used some knowledge of how the OnlinePitchFeature class works to minimize the amount of re-querying we had to do.

Definition at line 1398 of file pitch-functions.cc.

References OnlineFeatureInterface::Dim(), OnlineProcessPitch::dim_, KALDI_ASSERT, and OnlineProcessPitch::kRawFeatureDim.

1400  :
1401  opts_(opts), src_(src),
1402  dim_ ((opts.add_pov_feature ? 1 : 0)
1403  + (opts.add_normalized_log_pitch ? 1 : 0)
1404  + (opts.add_delta_pitch ? 1 : 0)
1405  + (opts.add_raw_log_pitch ? 1 : 0)) {
1406  KALDI_ASSERT(dim_ > 0 &&
1407  " At least one of the pitch features should be chosen. "
1408  "Check your post-process-pitch options.");
1409  KALDI_ASSERT(src->Dim() == kRawFeatureDim &&
1410  "Input feature must be pitch feature (should have dimension 2)");
1411 }
ProcessPitchOptions opts_
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
OnlineFeatureInterface * src_

Member Function Documentation

◆ Dim()

◆ FrameShiftInSeconds()

virtual BaseFloat FrameShiftInSeconds ( ) const
inlinevirtual

Implements OnlineFeatureInterface.

Definition at line 344 of file pitch-functions.h.

344  {
345  return src_->FrameShiftInSeconds();
346  }
virtual BaseFloat FrameShiftInSeconds() const =0
OnlineFeatureInterface * src_

◆ GetDeltaPitchFeature()

BaseFloat GetDeltaPitchFeature ( int32  frame)
inlineprivate

Computes and returns the delta-log-pitch feature for this frame.

Called from GetFrame().

Definition at line 1439 of file pitch-functions.cc.

References kaldi::ComputeDeltas(), OnlineProcessPitch::delta_feature_noise_, ProcessPitchOptions::delta_pitch_noise_stddev, ProcessPitchOptions::delta_pitch_scale, ProcessPitchOptions::delta_window, OnlineProcessPitch::GetRawLogPitchFeature(), OnlineFeatureInterface::NumFramesReady(), OnlineProcessPitch::opts_, DeltaFeaturesOptions::order, kaldi::RandGauss(), OnlineProcessPitch::src_, and DeltaFeaturesOptions::window.

Referenced by OnlineProcessPitch::GetFrame().

1439  {
1440  // Rather than computing the delta pitch directly in code here,
1441  // which might seem easier, we accumulate a small window of features
1442  // and call ComputeDeltas. This might seem like overkill; the reason
1443  // we do it this way is to ensure that the end effects (at file
1444  // beginning and end) are handled in a consistent way.
1445  int32 context = opts_.delta_window;
1446  int32 start_frame = std::max(0, frame - context),
1447  end_frame = std::min(frame + context + 1, src_->NumFramesReady()),
1448  frames_in_window = end_frame - start_frame;
1449  Matrix<BaseFloat> feats(frames_in_window, 1),
1450  delta_feats;
1451 
1452  for (int32 f = start_frame; f < end_frame; f++)
1453  feats(f - start_frame, 0) = GetRawLogPitchFeature(f);
1454 
1455  DeltaFeaturesOptions delta_opts;
1456  delta_opts.order = 1;
1457  delta_opts.window = opts_.delta_window;
1458  ComputeDeltas(delta_opts, feats, &delta_feats);
1459  while (delta_feature_noise_.size() <= static_cast<size_t>(frame)) {
1460  delta_feature_noise_.push_back(RandGauss() *
1462  }
1463  // note: delta_feats will have two columns, second contains deltas.
1464  return (delta_feats(frame - start_frame, 1) + delta_feature_noise_[frame]) *
1466 }
std::vector< BaseFloat > delta_feature_noise_
float RandGauss(struct RandomState *state=NULL)
Definition: kaldi-math.h:155
kaldi::int32 int32
ProcessPitchOptions opts_
void ComputeDeltas(const DeltaFeaturesOptions &delta_opts, const MatrixBase< BaseFloat > &input_features, Matrix< BaseFloat > *output_features)
OnlineFeatureInterface * src_
virtual int32 NumFramesReady() const =0
returns the feature dimension.
BaseFloat GetRawLogPitchFeature(int32 frame) const
Computes and returns the raw log-pitch feature for this frame.

◆ GetFrame()

void GetFrame ( int32  frame,
VectorBase< BaseFloat > *  feat 
)
virtual

Gets the feature vector for this frame.

Before calling this for a given frame, it is assumed that you called NumFramesReady() and it returned a number greater than "frame". Otherwise this call will likely crash with an assert failure. This function is not declared const, in case there is some kind of caching going on, but most of the time it shouldn't modify the class.

Implements OnlineFeatureInterface.

Definition at line 1414 of file pitch-functions.cc.

References ProcessPitchOptions::add_delta_pitch, ProcessPitchOptions::add_normalized_log_pitch, ProcessPitchOptions::add_pov_feature, ProcessPitchOptions::add_raw_log_pitch, ProcessPitchOptions::delay, VectorBase< Real >::Dim(), OnlineProcessPitch::dim_, OnlineProcessPitch::GetDeltaPitchFeature(), OnlineProcessPitch::GetNormalizedLogPitchFeature(), OnlineProcessPitch::GetPovFeature(), OnlineProcessPitch::GetRawLogPitchFeature(), KALDI_ASSERT, OnlineProcessPitch::NumFramesReady(), and OnlineProcessPitch::opts_.

Referenced by kaldi::ComputeAndProcessKaldiPitch(), OnlineFeaturePipeline::GetAsMatrix(), kaldi::ProcessPitch(), kaldi::UnitTestDelay(), and kaldi::UnitTestPieces().

1415  {
1416  int32 frame_delayed = frame < opts_.delay ? 0 : frame - opts_.delay;
1417  KALDI_ASSERT(feat->Dim() == dim_ &&
1418  frame_delayed < NumFramesReady());
1419  int32 index = 0;
1420  if (opts_.add_pov_feature)
1421  (*feat)(index++) = GetPovFeature(frame_delayed);
1423  (*feat)(index++) = GetNormalizedLogPitchFeature(frame_delayed);
1424  if (opts_.add_delta_pitch)
1425  (*feat)(index++) = GetDeltaPitchFeature(frame_delayed);
1427  (*feat)(index++) = GetRawLogPitchFeature(frame_delayed);
1428  KALDI_ASSERT(index == dim_);
1429 }
kaldi::int32 int32
ProcessPitchOptions opts_
BaseFloat GetPovFeature(int32 frame) const
Computes and returns the POV feature for this frame.
virtual int32 NumFramesReady() const
returns the feature dimension.
BaseFloat GetNormalizedLogPitchFeature(int32 frame)
Computes and returns the mean-subtracted log-pitch feature for this frame.
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
BaseFloat GetDeltaPitchFeature(int32 frame)
Computes and returns the delta-log-pitch feature for this frame.
BaseFloat GetRawLogPitchFeature(int32 frame) const
Computes and returns the raw log-pitch feature for this frame.

◆ GetNormalizationWindow()

void GetNormalizationWindow ( int32  frame,
int32  src_frames_ready,
int32 window_begin,
int32 window_end 
) const
inlineprivate

Computes the normalization window sizes.

Definition at line 1487 of file pitch-functions.cc.

References ProcessPitchOptions::normalization_left_context, ProcessPitchOptions::normalization_right_context, and OnlineProcessPitch::opts_.

Referenced by OnlineProcessPitch::UpdateNormalizationStats().

1490  {
1491  int32 left_context = opts_.normalization_left_context;
1492  int32 right_context = opts_.normalization_right_context;
1493  *window_begin = std::max(0, t - left_context);
1494  *window_end = std::min(t + right_context + 1, src_frames_ready);
1495 }
kaldi::int32 int32
ProcessPitchOptions opts_

◆ GetNormalizedLogPitchFeature()

BaseFloat GetNormalizedLogPitchFeature ( int32  frame)
inlineprivate

Computes and returns the mean-subtracted log-pitch feature for this frame.

Called from GetFrame().

Definition at line 1476 of file pitch-functions.cc.

References OnlineProcessPitch::GetRawLogPitchFeature(), OnlineProcessPitch::normalization_stats_, OnlineProcessPitch::opts_, ProcessPitchOptions::pitch_scale, and OnlineProcessPitch::UpdateNormalizationStats().

Referenced by OnlineProcessPitch::GetFrame().

1476  {
1477  UpdateNormalizationStats(frame);
1478  BaseFloat log_pitch = GetRawLogPitchFeature(frame),
1479  avg_log_pitch = normalization_stats_[frame].sum_log_pitch_pov /
1480  normalization_stats_[frame].sum_pov,
1481  normalized_log_pitch = log_pitch - avg_log_pitch;
1482  return normalized_log_pitch * opts_.pitch_scale;
1483 }
ProcessPitchOptions opts_
float BaseFloat
Definition: kaldi-types.h:29
std::vector< NormalizationStats > normalization_stats_
void UpdateNormalizationStats(int32 frame)
Makes sure the entry in normalization_stats_ for this frame is up to date; called from GetNormalizedL...
BaseFloat GetRawLogPitchFeature(int32 frame) const
Computes and returns the raw log-pitch feature for this frame.

◆ GetPovFeature()

BaseFloat GetPovFeature ( int32  frame) const
inlineprivate

Computes and returns the POV feature for this frame.

Called from GetFrame().

Definition at line 1431 of file pitch-functions.cc.

References OnlineFeatureInterface::GetFrame(), OnlineProcessPitch::kRawFeatureDim, kaldi::NccfToPovFeature(), OnlineProcessPitch::opts_, ProcessPitchOptions::pov_offset, ProcessPitchOptions::pov_scale, and OnlineProcessPitch::src_.

Referenced by OnlineProcessPitch::GetFrame().

1431  {
1432  Vector<BaseFloat> tmp(kRawFeatureDim);
1433  src_->GetFrame(frame, &tmp); // (NCCF, pitch) from pitch extractor
1434  BaseFloat nccf = tmp(0);
1435  return opts_.pov_scale * NccfToPovFeature(nccf)
1436  + opts_.pov_offset;
1437 }
BaseFloat NccfToPovFeature(BaseFloat n)
This function processes the NCCF n to a POV feature f by applying the formula f = (1...
virtual void GetFrame(int32 frame, VectorBase< BaseFloat > *feat)=0
Gets the feature vector for this frame.
ProcessPitchOptions opts_
float BaseFloat
Definition: kaldi-types.h:29
OnlineFeatureInterface * src_

◆ GetRawLogPitchFeature()

BaseFloat GetRawLogPitchFeature ( int32  frame) const
inlineprivate

Computes and returns the raw log-pitch feature for this frame.

Called from GetFrame().

Definition at line 1468 of file pitch-functions.cc.

References OnlineFeatureInterface::GetFrame(), KALDI_ASSERT, OnlineProcessPitch::kRawFeatureDim, kaldi::Log(), and OnlineProcessPitch::src_.

Referenced by OnlineProcessPitch::GetDeltaPitchFeature(), OnlineProcessPitch::GetFrame(), and OnlineProcessPitch::GetNormalizedLogPitchFeature().

1468  {
1469  Vector<BaseFloat> tmp(kRawFeatureDim);
1470  src_->GetFrame(frame, &tmp);
1471  BaseFloat pitch = tmp(1);
1472  KALDI_ASSERT(pitch > 0);
1473  return Log(pitch);
1474 }
virtual void GetFrame(int32 frame, VectorBase< BaseFloat > *feat)=0
Gets the feature vector for this frame.
float BaseFloat
Definition: kaldi-types.h:29
double Log(double x)
Definition: kaldi-math.h:100
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
OnlineFeatureInterface * src_

◆ IsLastFrame()

virtual bool IsLastFrame ( int32  frame) const
inlinevirtual

Returns true if this is the last frame.

Frame indices are zero-based, so the first frame is zero. IsLastFrame(-1) will return false, unless the file is empty (which is a case that I'm not sure all the code will handle, so be careful). This function may return false for some frame if we haven't yet decided to terminate decoding, but later true if we decide to terminate decoding. This function exists mainly to correctly handle end effects in feature extraction, and is not a mechanism to determine how many frames are in the decodable object (as it used to be, and for backward compatibility, still is, in the Decodable interface).

Implements OnlineFeatureInterface.

Definition at line 336 of file pitch-functions.h.

336  {
337  if (frame <= -1)
338  return src_->IsLastFrame(-1);
339  else if (frame < opts_.delay)
340  return src_->IsLastFrame(-1) == true ? false : src_->IsLastFrame(0);
341  else
342  return src_->IsLastFrame(frame - opts_.delay);
343  }
ProcessPitchOptions opts_
virtual bool IsLastFrame(int32 frame) const =0
Returns true if this is the last frame.
OnlineFeatureInterface * src_

◆ NumFramesReady()

int32 NumFramesReady ( ) const
virtual

returns the feature dimension.

Returns the total number of frames, since the start of the utterance, that are now available. In an online-decoding context, this will likely increase with time as more data becomes available.

Implements OnlineFeatureInterface.

Definition at line 1569 of file pitch-functions.cc.

References ProcessPitchOptions::delay, OnlineFeatureInterface::IsLastFrame(), ProcessPitchOptions::normalization_right_context, OnlineFeatureInterface::NumFramesReady(), OnlineProcessPitch::opts_, and OnlineProcessPitch::src_.

Referenced by kaldi::ComputeAndProcessKaldiPitch(), OnlineProcessPitch::GetFrame(), kaldi::ProcessPitch(), and kaldi::UnitTestDelay().

1569  {
1570  int32 src_frames_ready = src_->NumFramesReady();
1571  if (src_frames_ready == 0) {
1572  return 0;
1573  } else if (src_->IsLastFrame(src_frames_ready - 1)) {
1574  return src_frames_ready + opts_.delay;
1575  } else {
1576  return std::max(0, src_frames_ready -
1578  }
1579 }
kaldi::int32 int32
ProcessPitchOptions opts_
virtual bool IsLastFrame(int32 frame) const =0
Returns true if this is the last frame.
OnlineFeatureInterface * src_
virtual int32 NumFramesReady() const =0
returns the feature dimension.

◆ UpdateNormalizationStats()

void UpdateNormalizationStats ( int32  frame)
inlineprivate

Makes sure the entry in normalization_stats_ for this frame is up to date; called from GetNormalizedLogPitchFeature.

Definition at line 1502 of file pitch-functions.cc.

References OnlineProcessPitch::NormalizationStats::cur_num_frames, OnlineFeatureInterface::GetFrame(), OnlineProcessPitch::GetNormalizationWindow(), OnlineProcessPitch::NormalizationStats::input_finished, OnlineFeatureInterface::IsLastFrame(), KALDI_ASSERT, OnlineProcessPitch::kRawFeatureDim, kaldi::Log(), kaldi::NccfToPov(), OnlineProcessPitch::normalization_stats_, OnlineFeatureInterface::NumFramesReady(), OnlineProcessPitch::src_, OnlineProcessPitch::NormalizationStats::sum_log_pitch_pov, and OnlineProcessPitch::NormalizationStats::sum_pov.

Referenced by OnlineProcessPitch::GetNormalizedLogPitchFeature().

1502  {
1503  KALDI_ASSERT(frame >= 0);
1504  if (normalization_stats_.size() <= frame)
1505  normalization_stats_.resize(frame + 1);
1506  int32 cur_num_frames = src_->NumFramesReady();
1507  bool input_finished = src_->IsLastFrame(cur_num_frames - 1);
1508 
1509  NormalizationStats &this_stats = normalization_stats_[frame];
1510  if (this_stats.cur_num_frames == cur_num_frames &&
1511  this_stats.input_finished == input_finished) {
1512  // Stats are fully up-to-date.
1513  return;
1514  }
1515  int32 this_window_begin, this_window_end;
1516  GetNormalizationWindow(frame, cur_num_frames,
1517  &this_window_begin, &this_window_end);
1518 
1519  if (frame > 0) {
1520  const NormalizationStats &prev_stats = normalization_stats_[frame - 1];
1521  if (prev_stats.cur_num_frames == cur_num_frames &&
1522  prev_stats.input_finished == input_finished) {
1523  // we'll derive this_stats efficiently from prev_stats.
1524  // Checking that cur_num_frames and input_finished have not changed
1525  // ensures that the underlying features will not have changed.
1526  this_stats = prev_stats;
1527  int32 prev_window_begin, prev_window_end;
1528  GetNormalizationWindow(frame - 1, cur_num_frames,
1529  &prev_window_begin, &prev_window_end);
1530  if (this_window_begin != prev_window_begin) {
1531  KALDI_ASSERT(this_window_begin == prev_window_begin + 1);
1532  Vector<BaseFloat> tmp(kRawFeatureDim);
1533  src_->GetFrame(prev_window_begin, &tmp);
1534  BaseFloat accurate_pov = NccfToPov(tmp(0)),
1535  log_pitch = Log(tmp(1));
1536  this_stats.sum_pov -= accurate_pov;
1537  this_stats.sum_log_pitch_pov -= accurate_pov * log_pitch;
1538  }
1539  if (this_window_end != prev_window_end) {
1540  KALDI_ASSERT(this_window_end == prev_window_end + 1);
1541  Vector<BaseFloat> tmp(kRawFeatureDim);
1542  src_->GetFrame(prev_window_end, &tmp);
1543  BaseFloat accurate_pov = NccfToPov(tmp(0)),
1544  log_pitch = Log(tmp(1));
1545  this_stats.sum_pov += accurate_pov;
1546  this_stats.sum_log_pitch_pov += accurate_pov * log_pitch;
1547  }
1548  return;
1549  }
1550  }
1551  // The way we do it here is not the most efficient way to do it;
1552  // we'll see if it becomes a problem. The issue is we have to redo
1553  // this computation from scratch each time we process a new chunk, which
1554  // may be a little inefficient if the chunk-size is very small.
1555  this_stats.cur_num_frames = cur_num_frames;
1556  this_stats.input_finished = input_finished;
1557  this_stats.sum_pov = 0.0;
1558  this_stats.sum_log_pitch_pov = 0.0;
1559  Vector<BaseFloat> tmp(kRawFeatureDim);
1560  for (int32 f = this_window_begin; f < this_window_end; f++) {
1561  src_->GetFrame(f, &tmp);
1562  BaseFloat accurate_pov = NccfToPov(tmp(0)),
1563  log_pitch = Log(tmp(1));
1564  this_stats.sum_pov += accurate_pov;
1565  this_stats.sum_log_pitch_pov += accurate_pov * log_pitch;
1566  }
1567 }
virtual void GetFrame(int32 frame, VectorBase< BaseFloat > *feat)=0
Gets the feature vector for this frame.
kaldi::int32 int32
float BaseFloat
Definition: kaldi-types.h:29
double Log(double x)
Definition: kaldi-math.h:100
std::vector< NormalizationStats > normalization_stats_
virtual bool IsLastFrame(int32 frame) const =0
Returns true if this is the last frame.
void GetNormalizationWindow(int32 frame, int32 src_frames_ready, int32 *window_begin, int32 *window_end) const
Computes the normalization window sizes.
BaseFloat NccfToPov(BaseFloat n)
This function processes the NCCF n to a reasonably accurate probability of voicing p by applying the ...
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
OnlineFeatureInterface * src_
virtual int32 NumFramesReady() const =0
returns the feature dimension.

Member Data Documentation

◆ delta_feature_noise_

std::vector<BaseFloat> delta_feature_noise_
private

Definition at line 379 of file pitch-functions.h.

Referenced by OnlineProcessPitch::GetDeltaPitchFeature().

◆ dim_

int32 dim_
private

◆ normalization_stats_

std::vector<NormalizationStats> normalization_stats_
private

◆ opts_

◆ src_


The documentation for this class was generated from the following files: