OptimizeLbfgs< Real > Class Template Reference

#include <optimization.h>

Collaboration diagram for OptimizeLbfgs< Real >:

Public Member Functions

 OptimizeLbfgs (const VectorBase< Real > &x, const LbfgsOptions &opts)
 Initializer takes the starting value of x. More...
 
const VectorBase< Real > & GetValue (Real *objf_value=NULL) const
 This returns the value of the variable x that has the best objective function so far, and the corresponding objective function value if requested. More...
 
const VectorBase< Real > & GetProposedValue () const
 This returns the value at which the function wants us to compute the objective function and gradient. More...
 
Real RecentStepLength () const
 Returns the average magnitude of the last n steps (but not more than the number we have stored). More...
 
void DoStep (Real function_value, const VectorBase< Real > &gradient)
 The user calls this function to provide the class with the function and gradient info at the point GetProposedValue(). More...
 
void DoStep (Real function_value, const VectorBase< Real > &gradient, const VectorBase< Real > &diag_approx_2nd_deriv)
 The user can call this version of DoStep() if it is desired to set some kind of approximate Hessian on this iteration. More...
 

Private Types

enum  ComputationState { kBeforeStep, kWithinStep }
 "compute p_k <-- - H_k \delta f_k" (i.e. Algorithm 7.4). More...
 
enum  { kWolfeI, kWolfeII, kNone }
 

Private Member Functions

 KALDI_DISALLOW_COPY_AND_ASSIGN (OptimizeLbfgs)
 
MatrixIndexT Dim ()
 
MatrixIndexT M ()
 
SubVector< Real > Y (MatrixIndexT i)
 
SubVector< Real > S (MatrixIndexT i)
 
bool AcceptStep (Real function_value, const VectorBase< Real > &gradient)
 
void Restart (const VectorBase< Real > &x, Real function_value, const VectorBase< Real > &gradient)
 
void ComputeNewDirection (Real function_value, const VectorBase< Real > &gradient)
 
void ComputeHifNeeded (const VectorBase< Real > &gradient)
 
void StepSizeIteration (Real function_value, const VectorBase< Real > &gradient)
 
void RecordStepLength (Real s)
 

Private Attributes

LbfgsOptions opts_
 
SignedMatrixIndexT k_
 
ComputationState computation_state_
 
bool H_was_set_
 
Vector< Real > x_
 
Vector< Real > new_x_
 
Vector< Real > best_x_
 
Vector< Real > deriv_
 
Vector< Real > temp_
 
Real f_
 
Real best_f_
 
Real d_
 
int num_wolfe_i_failures_
 
int num_wolfe_ii_failures_
 
enum kaldi::OptimizeLbfgs:: { ... }  last_failure_type_
 
Vector< Real > H_
 
Matrix< Real > data_
 
Vector< Real > rho_
 
std::vector< Real > step_lengths_
 

Detailed Description

template<typename Real>
class kaldi::OptimizeLbfgs< Real >

Definition at line 121 of file optimization.h.

Member Enumeration Documentation

◆ anonymous enum

anonymous enum
private
Enumerator
kWolfeI 
kWolfeII 
kNone 

Definition at line 223 of file optimization.h.

◆ ComputationState

enum ComputationState
private

"compute p_k <-- - H_k \delta f_k" (i.e. Algorithm 7.4).

Enumerator
kBeforeStep 
kWithinStep 

Definition at line 173 of file optimization.h.

173  {
174  kBeforeStep,
175  kWithinStep, // This means we're within the step-size computation, and
176  // have not yet done the 1st function evaluation.
177  };

Constructor & Destructor Documentation

◆ OptimizeLbfgs()

OptimizeLbfgs ( const VectorBase< Real > &  x,
const LbfgsOptions opts 
)

Initializer takes the starting value of x.

Definition at line 35 of file optimization.cc.

References OptimizeLbfgs< Real >::best_f_, OptimizeLbfgs< Real >::best_x_, OptimizeLbfgs< Real >::data_, OptimizeLbfgs< Real >::deriv_, VectorBase< Real >::Dim(), OptimizeLbfgs< Real >::f_, KALDI_ASSERT, LbfgsOptions::m, LbfgsOptions::minimize, OptimizeLbfgs< Real >::new_x_, OptimizeLbfgs< Real >::rho_, OptimizeLbfgs< Real >::temp_, and OptimizeLbfgs< Real >::x_.

36  :
37  opts_(opts), k_(0), computation_state_(kBeforeStep), H_was_set_(false) {
38  KALDI_ASSERT(opts.m > 0); // dimension.
39  MatrixIndexT dim = x.Dim();
40  KALDI_ASSERT(dim > 0);
41  x_ = x; // this is the value of x_k
42  new_x_ = x; // this is where we'll evaluate the function next.
43  deriv_.Resize(dim);
44  temp_.Resize(dim);
45  data_.Resize(2 * opts.m, dim);
46  rho_.Resize(opts.m);
47  // Just set f_ to some invalid value, as we haven't yet set it.
48  f_ = (opts.minimize ? 1 : -1 ) * std::numeric_limits<Real>::infinity();
49  best_f_ = f_;
50  best_x_ = x_;
51 }
Vector< Real > x_
Definition: optimization.h:210
ComputationState computation_state_
Definition: optimization.h:205
SignedMatrixIndexT k_
Definition: optimization.h:202
Vector< Real > temp_
Definition: optimization.h:215
int32 MatrixIndexT
Definition: matrix-common.h:98
Vector< Real > deriv_
Definition: optimization.h:214
Vector< Real > new_x_
Definition: optimization.h:211
LbfgsOptions opts_
Definition: optimization.h:201
Vector< Real > best_x_
Definition: optimization.h:212
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
Vector< Real > rho_
Definition: optimization.h:230
Matrix< Real > data_
Definition: optimization.h:228

Member Function Documentation

◆ AcceptStep()

bool AcceptStep ( Real  function_value,
const VectorBase< Real > &  gradient 
)
private

Definition at line 173 of file optimization.cc.

References VectorBase< Real >::AddVec(), VectorBase< Real >::CopyFromVec(), OptimizeLbfgs< Real >::deriv_, OptimizeLbfgs< Real >::f_, OptimizeLbfgs< Real >::k_, KALDI_VLOG, LbfgsOptions::m, LbfgsOptions::minimize, OptimizeLbfgs< Real >::new_x_, VectorBase< Real >::Norm(), OptimizeLbfgs< Real >::opts_, OptimizeLbfgs< Real >::RecordStepLength(), OptimizeLbfgs< Real >::rho_, OptimizeLbfgs< Real >::S(), kaldi::VecVec(), OptimizeLbfgs< Real >::x_, and OptimizeLbfgs< Real >::Y().

Referenced by OptimizeLbfgs< Real >::StepSizeIteration().

174  {
175  // Save s_k = x_{k+1} - x_{k}, and y_k = \nabla f_{k+1} - \nabla f_k.
176  SubVector<Real> s = S(k_), y = Y(k_);
177  s.CopyFromVec(new_x_);
178  s.AddVec(-1.0, x_); // s = new_x_ - x_.
179  y.CopyFromVec(gradient);
180  y.AddVec(-1.0, deriv_); // y = gradient - deriv_.
181 
182  // Warning: there is a division in the next line. This could
183  // generate inf or nan, but this wouldn't necessarily be an error
184  // at this point because for zero step size or derivative we should
185  // terminate the iterations. But this is up to the calling code.
186  Real prod = VecVec(y, s);
187  rho_(k_ % opts_.m) = 1.0 / prod;
188  Real len = s.Norm(2.0);
189 
190  if ((opts_.minimize && prod <= 1.0e-20) || (!opts_.minimize && prod >= -1.0e-20)
191  || len == 0.0)
192  return false; // This will force restart.
193 
194  KALDI_VLOG(3) << "Accepted step; length was " << len
195  << ", prod was " << prod;
196  RecordStepLength(len);
197 
198  // store x_{k+1} and the function value f_{k+1}.
199  x_.CopyFromVec(new_x_);
200  f_ = function_value;
201  k_++;
202 
203  return true; // We successfully accepted the step.
204 }
Vector< Real > x_
Definition: optimization.h:210
SignedMatrixIndexT k_
Definition: optimization.h:202
SubVector< Real > S(MatrixIndexT i)
Definition: optimization.h:184
Vector< Real > deriv_
Definition: optimization.h:214
Vector< Real > new_x_
Definition: optimization.h:211
LbfgsOptions opts_
Definition: optimization.h:201
void RecordStepLength(Real s)
SubVector< Real > Y(MatrixIndexT i)
Definition: optimization.h:181
#define KALDI_VLOG(v)
Definition: kaldi-error.h:156
Real VecVec(const VectorBase< Real > &a, const VectorBase< Real > &b)
Returns dot product between v1 and v2.
Definition: kaldi-vector.cc:37
Vector< Real > rho_
Definition: optimization.h:230

◆ ComputeHifNeeded()

void ComputeHifNeeded ( const VectorBase< Real > &  gradient)
private

Definition at line 70 of file optimization.cc.

References LbfgsOptions::first_step_impr, LbfgsOptions::first_step_learning_rate, LbfgsOptions::first_step_length, OptimizeLbfgs< Real >::H_, OptimizeLbfgs< Real >::H_was_set_, OptimizeLbfgs< Real >::k_, KALDI_ASSERT, KALDI_ISINF, KALDI_ISNAN, KALDI_WARN, LbfgsOptions::minimize, VectorBase< Real >::Norm(), OptimizeLbfgs< Real >::opts_, OptimizeLbfgs< Real >::S(), kaldi::VecVec(), OptimizeLbfgs< Real >::x_, and OptimizeLbfgs< Real >::Y().

Referenced by OptimizeLbfgs< Real >::ComputeNewDirection().

70  {
71  if (k_ == 0) {
72  if (H_.Dim() == 0) {
73  // H was never set up. Set it up for the first time.
74  Real learning_rate;
75  if (opts_.first_step_length > 0.0) { // this takes
76  // precedence over first_step_learning_rate, if set.
77  // We are setting up H for the first time.
78  Real gradient_length = gradient.Norm(2.0);
79  learning_rate = (gradient_length > 0.0 ?
80  opts_.first_step_length / gradient_length :
81  1.0);
82  } else if (opts_.first_step_impr > 0.0) {
83  Real gradient_length = gradient.Norm(2.0);
84  learning_rate = (gradient_length > 0.0 ?
85  opts_.first_step_impr / (gradient_length * gradient_length) :
86  1.0);
87  } else {
88  learning_rate = opts_.first_step_learning_rate;
89  }
90  H_.Resize(x_.Dim());
91  KALDI_ASSERT(learning_rate > 0.0);
92  H_.Set(opts_.minimize ? learning_rate : -learning_rate);
93  }
94  } else { // k_ > 0
95  if (!H_was_set_) { // The user never specified an approximate
96  // diagonal inverse Hessian.
97  // Set it using formula 7.20: H_k^{(0)} = \gamma_k I, where
98  // \gamma_k = s_{k-1}^T y_{k-1} / y_{k-1}^T y_{k-1}
99  SubVector<Real> y_km1 = Y(k_-1);
100  double gamma_k = VecVec(S(k_-1), y_km1) / VecVec(y_km1, y_km1);
101  if (KALDI_ISNAN(gamma_k) || KALDI_ISINF(gamma_k)) {
102  KALDI_WARN << "NaN encountered in L-BFGS (already converged?)";
103  gamma_k = (opts_.minimize ? 1.0 : -1.0);
104  }
105  H_.Set(gamma_k);
106  }
107  }
108 }
Vector< Real > x_
Definition: optimization.h:210
#define KALDI_ISINF
Definition: kaldi-math.h:73
float first_step_learning_rate
Definition: optimization.h:87
SignedMatrixIndexT k_
Definition: optimization.h:202
SubVector< Real > S(MatrixIndexT i)
Definition: optimization.h:184
Vector< Real > H_
Definition: optimization.h:226
#define KALDI_WARN
Definition: kaldi-error.h:150
LbfgsOptions opts_
Definition: optimization.h:201
SubVector< Real > Y(MatrixIndexT i)
Definition: optimization.h:181
#define KALDI_ISNAN
Definition: kaldi-math.h:72
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
Real VecVec(const VectorBase< Real > &a, const VectorBase< Real > &b)
Returns dot product between v1 and v2.
Definition: kaldi-vector.cc:37

◆ ComputeNewDirection()

void ComputeNewDirection ( Real  function_value,
const VectorBase< Real > &  gradient 
)
private

Definition at line 114 of file optimization.cc.

References VectorBase< Real >::AddVec(), VectorBase< Real >::AddVecVec(), OptimizeLbfgs< Real >::computation_state_, OptimizeLbfgs< Real >::ComputeHifNeeded(), LbfgsOptions::d, OptimizeLbfgs< Real >::d_, OptimizeLbfgs< Real >::deriv_, OptimizeLbfgs< Real >::f_, OptimizeLbfgs< Real >::H_, rnnlm::i, OptimizeLbfgs< Real >::k_, KALDI_ASSERT, KALDI_WARN, OptimizeLbfgs< Real >::kBeforeStep, OptimizeLbfgs< Real >::kNone, OptimizeLbfgs< Real >::kWithinStep, OptimizeLbfgs< Real >::last_failure_type_, OptimizeLbfgs< Real >::M(), LbfgsOptions::minimize, OptimizeLbfgs< Real >::new_x_, OptimizeLbfgs< Real >::num_wolfe_i_failures_, OptimizeLbfgs< Real >::num_wolfe_ii_failures_, OptimizeLbfgs< Real >::opts_, OptimizeLbfgs< Real >::rho_, OptimizeLbfgs< Real >::S(), VectorBase< Real >::SetZero(), kaldi::VecVec(), OptimizeLbfgs< Real >::x_, and OptimizeLbfgs< Real >::Y().

Referenced by OptimizeLbfgs< Real >::DoStep(), OptimizeLbfgs< Real >::Restart(), and OptimizeLbfgs< Real >::StepSizeIteration().

115  {
117  SignedMatrixIndexT m = M(), k = k_;
118  ComputeHifNeeded(gradient);
119  // The rest of this is computing p_k <-- - H_k \nabla f_k using Algorithm
120  // 7.4 of N&W.
121  Vector<Real> &q(deriv_), &r(new_x_); // Use deriv_ as a temporary place to put
122  // q, and new_x_ as a temporay place to put r.
123  // The if-statement below is just to get rid of spurious warnings from
124  // valgrind about memcpy source and destination overlap, since sometimes q and
125  // gradient are the same variable.
126  if (&q != &gradient)
127  q.CopyFromVec(gradient); // q <-- \nabla f_k.
128  Vector<Real> alpha(m);
129  // for i = k - 1, k - 2, ... k - m
130  for (SignedMatrixIndexT i = k - 1;
131  i >= std::max(k - m, static_cast<SignedMatrixIndexT>(0));
132  i--) {
133  alpha(i % m) = rho_(i % m) * VecVec(S(i), q); // \alpha_i <-- \rho_i s_i^T q.
134  q.AddVec(-alpha(i % m), Y(i)); // q <-- q - \alpha_i y_i
135  }
136  r.SetZero();
137  r.AddVecVec(1.0, H_, q, 0.0); // r <-- H_k^{(0)} q.
138  // for k = k - m, k - m + 1, ... , k - 1
139  for (SignedMatrixIndexT i = std::max(k - m, static_cast<SignedMatrixIndexT>(0));
140  i < k;
141  i++) {
142  Real beta = rho_(i % m) * VecVec(Y(i), r); // \beta <-- \rho_i y_i^T r
143  r.AddVec(alpha(i % m) - beta, S(i)); // r <-- r + s_i (\alpha_i - \beta)
144  }
145 
146  { // TEST. Note, -r will be the direction.
147  Real dot = VecVec(gradient, r);
148  if ((opts_.minimize && dot < 0) || (!opts_.minimize && dot > 0))
149  KALDI_WARN << "Step direction has the wrong sign! Routine will fail.";
150  }
151 
152  // Now we're out of Alg. 7.4 and back into Alg. 7.5.
153  // Alg. 7.4 returned r (using new_x_ as the location), and with \alpha_k = 1
154  // as the initial guess, we're setting x_{k+1} = x_k + \alpha_k p_k, with
155  // p_k = -r [hence the statement new_x_.Scale(-1.0)]., and \alpha_k = 1.
156  // This is the first place we'll get the user to evaluate the function;
157  // any backtracking (or acceptance of that step) occurs inside StepSizeIteration.
158  // We're still within iteration k; we haven't yet finalized the step size.
159  new_x_.Scale(-1.0);
160  new_x_.AddVec(1.0, x_);
161  if (&deriv_ != &gradient)
162  deriv_.CopyFromVec(gradient);
163  f_ = function_value;
164  d_ = opts_.d;
169 }
Vector< Real > x_
Definition: optimization.h:210
ComputationState computation_state_
Definition: optimization.h:205
SignedMatrixIndexT k_
Definition: optimization.h:202
enum kaldi::OptimizeLbfgs::@0 last_failure_type_
SubVector< Real > S(MatrixIndexT i)
Definition: optimization.h:184
Vector< Real > deriv_
Definition: optimization.h:214
Vector< Real > new_x_
Definition: optimization.h:211
Vector< Real > H_
Definition: optimization.h:226
#define KALDI_WARN
Definition: kaldi-error.h:150
LbfgsOptions opts_
Definition: optimization.h:201
MatrixIndexT M()
Definition: optimization.h:180
void ComputeHifNeeded(const VectorBase< Real > &gradient)
Definition: optimization.cc:70
SubVector< Real > Y(MatrixIndexT i)
Definition: optimization.h:181
int32 SignedMatrixIndexT
Definition: matrix-common.h:99
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185
Real VecVec(const VectorBase< Real > &a, const VectorBase< Real > &b)
Returns dot product between v1 and v2.
Definition: kaldi-vector.cc:37
Vector< Real > rho_
Definition: optimization.h:230

◆ Dim()

MatrixIndexT Dim ( )
inlineprivate

Definition at line 179 of file optimization.h.

179 { return x_.Dim(); }
Vector< Real > x_
Definition: optimization.h:210

◆ DoStep() [1/2]

void DoStep ( Real  function_value,
const VectorBase< Real > &  gradient 
)

The user calls this function to provide the class with the function and gradient info at the point GetProposedValue().

If this point is outside the constraints you can set function_value to {+infinity,-infinity} for {minimization,maximization} problems. In this case the gradient, and also the second derivative (if you call the second overloaded version of this function) will be ignored.

Definition at line 383 of file optimization.cc.

References OptimizeLbfgs< Real >::best_f_, OptimizeLbfgs< Real >::best_x_, OptimizeLbfgs< Real >::computation_state_, OptimizeLbfgs< Real >::ComputeNewDirection(), OptimizeLbfgs< Real >::kBeforeStep, LbfgsOptions::minimize, OptimizeLbfgs< Real >::new_x_, OptimizeLbfgs< Real >::opts_, and OptimizeLbfgs< Real >::StepSizeIteration().

Referenced by kaldi::nnet2::CombineNnets(), kaldi::nnet2::CombineNnetsA(), LogisticRegression::DoStep(), OptimizeLbfgs< Real >::DoStep(), FastNnetCombiner::FastNnetCombiner(), kaldi::nnet2::ShrinkNnet(), and kaldi::UnitTestLbfgs().

384  {
385  if (opts_.minimize ? function_value < best_f_ : function_value > best_f_) {
386  best_f_ = function_value;
387  best_x_.CopyFromVec(new_x_);
388  }
390  ComputeNewDirection(function_value, gradient);
391  else // kWithinStep{1,2,3}
392  StepSizeIteration(function_value, gradient);
393 }
ComputationState computation_state_
Definition: optimization.h:205
Vector< Real > new_x_
Definition: optimization.h:211
void StepSizeIteration(Real function_value, const VectorBase< Real > &gradient)
LbfgsOptions opts_
Definition: optimization.h:201
Vector< Real > best_x_
Definition: optimization.h:212
void ComputeNewDirection(Real function_value, const VectorBase< Real > &gradient)

◆ DoStep() [2/2]

void DoStep ( Real  function_value,
const VectorBase< Real > &  gradient,
const VectorBase< Real > &  diag_approx_2nd_deriv 
)

The user can call this version of DoStep() if it is desired to set some kind of approximate Hessian on this iteration.

Note: it is a prerequisite that diag_approx_2nd_deriv must be strictly positive (minimizing), or negative (maximizing).

Definition at line 396 of file optimization.cc.

References OptimizeLbfgs< Real >::best_f_, OptimizeLbfgs< Real >::best_x_, OptimizeLbfgs< Real >::DoStep(), OptimizeLbfgs< Real >::H_, OptimizeLbfgs< Real >::H_was_set_, KALDI_ASSERT, VectorBase< Real >::Max(), VectorBase< Real >::Min(), LbfgsOptions::minimize, OptimizeLbfgs< Real >::new_x_, and OptimizeLbfgs< Real >::opts_.

398  {
399  if (opts_.minimize ? function_value < best_f_ : function_value > best_f_) {
400  best_f_ = function_value;
401  best_x_.CopyFromVec(new_x_);
402  }
403  if (opts_.minimize) {
404  KALDI_ASSERT(diag_approx_2nd_deriv.Min() > 0.0);
405  } else {
406  KALDI_ASSERT(diag_approx_2nd_deriv.Max() < 0.0);
407  }
408  H_was_set_ = true;
409  H_.CopyFromVec(diag_approx_2nd_deriv);
410  H_.InvertElements();
411  DoStep(function_value, gradient);
412 }
void DoStep(Real function_value, const VectorBase< Real > &gradient)
The user calls this function to provide the class with the function and gradient info at the point Ge...
Vector< Real > new_x_
Definition: optimization.h:211
Vector< Real > H_
Definition: optimization.h:226
LbfgsOptions opts_
Definition: optimization.h:201
Vector< Real > best_x_
Definition: optimization.h:212
#define KALDI_ASSERT(cond)
Definition: kaldi-error.h:185

◆ GetProposedValue()

const VectorBase<Real>& GetProposedValue ( ) const
inline

This returns the value at which the function wants us to compute the objective function and gradient.

Definition at line 134 of file optimization.h.

References KALDI_DISALLOW_COPY_AND_ASSIGN.

Referenced by kaldi::nnet2::CombineNnets(), kaldi::nnet2::CombineNnetsA(), LogisticRegression::DoStep(), FastNnetCombiner::FastNnetCombiner(), kaldi::nnet2::ShrinkNnet(), and kaldi::UnitTestLbfgs().

134 { return new_x_; }
Vector< Real > new_x_
Definition: optimization.h:211

◆ GetValue()

const VectorBase< Real > & GetValue ( Real *  objf_value = NULL) const

This returns the value of the variable x that has the best objective function so far, and the corresponding objective function value if requested.

This would typically be called only at the end.

Definition at line 416 of file optimization.cc.

References OptimizeLbfgs< Real >::best_f_, and OptimizeLbfgs< Real >::best_x_.

Referenced by kaldi::nnet2::CombineNnets(), kaldi::nnet2::CombineNnetsA(), FastNnetCombiner::FastNnetCombiner(), kaldi::nnet2::ShrinkNnet(), LogisticRegression::TrainParameters(), and kaldi::UnitTestLbfgs().

416  {
417  if (objf_value != NULL) *objf_value = best_f_;
418  return best_x_;
419 }
Vector< Real > best_x_
Definition: optimization.h:212

◆ KALDI_DISALLOW_COPY_AND_ASSIGN()

KALDI_DISALLOW_COPY_AND_ASSIGN ( OptimizeLbfgs< Real >  )
private

◆ M()

MatrixIndexT M ( )
inlineprivate

Definition at line 180 of file optimization.h.

Referenced by OptimizeLbfgs< Real >::ComputeNewDirection(), and kaldi::LinearCgd().

180 { return opts_.m; }
LbfgsOptions opts_
Definition: optimization.h:201

◆ RecentStepLength()

Real RecentStepLength ( ) const

Returns the average magnitude of the last n steps (but not more than the number we have stored).

Before we have taken any steps, returns +infinity. Note: if the most recent step length was 0, it returns 0, regardless of the other step lengths. This makes it suitable as a convergence test (else we'd generate NaN's).

Definition at line 55 of file optimization.cc.

References rnnlm::i, rnnlm::n, and OptimizeLbfgs< Real >::step_lengths_.

Referenced by kaldi::UnitTestLbfgs().

55  {
56  size_t n = step_lengths_.size();
57  if (n == 0) return std::numeric_limits<Real>::infinity();
58  else {
59  if (n >= 2 && step_lengths_[n-1] == 0.0 && step_lengths_[n-2] == 0.0)
60  return 0.0; // two zeros in a row means repeated restarts, which is
61  // a loop. Short-circuit this by returning zero.
62  Real avg = 0.0;
63  for (size_t i = 0; i < n; i++)
64  avg += step_lengths_[i] / n;
65  return avg;
66  }
67 }
std::vector< Real > step_lengths_
Definition: optimization.h:232
struct rnnlm::@11::@12 n

◆ RecordStepLength()

void RecordStepLength ( Real  s)
private

Definition at line 207 of file optimization.cc.

References LbfgsOptions::avg_step_length, OptimizeLbfgs< Real >::opts_, and OptimizeLbfgs< Real >::step_lengths_.

Referenced by OptimizeLbfgs< Real >::AcceptStep(), and OptimizeLbfgs< Real >::Restart().

207  {
208  step_lengths_.push_back(s);
209  if (step_lengths_.size() > static_cast<size_t>(opts_.avg_step_length))
210  step_lengths_.erase(step_lengths_.begin(), step_lengths_.begin() + 1);
211 }
std::vector< Real > step_lengths_
Definition: optimization.h:232
LbfgsOptions opts_
Definition: optimization.h:201

◆ Restart()

void Restart ( const VectorBase< Real > &  x,
Real  function_value,
const VectorBase< Real > &  gradient 
)
private

Definition at line 215 of file optimization.cc.

References VectorBase< Real >::AddVec(), OptimizeLbfgs< Real >::computation_state_, OptimizeLbfgs< Real >::ComputeNewDirection(), VectorBase< Real >::CopyFromVec(), OptimizeLbfgs< Real >::f_, OptimizeLbfgs< Real >::k_, OptimizeLbfgs< Real >::kBeforeStep, OptimizeLbfgs< Real >::new_x_, VectorBase< Real >::Norm(), OptimizeLbfgs< Real >::RecordStepLength(), OptimizeLbfgs< Real >::temp_, and OptimizeLbfgs< Real >::x_.

Referenced by OptimizeLbfgs< Real >::StepSizeIteration().

217  {
218  // Note: we will consider restarting (the transition of x_ -> x)
219  // as a step, even if it has zero step size. This is necessary in
220  // order for convergence to be detected.
221  {
222  Vector<Real> &diff(temp_);
223  diff.CopyFromVec(x);
224  diff.AddVec(-1.0, x_);
225  RecordStepLength(diff.Norm(2.0));
226  }
227  k_ = 0; // Restart the iterations! [But note that the Hessian,
228  // whatever it was, stays as before.]
229  if (&x_ != &x)
230  x_.CopyFromVec(x);
231  new_x_.CopyFromVec(x);
232  f_ = f;
234  ComputeNewDirection(f, gradient);
235 }
Vector< Real > x_
Definition: optimization.h:210
ComputationState computation_state_
Definition: optimization.h:205
SignedMatrixIndexT k_
Definition: optimization.h:202
Vector< Real > temp_
Definition: optimization.h:215
Vector< Real > new_x_
Definition: optimization.h:211
void RecordStepLength(Real s)
void ComputeNewDirection(Real function_value, const VectorBase< Real > &gradient)

◆ S()

SubVector<Real> S ( MatrixIndexT  i)
inlineprivate

Definition at line 184 of file optimization.h.

References data_.

Referenced by OptimizeLbfgs< Real >::AcceptStep(), OptimizeLbfgs< Real >::ComputeHifNeeded(), and OptimizeLbfgs< Real >::ComputeNewDirection().

184  {
185  return SubVector<Real>(data_, (i % M()) * 2 + 1); // vector s_i
186  }
MatrixIndexT M()
Definition: optimization.h:180
Matrix< Real > data_
Definition: optimization.h:228

◆ StepSizeIteration()

void StepSizeIteration ( Real  function_value,
const VectorBase< Real > &  gradient 
)
private

Definition at line 238 of file optimization.cc.

References OptimizeLbfgs< Real >::AcceptStep(), LbfgsOptions::c1, LbfgsOptions::c2, OptimizeLbfgs< Real >::computation_state_, OptimizeLbfgs< Real >::ComputeNewDirection(), OptimizeLbfgs< Real >::d_, OptimizeLbfgs< Real >::deriv_, OptimizeLbfgs< Real >::f_, OptimizeLbfgs< Real >::k_, KALDI_VLOG, OptimizeLbfgs< Real >::kBeforeStep, OptimizeLbfgs< Real >::kWolfeI, OptimizeLbfgs< Real >::kWolfeII, OptimizeLbfgs< Real >::last_failure_type_, LbfgsOptions::max_line_search_iters, LbfgsOptions::minimize, OptimizeLbfgs< Real >::new_x_, OptimizeLbfgs< Real >::num_wolfe_i_failures_, OptimizeLbfgs< Real >::num_wolfe_ii_failures_, OptimizeLbfgs< Real >::opts_, OptimizeLbfgs< Real >::Restart(), OptimizeLbfgs< Real >::temp_, kaldi::VecVec(), and OptimizeLbfgs< Real >::x_.

Referenced by OptimizeLbfgs< Real >::DoStep().

239  {
240  KALDI_VLOG(3) << "In step size iteration, function value changed "
241  << f_ << " to " << function_value;
242 
243  // We're in some part of the backtracking, and the user is providing
244  // the objective function value and gradient.
245  // We're checking two conditions: Wolfe i) [the Armijo rule] and
246  // Wolfe ii).
247 
248  // The Armijo rule (when minimizing) is:
249  // f(k_k + \alpha_k p_k) <= f(x_k) + c_1 \alpha_k p_k^T \nabla f(x_k), where
250  // \nabla means the derivative.
251  // Below, "temp" is the RHS of this equation, where (\alpha_k p_k) equals
252  // (new_x_ - x_); we don't store \alpha or p_k separately, they are implicit
253  // as the difference new_x_ - x_.
254 
255  // Below, pf is \alpha_k p_k^T \nabla f(x_k).
256  Real pf = VecVec(new_x_, deriv_) - VecVec(x_, deriv_);
257  Real temp = f_ + opts_.c1 * pf;
258 
259  bool wolfe_i_ok;
260  if (opts_.minimize) wolfe_i_ok = (function_value <= temp);
261  else wolfe_i_ok = (function_value >= temp);
262 
263  // Wolfe condition ii) can be written as:
264  // p_k^T \nabla f(x_k + \alpha_k p_k) >= c_2 p_k^T \nabla f(x_k)
265  // p2f equals \alpha_k p_k^T \nabla f(x_k + \alpha_k p_k), where
266  // (\alpha_k p_k^T) is (new_x_ - x_).
267  // Note that in our version of Wolfe condition (ii) we have an extra
268  // factor alpha, which doesn't affect anything.
269  Real p2f = VecVec(new_x_, gradient) - VecVec(x_, gradient);
270  //eps = (sizeof(Real) == 4 ? 1.0e-05 : 1.0e-10) *
271  //(std::abs(p2f) + std::abs(pf));
272  bool wolfe_ii_ok;
273  if (opts_.minimize) wolfe_ii_ok = (p2f >= opts_.c2 * pf);
274  else wolfe_ii_ok = (p2f <= opts_.c2 * pf);
275 
276  enum { kDecrease, kNoChange } d_action; // What do do with d_: leave it alone,
277  // or take the square root.
278  enum { kAccept, kDecreaseStep, kIncreaseStep, kRestart } iteration_action;
279  // What we'll do in the overall iteration: accept this value, DecreaseStep
280  // (reduce the step size), IncreaseStep (increase the step size), or kRestart
281  // (set k back to zero). Generally when we can't get both conditions to be
282  // true with a reasonable period of time, it makes sense to restart, because
283  // probably we've almost converged and got into numerical issues; from here
284  // we'll just produced NaN's. Restarting is a safe thing to do and the outer
285  // code will quickly detect convergence.
286 
287  d_action = kNoChange; // the default.
288 
289  if (wolfe_i_ok && wolfe_ii_ok) {
290  iteration_action = kAccept;
291  d_action = kNoChange; // actually doesn't matter, it'll get reset.
292  } else if (!wolfe_i_ok) {
293  // If wolfe i) [the Armijo rule] failed then we went too far (or are
294  // meeting numerical problems).
295  if (last_failure_type_ == kWolfeII) { // Last time we failed it was Wolfe ii).
296  // When we switch between them we decrease d.
297  d_action = kDecrease;
298  }
299  iteration_action = kDecreaseStep;
302  } else if (!wolfe_ii_ok) {
303  // Curvature condition failed -> we did not go far enough.
304  if (last_failure_type_ == kWolfeI) // switching between wolfe i and ii failures->
305  d_action = kDecrease; // decrease value of d.
306  iteration_action = kIncreaseStep;
309  }
310 
311  // Test whether we've been switching too many times betwen wolfe i) and ii)
312  // failures, or overall have an excessive number of failures. We just give up
313  // and restart L-BFGS. Probably we've almost converged.
316  KALDI_VLOG(2) << "Too many steps in line search -> restarting.";
317  iteration_action = kRestart;
318  }
319 
320  if (d_action == kDecrease)
321  d_ = std::sqrt(d_);
322 
323  KALDI_VLOG(3) << "d = " << d_ << ", iter = " << k_ << ", action = "
324  << (iteration_action == kAccept ? "accept" :
325  (iteration_action == kDecreaseStep ? "decrease" :
326  (iteration_action == kIncreaseStep ? "increase" :
327  "reject")));
328 
329  // Note: even if iteration_action != Restart at this point,
330  // some code below may set it to Restart.
331  if (iteration_action == kAccept) {
332  if (AcceptStep(function_value, gradient)) { // If we did
333  // not detect a problem while accepting the step..
335  ComputeNewDirection(function_value, gradient);
336  } else {
337  KALDI_VLOG(2) << "Restarting L-BFGS computation; problem found while "
338  << "accepting step.";
339  iteration_action = kRestart; // We'll have to restart now.
340  }
341  }
342  if (iteration_action == kDecreaseStep || iteration_action == kIncreaseStep) {
343  Real scale = (iteration_action == kDecreaseStep ? 1.0 / d_ : d_);
344  temp_.CopyFromVec(new_x_);
345  new_x_.Scale(scale);
346  new_x_.AddVec(1.0 - scale, x_);
347  if (new_x_.ApproxEqual(temp_, 0.0)) {
348  // Value of new_x_ did not change at all --> we must restart.
349  KALDI_VLOG(3) << "Value of x did not change, when taking step; "
350  << "will restart computation.";
351  iteration_action = kRestart;
352  }
353  if (new_x_.ApproxEqual(temp_, 1.0e-08) &&
354  std::abs(f_ - function_value) < 1.0e-08 *
355  std::abs(f_) && iteration_action == kDecreaseStep) {
356  // This is common and due to roundoff.
357  KALDI_VLOG(3) << "We appear to be backtracking while we are extremely "
358  << "close to the old value; restarting.";
359  iteration_action = kRestart;
360  }
361 
362  if (iteration_action == kDecreaseStep) {
365  } else {
368  }
369  }
370  if (iteration_action == kRestart) {
371  // We want to restart the computation. If the objf at new_x_ is
372  // better than it was at x_, we'll start at new_x_, else at x_.
373  bool use_newx;
374  if (opts_.minimize) use_newx = (function_value < f_);
375  else use_newx = (function_value > f_);
376  KALDI_VLOG(3) << "Restarting computation.";
377  if (use_newx) Restart(new_x_, function_value, gradient);
378  else Restart(x_, f_, deriv_);
379  }
380 }
Vector< Real > x_
Definition: optimization.h:210
ComputationState computation_state_
Definition: optimization.h:205
void Restart(const VectorBase< Real > &x, Real function_value, const VectorBase< Real > &gradient)
SignedMatrixIndexT k_
Definition: optimization.h:202
Vector< Real > temp_
Definition: optimization.h:215
enum kaldi::OptimizeLbfgs::@0 last_failure_type_
Vector< Real > deriv_
Definition: optimization.h:214
Vector< Real > new_x_
Definition: optimization.h:211
LbfgsOptions opts_
Definition: optimization.h:201
void ComputeNewDirection(Real function_value, const VectorBase< Real > &gradient)
#define KALDI_VLOG(v)
Definition: kaldi-error.h:156
bool AcceptStep(Real function_value, const VectorBase< Real > &gradient)
Real VecVec(const VectorBase< Real > &a, const VectorBase< Real > &b)
Returns dot product between v1 and v2.
Definition: kaldi-vector.cc:37

◆ Y()

SubVector<Real> Y ( MatrixIndexT  i)
inlineprivate

Definition at line 181 of file optimization.h.

References data_.

Referenced by OptimizeLbfgs< Real >::AcceptStep(), OptimizeLbfgs< Real >::ComputeHifNeeded(), and OptimizeLbfgs< Real >::ComputeNewDirection().

181  {
182  return SubVector<Real>(data_, (i % M()) * 2); // vector y_i
183  }
MatrixIndexT M()
Definition: optimization.h:180
Matrix< Real > data_
Definition: optimization.h:228

Member Data Documentation

◆ best_f_

◆ best_x_

◆ computation_state_

◆ d_

Real d_
private

◆ data_

Matrix<Real> data_
private

Definition at line 228 of file optimization.h.

Referenced by OptimizeLbfgs< Real >::OptimizeLbfgs().

◆ deriv_

◆ f_

◆ H_

◆ H_was_set_

bool H_was_set_
private

◆ k_

◆ last_failure_type_

◆ new_x_

◆ num_wolfe_i_failures_

int num_wolfe_i_failures_
private

◆ num_wolfe_ii_failures_

int num_wolfe_ii_failures_
private

◆ opts_

◆ rho_

◆ step_lengths_

std::vector<Real> step_lengths_
private

◆ temp_

◆ x_


The documentation for this class was generated from the following files: