COMPUTING FOR ENGINEERS AND SCIENTISTS (600.105)

Lecture 23: Optimisation Continued And Simulated Annealing

PDF version for printing


Enhancements to 1D optimisation

There are two common approaches to improving the golden section search.

Both improvements revolve around trying to find a more direct way to the minimum.

The two improvements are:

  1. Use parabolic interpolation through the three bracketing points and solve for the minimum point on the parabola to estimate where the next point on the function should be tried.
  2. Use derivative information to estimate where the slope of the function goes to zero (and hence where the minimum is).

Using parabolic interpolation (Brent's method)

Fit a parabola through the three bracketing points f(a), f(b), and f(c).

The point where the slope of the parabola goes to zero is given by the equation:


1
(b - a)2 (f(b) - f(c)) - (b - c)2 (f(b) - f(a))
m = b
 

2
(b - a) (f(b) - f(c)) - (b - c) (f(b) - f(a))

The value m indicates the location where the fitted parabola's slope is zero.

We can then use this location to construct the fourth point from which we select the next three bracketing points.

Hopefully this will result in much faster convergence on the minimum.

The problem is that m is the location where the slope of the parabola approaches zero - this could be a maximum rather than a minimum. Compare the two cases below:

Careful housekeeping is needed to efficiently switch from parabolic interpolation when it is found that it is not working, to the slow but sure golden section search, and back again when conditions are favourable.

The general strategy / heuristic is to fit a parabola through the best three points, accepting this parabolic step if the position m is between the two outer bracketing points and implies a movement that is less than half the movement from the second last step.

This last condition ensures that the parabolic steps are actually converging on something and are not simply "bouncing around" in some non-convergent cycle.

The idea behind measuring the movement relative to the second last step (rather than the last) is that it is better not to "punish" the algorithm for a single bad step if it can make it up on the next one.

If this heuristic isn't met, a golden section step is used instead.

Making use of derivatives

The sign of the derivative at the central bracketing point indicates in which interval the next test point should be taken.

The algorithm: take the derivative at the middle point and at the second best point and linearly interpolate (or extrapolate) to estimate the point where the derivative is zero.

This interpolated point is then used to form a new bracketing interval.

Similar tests for choosing to accept the new step are applied as before:

  1. The new point must be within the two outer bracketing points.
  2. The new point must be in the correct interval.
  3. The step size must be less than half the size of the second last step.

Multidimensional optimisation

The problem: minimise the value of a function f(x,y,z,...) in N-dimensional space.

In 1D optimisation we were able to bracket a minimum.

There is no equivalent concept in higher dimensions that can be readily constructed.

All we can do is provide our algorithm with a guess for the starting point and hope that it can find its way downhill to a minimum.

A number of multidimensional minimisation algorithms are based on applying 1D minimisation methods in various directions in our N-dimensional search space.

The main way that algorithms differ is how they choose the direction to minimise in and when to change direction.

The simplest approach is to minimise the function in each of the coordinate directions in turn, cycling through them until we find that the function stops decreasing.

This is in fact a fairly "dumb" method, but it often works quite well.

Why is it dumb?

Consider a function defined in 2D with a valley at 45 degrees to the x-axis.

If the valley axis is at 45 degrees, but we can step only horizontally or vertically, we end up taking a long and tortuous path to the minimum.

More powerful algorithms determine the "good directions" to search in, taking a more direct route down the narrow valleys.

Some algorithms also determine non-interfering directions with the property that minimising in one of these directions is not spoilt by subsequent minimisation in another.

Matlab provides some minimisation algorithms:

fmin 1D minimisation.
fminu N-dimensional minimisation based on 1D searches in selected directions.
fmins N-dimensional minimisation using the simplex method.

The N-dimensional search algorithms have many, many options.

Simulated annealing

Simulated annealing is an optimisation technique that allows one to work with functions that have many local minima.

You are not guaranteed to find the global minimum, but you have a good chance of getting past most of the local minima that would trap conventional algorithms.

This algorithm was developed by Kirkpatrick, Gelatt, and Vecchi in the late 70's at IBM.

The method arises from an analogy with the way that molten metal cools and anneals.

At high temperatures, the molecules in a liquid move freely.

As the liquid is cooled, thermal mobility is lost.

Eventually the atoms align themselves and form a crystal that is completely ordered.

If the cooling process is slow, allowing ample time for molecules to lose mobility and find a low energy configuration, the resultant crystal is in a state of minimum energy for the system.

If you cool metal very quickly (by quenching it), it does not end up in this minimum energy state - the crystals do not have time to form and the system is left at a high energy level.

In some sense, conventional minimisation algorithms are a bit like quenching - a quick greedy search that stops at a local, not global, minimum.

A system in thermal equilibrium will have atoms at a wide range of energy levels according to the Boltzmann distribution.

Atoms can move from low energy levels to high ones and vice-versa.

This process is emulated in the following algorithm. Given the objective function, E (energy) we wish to minimise, and a temperature, T:

1. Start at some system configuration, then, make random movements from this configuration always accepting movements (i.e. the changes of E, dE) that take you to lower level, and sometimes accepting movements that take you up using a probability given by the Boltzman factor (p=exp^(-(dE/T))).

2. Repeat process 1 sufficient times to give a good sampling probability.

3. Repeat processes 1 and 2 for each of the decreasing T (cooling steps).

There are many other optimisation algorithms including genetic algorithms that breed the optimal solution!.
Copyright © 2000 - 2005
School of Computer Science & Software Engineering
The University of Western Australia
Last modified: 2003-07-25 16:06:54.000000000 +0800