Simulated Annealing


The Simulated Annealing (SA) algorithm is a meta-heuristic, that is, a technique for approximating the global optimum of a function $J(x)$. SA is inspired by the annealing process in metallurgy, where a metal is molded by carefully altering its temperature over time. In this page, you can find a visualization of SA being applied in a clustering application.

Informally, this is how SA works: the algorithm starts with a high temperature and a initial solution $x$. Each iteration, this solution is randomly modified in some way, and the resulting solution is evaluated. If it is better than what we currently have, then great!, just use that instead. However, if the resulting solution is worse than what we currently have, then we still have some probability of accepting that candidate as our current solution; in particular, this probability is affected by the current temperature of the algorithm. Over time, this temperature gets reduced and we become more and more conservative (only accepting better solutions). It is useful to think about the two extremes: if the temperature is $0$, then only better solutions are accepted; if the temperature is $\infty$, then any solution is accepted, no matter how bad.

More formally, the algorithm is divided into inner and outer loops. In each of the $N$ inner loop iterations (line 3), we apply a random perturbation to the current solution $x$ (line 5) and obtain a candidate solution $\hat{x}$. Then, we compare $x$ and $\hat{x}$ (line 6): if $\hat{x}$ is better than $x$ (i.e. if $J(\hat{x}) \leq J(x)$), then we accept $\hat{x}$ as our new current solution $x$ (line 7). But if $\hat{x}$ is worse than $x$, then we only accept $\hat{x}$ with a probability that is proportional to the relative quality of the new solution (i.e. the worse the solution, the lower the probability) and also proportional to a parameter $T$ called temperature (i.e. the higher the temperature, the higher the probability). This probability is captured by the term $e^{(J(x) - J(\hat{x})) / T}$ in line 6 (notice that if $\hat{x}$ is better than $x$, then $J(\hat{x}) \leq J(x)$ and the probability check always returns true).

In each of the $k_{\max}$ outer loop iterations (line 1), the temperature $T$ is reduced in a process called cooling (line 2). This has the effect of making the algorithm more conservative over time: in the beginning the temperature is high, so worse solutions are occasionally accepted; however as the execution progresses, the temperature is reduced and it becomes hard to accept worse solutions.

Finally, the algorithm always checks if a solution $x$ is the best one yet (line 8). If it is, then it is stored as $x_{\min}$ (line 9). This is what the algorithm returns at the end of its execution.


1. $\# \text{clusters}$: , $T_0$: , $k_\text{max}$: , $N$:
2. Should restart clusters that become empty?
3. Random perturbation type:
Single cluster, single coordinate
Single cluster, all coordinates
All clusters, all coordinates

0. Initialize $x \leftarrow \mathbf{0}$, $x_\text{min} = x$
1. For $k = 1 \ldots k_\text{max}$:
2. $T \leftarrow T_0 / \log_2(k + 1)$
3. For $n = 1 \ldots N$:
4. Restart empty clusters
5. Randomly perturb $x$ to obtain $\hat{x}$
6. If $\text{unif}(0, 1) \leq e^{(J(x) - J(\hat{x})) / T}$:
7. $x \leftarrow \hat{x}$
8. If $J(x) \leq J(x_\text{min})$:
9. $x_\text{min} \leftarrow x$

$T$ 0
0 $J(x)$ 0
0 $J(x_{\text{min)}}$0
0 $J(\hat{x})$0