Lecture 14: Monte Carlo Methods II — MCMC & Optimization

Lecture 14: Monte Carlo Methods II — MCMC & Optimization#

Computational Physics — Spring 2026

From Sampling to Solving: MCMC and Simulated Annealing#

In Lectures 12–13, we built a powerful Monte Carlo toolkit:

Random number generation and the Metropolis algorithm (Lecture 12)
MC integration, importance sampling, rejection sampling, and VMC (Lecture 13)

Today we take three major steps forward:

The fundamental problem of statistical mechanics — why MCMC is essential for computing thermal averages, with a hands-on ideal gas simulation
Formal MCMC theory — Markov chains, detailed balance, Metropolis-Hastings, and rigorous diagnostics
Simulated annealing — turning the Metropolis algorithm into a powerful optimization tool

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats, ndimage
from matplotlib.colors import ListedColormap

plt.rcParams['figure.figsize'] = [6, 4]
plt.rcParams['font.size'] = 9

print("Ready for MCMC and optimization!")

Ready for MCMC and optimization!

I. MCMC: From Statistical Mechanics to Markov Chains#

Monte Carlo in Statistical Mechanics#

Monte Carlo simulation is the name given to any computer simulation that uses random numbers to simulate a random physical process. Although it finds uses across all of physics, nowhere is it more central than in statistical mechanics — a field fundamentally built on randomness.

The Fundamental Problem#

The central task of statistical mechanics is to compute the thermal average of a quantity $X$ in a system at temperature $T$. A system in thermal equilibrium visits state $i$ (with energy $E_i$) with the Boltzmann probability:

\[P(E_i) = \frac{e^{-\beta E_i}}{Z}, \qquad Z = \sum_i e^{-\beta E_i}\]

where $\beta = 1/k_B T$. The thermal average is then:

\[\langle X \rangle = \sum_i X_i \, P(E_i)\]

In a few special cases (e.g., quantum harmonic oscillator) we can evaluate this sum exactly. But for most real systems, the sum is over a astronomically large number of states — a single mole of a two-state system has $2^{10^{23}}$ states!

Why Not Just Sample Randomly?#

We could try the MC integration approach from Lecture 13: pick $N$ states at random and average. But there’s a fatal problem — the Boltzmann factor $e^{-\beta E_i}$ is exponentially small for most states (those with $E_i \gg k_B T$). Almost every randomly chosen state contributes essentially nothing to the sum.

The solution is importance sampling combined with MCMC: instead of choosing states uniformly, we construct a Markov chain that visits states with probability $\propto e^{-\beta E_i}$. Then every sample contributes equally to the average!

The Metropolis Algorithm: A Quick Recap#

The Metropolis algorithm (introduced in Lecture 12 for the Ising model) is the workhorse of MCMC. It constructs a Markov chain — a sequence of states where each state depends only on the previous one — that samples from the Boltzmann distribution.

The recipe:

Start in some state $i$
Propose a move to state $j$ (e.g., flip one spin, change one quantum number)
Compute the acceptance probability: $$P_{\text{accept}} = \begin{cases} 1 & \text{if } E_j < E_i \\ e^{-\beta(E_j - E_i)} & \text{if } E_j \geq E_i \end{cases}$$
Accept the move with probability $P_{\text{accept}}$; otherwise stay at $i$
Measure the quantity of interest and repeat from step 2

Why it works: downhill moves (lower energy) are always accepted; uphill moves are accepted with a probability that decreases with the energy cost and increases with temperature. After a long burn-in, the chain samples states according to $P \propto e^{-\beta E}$.

Let’s see this in action on a concrete physics problem before formalizing the theory.

Application: Monte Carlo Simulation of an Ideal Gas#

A particle of mass $m$ in a cubic box of side $L$ has quantum states labeled by three integers $n_x, n_y, n_z = 1, 2, 3, \ldots$ with energies:

\[E(n_x, n_y, n_z) = \frac{\pi^2 \hbar^2}{2mL^2}(n_x^2 + n_y^2 + n_z^2)\]

An ideal gas of $N$ non-interacting atoms has total energy $E = \sum_{i=1}^N E(n_x^i, n_y^i, n_z^i)$.

We use the Metropolis algorithm to compute the internal energy $\langle E \rangle$ at temperature $k_BT$:

State: the set of all quantum numbers $\{n_x^i, n_y^i, n_z^i\}$ for $N$ atoms
Move: pick a random atom and a random direction, increment or decrement its quantum number by 1
Accept/reject: via the Metropolis rule with $\Delta E$ from the change

We work in natural units: $L = 1$, $\hbar = 1$, $m = 1$.

# Monte Carlo simulation of an ideal gas (units: L=1, hbar=1, m=1)
np.random.seed(42)

kbt = 300         # Temperature in natural units
N = 1000          # Number of atoms
steps = 1000000   # MC steps

# Quantum numbers: each atom has (n_x, n_y, n_z), all start at 1
n = np.ones([N, 3], int)

# Initial energy: E = (pi^2/2) * sum(n_x^2 + n_y^2 + n_z^2)
E = np.pi**2 / 2 * np.sum(n**2)
ehist = [E]

# Metropolis MC loop
for k in range(steps):
    # Choose a random atom and random direction (x, y, or z)
    i = np.random.randint(N)
    j = np.random.randint(3)

    # Propose: increment or decrement quantum number by 1
    if np.random.random() < 0.5:
        dn = 1
        dE = (2 * n[i, j] + 1) * np.pi**2 / 2
    else:
        dn = -1
        dE = (-2 * n[i, j] + 1) * np.pi**2 / 2

    # Accept/reject (also reject if n would go below 1)
    if dn == 1 or n[i, j] > 1:
        if np.random.random() < np.exp(-dE / kbt):
            n[i, j] += dn
            E += dE

    ehist.append(E)

# Plot energy trace
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

axes[0].plot(ehist, lw=0.3, alpha=0.7, color='blue')
axes[0].set_xlabel('MC step')
axes[0].set_ylabel('Total energy E')
axes[0].set_title(f'Ideal Gas MC: N = {N} atoms, $k_BT$ = {kbt}')
axes[0].grid(alpha=0.3)

# Histogram of energy after burn-in
burn = len(ehist) // 5
axes[1].hist(ehist[burn:], bins=60, density=True, alpha=0.7, color='skyblue', edgecolor='black')
axes[1].set_xlabel('Total energy E')
axes[1].set_ylabel('Probability density')
axes[1].set_title('Energy Distribution (after burn-in)')
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

E_avg = np.mean(ehist[burn:])
E_exact = 3 * N * kbt / 2  # Classical equipartition: E = (3/2) N k_B T
print(f"MC average energy:   <E> = {E_avg:.1f}")
print(f"Equipartition value: (3/2)NkT = {E_exact:.1f}")
print(f"Relative error: {abs(E_avg - E_exact)/E_exact:.2%}")
print(f"\nThe Metropolis algorithm recovers the equipartition theorem!")

_images/8eedd03f0c6b05e9fce92ab500fd3c95cec1f2a9d6193ddb8a10b897d48480dc.png

MC average energy:   <E> = 477790.9
Equipartition value: (3/2)NkT = 450000.0
Relative error: 6.18%

The Metropolis algorithm recovers the equipartition theorem!

Formalizing MCMC: Markov Chains and Detailed Balance#

The ideal gas simulation above works — but why? Why does the Metropolis algorithm produce samples from the Boltzmann distribution? The answer lies in the mathematical theory of Markov chains.

A Markov chain is a sequence of random states $x_0, x_1, x_2, \ldots$ where the probability of the next state depends only on the current state:

\[P(x_{n+1} \mid x_n, x_{n-1}, \ldots, x_0) = P(x_{n+1} \mid x_n) \equiv T(x_n \to x_{n+1})\]

The transition probability $T(x \to x')$ defines the rules of the chain.

Stationary Distribution#

Under mild conditions (ergodicity, aperiodicity), a Markov chain converges to a unique stationary distribution $\pi(x)$ satisfying:

\[\pi(x') = \sum_x T(x \to x') \, \pi(x)\]

This is the distribution of states after running the chain for a long time, regardless of where we started.

Detailed Balance#

A sufficient (not necessary) condition for $\pi(x)$ to be stationary is detailed balance:

\[\pi(x) \, T(x \to x') = \pi(x') \, T(x' \to x)\]

Proof that detailed balance implies stationarity:

Sum both sides over $x$:

\[\sum_x \pi(x) \, T(x \to x') = \sum_x \pi(x') \, T(x' \to x) = \pi(x') \sum_x T(x' \to x) = \pi(x')\]

where we used $\sum_x T(x' \to x) = 1$ (probability conservation). This is exactly the stationarity condition. $\square$

The MCMC strategy: Design $T(x \to x')$ so that detailed balance is satisfied with $\pi(x)$ equal to the distribution we want to sample from. The Metropolis algorithm is one such design — and the ideal gas simulation above is a specific instance of it!

The Metropolis-Hastings Algorithm#

In Lecture 12, we used the Metropolis algorithm with symmetric proposals. The Metropolis-Hastings algorithm generalizes this to asymmetric proposals.

Algorithm:

From current state $x$, propose $x' \sim q(x' \mid x)$ (proposal distribution)
Compute the acceptance ratio: $$\alpha = \min\!\left(1, \; \frac{\pi(x') \, q(x \mid x')}{\pi(x) \, q(x' \mid x)}\right)$$
Accept $x'$ with probability $\alpha$; otherwise stay at $x$

Special case — Metropolis: When $q$ is symmetric ($q(x' \mid x) = q(x \mid x')$), the proposal terms cancel:

\[\alpha = \min\!\left(1, \; \frac{\pi(x')}{\pi(x)}\right)\]

This is exactly the Metropolis algorithm from Lecture 12!

Proof: Metropolis-Hastings Satisfies Detailed Balance#

The transition probability is $T(x \to x') = q(x' \mid x) \cdot \alpha(x \to x')$. Check detailed balance:

\[\pi(x) \, q(x' \mid x) \, \alpha(x \to x')\]

Case 1: $\frac{\pi(x') q(x \mid x')}{\pi(x) q(x' \mid x)} \leq 1$, so $\alpha(x \to x') = \frac{\pi(x') q(x \mid x')}{\pi(x) q(x' \mid x)}$ and $\alpha(x' \to x) = 1$.

\[\pi(x) \, q(x' \mid x) \cdot \frac{\pi(x') \, q(x \mid x')}{\pi(x) \, q(x' \mid x)} = \pi(x') \, q(x \mid x') \cdot 1 \;\checkmark\]

Case 2: The reverse. By symmetry of the argument, it also holds. $\square$

Demo: Sampling a Bimodal Distribution#

Let’s sample from a mixture of two Gaussians:

\[\pi(x) = 0.4 \, \mathcal{N}(-3, 1) + 0.6 \, \mathcal{N}(3, 0.5)\]

This tests the chain’s ability to jump between well-separated modes.

def target_bimodal(x):
    """Unnormalized target: mixture of two Gaussians."""
    return 0.4 * stats.norm.pdf(x, -3, 1) + 0.6 * stats.norm.pdf(x, 3, 0.5)

def metropolis_hastings(target, n_samples, x0=0.0, proposal_sigma=1.0):
    """Metropolis-Hastings with symmetric Gaussian proposal."""
    samples = np.zeros(n_samples)
    samples[0] = x0
    n_accept = 0

    for i in range(1, n_samples):
        x_current = samples[i-1]
        # Symmetric Gaussian proposal
        x_proposed = x_current + np.random.normal(0, proposal_sigma)

        # Acceptance ratio (proposal terms cancel for symmetric q)
        alpha = min(1.0, target(x_proposed) / target(x_current) )

        if np.random.uniform() < alpha:
            samples[i] = x_proposed
            n_accept += 1
        else:
            samples[i] = x_current

    return samples, n_accept / n_samples

np.random.seed(42)

# Compare three proposal widths
sigmas = [0.3, 2.0, 10.0]
results = {}

for sigma in sigmas:
    samples, acc = metropolis_hastings(target_bimodal, 50000, x0=0.0, proposal_sigma=sigma)
    results[sigma] = (samples, acc)
    print(f"sigma = {sigma:5.1f}: acceptance rate = {acc:.1%}")

sigma =   0.3: acceptance rate = 88.2%
sigma =   2.0: acceptance rate = 39.2%
sigma =  10.0: acceptance rate = 14.8%

fig, axes = plt.subplots(3, 2, figsize=(12, 9))

x_plot = np.linspace(-7, 6, 300)
p_true = target_bimodal(x_plot)

for row, sigma in enumerate(sigmas):
    samples, acc = results[sigma]

    # Left: trace plot (first 3000 steps)
    axes[row, 0].plot(samples[:3000], lw=0.5, alpha=0.7)
    axes[row, 0].set_ylabel('x')
    axes[row, 0].set_title(f'Trace: $\\sigma_q$ = {sigma}, accept = {acc:.0%}')
    axes[row, 0].grid(alpha=0.3)
    if row == 2:
        axes[row, 0].set_xlabel('Step')

    # Right: histogram vs true distribution
    axes[row, 1].hist(samples[5000:], bins=80, density=True, alpha=0.6,
                      color='skyblue', edgecolor='black')
    axes[row, 1].plot(x_plot, p_true, 'r-', lw=2, label='True $\\pi(x)$')
    axes[row, 1].set_ylabel('Density')
    axes[row, 1].set_title(f'Distribution: $\\sigma_q$ = {sigma}')
    axes[row, 1].legend(fontsize=8)
    axes[row, 1].grid(alpha=0.3)
    if row == 2:
        axes[row, 1].set_xlabel('x')

plt.tight_layout()
plt.show()

print("Observations:")
print("  σ = 0.3: High acceptance, but chain gets STUCK in one mode (poor mixing)")
print("  σ = 2.0: Moderate acceptance, good exploration of both modes (Goldilocks!)")
print("  σ = 10:  Low acceptance, chain jumps erratically (wasteful)")
print("\nRule of thumb: aim for 20-50% acceptance rate")

_images/b2af3ef055879c5975b293adb14110e9ce675139ec9f33f72366ceb49f8f71a8.png

Observations:
  σ = 0.3: High acceptance, but chain gets STUCK in one mode (poor mixing)
  σ = 2.0: Moderate acceptance, good exploration of both modes (Goldilocks!)
  σ = 10:  Low acceptance, chain jumps erratically (wasteful)

Rule of thumb: aim for 20-50% acceptance rate

II. MCMC Diagnostics#

How Do We Know Our Chain Has Converged?#

MCMC produces correlated samples, unlike rejection sampling which gives i.i.d. samples. We need diagnostic tools to verify that:

The chain has reached equilibrium (burn-in is complete)
The samples are sufficiently uncorrelated for reliable estimates
The chain has explored the full distribution (mixing)

Tool 1: Burn-in#

The initial portion of the chain is influenced by the starting point and should be discarded. This is called burn-in.

Tool 2: Autocorrelation Time#

The autocorrelation function measures how correlated samples are at lag $k$:

\[C(k) = \frac{\langle x_n \, x_{n+k} \rangle - \langle x_n \rangle^2}{\langle x_n^2 \rangle - \langle x_n \rangle^2}\]

The integrated autocorrelation time is:

\[\tau_{\text{int}} = 1 + 2 \sum_{k=1}^{\infty} C(k)\]

This tells us how many MCMC steps correspond to one independent sample.

Tool 3: Effective Sample Size#

From $N$ MCMC samples, the number of effectively independent samples is:

\[N_{\text{eff}} = \frac{N}{2\,\tau_{\text{int}}}\]

The true standard error of the mean is:

\[\text{SE} = \frac{\sigma}{\sqrt{N_{\text{eff}}}} = \sigma \sqrt{\frac{2\,\tau_{\text{int}}}{N}}\]

def autocorrelation(x, max_lag=500):
    """Compute normalized autocorrelation function."""
    x = x - np.mean(x)
    n = len(x)
    var = np.var(x)
    if var == 0:
        return np.zeros(max_lag)

    acf = np.zeros(max_lag)
    for k in range(max_lag):
        acf[k] = np.mean(x[:n-k] * x[k:]) / var
    return acf

def integrated_autocorrelation_time(acf):
    """Estimate integrated autocorrelation time with automatic truncation."""
    # Truncate when ACF first goes negative (simple heuristic)
    tau = 0.5  # Start at 0.5 from the k=0 term
    for k in range(1, len(acf)):
        if acf[k] < 0:
            break
        tau += acf[k]
    return tau

def effective_sample_size(samples, max_lag=500):
    """Compute effective sample size from MCMC chain."""
    acf = autocorrelation(samples, max_lag)
    tau = integrated_autocorrelation_time(acf)
    n_eff = len(samples) / (2 * tau)
    return n_eff, tau, acf

# Analyze the three chains from above
print(f"{'σ_q':>5s}  {'N':>7s}  {'τ_int':>7s}  {'N_eff':>8s}  {'Efficiency':>11s}")
print("-" * 45)

for sigma in sigmas:
    samples = results[sigma][0][5000:]  # discard burn-in
    n_eff, tau, acf = effective_sample_size(samples)
    efficiency = n_eff / len(samples)
    print(f"{sigma:5.1f}  {len(samples):7d}  {tau:7.1f}  {n_eff:8.0f}  {efficiency:11.1%}")

  σ_q        N    τ_int     N_eff   Efficiency
---------------------------------------------
  0.3    45000    441.3        51         0.1%
  2.0    45000     42.3       532         1.2%
 10.0    45000      9.0      2492         5.5%

fig, axes = plt.subplots(1, 3, figsize=(14, 4))

colors = ['blue', 'green', 'red']

for idx, sigma in enumerate(sigmas):
    samples = results[sigma][0][5000:]
    acf = autocorrelation(samples, max_lag=200)
    axes[0].plot(acf[:200], color=colors[idx], lw=1.5, label=f'$\sigma_q$ = {sigma}')

axes[0].axhline(0, color='k', linestyle='-', alpha=0.3)
axes[0].set_xlabel('Lag k')
axes[0].set_ylabel('C(k)')
axes[0].set_title('Autocorrelation Function')
axes[0].legend(fontsize=8)
axes[0].grid(alpha=0.3)
axes[0].set_xlim(0, 200)

# Running mean convergence
for idx, sigma in enumerate(sigmas):
    samples = results[sigma][0][5000:]
    running_mean = np.cumsum(samples) / np.arange(1, len(samples)+1)
    axes[1].plot(running_mean, color=colors[idx], lw=0.8, alpha=0.8, label=f'$\sigma_q$ = {sigma}')

# True mean of bimodal distribution
x_fine = np.linspace(-8, 8, 1000)
p_fine = target_bimodal(x_fine)
true_mean = np.trapezoid(x_fine * p_fine, x_fine) / np.trapezoid(p_fine, x_fine)
axes[1].axhline(true_mean, color='k', linestyle='--', lw=2, label=f'True mean = {true_mean:.2f}')
axes[1].set_xlabel('Sample index')
axes[1].set_ylabel('Running mean')
axes[1].set_title('Running Mean Convergence')
axes[1].legend(fontsize=7)
axes[1].grid(alpha=0.3)

# Effective sample size vs proposal width
sigma_scan = np.linspace(0.1, 15, 30)
n_eff_scan = []

np.random.seed(0)
for s in sigma_scan:
    samp, _ = metropolis_hastings(target_bimodal, 10000, x0=0.0, proposal_sigma=s)
    n_eff, _, _ = effective_sample_size(samp[2000:])
    n_eff_scan.append(n_eff)

axes[2].plot(sigma_scan, n_eff_scan, 'ko-', ms=4, lw=1.5)
axes[2].set_xlabel('Proposal width $\sigma_q$')
axes[2].set_ylabel('$N_{\text{eff}}$')
axes[2].set_title('Effective Sample Size vs Proposal Width')
axes[2].grid(alpha=0.3)

idx_opt = np.argmax(n_eff_scan)
axes[2].axvline(sigma_scan[idx_opt], color='r', linestyle='--',
                label=f'Optimal $\sigma_q$ ≈ {sigma_scan[idx_opt]:.1f}')
axes[2].legend(fontsize=8)

plt.tight_layout()
plt.show()

III. Simulated Annealing#

From Sampling to Optimization#

The Metropolis algorithm samples from the Boltzmann distribution $P(x) \propto e^{-E(x)/T}$. At low temperature, this distribution concentrates near the global minimum of $E(x)$.

Idea: If we slowly cool the system during the Metropolis simulation, the chain will explore at high $T$ (avoiding local minima) and then settle into the global minimum at low $T$.

This is simulated annealing — inspired by the physical process of annealing metals.

Algorithm#

Start at high temperature $T_0$
Run Metropolis steps at current $T$
Reduce $T$ according to a cooling schedule: $T_{k+1} = \beta \cdot T_k$ (with $\beta < 1$)
Repeat until $T$ is very small

Cooling Schedule#

Common choices:

Geometric: $T_k = T_0 \cdot \beta^k$ (most common, $\beta \approx 0.95$–$0.99$)
Linear: $T_k = T_0 - k \cdot \Delta T$
Logarithmic: $T_k = T_0 / \ln(1 + k)$ (provably optimal but very slow)

Theorem (Hajek, 1988): Simulated annealing converges to the global minimum with probability 1 if $T_k \geq \frac{c}{\ln(k)}$ for sufficiently large $c$. In practice, geometric cooling works well.

Demo: Finding the Global Minimum of a Multimodal Function#

\[E(x) = x^2 + 5\sin^2(3x) + 3\cos^2(5x)\]

This has many local minima. A simple gradient descent would get stuck, but simulated annealing can find the global minimum.

def energy_multimodal(x):
    """Multimodal energy landscape."""
    return x**2 + 5 * np.sin(3*x)**2 + 3 * np.cos(5*x)**2

def simulated_annealing(energy_func, x0, T0, beta, n_steps, step_size=1.0):
    """Simulated annealing with geometric cooling."""
    x = x0
    E = energy_func(x)
    x_best, E_best = x, E

    T = T0
    trajectory = [(x, E, T)]

    for step in range(n_steps):
        # Propose new state
        x_new = x + np.random.normal(0, step_size)
        E_new = energy_func(x_new)
        dE = E_new - E

        # Metropolis acceptance at temperature T
        if dE < 0 or np.random.uniform() < np.exp(-dE / T):
            x, E = x_new, E_new

            # Track best solution
            if E < E_best:
                x_best, E_best = x, E

        # Cool down
        T *= beta
        trajectory.append((x, E, T))

    return x_best, E_best, np.array(trajectory)

np.random.seed(42)

# Run SA from a bad starting point
x_best, E_best, traj = simulated_annealing(
    energy_multimodal, x0=4.0, T0=20.0, beta=0.999, n_steps=10000, step_size=1.0
)

print(f"Starting point: x = 4.0, E = {energy_multimodal(4.0):.4f}")
print(f"SA found: x = {x_best:.4f}, E = {E_best:.4f}")

# Brute force global minimum
x_grid = np.linspace(-5, 5, 10000)
E_grid = energy_multimodal(x_grid)
idx_min = np.argmin(E_grid)
print(f"Global minimum: x = {x_grid[idx_min]:.4f}, E = {E_grid[idx_min]:.4f}")

Starting point: x = 4.0, E = 17.9391
SA found: x = 0.9729, E = 1.2597
Global minimum: x = -0.9726, E = 1.2598

fig, axes = plt.subplots(1, 3, figsize=(14, 4))

# Left: energy landscape with SA trajectory
x_plot = np.linspace(-5, 5, 500)
axes[0].plot(x_plot, energy_multimodal(x_plot), 'k-', lw=2, label='$E(x)$')
axes[0].scatter(traj[::50, 0], traj[::50, 1], c=np.arange(0, len(traj), 50),
                cmap='coolwarm', s=5, alpha=0.5, label='SA path')
axes[0].plot(x_best, E_best, 'r*', ms=20, label=f'Best: x={x_best:.2f}')
axes[0].set_xlabel('x')
axes[0].set_ylabel('E(x)')
axes[0].set_title('Energy Landscape + SA Trajectory')
axes[0].legend(fontsize=8)
axes[0].grid(alpha=0.3)

# Middle: energy vs step
axes[1].plot(traj[:, 1], lw=0.5, alpha=0.7, color='blue', label='Current E')
best_so_far = np.minimum.accumulate(traj[:, 1])
axes[1].plot(best_so_far, 'r-', lw=2, label='Best E so far')
axes[1].set_xlabel('Step')
axes[1].set_ylabel('Energy')
axes[1].set_title('Energy vs Annealing Step')
axes[1].legend(fontsize=8)
axes[1].grid(alpha=0.3)

# Right: temperature schedule
axes[2].semilogy(traj[:, 2], 'g-', lw=1.5)
axes[2].set_xlabel('Step')
axes[2].set_ylabel('Temperature T')
axes[2].set_title('Cooling Schedule')
axes[2].grid(alpha=0.3)

plt.tight_layout()
plt.show()

_images/b4276d665e66e36bcb3ef7f89429fed18c498c1d55d5edae69f57a3c7c781681.png

Application: Traveling Salesman Problem (TSP)#

The Traveling Salesman Problem is a classic combinatorial optimization problem:

Given $N$ cities with known positions, find the shortest tour that visits each city exactly once and returns to the start.

This is NP-hard — the number of possible tours is $(N-1)!/2$. For 20 cities, that’s $\sim 10^{16}$ tours. Brute force is impossible, but simulated annealing can find near-optimal solutions.

Moves: Swap two random cities in the tour, or reverse a random segment.

import math
def tour_distance(cities, tour):
    """Total distance of a tour (closed loop)."""
    d = 0
    n = len(tour)
    for i in range(n):
        c1 = cities[tour[i]]
        c2 = cities[tour[(i+1) % n]]
        d += np.sqrt((c1[0]-c2[0])**2 + (c1[1]-c2[1])**2)
    return d

def tsp_move(tour):
    """Generate a neighboring tour by reversing a random segment."""
    new_tour = tour.copy()
    n = len(tour)
    i, j = sorted(np.random.choice(n, 2, replace=False))
    new_tour[i:j+1] = new_tour[i:j+1][::-1]  # reverse segment
    return new_tour

def tsp_annealing(cities, T0=100.0, beta=0.9995, n_steps=100000):
    """Solve TSP via simulated annealing."""
    n = len(cities)
    tour = list(range(n))
    np.random.shuffle(tour)

    E = tour_distance(cities, tour)
    best_tour, best_E = tour.copy(), E

    T = T0
    energy_history = [E]
    temp_history = [T]

    for step in range(n_steps):
        new_tour = tsp_move(tour)
        E_new = tour_distance(cities, new_tour)
        dE = E_new - E

        if dE < 0 or np.random.uniform() < np.exp(-dE / T):
            tour = new_tour
            E = E_new

            if E < best_E:
                best_tour = tour.copy()
                best_E = E

        T *= beta

        if step % 100 == 0:
            energy_history.append(E)
            temp_history.append(T)

    return best_tour, best_E, energy_history, temp_history

# Generate 25 random cities
np.random.seed(42)
n_cities = 25
cities = np.random.uniform(0, 100, size=(n_cities, 2))

# Initial random tour
initial_tour = list(range(n_cities))
initial_distance = tour_distance(cities, initial_tour)

# Solve with SA
best_tour, best_distance, energy_hist, temp_hist = tsp_annealing(
    cities, T0=100, beta=0.9995, n_steps=20000
)

print(f"Number of cities: {n_cities}")
print(f"Possible tours: {math.factorial(n_cities-1)//2:.2e}")
print(f"Initial tour distance: {initial_distance:.2f}")
print(f"SA best tour distance: {best_distance:.2f}")
print(f"Improvement: {(initial_distance - best_distance)/initial_distance:.1%}")

Number of cities: 25
Possible tours: 3.10e+23
Initial tour distance: 1373.41
SA best tour distance: 407.31
Improvement: 70.3%

def plot_tour(ax, cities, tour, title, color='blue'):
    """Plot a TSP tour."""
    tour_closed = tour + [tour[0]]  # return to start
    coords = cities[tour_closed]
    ax.plot(coords[:, 0], coords[:, 1], 'o-', color=color, ms=6, lw=1.5, alpha=0.7)
    ax.plot(cities[tour[0], 0], cities[tour[0], 1], 's', color='red', ms=10, label='Start')
    ax.set_title(title)
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.set_aspect('equal')
    ax.grid(alpha=0.3)

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Left: initial tour
plot_tour(axes[0], cities, initial_tour, f'Initial Tour (d = {initial_distance:.1f})', 'gray')

# Middle: optimized tour
plot_tour(axes[1], cities, best_tour, f'SA Optimized (d = {best_distance:.1f})', 'blue')

# Right: energy convergence
axes[2].plot(energy_hist, lw=1, alpha=0.7, color='blue', label='Tour distance')
axes[2].axhline(best_distance, color='r', linestyle='--', lw=1.5, label=f'Best = {best_distance:.1f}')
axes[2].set_xlabel('Step (×100)')
axes[2].set_ylabel('Tour distance')
axes[2].set_title('SA Convergence for TSP')
axes[2].legend(fontsize=8)
axes[2].grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("\nNote: SA finds a near-optimal tour in seconds, while brute force would take years!")
print("The key: high T allows uphill moves (escaping local minima), low T refines the solution.")

_images/56cce6b25a293c2bd905426c0a50827c98968c28833578e1c7c583312a65b5d7.png

Note: SA finds a near-optimal tour in seconds, while brute force would take years!
The key: high T allows uphill moves (escaping local minima), low T refines the solution.