Lecture 19: Global Optimization III#

Bayesian Optimization, Real-World Applications & Grand Comparison#

Section I: Bayesian Optimization#

Motivation: Why Bayesian Optimization?#

Many real-world optimization problems involve expensive function evaluations:

  • DFT calculations: 1-10 hours per evaluation

  • Lab experiments: days to weeks per trial

  • Complex simulations: hours to days per run

Traditional methods (DE, PSO, GA) require hundreds to thousands of evaluations. Bayesian Optimization is designed for expensive functions where we can afford 10-100 evaluations at most.

Key idea: Use completed evaluations to build a probabilistic model (surrogate), then intelligently select the next point to evaluate.

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import differential_evolution

try:
    from skopt import gp_minimize, dump, load
    from skopt.plots import plot_convergence
    from skopt.space import Real
    SKOPT_AVAILABLE = True
except ImportError:
    print('Installing scikit-optimize...')
    import subprocess
    subprocess.check_call(['pip', 'install', 'scikit-optimize', '-q'])
    from skopt import gp_minimize, dump, load
    from skopt.plots import plot_convergence
    from skopt.space import Real
    SKOPT_AVAILABLE = True

print('Imports successful. Scikit-optimize available:', SKOPT_AVAILABLE)
Imports successful. Scikit-optimize available: True

Bayesian Optimization Concept#

How It Works (Conceptually)#

  1. Surrogate Model: Build a Gaussian Process (GP) from observed (x, f(x)) pairs

    • Cheap to evaluate anywhere

    • Provides mean prediction and uncertainty estimate

  2. Acquisition Function: Balance exploration vs. exploitation

    • Favor regions with low predicted value (exploitation)

    • Favor regions with high uncertainty (exploration)

    • Common choice: Expected Improvement (EI)

  3. Repeat: Evaluate function at the point that maximizes acquisition, update GP

Why it’s powerful: Concentrates evaluations where the model is uncertain or promising, avoiding wasted evaluations in clearly suboptimal regions.

Limitation: Scales poorly with dimension (curse of dimensionality). Works best for d <= 20.

Core Idea in Three Steps#

Bayesian Optimization repeats a simple loop:

Model → Decide → Evaluate → Update

Concretely:

  1. Build a surrogate model of the objective from the data collected so far. The standard choice is a Gaussian Process (GP), which gives both a mean prediction \(\mu(\mathbf{x})\) and an uncertainty \(\sigma(\mathbf{x})\) at every point.

  2. Choose the next point by maximising an acquisition function \(\alpha(\mathbf{x})\). The acquisition function balances:

    • exploitation — go where \(\mu\) is low (promising), and

    • exploration — go where \(\sigma\) is high (uncertain).

    The most common choice is Expected Improvement (EI): $\( \mathrm{EI}(\mathbf{x}) \;=\; \mathbb{E}\!\Big[\max\!\big(f_{\min} - f(\mathbf{x}),\; 0\big)\Big] \)\( where \)f_{\min}$ is the best value found so far.

  3. Evaluate the true (expensive) function at the chosen point, add the result to the dataset, and go to step 1.

Because step 1 (fitting the GP) and step 2 (maximising \(\alpha\)) are both cheap, we spend almost all of our budget on step 3 — and we pick the most informative point every time.

What is a Gaussian Process?#

A Gaussian Process is a distribution over functions. Given \(n\) observed points \(\{(\mathbf{x}_i, y_i)\}\), the GP predicts that the function value at a new point \(\mathbf{x}_*\) follows a normal distribution:

\[ f(\mathbf{x}_*) \;\sim\; \mathcal{N}\!\big(\mu(\mathbf{x}_*),\;\sigma^2(\mathbf{x}_*)\big) \]

The mean and variance are computed from the kernel (covariance function). A popular kernel is the Matérn-5/2 kernel:

\[ k(\mathbf{x}, \mathbf{x}') = \sigma_f^2 \left(1 + \frac{\sqrt{5}\,r}{\ell} + \frac{5\,r^2}{3\,\ell^2}\right) \exp\!\left(-\frac{\sqrt{5}\,r}{\ell}\right), \quad r = \|\mathbf{x}-\mathbf{x}'\| \]

Key property: far from any data point, \(\sigma\) is large (high uncertainty); close to data points, \(\sigma\) is small (we have information). This is exactly what we need for exploration vs exploitation.

def expensive_1d(x):
    'Expensive 1D function: Branin-like with noise'
    x = np.atleast_1d(x)[0]
    y = (x - 0.3)**2 * np.sin(10*x) + 0.1*np.sin(x/2)
    return y

# Test the function
x_test = np.linspace(-2, 2, 100)
y_test = np.array([expensive_1d(xi) for xi in x_test])

plt.figure(figsize=(10, 4))
plt.plot(x_test, y_test, 'b-', linewidth=2, label='True function')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.title('Expensive 1D Function')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print('Min value (true):', y_test.min())
print('Location of min:', x_test[y_test.argmin()])
_images/c8c30eaee8af48699a6f4ffc66e7005c63bfb6f536f92c8a01eb49075083e79c.png
Min value (true): -4.913627474829939
Location of min: -2.0
# Bayesian Optimization on 1D function
from skopt.space import Real

print('Running Bayesian Optimization on 1D function...')
result_1d = gp_minimize(
    expensive_1d,
    [Real(-2, 2)],
    n_calls=30,
    random_state=42,
    n_initial_points=3,
    acq_func='EI'
)

print('Minimum found:', result_1d.fun)
print('Location:', result_1d.x[0])
print('Number of calls:', len(result_1d.func_vals))
print('Improvement over random:', y_test.min() - result_1d.fun)
Running Bayesian Optimization on 1D function...
/usr/local/lib/python3.12/dist-packages/skopt/optimizer/optimizer.py:517: UserWarning: The objective has been evaluated at point [-2.0] before, using random point [0.9015916502875299]
  warnings.warn(
/usr/local/lib/python3.12/dist-packages/skopt/optimizer/optimizer.py:517: UserWarning: The objective has been evaluated at point [-2.0] before, using random point [-1.247615444836211]
  warnings.warn(
/usr/local/lib/python3.12/dist-packages/skopt/optimizer/optimizer.py:517: UserWarning: The objective has been evaluated at point [-2.0] before, using random point [-0.68572645514332]
  warnings.warn(
/usr/local/lib/python3.12/dist-packages/skopt/optimizer/optimizer.py:517: UserWarning: The objective has been evaluated at point [-2.0] before, using random point [-0.9358904622190607]
  warnings.warn(
/usr/local/lib/python3.12/dist-packages/skopt/optimizer/optimizer.py:517: UserWarning: The objective has been evaluated at point [-2.0] before, using random point [-0.7219317258464806]
  warnings.warn(
Minimum found: -4.913627474829939
Location: -2.0
Number of calls: 30
Improvement over random: 0.0
# Plot convergence of BO on 1D
plt.figure(figsize=(12, 4))

# Left: convergence plot
plt.subplot(1, 2, 1)
iterations = range(1, len(result_1d.func_vals) + 1)
cumulative_min = np.minimum.accumulate(result_1d.func_vals)
plt.plot(iterations, cumulative_min, 'b-o', linewidth=2, markersize=6)
plt.xlabel('Iteration (Evaluations)')
plt.ylabel('Best Value Found')
plt.title('BO Convergence - 1D')
plt.grid(True, alpha=0.3)

# Right: function with BO samples
plt.subplot(1, 2, 2)
x_test = np.linspace(-2, 2, 200)
y_test = np.array([expensive_1d(xi) for xi in x_test])
plt.plot(x_test, y_test, 'b-', linewidth=2, label='True function')
plt.scatter(result_1d.x_iters, result_1d.func_vals, c='r', s=50, zorder=5, label='BO samples')
plt.axvline(result_1d.x[0], color='g', linestyle='--', label='BO optimum')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.title('Sampled Points vs True Function')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()
_images/dc87565af154d7ddcd73855350fdfcb64e1ce9c46f01c39bb65a4f47139c3387.png
# Visualize GP surrogate at different stages
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import Matern, ConstantKernel

fig, axes = plt.subplots(1, 3, figsize=(15, 4))
stages = [3, 10, 30]

x_plot = np.linspace(-2.5, 2.5, 200).reshape(-1, 1)

for idx, stage in enumerate(stages):
    if stage <= len(result_1d.func_vals):
        X_train = np.array(result_1d.x_iters[:stage]).reshape(-1, 1)
        y_train = np.array(result_1d.func_vals[:stage])

        kernel = ConstantKernel(1.0) * Matern(nu=2.5)
        gp = GaussianProcessRegressor(kernel=kernel, alpha=1e-6, normalize_y=True, n_restarts_optimizer=5)
        gp.fit(X_train, y_train)

        y_pred, y_std = gp.predict(x_plot, return_std=True)

        ax = axes[idx]
        ax.plot(x_plot, y_pred, 'b-', linewidth=2, label='GP mean')
        ax.fill_between(x_plot.ravel(), y_pred - y_std, y_pred + y_std, alpha=0.3, label='+/- 1 std')
        ax.scatter(X_train, y_train, c='r', s=100, zorder=5, label=f'{stage} samples')
        ax.set_xlabel('x')
        ax.set_ylabel('f(x)')
        ax.set_title(f'GP Surrogate after {stage} evals')
        ax.legend(fontsize=9)
        ax.grid(True, alpha=0.3)
        ax.set_ylim(-1.5, 1)

plt.tight_layout()
plt.show()
_images/a6f391cd524e900a8de465926209c13f4dd0506fc5a81808e026d25cc8ffc5f3.png

2D Benchmark: Ackley Function#

Now let’s test BO on a 2D benchmark function where we can visualize everything.

Ackley function: many local minima, one global minimum at (0, 0).

\[ f(x_1, x_2) = -20\,\exp\!\Bigl(-0.2\sqrt{\tfrac{1}{2}(x_1^2+x_2^2)}\Bigr) - \exp\!\Bigl(\tfrac{1}{2}(\cos 2\pi x_1+\cos 2\pi x_2)\Bigr) + 20 + e \]
def ackley(x):
    'Ackley function - many local minima'
    x1, x2 = x[0], x[1]
    term1 = -20 * np.exp(-0.2 * np.sqrt(0.5 * (x1**2 + x2**2)))
    term2 = -np.exp(0.5 * (np.cos(2*np.pi*x1) + np.cos(2*np.pi*x2)))
    return term1 + term2 + 20 + np.e

print('Bayesian Optimization on 2D Ackley function...')
result_2d = gp_minimize(
    ackley,
    [Real(-5, 5), Real(-5, 5)],
    n_calls=50,
    random_state=42,
    n_initial_points=5,
    acq_func='EI'
)

print('Minimum found:', result_2d.fun)
print('Location:', result_2d.x)
print('True minimum: (0, 0) with f=0')
print('Error:', np.linalg.norm(result_2d.x))
Bayesian Optimization on 2D Ackley function...
Minimum found: 0.6736579488564982
Location: [0.07525497527446046, 0.09109865017347474]
True minimum: (0, 0) with f=0
Error: 0.11816207245554211
# Visualize 2D BO results
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

x1_range = np.linspace(-5, 5, 150)
x2_range = np.linspace(-5, 5, 150)
X1, X2 = np.meshgrid(x1_range, x2_range)
Z = np.zeros_like(X1)
for i in range(X1.shape[0]):
    for j in range(X1.shape[1]):
        Z[i, j] = ackley([X1[i, j], X2[i, j]])

ax = axes[0]
contour = ax.contourf(X1, X2, Z, levels=30, cmap='viridis')
plt.colorbar(contour, ax=ax, label='f(x1, x2)')
X_samples = np.array(result_2d.x_iters)
ax.scatter(X_samples[:, 0], X_samples[:, 1], c='r', s=30, zorder=5, label='BO samples')
ax.scatter(result_2d.x[0], result_2d.x[1], c='g', s=300, marker='*', zorder=6, label='BO optimum')
ax.scatter(0, 0, c='yellow', s=300, marker='+', linewidths=3, zorder=6, label='True optimum')
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_title('BO on 2D Ackley (50 evaluations)')
ax.legend()
ax.set_xlim(-5, 5)
ax.set_ylim(-5, 5)

ax = axes[1]
iterations = range(1, len(result_2d.func_vals) + 1)
cumulative_min = np.minimum.accumulate(result_2d.func_vals)
ax.plot(iterations, cumulative_min, 'b-o', linewidth=2, markersize=5)
ax.set_xlabel('Iteration')
ax.set_ylabel('Best Value Found')
ax.set_title('BO Convergence - 2D Ackley')
ax.grid(True, alpha=0.3)
ax.set_yscale('log')

plt.tight_layout()
plt.show()

print(f'BO found minimum {result_2d.fun:.4f} in 50 evaluations')
_images/559c3bf6efe5f2b2424229a1e4b1f3609ef7045325fc43840a414004b6dc615f.png
BO found minimum 0.6737 in 50 evaluations
# Compare BO vs Differential Evolution on 2D Ackley
print('Running Differential Evolution on Ackley (5000 evaluations)...')
result_de = differential_evolution(ackley, bounds=[(-5, 5), (-5, 5)], seed=42, maxiter=5000, popsize=50)

print('\n=== Comparison: BO vs DE ===')
print(f'BO (50 calls):    f = {result_2d.fun:.6f}, x = {result_2d.x}')
print(f'DE (5000 calls):  f = {result_de.fun:.6f}, x = {result_de.x}')
print(f'\nTrue optimum:      f = 0, x = [0, 0]')
print(f'\nBO error in f: {result_2d.fun:.6f}')
print(f'DE error in f: {result_de.fun:.6f}')
print(f'\nBO calls: 50')
print(f'DE calls: {5000}')
print(f'Efficiency ratio: {5000 / 50}x fewer calls with BO')
Running Differential Evolution on Ackley (5000 evaluations)...

=== Comparison: BO vs DE ===
BO (50 calls):    f = 0.673658, x = [0.07525497527446046, 0.09109865017347474]
DE (5000 calls):  f = 0.000000, x = [0. 0.]

True optimum:      f = 0, x = [0, 0]

BO error in f: 0.673658
DE error in f: 0.000000

BO calls: 50
DE calls: 5000
Efficiency ratio: 100.0x fewer calls with BO

Key Insights on Bayesian Optimization#

  1. Sample Efficiency: BO found the global minimum in 50 evaluations, while DE needed 5000+

  2. Trade-off: Computational cost per iteration is higher (fitting GP), but fewer iterations needed

  3. Dimensionality: BO excels in low dimensions (d < 20)

  4. When to use:

    • Expensive function: ✓ (DFT, experiments, simulations)

    • Few evaluations budget: ✓

    • High dimension: ✗ (curse of dimensionality)

    • Many evaluations budget: ✗ (use DE/PSO instead)

  5. In practice: BO is the go-to for real materials science, drug discovery, hyperparameter tuning

Next: Real-world application with actual interatomic potentials and material simulation.

Section II: Real Materials Science - Silicon Crystal with MatterSim#

From Toy Problems to Real Materials#

So far: test functions (Ackley, Rastrigin, synthetic benchmarks)

Now: Real materials science with MatterSim

  • MatterSim: Fast, accurate neural network interatomic potential trained on DFT

  • Costs: hours of DFT computation to milliseconds of NN inference

  • Use case: lattice constant optimization, structure relaxation, defects

Why MatterSim?

  • Runs on GPU or CPU (< 1 sec per structure)

  • Trained on Materials Project + OQMD

  • Covers: Si, C, metals, and mixed systems

  • Much faster than DFT, more reliable than classical LJ

try:
    import ase
    from ase.build import bulk
    from ase.calculators.emt import EMT
    MATTERSIM_AVAILABLE = False
    print('ASE available. Will use EMT fallback.')
except ImportError:
    print('Installing ASE...')
    import subprocess
    subprocess.check_call(['pip', 'install', 'ase', '-q'])
    from ase.build import bulk
    from ase.calculators.emt import EMT
    MATTERSIM_AVAILABLE = False
    print('ASE installed.')

try:
    from mattersim.forcefield import MatterSimCalculator
    MATTERSIM_AVAILABLE = True
    print('MatterSim available!')
except ImportError:
    print('Installing Mattersim...')
    import subprocess
    subprocess.check_call(['pip', 'install', 'mattersim', '-q'])
    from ase.build import bulk
    print('Mattersim installed.')
    MATTERSIM_AVAILABLE = True
    from mattersim.forcefield import MatterSimCalculator
    # print('MatterSim not available. Will use EMT as fallback.')
    # MATTERSIM_AVAILABLE = False
Installing ASE...
ASE installed.
Installing Mattersim...
Mattersim installed.
from ase.build import bulk
from ase.calculators.emt import EMT
import numpy as np

si_bulk = bulk('Si', 'diamond', a=5.43)
print('Si diamond structure:')
print(f'  Lattice constant: {si_bulk.cell[0,1]:.4f} Angstrom')
print(f'  Number of atoms: {len(si_bulk)}')
print(f'  Cell volume: {si_bulk.get_volume():.2f} Angstrom^3')
print(f'  Atomic positions (first 2):')
for i in range(min(2, len(si_bulk))):
    print(f'    Atom {i}: {si_bulk.positions[i]}')

if not MATTERSIM_AVAILABLE:
    si_bulk.set_calculator(EMT())
    print('\nUsing EMT calculator (fallback)')
else:
    si_bulk.set_calculator(MatterSimCalculator())
    print('\nUsing MatterSim calculator')

energy = si_bulk.get_potential_energy()
print(f'\nInitial energy: {energy:.6f} eV')
Si diamond structure:
  Lattice constant: 2.7150 Angstrom
  Number of atoms: 2
  Cell volume: 40.03 Angstrom^3
  Atomic positions (first 2):
    Atom 0: [0. 0. 0.]
    Atom 1: [1.3575 1.3575 1.3575]
2026-04-06 17:33:56.765 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model

Using MatterSim calculator

Initial energy: -10.825025 eV
/tmp/ipykernel_922/2589797036.py:18: FutureWarning: Please use atoms.calc = calc
  si_bulk.set_calculator(MatterSimCalculator())
# Function to compute energy at different lattice constants
def energy_at_lattice_constant(a):
    'Compute Si bulk energy for given lattice constant'
    si = bulk('Si', 'diamond', a=a)
    if not MATTERSIM_AVAILABLE:
        si.set_calculator(EMT())
    else:
        si.set_calculator(MatterSimCalculator())
    return si.get_potential_energy()

print('Testing energy function...')
for a_test in [5.0, 5.43, 5.8]:
    E = energy_at_lattice_constant(a_test)
    print(f'  a = {a_test:.2f} Angstrom: E = {E:.6f} eV')
Testing energy function...
2026-04-06 17:36:02.681 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
  a = 5.00 Angstrom: E = -9.822284 eV
2026-04-06 17:36:02.870 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
/tmp/ipykernel_922/3648716128.py:8: FutureWarning: Please use atoms.calc = calc
  si.set_calculator(MatterSimCalculator())
  a = 5.43 Angstrom: E = -10.825025 eV
2026-04-06 17:36:03.070 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
  a = 5.80 Angstrom: E = -10.515951 eV
# Scan lattice constant
print('Lattice constant scan...')
a_values = np.linspace(5.0, 6.0, 21)
energies = []

for a in a_values:
    E = energy_at_lattice_constant(a)
    energies.append(E)
    print(f'a = {a:.3f}: E = {E:.6f} eV')

energies = np.array(energies)
a_min = a_values[energies.argmin()]
E_min = energies.min()

print(f'\nMinimum energy: {E_min:.6f} eV')
print(f'Optimal lattice constant: {a_min:.4f} Angstrom')
print(f'Literature value: ~5.43 Angstrom')
Lattice constant scan...
2026-04-06 15:46:24.022 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
a = 5.000: E = -9.822284 eV
2026-04-06 15:46:24.229 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
/tmp/ipykernel_922/3648716128.py:8: FutureWarning: Please use atoms.calc = calc
  si.set_calculator(MatterSimCalculator())
a = 5.050: E = -10.052809 eV
2026-04-06 15:46:24.516 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
a = 5.100: E = -10.248643 eV
2026-04-06 15:46:24.869 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
a = 5.150: E = -10.411705 eV
2026-04-06 15:46:25.191 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
a = 5.200: E = -10.544115 eV
2026-04-06 15:46:25.796 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
a = 5.250: E = -10.648217 eV
2026-04-06 15:46:26.495 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
a = 5.300: E = -10.726437 eV
2026-04-06 15:46:27.191 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
a = 5.350: E = -10.781118 eV
2026-04-06 15:46:27.962 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
a = 5.400: E = -10.814449 eV
2026-04-06 15:46:28.479 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
a = 5.450: E = -10.828400 eV
2026-04-06 15:46:29.068 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
a = 5.500: E = -10.824736 eV
2026-04-06 15:46:29.431 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
a = 5.550: E = -10.805080 eV
2026-04-06 15:46:29.688 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
a = 5.600: E = -10.770957 eV
2026-04-06 15:46:30.027 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
a = 5.650: E = -10.723774 eV
2026-04-06 15:46:30.322 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
a = 5.700: E = -10.664797 eV
2026-04-06 15:46:30.576 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
a = 5.750: E = -10.595204 eV
2026-04-06 15:46:30.863 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
a = 5.800: E = -10.515951 eV
2026-04-06 15:46:31.177 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
a = 5.850: E = -10.427797 eV
2026-04-06 15:46:31.534 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
a = 5.900: E = -10.331318 eV
2026-04-06 15:46:31.794 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
a = 5.950: E = -10.226912 eV
2026-04-06 15:46:32.050 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
a = 6.000: E = -10.114921 eV

Minimum energy: -10.828400 eV
Optimal lattice constant: 5.4500 Angstrom
Literature value: ~5.43 Angstrom
# Plot lattice constant scan
plt.figure(figsize=(10, 5))
plt.plot(a_values, energies, 'bo-', linewidth=2, markersize=8)
plt.axvline(a_min, color='r', linestyle='--', linewidth=2, label=f'Minimum at a={a_min:.4f}')
plt.axhline(E_min, color='r', linestyle='--', linewidth=1, alpha=0.5)
plt.xlabel('Lattice Constant (Angstrom)')
plt.ylabel('Energy (eV)')
plt.title('Silicon Diamond Structure: Energy vs Lattice Constant')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

idx_min = energies.argmin()
window = 5
if idx_min - window >= 0 and idx_min + window < len(a_values):
    a_fit = a_values[idx_min - window:idx_min + window + 1]
    E_fit = energies[idx_min - window:idx_min + window + 1]
    coeffs = np.polyfit(a_fit, E_fit, 2)
    a_refined = -coeffs[1] / (2 * coeffs[0])
    print(f'\nRefined estimate (parabola fit): a = {a_refined:.4f} Angstrom')
_images/718516386b1cb6ef56b2be9cdb5b370adff71d0a7237070a82351d1a187e641e.png
Refined estimate (parabola fit): a = 5.4778 Angstrom
from ase.eos import EquationOfState
from ase.units import GPa

print('Computing EOS...')
volumes = []
energies = []

for a in np.linspace(5.1, 5.9, 13):
    si = bulk('Si', 'diamond', a=a)
    if not MATTERSIM_AVAILABLE:
        si.set_calculator(EMT())
    else:
        si.set_calculator(MatterSimCalculator())
    volumes.append(si.get_volume())
    energies.append(si.get_potential_energy())

eos = EquationOfState(volumes, energies, eos='birchmurnaghan')
v0, e0, B = eos.fit()

# Back-calculate equilibrium lattice constant for diamond cubic
# V = a^3 / 4 for diamond structure (4 atoms in conventional cell...
# actually Si diamond has 8 atoms, V = a^3)
a0 = v0 ** (1/3)

print(f'EOS fitting complete.')
print(f'  Equilibrium lattice constant: a0 = {a0:.4f} Å')
print(f'  Equilibrium energy:           E0 = {e0:.4f} eV')
print(f'  Bulk modulus:                  B = {B/GPa:.2f} GPa')
Computing EOS...
2026-04-06 15:46:32.987 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
2026-04-06 15:46:34.124 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
/tmp/ipykernel_922/4017653889.py:13: FutureWarning: Please use atoms.calc = calc
  si.set_calculator(MatterSimCalculator())
2026-04-06 15:46:34.431 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
2026-04-06 15:46:34.837 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
2026-04-06 15:46:35.168 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
2026-04-06 15:46:35.488 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
2026-04-06 15:46:35.823 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
2026-04-06 15:46:36.184 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
2026-04-06 15:46:36.639 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
2026-04-06 15:46:36.981 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
2026-04-06 15:46:37.257 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
2026-04-06 15:46:37.581 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
2026-04-06 15:46:37.839 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
EOS fitting complete.
  Equilibrium lattice constant: a0 = 3.4422 Å
  Equilibrium energy:           E0 = -10.8290 eV
  Bulk modulus:                  B = 89.17 GPa
from ase.optimize import BFGS

print('Starting atomic relaxation from distorted structure...')

si_distorted = bulk('Si', 'diamond', a=5.43)
si_distorted.positions += 0.1 * np.random.randn(*si_distorted.positions.shape)

if not MATTERSIM_AVAILABLE:
    si_distorted.set_calculator(EMT())
else:
    si_distorted.set_calculator(MatterSimCalculator())

E_initial = si_distorted.get_potential_energy()
print(f'Initial energy (distorted): {E_initial:.6f} eV')

relaxer = BFGS(si_distorted, trajectory='relax.traj')
relaxer.run(fmax=0.01, steps=100)

E_final = si_distorted.get_potential_energy()
print(f'Final energy (relaxed): {E_final:.6f} eV')
print(f'Energy lowered by: {E_initial - E_final:.6f} eV')
print(f'Optimization converged!')

from ase.io import read
traj = read('relax.traj', index=':')
print(f'Number of relaxation steps: {len(traj) if isinstance(traj, list) else 1}')
Starting atomic relaxation from distorted structure...
2026-04-06 16:49:40.668 | INFO     | mattersim.forcefield.potential:from_checkpoint:877 - Loading the pre-trained mattersim-v1.0.0-1M.pth model
/tmp/ipykernel_922/1403209858.py:11: FutureWarning: Please use atoms.calc = calc
  si_distorted.set_calculator(MatterSimCalculator())
Initial energy (distorted): -10.408150 eV
      Step     Time          Energy          fmax
BFGS:    0 16:49:41      -10.408150        3.125817
BFGS:    1 16:49:41      -10.631231        1.980236
BFGS:    2 16:49:41      -10.810221        0.607144
BFGS:    3 16:49:41      -10.819462        0.380357
BFGS:    4 16:49:41      -10.824055        0.156613
BFGS:    5 16:49:41      -10.824995        0.029394
BFGS:    6 16:49:41      -10.825025        0.011248
BFGS:    7 16:49:41      -10.825026        0.006406
Final energy (relaxed): -10.825026 eV
Energy lowered by: 0.416876 eV
Optimization converged!
Number of relaxation steps: 8
# Plot relaxation convergence
if isinstance(traj, list):
    relaxation_energies = [atoms.get_potential_energy() for atoms in traj]
    relaxation_steps = range(len(relaxation_energies))
else:
    relaxation_energies = [traj.get_potential_energy()]
    relaxation_steps = [0]

plt.figure(figsize=(10, 5))
if len(relaxation_energies) > 1:
    plt.plot(relaxation_steps, relaxation_energies, 'b-o', linewidth=2, markersize=5)
    plt.xlabel('Relaxation Step')
    plt.ylabel('Energy (eV)')
    plt.title('Atomic Relaxation Convergence')
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()
    print(f'Relaxation energy drop: {relaxation_energies[0] - relaxation_energies[-1]:.6f} eV')
else:
    print('Relaxation trajectory too short for detailed plot.')
_images/b947d7a9b58e2c376e8f91b80c0df7898b3926c0339cf26cbb11f57b8a81a51f.png
Relaxation energy drop: 0.416876 eV

Summary: Real Materials Science Application#

What we did:

  1. Build crystalline Si structure with ASE

  2. Compute energies using fast NN potential (MatterSim) or classical (EMT)

  3. Scan lattice constant and find optimal value

  4. Fit equation of state

  5. Relax distorted atomic positions

Why this matters:

  • This is production workflow in materials science: DFT is too slow -> use fast NN potentials

  • MatterSim scales to 1000+ atoms (DFT limited to ~100)

  • Used for: high-throughput screening, defect calculations, phase diagrams

  • Alternative: ReaxFF, MACE, EquiformerV2 (all similar idea)

Next: Compare all optimization methods on a realistic objective.

Key Takeaways from Computational Physics Optimization#

1. Problem Structure Matters#

  • Smooth vs discontinuous

  • Landscape topology (funnel vs rugged)

  • Dimensionality and cost per evaluation

  • Implication: Exploit structure for method choice

2. There is No Silver Bullet#

  • BFGS excellent for smooth local optimization

  • DE robust for global problems

  • Bayesian Optimization wins for expensive functions

  • GA best for discrete spaces

  • Implication: Always consider your specific problem

3. Hybridization is Powerful#

  • Global method (DE/SA) -> Local refinement (BFGS)

  • Multiple restarts of local methods

  • Ensemble of methods -> take best

  • Implication: Combine strengths, cover weaknesses

4. Hyperparameters Matter#

  • Temperature schedule in SA

  • Population size and mutation in GA/DE

  • Kernel and acquisition in BO

  • Implication: Tuning gives 10-100x speedup

5. Exploit Domain Knowledge#

  • Physics constraints (e.g., PBC, symmetries)

  • Known approximate solutions

  • Surrogate models (neural networks, physics-informed)

  • Implication: Your domain expertise -> better optimization