Building SimAMOC: From Broken Metrics to Stommel Bifurcation in One Session

April 21, 2026  ·  Derek Lomas & Claude

We spent a day building a real-time ocean circulation model that runs in the browser via WebGPU. By the end, it had working thermohaline circulation, salinity-driven AMOC that collapses under freshwater forcing, and a 3.8°C RMSE against NOAA satellite observations — tuned by AI agents for about a dime in API costs.

Here's what we built, how we measure quality, and where we're going.

SimAMOC temperature field showing sea surface temperature and seasonal land temperature

Sea surface temperature initialized from NOAA observations, evolved by the physics. Land shows seasonal temperature with altitude lapse rate from ETOPO1 elevation data.

What SimAMOC Is

SimAMOC solves the barotropic vorticity equation on a 360×180 grid (1° resolution) with two vertical layers — a 100m surface layer and a 4000m deep layer. It computes:

Everything runs on the GPU via WebGPU compute shaders. The simulation initializes from real NOAA satellite SST observations and uses ETOPO1 seafloor bathymetry — so it looks like Earth from frame one.

Streamfunction showing gyre structure

Streamfunction: red/blue = anticyclonic/cyclonic gyres

Current speed field

Speed: bright = fast currents (ACC, western boundaries)

Deep ocean temperature at 1000m

Deep temperature (1000m): cold polar water fills deep basins

ETOPO1 ocean bathymetry

Real ETOPO1 bathymetry: shelves, ridges, trenches

Try it live View source

The Ralph Wiggum Loop

To tune the model, we built a "Ralph Wiggum loop" — an iterative AI agent system that runs the simulation headlessly, compares output against observations, and proposes physics-informed parameter changes.

The loop uses Gemini 3.1 Flash Lite ($0.25 per million tokens) and consists of seven agents:

AgentRoleRuns
PhysicistGenerates competing hypotheses from screenshots + dataEvery iteration
TunerTranslates winning hypothesis to parameter changesEvery iteration
ValidatorChecks physical consistency, catches curve-fittingEvery iteration
Numerical AnalystChecks for computational artifactsWhen conservation fails
SkepticAudits trajectory for compensating errorsEvery 4th iteration
Observational ScientistAssesses reference data quality at problem latitudesWhen polar errors dominate
Literature AgentChecks parameter values against published modelsWhen params hit bounds

The Physicist agent receives four screenshots (temperature, streamfunction, speed, deep temperature) along with quantitative error metrics. It can see spatial patterns that zonal means miss — checkerboard artifacts, misplaced currents, boundary noise.

When the loop stalls for three iterations, it escalates to Claude for structural code review — looking for missing physics terms rather than parameter tuning.

How We Measure Quality

We evaluate the simulation across four tiers, weighted into a composite score:

Tier 1: Conservation (25%)

Does the simulation obey fundamental physical constraints? Temperature range realistic, equator-to-pole gradient exists, deep ocean colder than surface, AMOC positive. These are non-negotiable — if the simulation violates conservation, no amount of parameter tuning matters.

Tier 2: Structure (30%)

Do the right features emerge for the right reasons? We check seven diagnostics:

Tier 3: Sensitivity (20%, final evaluation only)

Does the simulation respond correctly to perturbations? Freshwater forcing should weaken the AMOC. Cooling should cool the ocean. These test causal structure, not just static fit.

Tier 4: Quantitative (30%)

How close is the sea surface temperature to NOAA OI SST v2 observations (1991–2020 annual mean)? Measured as RMSE across 15 latitude bands from 70°S to 70°N.

Current Quality

MetricValueTarget
RMSE vs NOAA SST3.8°C<3°C
Composite score74.1%>80%
AMOCPositive (+0.006)Stable positive
Freshwater → AMOC collapseYesYes
Western intensificationIntermittentConsistent
Structural checks passing5/77/7

The midlatitudes (30–50°) are within 1–2°C of observations. The poles remain challenging — the NH high latitudes tend to run warm and the SH can oscillate. This is typical of coarse ocean models; even CMIP6 GCMs struggle with polar biases.

The Key Breakthrough: Salinity

The single most important change was adding a salinity field. Before salinity, the simulation had temperature-driven density only — and the "Melt Greenland" scenario incorrectly cooled the North Atlantic instead of freshening it.

With salinity and a linear equation of state ρ(T,S), the physics changes fundamentally:

The result: cranking up freshwater forcing causes the AMOC to collapse from +0.004 to −0.0001 — a sign reversal in the overturning circulation. This is the Stommel bifurcation in action, the same physics that van Westen et al. (2024) showed could tip the real AMOC.

What We Know We're Missing

An honest audit reveals structural limitations that no parameter tuning can fix:

MissingImpactFixable?
Nordic Sea overflowsNo sill hydraulics for Denmark Strait / Faroe Bank ChannelParameterizable
Mesoscale eddiesUnresolved at 1° (need GM parameterization)Yes, with GM90
Mixed layer depthFixed 100m surface layerModerate effort
Atmospheric couplingFixed wind pattern, no weatherAdd energy balance atmosphere
Sea ice dynamicsOnly thermodynamic albedo, no ice transportModerate effort
Isopycnal upwellingSouthern Ocean AMOC closure too simpleNeeds isopycnal coordinates

For context: even the simplest model that correctly captures AMOC tipping (the Stommel two-box model) only needs temperature, salinity, and a density-driven flow. We now have all three with spatial structure — putting us above box models in capability, though below full GCMs like MOM6 or NEMO in resolution and physics detail.

SimAMOC full interface with controls and diagnostics

The full SimAMOC interface: simulation view with temperature field, parameter sliders, view mode buttons, and real-time diagnostics (AMOC strength, velocity, kinetic energy, season).

Compute Architecture

The simulation runs entirely in the browser via WebGPU compute shaders (WGSL). Key optimizations:

The AI Cost: $0.10

The entire Wiggum loop — all iterations across all runs, including multimodal screenshots sent to the Physicist agent — cost approximately $0.10 in Gemini 3.1 Flash Lite API calls. At $0.25 per million input tokens and $1.50 per million output tokens, we could run thousands more iterations for under $10.

The real value was the agent dialectic: the Physicist hypothesizes, the Tuner proposes, the Validator catches bad physics. The Skeptic audits for curve-fitting. When all four stall, Claude reviews the actual code and finds structural bugs (like the missing deep buoyancy term) that no amount of parameter tuning could fix.

Update: AI-Discovered Physics (April 23)

In a follow-up session, we ran a physics tournament: Claude agents proposed competing modifications to the ocean physics code, each tested in an isolated git worktree and evaluated against NOAA SST observations. The winner was merged; losers pruned.

SimAMOC after AI-discovered physics improvements

After 6 AI-discovered physics improvements: polar OLR boost, brine rejection, Ekman transport, realistic wind stress, variable mixed layer depth, and meridional overturning. RMSE dropped from 7.6°C to 5.6°C. Play this version

The key breakthrough: the Physicist agent diagnosed that the outgoing longwave radiation formula was latitude-independent — a structural code bug that no amount of parameter tuning could fix. It also identified that salinity stratification was permanently blocking Antarctic Bottom Water formation, and proposed brine rejection (sea ice concentrates salt) as the fix. These are real scientific insights, not curve fitting.

Six code-level physics improvements were merged:

  1. Polar OLR boost — high latitudes radiate 20 W/m² more efficiently. Fixed +14°C Southern Ocean warm bias.
  2. Brine rejection — ice formation concentrates salt at |lat|>55°, enabling Antarctic Bottom Water.
  3. Ekman heat transport — wind stress drives poleward heat via Coriolis (v_ek = τ/ρf).
  4. Realistic wind stress — multi-term Fourier expansion: τ_x = −0.4cos(2φ) + 0.25sin(φ) + 0.1sin(3φ).
  5. Variable mixed layer — 20m at equator, 200m at poles.
  6. Meridional overturning — density gradients drive deep equatorward flow.

We also built a competitive leaderboard where Derek and Luke submit versions and compare RMSE scores. The vision: a self-improving climate model where AI agents continuously discover physics improvements, evaluated against observations.

What's Next

AMOC Tipping Experiments

The AMOC timeseries panel now tracks overturning strength in pseudo-Sverdrups with FovS early warning. "Melt Greenland" ramps freshwater gradually over 10 seconds so you can watch the collapse in real time. Next: systematic hosing protocol to map the full bifurcation diagram.

RMSE Below 3°C

Current best: 5.6°C. Remaining biases: tropics 5°C too cold, NH mid-latitudes too warm. Candidates: atmospheric energy balance model, Ekman pumping, latitude-dependent OLR tuning. The tournament infrastructure makes testing fast.

Cloud Modeling

Next frontier: simple cloud parameterization to modulate solar insolation and OLR. Clouds are the largest uncertainty in climate models — even a 1-layer scheme (low clouds cool, high clouds warm) would add a new degree of realism.

Self-Improving Model

The physics tournament proved the concept: AI agents can discover real physics improvements, not just tune parameters. Next step: run tournaments continuously, with each day's winner merged into main. The git history becomes a record of AI-discovered science.

Try SimAMOC Leaderboard GitHub

SimAMOC is built by Derek Lomas and Luke Barrington, with AI assistance from Claude. The simulation code, knowledge bank, and Wiggum loop are open source.