SimAMOC Competition

All

Derek

Luke

Current era

Zonal SST Error (sim - obs, °C)

Convergence

Model Parameters

Performance

#	Author	Version	RMSE	Trop SST	Polar SST	Perf	Date

The Challenge

SimAMOC is a WebGPU ocean circulation model running on a 360x180 (1-degree) global grid with real bathymetry. Your goal: tune the physics so the simulated sea surface temperatures match 30 years of NOAA satellite observations.

Target: RMSE < 3.0°C across 15 latitude bands

Reference Observations (NOAA OI SST v2, 1991-2020)

-70°

-60°

-50°

-40°

-30°

-20°

-10°

0°

10°

20°

30°

40°

50°

60°

70°

-1.5

1.3

6.8

15.1

20.9

24.7

26.9

27.6

27.8

26.1

22.2

15.6

9.0

5.2

1.0

What You Can Change

Physics parameters in model.js — solar forcing, OLR, diffusion, friction, deep water formation rates
GPU shader code — the WGSL compute shaders for vorticity, temperature, Poisson solver
Wind forcing — the wind stress curl pattern drives gyres and western boundary currents
Cloud model — cloud fraction affects both solar albedo and OLR greenhouse
Atmosphere coupling in main.js — air-sea heat exchange, atmospheric diffusion
Initialization — starting conditions for temperature, salinity, circulation
New data files — add observational data to constrain the model

How to Submit

Clone the repo: git clone https://github.com/JDerekLomas/amoc.git
Open simamoc/index.html in Chrome to see the sim running
Edit simamoc/model.js (or other files) with your physics changes
Test interactively: open browser console, try lab.diagnostics()

Submit:

node submit-version.mjs \
  --author "Luke" \
  --name "Better wind forcing" \
  --description "Multi-term Fourier wind stress matching ERA5"

The script takes ~40 seconds: snapshot code, launch headless Chrome + WebGPU, spin up 30s, extract zonal SST, compute RMSE, save screenshots, update leaderboard.

Quick-Start: Parameter Tuning

Fastest way to improve RMSE — tweak radiative balance params in the browser console:

// Open simamoc/index.html in Chrome, then in console:
lab.setParams({ S_solar: 6.5, A_olr: 1.8, B_olr: 0.14 })
// Wait 30s, then check:
lab.diagnostics()
// See .globalSST, .tropicalSST, .polarSST

When you find good values, edit model.js line 22-24 and submit.

Automated Tuning

For rapid iteration without a browser:

node tune.mjs --label test1 --spinup 25 \
  --params '{"S_solar":6.5,"B_olr":0.14}'

Outputs RMSE, zonal errors, and screenshots to screenshots/tune/.

Rules

Any change to the physics is fair game — parameters, equations, initialization, data
The model must remain a 360x180 grid with WebGPU compute (or CPU fallback)
No hardcoding reference temperatures into the model output
Submissions are evaluated on a 30s wall-clock spinup from initial conditions
Legacy entries (pre-April 24 param rescaling) are marked but kept for history

Why These Metrics?

We're building an educational ocean simulator, not a GCM. Our metrics reflect what matters for a model that helps people understand ocean circulation and climate — not one that produces publication-grade forecasts. The scoring is deliberately simple so that improving it requires learning real ocean physics, not just gaming a loss function.

Primary: Zonal Mean SST RMSE

What it measures: How well the model reproduces the large-scale temperature structure of the ocean — warm tropics, cold poles, and the gradient between them. We compare zonal means (averaged around each latitude band) at 15 latitudes from 70°S to 70°N against 30 years of NOAA satellite observations.

Why we chose it: Zonal mean SST is the single most important diagnostic in ocean/climate modeling. It integrates everything — solar forcing, atmospheric coupling, ocean heat transport, ice-albedo feedback, cloud radiative effects. If your zonal SST is right, most of the large-scale physics is working. It's also easy to compute, easy to understand, and directly tied to the climate question this model exists to explore (AMOC collapse and its effect on regional temperatures).

What it misses:

Spatial structure within latitude bands. A model could have the right zonal mean but put all the warm water in the wrong ocean basin. The Gulf Stream could be missing entirely and the zonal mean might not notice, because the Atlantic is small compared to the Pacific at most latitudes.
Seasonal cycle. We compare against annual means. A model that's 10°C too hot in summer and 10°C too cold in winter would score perfectly.
Temporal stability. We only evaluate at one snapshot (after 30s spinup). A model that oscillates wildly but happens to pass through the right state would score well.
Deep ocean. SST only tells you about the top ~100m. The deep ocean (4000m, 90% of ocean volume) could be completely wrong.

Secondary: Tropical SST

What it measures: Average SST between 20°S and 20°N. Observed value: ~27°C.

Why it matters: The tropical warm pool drives global atmospheric circulation (Hadley cells, Walker circulation, monsoons). Getting it right is a prerequisite for realistic heat transport. Most simple models underestimate tropical SST because they can't maintain the sharp equator-to-subtropics gradient — heat leaks poleward too fast through numerical diffusion.

What it misses: The tropical band is huge. This metric can't distinguish between a model with a realistic warm pool in the western Pacific and one with uniformly warm tropics. It also can't detect El Nino/La Nina variability, which is arguably the most important mode of tropical ocean dynamics.

Secondary: Polar SST

What it measures: Average SST poleward of 60° in both hemispheres. Observed value: ~0-2°C (depending on ice extent).

Why it matters: Polar temperatures control sea ice extent, deep water formation, and the strength of the thermohaline circulation (AMOC). Getting polar SST right means the model's radiative balance, ice-albedo feedback, and deep convection are all in reasonable shape.

What it misses: Polar SST has large seasonal and interannual variability. The reference data in ice-covered regions is uncertain — satellites can't see SST under ice, so the "observed" values are partly interpolated. Our model also doesn't have a real sea ice model (just a temperature-dependent albedo), so the comparison is approximate.

Secondary: AMOC Strength

What it measures: The strength of the Atlantic Meridional Overturning Circulation, computed from the meridional streamfunction gradient in the North Atlantic at ~26°N. Reported in non-dimensional model units (not Sverdrups).

Why it matters: The AMOC is the whole reason this project exists. It's the ocean's main heat conveyor — warm surface water flows north, cools, sinks, and returns south at depth. Its collapse would cool Europe by 5-10°C. A model that gets SST right but has no overturning is missing the most important piece of ocean dynamics.

What it misses: Our AMOC diagnostic is crude — it measures the streamfunction gradient at a single latitude, not the full overturning streamfunction. The observed AMOC (~17 Sv from the RAPID array) is in physical units that our non-dimensional model can't directly compare to. We report it for qualitative assessment (is overturning happening?) rather than quantitative scoring.

What We Don't Measure (Yet)

These are diagnostics that would make the competition more complete. They're future work.

Western boundary current position and strength. Is the Gulf Stream separating at Cape Hatteras? Is the Kuroshio present? These are the most dynamically important features in the ocean and we don't score them.
ACC transport. The Antarctic Circumpolar Current carries ~130 Sv through Drake Passage. We compute zonal velocity at 58°S but don't include it in the score.
Deep temperature profile. We have WOA23 deep temp observations and the model computes deep temps, but they're not in the score. Deep ocean equilibration takes much longer than our 30s spinup allows.
Salinity structure. Salinity drives half of the density-driven circulation. The model has salinity but we don't evaluate it against observations.
Heat transport. Poleward ocean heat transport (vT at various latitudes) is a fundamental climate diagnostic. Getting it right for the right reasons matters more than getting SST right by accident.
Response to perturbations. Does freshwater forcing weaken the AMOC? Does closing Drake Passage warm Antarctica? The model's sensitivity to perturbations is arguably more important than its equilibrium state, but we don't test it.
Spatial pattern correlation. Comparing the full 2D SST field against observations (not just zonal means) would catch basin-scale errors that zonal averages hide.

Evaluation Protocol Limitations

30-second spinup is too short. The surface ocean equilibrates in ~1-2 sim-years, but the deep ocean takes ~100-1000 years. Our score heavily favors initial conditions over dynamics. A model initialized from perfect observations would score well even with broken physics.
No ensemble evaluation. We run once. Chaotic sensitivity means a different random seed could give a different score. Proper evaluation would average over multiple runs.
Reference data uncertainty. The NOAA SST climatology has known biases at high latitudes (sparse sampling, ice contamination). We treat it as ground truth.
Overfitting risk. With only 15 latitude bands to match, it's possible to curve-fit the radiative parameters to minimize RMSE without the physics being right. The Wiggum loop's multi-tier evaluation (conservation, structural emergence, sensitivity) guards against this, but the leaderboard score doesn't.

Performance Metrics

Not everything worth optimizing is about accuracy. The leaderboard also tracks simulation performance — how fast the model runs. You can sort the leaderboard by performance to see who has the fastest physics engine.

Steps/sec — GPU timesteps per second. Higher is better. Typical range: 400-1600. Affected by shader complexity, buffer count, dispatch overhead.
FPS — rendering frames per second. Above 30 feels smooth. Affected by canvas resolution, colormap computation, particle count.
Stability — whether vorticity stays bounded and no NaN values appear. An unstable model is useless regardless of RMSE.

The tradeoff: More complex physics (atmosphere coupling, regime-based clouds, variable mixed layer depth) generally costs performance. The current leader runs atmosphere updates on CPU between GPU readback cycles, which adds ~30% overhead. A submission that achieves the same RMSE with less computational cost is genuinely better — it means the physics is more efficiently formulated.

What we don't measure: Memory usage, power consumption, mobile performance, or startup time. These matter for a real educational tool but aren't currently benchmarked.

Convergence: Alignment Over Time

A single RMSE snapshot can be misleading. A model initialized from perfect observations scores great at t=0 even with broken physics — it just hasn't had time to drift yet. What matters is the trajectory.

New submissions record RMSE at intervals during spinup, producing a convergence curve. This reveals three distinct model behaviors:

Converging (good): RMSE may rise initially as the model adjusts from initial conditions, then falls and flattens. The dynamics are pulling the model toward reality.
Drifting (bad init, good physics): RMSE rises from low initial value, then slowly recovers. The model's equilibrium is close to observations but it starts far away.
Diverging (bad physics): RMSE rises and keeps rising. The model's dynamics are actively making things worse. No amount of init-tuning will save it.

The shape of the convergence curve tells you more about model quality than any single RMSE number. Click any submission on the Leaderboard tab to see its curve.

The Real Test: Predictive Validation

Everything above is hindcasting — can the model reproduce data it was fitted to? The real test is forecasting: initialize from 2020 observations, run forward, and compare predictions against 2021-2025 satellite data the model has never seen.

This is how real climate models are validated. It requires:

Getting the year-to-year warming trend right (~0.02°C/yr global mean)
Capturing regional warming patterns (Arctic amplification, Southern Ocean delay)
Responding correctly to forcing changes (CO2 trajectory, volcanic eruptions)

We don't score this yet, but the data exists — NOAA publishes yearly SST fields. A future version of the competition could split the reference data into a training period (1991-2020) and a validation period (2021-2025). This would make overfitting nearly impossible and force genuine physical understanding.

This is where the competition is headed. Matching a climatology is Phase 1. Predicting unseen years is the endgame.

The Target: 3.0°C RMSE

Why 3°C? It's roughly the level where the model "looks right" — warm tropics, cold poles, recognizable ocean basins, reasonable gradients. Below 3°C, the remaining errors are mostly in regions where our 1-degree barotropic model has known structural limitations (western boundary currents, mesoscale eddies, polar processes).

For context: a constant 14.2°C everywhere (the global mean) gives RMSE ~11°C. A simple latitude-only model (T = 28 - 0.4 * |lat|) gives ~5°C. Getting below 5°C requires real ocean physics. Getting below 3°C requires getting the physics right.