Build the most realistic ocean circulation model. Match your simulation to 30 years of NOAA satellite observations. Beat the leaderboard.
| # | Author | Version | RMSE | Trop SST | Polar SST | Perf | Date |
|---|
SimAMOC is a WebGPU ocean circulation model running on a 360x180 (1-degree) global grid with real bathymetry. Your goal: tune the physics so the simulated sea surface temperatures match 30 years of NOAA satellite observations.
Target: RMSE < 3.0°C across 15 latitude bands
model.js — solar forcing, OLR, diffusion, friction, deep water formation ratesmain.js — air-sea heat exchange, atmospheric diffusiongit clone https://github.com/JDerekLomas/amoc.gitsimamoc/index.html in Chrome to see the sim runningsimamoc/model.js (or other files) with your physics changeslab.diagnostics()node submit-version.mjs \ --author "Luke" \ --name "Better wind forcing" \ --description "Multi-term Fourier wind stress matching ERA5"
The script takes ~40 seconds: snapshot code, launch headless Chrome + WebGPU, spin up 30s, extract zonal SST, compute RMSE, save screenshots, update leaderboard.
Fastest way to improve RMSE — tweak radiative balance params in the browser console:
// Open simamoc/index.html in Chrome, then in console:
lab.setParams({ S_solar: 6.5, A_olr: 1.8, B_olr: 0.14 })
// Wait 30s, then check:
lab.diagnostics()
// See .globalSST, .tropicalSST, .polarSST
When you find good values, edit model.js line 22-24 and submit.
For rapid iteration without a browser:
node tune.mjs --label test1 --spinup 25 \
--params '{"S_solar":6.5,"B_olr":0.14}'
Outputs RMSE, zonal errors, and screenshots to screenshots/tune/.
We're building an educational ocean simulator, not a GCM. Our metrics reflect what matters for a model that helps people understand ocean circulation and climate — not one that produces publication-grade forecasts. The scoring is deliberately simple so that improving it requires learning real ocean physics, not just gaming a loss function.
What it measures: How well the model reproduces the large-scale temperature structure of the ocean — warm tropics, cold poles, and the gradient between them. We compare zonal means (averaged around each latitude band) at 15 latitudes from 70°S to 70°N against 30 years of NOAA satellite observations.
Why we chose it: Zonal mean SST is the single most important diagnostic in ocean/climate modeling. It integrates everything — solar forcing, atmospheric coupling, ocean heat transport, ice-albedo feedback, cloud radiative effects. If your zonal SST is right, most of the large-scale physics is working. It's also easy to compute, easy to understand, and directly tied to the climate question this model exists to explore (AMOC collapse and its effect on regional temperatures).
What it misses:
What it measures: Average SST between 20°S and 20°N. Observed value: ~27°C.
Why it matters: The tropical warm pool drives global atmospheric circulation (Hadley cells, Walker circulation, monsoons). Getting it right is a prerequisite for realistic heat transport. Most simple models underestimate tropical SST because they can't maintain the sharp equator-to-subtropics gradient — heat leaks poleward too fast through numerical diffusion.
What it misses: The tropical band is huge. This metric can't distinguish between a model with a realistic warm pool in the western Pacific and one with uniformly warm tropics. It also can't detect El Nino/La Nina variability, which is arguably the most important mode of tropical ocean dynamics.
What it measures: Average SST poleward of 60° in both hemispheres. Observed value: ~0-2°C (depending on ice extent).
Why it matters: Polar temperatures control sea ice extent, deep water formation, and the strength of the thermohaline circulation (AMOC). Getting polar SST right means the model's radiative balance, ice-albedo feedback, and deep convection are all in reasonable shape.
What it misses: Polar SST has large seasonal and interannual variability. The reference data in ice-covered regions is uncertain — satellites can't see SST under ice, so the "observed" values are partly interpolated. Our model also doesn't have a real sea ice model (just a temperature-dependent albedo), so the comparison is approximate.
What it measures: The strength of the Atlantic Meridional Overturning Circulation, computed from the meridional streamfunction gradient in the North Atlantic at ~26°N. Reported in non-dimensional model units (not Sverdrups).
Why it matters: The AMOC is the whole reason this project exists. It's the ocean's main heat conveyor — warm surface water flows north, cools, sinks, and returns south at depth. Its collapse would cool Europe by 5-10°C. A model that gets SST right but has no overturning is missing the most important piece of ocean dynamics.
What it misses: Our AMOC diagnostic is crude — it measures the streamfunction gradient at a single latitude, not the full overturning streamfunction. The observed AMOC (~17 Sv from the RAPID array) is in physical units that our non-dimensional model can't directly compare to. We report it for qualitative assessment (is overturning happening?) rather than quantitative scoring.
These are diagnostics that would make the competition more complete. They're future work.
Not everything worth optimizing is about accuracy. The leaderboard also tracks simulation performance — how fast the model runs. You can sort the leaderboard by performance to see who has the fastest physics engine.
The tradeoff: More complex physics (atmosphere coupling, regime-based clouds, variable mixed layer depth) generally costs performance. The current leader runs atmosphere updates on CPU between GPU readback cycles, which adds ~30% overhead. A submission that achieves the same RMSE with less computational cost is genuinely better — it means the physics is more efficiently formulated.
What we don't measure: Memory usage, power consumption, mobile performance, or startup time. These matter for a real educational tool but aren't currently benchmarked.
A single RMSE snapshot can be misleading. A model initialized from perfect observations scores great at t=0 even with broken physics — it just hasn't had time to drift yet. What matters is the trajectory.
New submissions record RMSE at intervals during spinup, producing a convergence curve. This reveals three distinct model behaviors:
The shape of the convergence curve tells you more about model quality than any single RMSE number. Click any submission on the Leaderboard tab to see its curve.
Everything above is hindcasting — can the model reproduce data it was fitted to? The real test is forecasting: initialize from 2020 observations, run forward, and compare predictions against 2021-2025 satellite data the model has never seen.
This is how real climate models are validated. It requires:
We don't score this yet, but the data exists — NOAA publishes yearly SST fields. A future version of the competition could split the reference data into a training period (1991-2020) and a validation period (2021-2025). This would make overfitting nearly impossible and force genuine physical understanding.
This is where the competition is headed. Matching a climatology is Phase 1. Predicting unseen years is the endgame.
Why 3°C? It's roughly the level where the model "looks right" — warm tropics, cold poles, recognizable ocean basins, reasonable gradients. Below 3°C, the remaining errors are mostly in regions where our 1-degree barotropic model has known structural limitations (western boundary currents, mesoscale eddies, polar processes).
For context: a constant 14.2°C everywhere (the global mean) gives RMSE ~11°C. A simple latitude-only model (T = 28 - 0.4 * |lat|) gives ~5°C. Getting below 5°C requires real ocean physics. Getting below 3°C requires getting the physics right.