Blog

How we measure RL-environment quality — the method, the findings, and the standard behind the Index.

Don't Trust Me — Re-run the Seed

A public RL environment whose reward a seeded search games in one simulation — and why the fix is to construct reward-hacks deterministically and ship the apparatus with the claim, not to detect them with a bigger model.

Read →

Research note

Run the Checker: Replayable Audits, Difficulty Certificates, and the Capability Floor

The platform plus a season of pointing the instruments at my own hypotheses: three refutations with named mechanisms, and two results — a capability floor and a solver-effort certificate — that survived the same guillotine.

Read →

Introduction

The Reward Integrity Index, in brief

The accessible on-ramp: what reward-hacking is, the six deterministic instruments, and what the live board of 20 audited environments found — every number re-runnable from its seed.

Read the introduction →

Technical paper

Reward Integrity: deterministic, replayable audits of RL-environment rewards

The rigorous treatment — the epiplexity frame, the difficulty-certification results (−0.924 / −0.870 vs pass-rate, +0.872 vs oracle), the 1372× decomposition collapse, the Reward Integrity Score, and an open research agenda. Heavily cited, annotated in the margins.

Read the paper →

Live demo

Action-space decomposition — a live explorer

The flagship result, interactive: a flat search explodes 7 → 23 → 469 → 1372 simulations on a rule-assembly puzzle while an LM-proposed macro holds 1 — play the live Rust→WASM engine in your browser.

Open the explorer →

Reproductions

The paper, reproduced as runnable code

Fourteen hermetic Rust prototypes — thirteen reproduce the paper end-to-end (every instrument, the Index, the ExiT story, the epiplexity theory), the fourteenth extends it — each asserting its claim.

See the prototypes →