Blog
How we measure RL-environment quality — the method, the findings, and the standard behind the Index.
Don't Trust Me — Re-run the Seed
A public RL environment whose reward a seeded search games in one simulation — and why the fix is to construct reward-hacks deterministically and ship the apparatus with the claim, not to detect them with a bigger model.
Read →Run the Checker: Replayable Audits, Difficulty Certificates, and the Capability Floor
The platform plus a season of pointing the instruments at my own hypotheses: three refutations with named mechanisms, and two results — a capability floor and a solver-effort certificate — that survived the same guillotine.
Read →The Reward Integrity Index, in brief
The accessible on-ramp: what reward-hacking is, the six deterministic instruments, and what the live board of 20 audited environments found — every number re-runnable from its seed.
Read the introduction →Reward Integrity: deterministic, replayable audits of RL-environment rewards
The rigorous treatment — the epiplexity frame, the difficulty-certification results (−0.924 / −0.870 vs pass-rate, +0.872 vs oracle), the 1372× decomposition collapse, the Reward Integrity Score, and an open research agenda. Heavily cited, annotated in the margins.
Read the paper →Action-space decomposition — a live explorer
The flagship result, interactive: a flat search explodes 7 → 23 → 469 → 1372 simulations on a rule-assembly puzzle while an LM-proposed macro holds 1 — play the live Rust→WASM engine in your browser.
Open the explorer →The paper, reproduced as runnable code
Fourteen hermetic Rust prototypes — thirteen reproduce the paper end-to-end (every instrument, the Index, the ExiT story, the epiplexity theory), the fourteenth extends it — each asserting its claim.
See the prototypes →More posts soon.