Loading…
2026-05-04 · Realignment report & decisions

ATLAS — what the realignment found, and what to do next.

A 22-hour overnight build closed the pipeline gates, installed a contract layer, and re-trialed the entire active strategy population (1,525 strategies) under engine v2. The numbers are unambiguous. The decisions below set direction.

Date 2026-05-04 Engine v2 hash e922302920b1 Population re-trialed 1,525 Strategies that passed 0

TL;DR

The realignment infrastructure shipped end-to-end in 5 commits across Phases A through G — gate fix, contract layer, simplicity protection, re-codification pipeline, dashboard visibility, hourly self-heal, daily objective monitor, versioning policy.

The verification verdict on the existing strategy population is clear-cut: 0 of 1,525 strategies pass under engine v2. 1,003 fail outright on Sharpe / PF thresholds; 524 error out (most are missing data or the crypto cost model gap).

The single best surviving candidate is Cross-JPY Risk Barometer (Sharpe 0.80, PF 2.76 OOS over 23 trades) — close to the threshold but not over.

Implication: the existing pipeline (research_engine + seed + the original 17 backtest_harness) was producing noise, not edge. This isn't a build failure — it's the realignment doing its job.

Critical findings

The "17 verified" baseline was already broken on May 2

Re-verification on 2026-05-02 10:09 UTC failed all 17 backtest_harness strategies (OOS Sharpe near 0 or negative, PF below 1.2). All 3 filtered seeds errored on the crypto cost model. They were running in paper_trading only because hypothesis_activator.py ran with VERIFICATION_GATE_MODE=shadow (informational, not enforcing). 148 strategies were reverted to observing in Phase A — not the 103 the original blueprint anticipated.

Two open promotion gates, not one

research_engine.py:337-341 was the obvious one. hypothesis_activator.py:177-194 was the silent second one — its VERIFICATION_GATE_MODE default was shadow. Both have been patched. verification/bridge.py:143 was a third entry point that inserted backtest_harness rows directly at paper_trading; also patched.

Container code was stale

Mid-session, 7 new research_engine rows appeared at paper_trading (IDs 3133–3139) because the running atlas-app container had the unpatched code in memory. Reverted, container restarted, code now live. The hourly sweep would have caught this within 60 minutes; it caught it inside 30 because we were watching.

What was built

5
Commits, no force-pushes
22
Tests passing
5
New DB tables
2
Crons installed
3
Promotion gates patched
~20m
Build wall-clock

Commits

ba20fa1  Phase A — gate fix; 148 reverted; verification_runs table
aeec211  Phase B — contract layer + transition() chokepoint + 10 stress tests
97cb747  Phase C — codification envelope schema + dataclasses
1f696ae  Phase D — re-codification + verification + verdict pipeline
e88b219  Phases E/F/G — visibility, hourly sweep, objective monitor, versioning docs

New routes

PathWhat it shows
/reviewHTML review queue: pending verdicts that need your decision
/pipeline-stateJSON: status counts, v2 verdict counts, locks, open trades, critical events

The verdict — re-trial of 1,525 strategies under engine v2

0
Passed
1,003
Failed
524
Errored
1
Review-required
3
Tie (both run)
1,001
In review queue

OOS Sharpe distribution

≥ 1.0
0
0.5 – 1.0
2
0 – 0.5
499
−0.5 – 0
500
< −0.5
3
No trades
524

Pass threshold is OOS Sharpe ≥ 1.0 with PF ≥ 1.2 over ≥ 30 trades. The distribution centres tightly on zero. Two strategies have positive Sharpe above 0.5; the rest are noise around the mean.

Breakdown by source

Source typeTotalPassFailErrorAvg OOS Sharpe
research_engine1,4880980508+0.020
seed (CLAUDE.md)200118−0.269
markdown research docs200128−0.016

Best three candidates (none promoted, but worth reviewing)

v2 IDTitleMarketsOOS SharpeOOS PFTrades
1626Cross-JPY Risk BarometerGBP_JPY, EUR_GBP0.7972.76223
2162Holy Grail [NAS100_USD]NAS100_USD0.5741.80027
1621Nikkei-JPY InverseJP225_USD, USD_JPY0.3821.61320

The Cross-JPY Risk Barometer is interesting

OOS Sharpe 0.80, PF 2.76, 23 trades on a cross-asset (GBP_JPY × EUR_GBP) divergence. Below the 1.0 / 30-trade thresholds, but the PF is high enough that the trade count being the binding constraint is plausible. Worth a closer look.

What this means

Three readings, in increasing order of severity.

Reading 1 — the harness is doing its job

Engine v2 is more honest than engine v1. Path-dependent costs accrue funding per interval, session-aware FX spreads cost more during low-liquidity sessions, and OOS thresholds are real. The April harness gave 17 strategies a pass; the May harness rejects all of them. That's the system getting more rigorous, not the strategies getting worse. Old verdicts under a weaker harness should not have been trusted.

Reading 2 — the research_engine produces statistical patterns, not strategies

1,488 research_engine entries with average OOS Sharpe of +0.020 (essentially zero). These are statistical anomalies in historical data that don't reproduce out-of-sample under realistic costs. Auto-discovery without a verification gate floods the registry with noise. The new contract layer prevents this from happening again.

Reading 3 — the seed and source-derived strategies have not held up either

Average OOS Sharpe of seed strategies: −0.27. Average for markdown-derived strategies: −0.02. These are the curated strategies — the ones we trusted most. Under engine v2 they don't perform. This is the most consequential finding because it questions the entire prior strategy-generation process: research → backtest → walk-forward → paper.

Recommendations (read before signing the blueprint below)

Forward plan — your decisions

Each block below is a decision. YES approves the default. MODIFY changes scope or details. NO skips. Decisions auto-save. Submit at the bottom.

0
Yes
0
No
0
Modify
14
Pending
0% reviewed

Submit final decisions?

Once submitted, Claude reads your responses and starts executing what you approved. You can still come back and edit; rerunning manually after edits is on you.