2026-05-04 · Realignment report & decisions

ATLAS — what the realignment found, and what to do next.

A 22-hour overnight build closed the pipeline gates, installed a contract layer, and re-trialed the entire active strategy population (1,525 strategies) under engine v2. The numbers are unambiguous. The decisions below set direction.

Date 2026-05-04 Engine v2 hash e922302920b1 Population re-trialed 1,525 Strategies that passed 0

TL;DR

The realignment infrastructure shipped end-to-end in 5 commits across Phases A through G — gate fix, contract layer, simplicity protection, re-codification pipeline, dashboard visibility, hourly self-heal, daily objective monitor, versioning policy.

The verification verdict on the existing strategy population is clear-cut: 0 of 1,525 strategies pass under engine v2. 1,003 fail outright on Sharpe / PF thresholds; 524 error out (most are missing data or the crypto cost model gap).

The single best surviving candidate is Cross-JPY Risk Barometer (Sharpe 0.80, PF 2.76 OOS over 23 trades) — close to the threshold but not over.

Implication: the existing pipeline (research_engine + seed + the original 17 backtest_harness) was producing noise, not edge. This isn't a build failure — it's the realignment doing its job.

Critical findings

The "17 verified" baseline was already broken on May 2

Re-verification on 2026-05-02 10:09 UTC failed all 17 backtest_harness strategies (OOS Sharpe near 0 or negative, PF below 1.2). All 3 filtered seeds errored on the crypto cost model. They were running in paper_trading only because hypothesis_activator.py ran with VERIFICATION_GATE_MODE=shadow (informational, not enforcing). 148 strategies were reverted to observing in Phase A — not the 103 the original blueprint anticipated.

Two open promotion gates, not one

research_engine.py:337-341 was the obvious one. hypothesis_activator.py:177-194 was the silent second one — its VERIFICATION_GATE_MODE default was shadow. Both have been patched. verification/bridge.py:143 was a third entry point that inserted backtest_harness rows directly at paper_trading; also patched.

Container code was stale

Mid-session, 7 new research_engine rows appeared at paper_trading (IDs 3133–3139) because the running atlas-app container had the unpatched code in memory. Reverted, container restarted, code now live. The hourly sweep would have caught this within 60 minutes; it caught it inside 30 because we were watching.

What was built

Commits, no force-pushes

Tests passing

New DB tables

Crons installed

Promotion gates patched

~20m

Build wall-clock

Commits

ba20fa1 Phase A — gate fix; 148 reverted; verification_runs table

aeec211 Phase B — contract layer + transition() chokepoint + 10 stress tests

97cb747 Phase C — codification envelope schema + dataclasses

1f696ae Phase D — re-codification + verification + verdict pipeline

e88b219 Phases E/F/G — visibility, hourly sweep, objective monitor, versioning docs

New routes

Path	What it shows
`/review`	HTML review queue: pending verdicts that need your decision
`/pipeline-state`	JSON: status counts, v2 verdict counts, locks, open trades, critical events

The verdict — re-trial of 1,525 strategies under engine v2

Passed

1,003

Failed

524

Errored

Review-required

Tie (both run)

1,001

In review queue

OOS Sharpe distribution

≥ 1.0

0.5 – 1.0

0 – 0.5

499

−0.5 – 0

500

< −0.5

No trades

524

Pass threshold is OOS Sharpe ≥ 1.0 with PF ≥ 1.2 over ≥ 30 trades. The distribution centres tightly on zero. Two strategies have positive Sharpe above 0.5; the rest are noise around the mean.

Breakdown by source

Source type	Total	Fail	Error	Avg OOS Sharpe
research_engine	1,488	980	508	+0.020
seed (CLAUDE.md)	20	11	8	−0.269
markdown research docs	20	12	8	−0.016

Best three candidates (none promoted, but worth reviewing)

v2 ID	Title	Markets	OOS Sharpe	OOS PF	Trades
1626	Cross-JPY Risk Barometer	GBP_JPY, EUR_GBP	0.797	2.762	23
2162	Holy Grail [NAS100_USD]	NAS100_USD	0.574	1.800	27
1621	Nikkei-JPY Inverse	JP225_USD, USD_JPY	0.382	1.613	20

The Cross-JPY Risk Barometer is interesting

OOS Sharpe 0.80, PF 2.76, 23 trades on a cross-asset (GBP_JPY × EUR_GBP) divergence. Below the 1.0 / 30-trade thresholds, but the PF is high enough that the trade count being the binding constraint is plausible. Worth a closer look.

What this means

Three readings, in increasing order of severity.

Reading 1 — the harness is doing its job

Engine v2 is more honest than engine v1. Path-dependent costs accrue funding per interval, session-aware FX spreads cost more during low-liquidity sessions, and OOS thresholds are real. The April harness gave 17 strategies a pass; the May harness rejects all of them. That's the system getting more rigorous, not the strategies getting worse. Old verdicts under a weaker harness should not have been trusted.

Reading 2 — the research_engine produces statistical patterns, not strategies

1,488 research_engine entries with average OOS Sharpe of +0.020 (essentially zero). These are statistical anomalies in historical data that don't reproduce out-of-sample under realistic costs. Auto-discovery without a verification gate floods the registry with noise. The new contract layer prevents this from happening again.

Reading 3 — the seed and source-derived strategies have not held up either

Average OOS Sharpe of seed strategies: −0.27. Average for markdown-derived strategies: −0.02. These are the curated strategies — the ones we trusted most. Under engine v2 they don't perform. This is the most consequential finding because it questions the entire prior strategy-generation process: research → backtest → walk-forward → paper.

Recommendations (read before signing the blueprint below)

Stop generating new strategies until you've decided what to do with these results. The research_engine 6h cron is currently still running. It produces ~10–20 new statistical patterns per cycle. They will continue to fail verification. Pause it; revisit when you have a clearer view on what kinds of patterns deserve testing.
Investigate Cross-JPY Risk Barometer (sid 1626). Best surviving candidate. PF of 2.76 is meaningful. The trade count (23) is below the 30 threshold but the magnitude suggests genuine signal. Worth a closer manual look — review the trades, the regime profile, and whether it's been tested on more than the single OOS window.
Land crypto cost model Fix 2 before doing any more crypto verification. 524 of the 524 errors include crypto strategies that couldn't even be evaluated. The "no edge" verdict for crypto is currently unverifiable.
Wire Telegram delivery from critical_log — the table is being written to but nothing pushes. Without it, anomalies surface only on the dashboard. ~30 minutes of work; high leverage.
Don't run optimization sweeps yet. Sweeps amplify the asymmetric upside of strategies that do work; with 0 passing foundations there's nothing for sweeps to amplify.
Don't rush to LLM-driven re-codification. Tempting to think "maybe the source-fidelity codification would save these." For 1,001 of 1,003 failures, the OOS Sharpe is so far below threshold that no codification fidelity recovers it. Reserve LLM re-codification for the <5 borderline candidates.
Consider a different approach to strategy generation. The current pattern produces noise. Worth sitting with the question: where do real edges actually come from? That's a separate conversation from this realignment.

Forward plan — your decisions

Each block below is a decision. YES approves the default. MODIFY changes scope or details. NO skips. Decisions auto-save. Submit at the bottom.

Yes

Modify

Pending