Robustness (answers 'is the edge just outliers?'):
- _bucket_stats gains median_net_r, profit_factor, and net_avg_r_ex_top5
(expectancy with the top 5% of winners removed); shown as stat tiles.
- Portfolio sim gains per-calendar-year returns, shown in the sim table.
Dynamic recommendation ('What this backtest recommends' panel):
- _build_recommendation derives advice from the report's own numbers on
every run — exit policy (target vs best hold, with sim CAGRs), which
gate floors earn their keep (ablation Hold column), best momentum
cutoff, book-vs-SPY verdict, and an outlier-dependence warning when
the trimmed expectancy goes non-positive.
Retired (conclusions reached, tables removed from report + UI):
- Take-profit sweep (no interior optimum — fixed TP is the wrong tool
for momentum), trailing sweep (converged to the hold-to-horizon exit),
probability calibration (model is display-only by decision).
- _tp_primitives slimmed to _risk_and_stop_day; trailing machinery gone.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Per-trade additions to the report:
- Gap-through-stop fills: stops now fill at the worse of the stop or the
bar's open across every exit model (target, TP, trailing, time), so a
loss can exceed -1R; targets never fill better than their level.
- best_r / worst_r, avg holding days, and net R per day of capital
deployed on the summary buckets and the time-exit sweep.
Portfolio simulation (the stats a per-setup replay cannot give):
- One capital-constrained book over the qualified setups: 10k start, max
10 concurrent positions (one per ticker, best momentum first), 1%
fixed-fractional risk with a 20% no-leverage notional cap, entries at
the detection close, 0.1%/side costs, daily mark-to-market.
- Two exit policies compared: S/R target race vs hold-to-horizon.
- Equity-curve stats: final equity, total return, CAGR, max drawdown,
annualized daily Sharpe, win rate, avg P&L, best/worst trade, avg
hold, entries skipped on a full book, and SPY price return over the
same window (benchmark history refreshed to cover the replay span).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The ablation judged floors under the target/stop model, but the exit
sweeps point at replacing that exit with a fixed hold — under which the
R:R floor's rationale (bigger payoff at the target) may not apply. Each
ablation row now also carries hold_avg_r / hold_net_avg_r / hold_total_r
(30d hold, initial stop only), so the Phase 3 gate decision can be read
under the exit policy that would actually be used.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Phase 1 of the strategy-measurement plan — report-only, no production
trading behavior changes:
- Cost haircut: every bucket/sweep now reports net_avg_r/net_total_r
alongside gross (COST_PER_SIDE=0.1% of notional, converted to R via
each setup's stop distance); params carry cost_per_side_pct.
- Gate ablation table: re-qualifies candidates at the current momentum
cutoff with one floor removed per row (confidence / R:R / NEUTRAL /
momentum-only) to show which floors earn their keep.
- Time-based exit sweep: hold 5/10/21/30 days with the initial ATR stop,
exit at the day-N close — the classic momentum implementation, to
disambiguate the wide-trailing result.
- TP sweep extended to +40/+50%, trailing to 25/30% so the optima are
interior instead of starred at the sweep edge.
- BacktestPanel: Net Avg R columns everywhere, gate-ablation and
time-exit tables, stars now mark best net avg R; stale cached reports
still render (all new fields optional/guarded).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Third exit model alongside target-vs-stop and the fixed take-profit. The TP sweep
showed the edge lives in the fat tail (avg R keeps rising as you let winners run),
but a fixed wide target is win-rate-brutal and gives everything back on a reversal.
A trailing stop harvests the tail while protecting gains.
Per setup the replay computes the realized R for several trail widths (3/5/7/10/
15/20%) in a single conservative pass — stop ratchets up via max(initial_stop,
peak*(1-trail)), exit on the pullback or at the horizon close, R vs the initial
risk. Aggregated into a trailing sweep (win rate = share closed in profit, avg R,
total R) over the qualified set and shown as a new table in the Backtest panel.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Avg R was still rising at the previous top level (+15%), so the optimum was off
the table. Extend TP_LEVELS to 20/25/30% to reveal where letting winners run
stops paying (it plateaus toward "just hold to the horizon close").
Also clarify in the panel that the take-profit model deliberately does NOT use
the setup's S/R target — it's a standalone fixed-% exit; exiting at the target is
the target-vs-stop model above. The two are complementary ends, not in conflict.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The target-vs-stop model counts a near-miss of a far S/R target as a full loss
and ignores the partial gains you actually bank — so it measures a different
strategy than "scalp the early pop, take +8%". Add a realistic take-profit exit
model next to it (original untouched).
Per setup the replay now also records risk%, whether the stop was hit, the
favourable excursion reachable before the stop (MFE), and the horizon-close move.
From those a fixed-take-profit sweep (4/6/8/10/12/15%) is scored in R: bank +X%
if reached before the stop, else -1R, else the horizon close. Hit rate = how
often +X% was banked (the MFE CDF), so you can pick the EV-optimal TP without
top-ticking fantasy. Shown as a new table in the Backtest panel; the IC,
calibration and momentum sweep are unchanged.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The momentum-sweep table read row.min_momentum_percentile.toFixed(), but a report
cached before the EV->momentum change only has min_expected_value rows. undefined
.toFixed() threw during render and — with no error boundary — blanked the whole
Track Record tab. Guard the sweep block on the new field so a stale report just
hides the sweep; re-running the backtest repopulates it.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The 5-year backtest confirmed the EV gate adds negative value (high threshold =
worst expectancy) and that 12-1 month momentum is the one price signal with a
plausible, right-signed cross-sectional IC (~0.05). So "qualified" now means:
clears the R:R + confidence floors AND the ticker ranks in the top
`min_momentum_percentile` of the universe by 12-1 momentum that week.
- qualification.py: drop expected_value_r / the EV gate; add a momentum-percentile
gate (duck-typed `momentum_percentile`, only enforced when attached + threshold
set, else defers to floors). Mirrored in frontend qualification.ts.
- activation config/schema: min_expected_value -> min_momentum_percentile
(default 80 = top quintile). ActivationSettings, DashboardPage (ranks/【shows】
momentum instead of EV), and the BacktestPanel sweep follow.
- backtest: rank each ISO week's universe by 12-1 momentum, assign a percentile,
and qualify the top slice; the sweep now sweeps the percentile cutoff.
Also offload the backtest's per-ticker compute to a worker thread so the heavy
~5y run no longer blocks the API event loop (the "backend offline" flicker).
Production setups don't carry momentum_percentile yet — wiring the scanner to
attach it (a universe momentum-rank step) is the next step; until then the live
gate defers to floors while the backtest measures the momentum selection. 330
backend tests pass; frontend build clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two changes so the cross-sectional signal results can actually be trusted.
(a) History depth — the binding constraint. Ingestion defaulted to 365 days, so
long-lookback factors (12-month momentum, 52-week high) were only computable on a
handful of weeks at the tail, and every IC reflected a single market regime.
- New `settings.ohlcv_history_days` (default 1825 ≈ 5y); new tickers backfill this
far instead of 1 year.
- New manual "data_backfill" job (Admin → Jobs) re-fetches the full window for
every ticker, ignoring incremental resume — run once to deepen existing
1-year histories. Idempotent (upsert); resumes after rate limits.
(b) Factor-IC honesty. The IC was averaged over weekly rebalances whose 30-day
forward windows overlap, inflating the t-stat ~sqrt(6)x.
- IC now measured on NON-OVERLAPPING windows (weeks thinned to ~HORIZON apart).
- Each signal carries a `reliable` flag (>= 12 independent windows); BacktestPanel
greys out and de-stars thin signals so a lucky 9-week IC of 0.3 can't masquerade
as an edge.
332 backend tests pass; frontend build clean. No migration (config + job + an
added JSON field on the cached backtest report).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The per-setup hit-rate report can't tell whether a signal predicts returns —
only how a target/stop structure built on one performs. This adds a
cross-sectional factor-IC pass: each week the universe is ranked by a price-only
signal and graded by its rank correlation (Spearman IC) and top-minus-bottom-
quintile spread against the forward 30-day return.
Candidate signals (point-in-time from price; sentiment/fundamentals have no
history in the replay): 12-1/6-1/3-1 month momentum, 1-month reversal,
price-vs-200d SMA, proximity to the 52-week high (George/Hwang), and 126-day
realized volatility (low-vol anomaly).
Reuses the existing per-ticker replay loop (no new data, no second DB pass);
results land in the cached backtest_report as `signal_eval` and render as a
"Signal edge" table in BacktestPanel beside the calibration curve.
330 backend tests pass (10 new in test_signal_eval); frontend build clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Diagnosing "no qualified signals for 5 days": setups were generated but none
qualified. The gate required BOTH a high min_rr (2.0) AND a high
min_target_probability (60), which became contradictory after the Jun-15
probability recalibration — probability already embeds R:R via the 1/(rr+1) ruin
term, so high-R:R targets are inherently low-probability and nothing cleared both.
Gate is now expected value (R): p*rr - (1-p) from the primary target's
probability. R:R and confidence stay as floors; high-conviction / exclude-conflicts
/ min-target-probability become optional tighteners (default off). Defaults:
min_expected_value=0.15, min_rr=1.2, min_confidence=55. EV is only enforced when
computable. Migration 009 clears stored activation_* rows so the new defaults
apply. Backtest sweeps min_expected_value instead of target probability.
Scheduling: pipelines are now cron-configurable in Admin -> Jobs. daily_pipeline
(full, default 0 7 * * *) plus a new light intraday_pipeline (OHLCV + outcome eval,
default hourly US session) that keeps prices/live-R:R current without setup churn.
Fundamentals on its own early weekly cron. Timezone configurable (default
Europe/Berlin). Moving interval->CronTrigger also fixes the restart-deferral bug
where an interval job's countdown resets on every process restart.
319 backend unit tests pass; frontend tsc clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Re-applies the activation gate at several min_target_probability thresholds
(60→30, other conditions fixed) over the already-replayed candidates, so the
trade-off between how many setups qualify and their expectancy is visible in one
table — the cheap "optimize" half of Phase 2. Candidates now carry meets_core +
best_prob so the sweep needs no re-replay. New sweep table in BacktestPanel with
the current threshold starred.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
New BacktestPanel: shows qualified hit-rate/expectancy vs the all-setups baseline,
a by-direction breakdown, and the probability calibration table (predicted vs
realized, over-confident buckets flagged amber). Includes a "Run backtest" button
that triggers the job and a plain explanation of the method and its limits.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>