signal-platform

Author	SHA1	Message	Date
dennisthiessen	29a61cb2ca	fix: judge robustness under the recommended exit, not the abandoned one Deploy / lint (push) Successful in 6s Details Deploy / test (push) Successful in 57s Details Deploy / deploy (push) Successful in 32s Details The robustness warning was computed on the target-model distribution while the same panel recommends the hold exit — internally inconsistent. _robustness_stats (median, profit factor, ex-top-5% expectancy) is now shared by _bucket_stats and _time_exit_bucket, the time-exit table shows Median Net R and Ex-Top-5% per hold length, and _build_recommendation reads the trimmed expectancy from the recommended exit's bucket (falling back to the target model when no hold is recommended). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-07-02 12:50:13 +02:00
dennisthiessen	243e369e9a	feat: robustness stats + dynamic recommendation; retire settled report sections Deploy / lint (push) Successful in 6s Details Deploy / test (push) Successful in 54s Details Deploy / deploy (push) Successful in 32s Details Robustness (answers 'is the edge just outliers?'): - _bucket_stats gains median_net_r, profit_factor, and net_avg_r_ex_top5 (expectancy with the top 5% of winners removed); shown as stat tiles. - Portfolio sim gains per-calendar-year returns, shown in the sim table. Dynamic recommendation ('What this backtest recommends' panel): - _build_recommendation derives advice from the report's own numbers on every run — exit policy (target vs best hold, with sim CAGRs), which gate floors earn their keep (ablation Hold column), best momentum cutoff, book-vs-SPY verdict, and an outlier-dependence warning when the trimmed expectancy goes non-positive. Retired (conclusions reached, tables removed from report + UI): - Take-profit sweep (no interior optimum — fixed TP is the wrong tool for momentum), trailing sweep (converged to the hold-to-horizon exit), probability calibration (model is display-only by decision). - _tp_primitives slimmed to _risk_and_stop_day; trailing machinery gone. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-07-02 12:33:22 +02:00
dennisthiessen	0f43e755f4	feat: portfolio simulation + per-trade stats (gaps, hold time, best/worst) Deploy / lint (push) Successful in 6s Details Deploy / test (push) Successful in 55s Details Deploy / deploy (push) Successful in 38s Details Per-trade additions to the report: - Gap-through-stop fills: stops now fill at the worse of the stop or the bar's open across every exit model (target, TP, trailing, time), so a loss can exceed -1R; targets never fill better than their level. - best_r / worst_r, avg holding days, and net R per day of capital deployed on the summary buckets and the time-exit sweep. Portfolio simulation (the stats a per-setup replay cannot give): - One capital-constrained book over the qualified setups: 10k start, max 10 concurrent positions (one per ticker, best momentum first), 1% fixed-fractional risk with a 20% no-leverage notional cap, entries at the detection close, 0.1%/side costs, daily mark-to-market. - Two exit policies compared: S/R target race vs hold-to-horizon. - Equity-curve stats: final equity, total return, CAGR, max drawdown, annualized daily Sharpe, win rate, avg P&L, best/worst trade, avg hold, entries skipped on a full book, and SPY price return over the same window (benchmark history refreshed to cover the replay span). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-07-02 11:56:29 +02:00
dennisthiessen	942a22ce65	feat: grade gate-ablation variants under the hold-to-horizon exit too Deploy / lint (push) Successful in 6s Details Deploy / test (push) Successful in 55s Details Deploy / deploy (push) Successful in 33s Details The ablation judged floors under the target/stop model, but the exit sweeps point at replacing that exit with a fixed hold — under which the R:R floor's rationale (bigger payoff at the target) may not apply. Each ablation row now also carries hold_avg_r / hold_net_avg_r / hold_total_r (30d hold, initial stop only), so the Phase 3 gate decision can be read under the exit policy that would actually be used. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-07-02 11:34:41 +02:00
dennisthiessen	8750aac6d9	fix: carry action/risk_level onto backtest candidates for the gate ablation Deploy / lint (push) Successful in 6s Details Deploy / test (push) Successful in 57s Details Deploy / deploy (push) Successful in 2m24s Details _window_setups computed them but _replay_ticker dropped them, so the ablation's NEUTRAL/tightener checks saw None for every candidate and the 'without confidence floor' / 'without R:R floor' rows collapsed to 0 setups (impossible — removing a floor can only add setups). Regression test now goes through the real _replay_ticker path. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-07-02 08:07:27 +02:00
dennisthiessen	29b1a9a28c	feat: net-of-cost backtest, gate ablation + time-exit sweeps, longer tails Deploy / lint (push) Successful in 7s Details Deploy / test (push) Successful in 57s Details Deploy / deploy (push) Successful in 32s Details Phase 1 of the strategy-measurement plan — report-only, no production trading behavior changes: - Cost haircut: every bucket/sweep now reports net_avg_r/net_total_r alongside gross (COST_PER_SIDE=0.1% of notional, converted to R via each setup's stop distance); params carry cost_per_side_pct. - Gate ablation table: re-qualifies candidates at the current momentum cutoff with one floor removed per row (confidence / R:R / NEUTRAL / momentum-only) to show which floors earn their keep. - Time-based exit sweep: hold 5/10/21/30 days with the initial ATR stop, exit at the day-N close — the classic momentum implementation, to disambiguate the wide-trailing result. - TP sweep extended to +40/+50%, trailing to 25/30% so the optima are interior instead of starred at the sweep edge. - BacktestPanel: Net Avg R columns everywhere, gate-ablation and time-exit tables, stars now mark best net avg R; stale cached reports still render (all new fields optional/guarded). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-07-02 07:50:37 +02:00
dennisthiessen	ab9ce18809	feat: trailing-stop exit sweep in the backtest Deploy / lint (push) Successful in 6s Details Deploy / test (push) Successful in 55s Details Deploy / deploy (push) Successful in 32s Details Third exit model alongside target-vs-stop and the fixed take-profit. The TP sweep showed the edge lives in the fat tail (avg R keeps rising as you let winners run), but a fixed wide target is win-rate-brutal and gives everything back on a reversal. A trailing stop harvests the tail while protecting gains. Per setup the replay computes the realized R for several trail widths (3/5/7/10/ 15/20%) in a single conservative pass — stop ratchets up via max(initial_stop, peak*(1-trail)), exit on the pullback or at the horizon close, R vs the initial risk. Aggregated into a trailing sweep (win rate = share closed in profit, avg R, total R) over the qualified set and shown as a new table in the Backtest panel. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-30 17:33:17 +02:00
dennisthiessen	c5f6b07a3e	feat: extend take-profit sweep into the tail + clarify it ignores the target Deploy / lint (push) Successful in 6s Details Deploy / test (push) Successful in 55s Details Deploy / deploy (push) Successful in 33s Details Avg R was still rising at the previous top level (+15%), so the optimum was off the table. Extend TP_LEVELS to 20/25/30% to reveal where letting winners run stops paying (it plateaus toward "just hold to the horizon close"). Also clarify in the panel that the take-profit model deliberately does NOT use the setup's S/R target — it's a standalone fixed-% exit; exiting at the target is the target-vs-stop model above. The two are complementary ends, not in conflict. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-30 17:14:54 +02:00
dennisthiessen	c63951ca02	feat: take-profit exit sweep in the backtest (alongside target-vs-stop) Deploy / lint (push) Successful in 7s Details Deploy / test (push) Successful in 59s Details Deploy / deploy (push) Successful in 34s Details The target-vs-stop model counts a near-miss of a far S/R target as a full loss and ignores the partial gains you actually bank — so it measures a different strategy than "scalp the early pop, take +8%". Add a realistic take-profit exit model next to it (original untouched). Per setup the replay now also records risk%, whether the stop was hit, the favourable excursion reachable before the stop (MFE), and the horizon-close move. From those a fixed-take-profit sweep (4/6/8/10/12/15%) is scored in R: bank +X% if reached before the stop, else -1R, else the horizon close. Hit rate = how often +X% was banked (the MFE CDF), so you can pick the EV-optimal TP without top-ticking fantasy. Shown as a new table in the Backtest panel; the IC, calibration and momentum sweep are unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-30 16:56:32 +02:00
dennisthiessen	437ceacfc1	refactor: dedupe scheduler logging/runtime, centralize SystemSetting access, fix rankings N+1 Deploy / lint (push) Successful in 7s Details Deploy / test (push) Successful in 42s Details Deploy / deploy (push) Successful in 27s Details Behavior-preserving cleanup (345 tests pass, ruff clean): - scheduler: replace 62 inline logger.x(json.dumps({...})) calls with a _log_event helper, and collapse 11 identical _job_runtime dicts into an _idle_runtime() factory over _JOB_NAMES. - settings: add app/services/settings_store.py (get_setting/get_value/get_map/ upsert_setting) and route ~13 hand-rolled SystemSetting queries + two identical _settings_map helpers through it. - scoring.get_rankings: collapse the per-ticker N+1 (3-4 queries + a commit each) into 2 bulk reads + a single conditional commit; drop the redundant re-fetch. Lazy recompute-on-read is preserved. Adds first tests for get_rankings. Net ~ -245 lines across the touched modules.	2026-06-24 11:23:39 +02:00
dennisthiessen	605f95098c	momentum gate: long-only + wire the percentile onto live setups Deploy / lint (push) Successful in 6s Details Deploy / test (push) Successful in 47s Details Deploy / deploy (push) Successful in 24s Details Part 1 — long-only. The momentum edge is long top-momentum; the gate was qualifying shorts on high-momentum names (fighting the trend), which showed as the -0.13R Short(qual.) drag. While the gate is active, shorts no longer qualify (backend qualification, backtest _momentum_qualifies, and the frontend mirror). Part 2 — production wiring. Live setups now carry a real momentum rank, so the dashboard, the Track Record's qualified stats, and outcome evaluation all gate on the same value instead of deferring to floors: - new momentum_service.compute_momentum_percentiles: 12-1 momentum per ticker, ranked across the universe into a {symbol: percentile} map. - the daily R:R scan ranks the universe up front and stores each setup's percentile (new trade_setups.momentum_percentile column, migration 010). - enhance_trade_setup mutates the same row, so the percentile is preserved; _trade_setup_to_dict + TradeSetupResponse expose it to the API. Until a fresh scan runs, pre-existing setups have a null percentile and the gate falls back to floors for them (longs) / excludes them (shorts) — they fill in on the next scan. 341 backend tests pass; frontend build clean. Needs the alembic upgrade (migration 010) on deploy. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-24 07:07:38 +02:00
dennisthiessen	7060b9a019	parallelize the backtest across worker processes (true multi-core) Deploy / lint (push) Successful in 6s Details Deploy / test (push) Successful in 38s Details Deploy / deploy (push) Successful in 25s Details The replay was CPU-bound and single-core: the earlier asyncio.to_thread offload kept the API responsive but, because of the GIL, ran on one core. Per-ticker replay is independent, so fan it out across worker processes (which sidestep the GIL) for real multi-core speedup. - New `settings.backtest_workers` (default 4), capped to cpu_count-1 so a core stays free for the web server. - Uses a `forkserver` context (workers forked from a clean single-threaded server — avoids the fork-with-threads deadlock); falls back to `fork`. On spawn-only platforms (Windows) and for 1-ticker runs it uses the thread path, so dev/tests are unaffected. - Worker takes primitive column arrays (cheap to pickle), rebuilds bars, and returns (candidates, plain-dict signal series) — both picklable across the process boundary. Bars are still fetched in the event loop (ORM-safe). - Pool creation is guarded: if the pool can't start, the job falls back to the sequential thread path instead of failing. 334 backend tests pass (parallel path is POSIX/server-only, so it's covered by construction + the picklability/worker-count tests; the thread fallback is exercised by the run_backtest smoke test). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 23:20:20 +02:00
dennisthiessen	ef523474ad	replace EV activation gate with cross-sectional 12-1 momentum ranking Deploy / lint (push) Successful in 7s Details Deploy / test (push) Successful in 41s Details Deploy / deploy (push) Successful in 26s Details The 5-year backtest confirmed the EV gate adds negative value (high threshold = worst expectancy) and that 12-1 month momentum is the one price signal with a plausible, right-signed cross-sectional IC (~0.05). So "qualified" now means: clears the R:R + confidence floors AND the ticker ranks in the top `min_momentum_percentile` of the universe by 12-1 momentum that week. - qualification.py: drop expected_value_r / the EV gate; add a momentum-percentile gate (duck-typed `momentum_percentile`, only enforced when attached + threshold set, else defers to floors). Mirrored in frontend qualification.ts. - activation config/schema: min_expected_value -> min_momentum_percentile (default 80 = top quintile). ActivationSettings, DashboardPage (ranks/【shows】 momentum instead of EV), and the BacktestPanel sweep follow. - backtest: rank each ISO week's universe by 12-1 momentum, assign a percentile, and qualify the top slice; the sweep now sweeps the percentile cutoff. Also offload the backtest's per-ticker compute to a worker thread so the heavy ~5y run no longer blocks the API event loop (the "backend offline" flicker). Production setups don't carry momentum_percentile yet — wiring the scanner to attach it (a universe momentum-rank step) is the next step; until then the live gate defers to floors while the backtest measures the momentum selection. 330 backend tests pass; frontend build clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 22:42:24 +02:00
dennisthiessen	099846513b	deepen OHLCV history + make the factor-IC pass honest about overlap/regime Deploy / lint (push) Successful in 7s Details Deploy / test (push) Successful in 39s Details Deploy / deploy (push) Successful in 25s Details Two changes so the cross-sectional signal results can actually be trusted. (a) History depth — the binding constraint. Ingestion defaulted to 365 days, so long-lookback factors (12-month momentum, 52-week high) were only computable on a handful of weeks at the tail, and every IC reflected a single market regime. - New `settings.ohlcv_history_days` (default 1825 ≈ 5y); new tickers backfill this far instead of 1 year. - New manual "data_backfill" job (Admin → Jobs) re-fetches the full window for every ticker, ignoring incremental resume — run once to deepen existing 1-year histories. Idempotent (upsert); resumes after rate limits. (b) Factor-IC honesty. The IC was averaged over weekly rebalances whose 30-day forward windows overlap, inflating the t-stat ~sqrt(6)x. - IC now measured on NON-OVERLAPPING windows (weeks thinned to ~HORIZON apart). - Each signal carries a `reliable` flag (>= 12 independent windows); BacktestPanel greys out and de-stars thin signals so a lucky 9-week IC of 0.3 can't masquerade as an edge. 332 backend tests pass; frontend build clean. No migration (config + job + an added JSON field on the cached backtest report). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 18:20:59 +02:00
dennisthiessen	402025692a	add cross-sectional signal evaluation (factor rank-IC) to the backtest Deploy / lint (push) Successful in 6s Details Deploy / test (push) Successful in 40s Details Deploy / deploy (push) Successful in 26s Details The per-setup hit-rate report can't tell whether a signal predicts returns — only how a target/stop structure built on one performs. This adds a cross-sectional factor-IC pass: each week the universe is ranked by a price-only signal and graded by its rank correlation (Spearman IC) and top-minus-bottom- quintile spread against the forward 30-day return. Candidate signals (point-in-time from price; sentiment/fundamentals have no history in the replay): 12-1/6-1/3-1 month momentum, 1-month reversal, price-vs-200d SMA, proximity to the 52-week high (George/Hwang), and 126-day realized volatility (low-vol anomaly). Reuses the existing per-ticker replay loop (no new data, no second DB pass); results land in the cached backtest_report as `signal_eval` and render as a "Signal edge" table in BacktestPanel beside the calibration curve. 330 backend tests pass (10 new in test_signal_eval); frontend build clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 17:58:40 +02:00
dennisthiessen	c34f3cb1a4	redesign activation gate to expected value + make pipelines cron-configurable Deploy / lint (push) Successful in 9s Details Deploy / test (push) Successful in 46s Details Deploy / deploy (push) Successful in 28s Details Diagnosing "no qualified signals for 5 days": setups were generated but none qualified. The gate required BOTH a high min_rr (2.0) AND a high min_target_probability (60), which became contradictory after the Jun-15 probability recalibration — probability already embeds R:R via the 1/(rr+1) ruin term, so high-R:R targets are inherently low-probability and nothing cleared both. Gate is now expected value (R): prr - (1-p) from the primary target's probability. R:R and confidence stay as floors; high-conviction / exclude-conflicts / min-target-probability become optional tighteners (default off). Defaults: min_expected_value=0.15, min_rr=1.2, min_confidence=55. EV is only enforced when computable. Migration 009 clears stored activation_ rows so the new defaults apply. Backtest sweeps min_expected_value instead of target probability. Scheduling: pipelines are now cron-configurable in Admin -> Jobs. daily_pipeline (full, default 0 7 * * *) plus a new light intraday_pipeline (OHLCV + outcome eval, default hourly US session) that keeps prices/live-R:R current without setup churn. Fundamentals on its own early weekly cron. Timezone configurable (default Europe/Berlin). Moving interval->CronTrigger also fixes the restart-deferral bug where an interval job's countdown resets on every process restart. 319 backend unit tests pass; frontend tsc clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 14:46:38 +02:00
dennisthiessen	050abc6f71	backtest: add min target-probability sweep Deploy / lint (push) Successful in 7s Details Deploy / test (push) Successful in 40s Details Deploy / deploy (push) Successful in 26s Details Re-applies the activation gate at several min_target_probability thresholds (60→30, other conditions fixed) over the already-replayed candidates, so the trade-off between how many setups qualify and their expectancy is visible in one table — the cheap "optimize" half of Phase 2. Candidates now carry meets_core + best_prob so the sweep needs no re-replay. New sweep table in BacktestPanel with the current threshold starred. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 06:13:30 +02:00
dennisthiessen	6df67ad7ae	add backtest harness (Phase 1): historical replay + hit-rate & calibration reports Deploy / lint (push) Successful in 6s Details Deploy / test (push) Successful in 35s Details Deploy / deploy (push) Successful in 25s Details Replays the price-derived engine over stored OHLCV: at each weekly as-of date, rebuild the setup from bars <= D (no lookahead) and walk the actual forward bars for the realized outcome. Reports realized hit-rate/expectancy of qualified setups (and all setups, by direction) plus a probability calibration curve (predicted target prob vs realized hit rate). Reuses pure functions throughout; extracted compute_technical_from_arrays / compute_momentum_from_closes from scoring_service so live and backtest stay in sync. Runs as a weekly/triggerable 'backtest' job caching the report in a SystemSetting; GET /backtest/report serves it. Sentiment/fundamentals held neutral (no point-in-time history) — calibrates the price/S-R/probability machinery. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 20:14:07 +02:00

18 Commits