deepen OHLCV history + make the factor-IC pass honest about overlap/regime
Two changes so the cross-sectional signal results can actually be trusted. (a) History depth — the binding constraint. Ingestion defaulted to 365 days, so long-lookback factors (12-month momentum, 52-week high) were only computable on a handful of weeks at the tail, and every IC reflected a single market regime. - New `settings.ohlcv_history_days` (default 1825 ≈ 5y); new tickers backfill this far instead of 1 year. - New manual "data_backfill" job (Admin → Jobs) re-fetches the full window for every ticker, ignoring incremental resume — run once to deepen existing 1-year histories. Idempotent (upsert); resumes after rate limits. (b) Factor-IC honesty. The IC was averaged over weekly rebalances whose 30-day forward windows overlap, inflating the t-stat ~sqrt(6)x. - IC now measured on NON-OVERLAPPING windows (weeks thinned to ~HORIZON apart). - Each signal carries a `reliable` flag (>= 12 independent windows); BacktestPanel greys out and de-stars thin signals so a lucky 9-week IC of 0.3 can't masquerade as an edge. 332 backend tests pass; frontend build clean. No migration (config + job + an added JSON field on the cached backtest report). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -12,6 +12,7 @@ from datetime import date, timedelta
|
||||
from sqlalchemy import func, select
|
||||
from sqlalchemy.ext.asyncio import AsyncSession
|
||||
|
||||
from app.config import settings
|
||||
from app.exceptions import NotFoundError, ProviderError, RateLimitError
|
||||
from app.models.ohlcv import OHLCVRecord
|
||||
from app.models.settings import IngestionProgress
|
||||
@@ -92,20 +93,23 @@ async def fetch_and_ingest(
|
||||
if end_date is None:
|
||||
end_date = date.today()
|
||||
|
||||
# Resolve start_date: use progress resume or default to 1 year ago.
|
||||
# If we have too little history, force a one-year backfill even if
|
||||
# ingestion progress exists (upsert makes this safe and idempotent).
|
||||
# Resolve start_date: use progress resume or backfill the configured history
|
||||
# window. If we have too little history, force a full backfill even if
|
||||
# ingestion progress exists (upsert makes this safe and idempotent). A caller
|
||||
# that passes an explicit start_date (e.g. the manual deep-backfill job)
|
||||
# bypasses this entirely.
|
||||
if start_date is None:
|
||||
progress = await _get_progress(db, ticker.id)
|
||||
bar_count = await _get_ohlcv_bar_count(db, ticker.id)
|
||||
minimum_backfill_bars = 200
|
||||
backfill_start = end_date - timedelta(days=settings.ohlcv_history_days)
|
||||
|
||||
if bar_count < minimum_backfill_bars:
|
||||
start_date = end_date - timedelta(days=365)
|
||||
start_date = backfill_start
|
||||
elif progress is not None:
|
||||
start_date = progress.last_ingested_date + timedelta(days=1)
|
||||
else:
|
||||
start_date = end_date - timedelta(days=365)
|
||||
start_date = backfill_start
|
||||
|
||||
# If start > end, nothing to fetch
|
||||
if start_date > end_date:
|
||||
|
||||
Reference in New Issue
Block a user