sentiment: LLM buy/hold/avoid + full analysis, and search-budget scoping

Richer LLM output (same grounded call, ~no extra cost): - All providers now also return a recommendation (buy/hold/avoid) and a thorough reasoning paragraph; Gemini now actually captures reasoning + grounding citations (it was dropping them). Stored on sentiment_scores (migration 008), exposed in the API; display-only — NOT fed into the composite/EV. - Ticker Sentiment panel shows an "LLM view" badge and a "Full analysis & sources" expander with the complete reasoning + citations. Search-budget scoping (Gemini grounding free tier = 5000/mo): - collect_sentiment now targets only watchlist + open paper trades + top-N by composite, skips tickers refreshed within sentiment_fresh_hours (72h), and caps per run (sentiment_max_per_run). Once the relevant set is fresh, runs spend 0 searches until it ages out — bounding monthly usage well under the free tier. - Widened sentiment lookback to 7d (scoring + display) so sparser collection still feeds the dimension score. Deploy: alembic upgrade (sentiment_scores.recommendation). Switch provider to Gemini Flash in Admin for the cost win (grounded, cheapest). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 16:34:19 +02:00
parent a69557f5d8
commit e5166ed668
16 changed files with 219 additions and 36 deletions
@@ -16,10 +16,10 @@ from __future__ import annotations
 import json
 import logging
 import asyncio
-from datetime import date, datetime, timezone
+from datetime import date, datetime, timedelta, timezone

 from apscheduler.schedulers.asyncio import AsyncIOScheduler
-from sqlalchemy import case, func, select
+from sqlalchemy import case, func, or_, select
 from sqlalchemy.ext.asyncio import AsyncSession

 from app.config import settings
@@ -281,20 +281,49 @@ async def _get_ohlcv_priority_tickers(db: AsyncSession) -> list[str]:


 async def _get_sentiment_priority_tickers(db: AsyncSession) -> list[str]:
-    """Return symbols prioritized for sentiment collection.
+    """Symbols to fetch sentiment for, budgeted to stay in the free search tier.

-    Priority:
-      1) Tickers with no sentiment records
-      2) Tickers with records, oldest latest sentiment timestamp first
-      3) Alphabetical tiebreaker
+    Scope: only tickers that matter — watchlist + open paper trades + top-N by
+    composite score. Skip any refreshed within ``sentiment_fresh_hours``. Cap the
+    run at ``sentiment_max_per_run``, oldest/missing first. Once the relevant set
+    is fresh, runs make zero grounded searches until it ages out.
    """
+    from app.models.paper_trade import PaperTrade
+    from app.models.score import CompositeScore
+    from app.models.watchlist import WatchlistEntry
+
+    relevant: set[int] = set()
+    wl = await db.execute(
+        select(WatchlistEntry.ticker_id)
+        .where(WatchlistEntry.entry_type != "dismissed")
+        .distinct()
+    )
+    relevant.update(r[0] for r in wl.all())
+    pt = await db.execute(
+        select(PaperTrade.ticker_id).where(PaperTrade.status == "open").distinct()
+    )
+    relevant.update(r[0] for r in pt.all())
+    top = await db.execute(
+        select(CompositeScore.ticker_id)
+        .order_by(CompositeScore.score.desc())
+        .limit(settings.sentiment_top_composite)
+    )
+    relevant.update(r[0] for r in top.all())
+
+    if not relevant:
+        return []
+
+    cutoff = datetime.now(timezone.utc) - timedelta(hours=settings.sentiment_fresh_hours)
    latest_ts = func.max(SentimentScore.timestamp)
    missing_first = case((latest_ts.is_(None), 0), else_=1)
    result = await db.execute(
        select(Ticker.symbol)
        .outerjoin(SentimentScore, SentimentScore.ticker_id == Ticker.id)
+        .where(Ticker.id.in_(relevant))
        .group_by(Ticker.id, Ticker.symbol)
+        .having(or_(latest_ts.is_(None), latest_ts < cutoff))
        .order_by(missing_first.asc(), latest_ts.asc(), Ticker.symbol.asc())
+        .limit(settings.sentiment_max_per_run)
    )
    return list(result.scalars().all())

@@ -531,6 +560,7 @@ async def collect_sentiment() -> None:
                            timestamp=data.timestamp,
                            reasoning=data.reasoning,
                            citations=data.citations,
+                            recommendation=data.recommendation,
                        )
                        _last_successful[job_name] = symbol
                        processed += 1