Files
Dennis Thiessen 61ab24490d
Some checks failed
Deploy / lint (push) Failing after 7s
Deploy / test (push) Has been skipped
Deploy / deploy (push) Has been skipped
first commit
2026-02-20 17:31:01 +01:00

43 KiB
Raw Permalink Blame History

Design Document: Stock Data Backend

Overview

The Stock Data Backend is an MVP investing-signal platform built with Python/FastAPI and PostgreSQL, focused on NASDAQ stocks. It ingests OHLCV price data from a swappable market data provider, computes technical indicators (ADX, EMA, RSI, ATR, Volume Profile, Pivot Points), detects support/resistance levels, collects sentiment and fundamental data, and feeds everything into a composite scoring engine. The scoring engine ranks tickers, auto-populates a watchlist, and an R:R scanner flags asymmetric trade setups.

The system is API-first (REST, JSON envelope, versioned URLs), uses JWT auth with role-based access, and runs scheduled jobs for data collection. All computation is on-demand or scheduled — no streaming, no websockets, no real-time feeds.

Key Design Decisions

  • Single process: FastAPI app with APScheduler for scheduled jobs — no separate worker processes.
  • On-demand scoring: Composite scores are marked stale when inputs change and recomputed only when requested.
  • Simple LRU cache: In-memory functools.lru_cache (max 1000 entries) for indicator computations. No TTL — cache is invalidated when new OHLCV data is ingested for a ticker.
  • Provider abstraction: Market data provider behind a Python Protocol class for swappability.
  • Fixed indicator set: ADX, EMA, RSI, ATR, Volume Profile, Pivot Points — no plugin architecture.
  • Sentiment: Single source, weighted average with configurable time decay and lookback window.
  • Fundamentals: Single source, simple periodic fetch.
  • Watchlist cap: Auto-populated top-X (default 10) + max 10 manual additions = max 20.

Architecture

High-Level Architecture

graph TB
    Client[API Client] --> API[FastAPI App /api/v1/]
    API --> Auth[Auth Service]
    API --> TickerReg[Ticker Registry]
    API --> PriceStore[Price Store]
    API --> Ingestion[Ingestion Pipeline]
    API --> TechAnalysis[Technical Analysis]
    API --> SRDetector[S/R Detector]
    API --> Scoring[Scoring Engine]
    API --> RRScanner[R:R Scanner]
    API --> Watchlist[Watchlist Service]
    API --> Admin[Admin Service]
    
    Ingestion --> Provider[Market Data Provider Protocol]
    Provider --> ExternalAPI[External Market Data API]
    
    Scheduler[APScheduler] --> Ingestion
    Scheduler --> SentimentCollector[Sentiment Collector]
    Scheduler --> FundCollector[Fundamental Collector]
    Scheduler --> RRScanner
    
    TickerReg --> DB[(PostgreSQL)]
    PriceStore --> DB
    Auth --> DB
    Scoring --> DB
    SRDetector --> DB
    Watchlist --> DB
    
    TechAnalysis --> Cache[LRU Cache max=1000]
    TechAnalysis --> PriceStore

Request Flow

sequenceDiagram
    participant C as Client
    participant A as FastAPI
    participant Auth as Auth Middleware
    participant S as Service Layer
    participant DB as PostgreSQL

    C->>A: HTTP Request + JWT
    A->>Auth: Validate token + role
    Auth-->>A: User context
    A->>S: Call service method
    S->>DB: Query/Mutate
    DB-->>S: Result
    S-->>A: Domain result
    A-->>C: JSON envelope response

Project Structure

stock-data-backend/
├── alembic/                    # DB migrations
│   ├── versions/
│   └── env.py
├── app/
│   ├── main.py                 # FastAPI app, lifespan, scheduler
│   ├── config.py               # Settings via pydantic-settings
│   ├── database.py             # SQLAlchemy engine, session factory
│   ├── models/                 # SQLAlchemy ORM models
│   │   ├── ticker.py
│   │   ├── ohlcv.py
│   │   ├── user.py
│   │   ├── sentiment.py
│   │   ├── fundamental.py
│   │   ├── score.py
│   │   ├── sr_level.py
│   │   ├── trade_setup.py
│   │   ├── watchlist.py
│   │   └── settings.py
│   ├── schemas/                # Pydantic request/response schemas
│   │   ├── common.py           # APIEnvelope, pagination
│   │   ├── ticker.py
│   │   ├── ohlcv.py
│   │   ├── auth.py
│   │   ├── indicator.py
│   │   ├── sr_level.py
│   │   ├── sentiment.py
│   │   ├── fundamental.py
│   │   ├── score.py
│   │   ├── trade_setup.py
│   │   ├── watchlist.py
│   │   └── admin.py
│   ├── routers/                # FastAPI routers (one per domain)
│   │   ├── tickers.py
│   │   ├── ohlcv.py
│   │   ├── ingestion.py
│   │   ├── indicators.py
│   │   ├── sr_levels.py
│   │   ├── sentiment.py
│   │   ├── fundamentals.py
│   │   ├── scores.py
│   │   ├── trades.py
│   │   ├── watchlist.py
│   │   ├── auth.py
│   │   ├── admin.py
│   │   └── health.py
│   ├── services/               # Business logic
│   │   ├── ticker_service.py
│   │   ├── price_service.py
│   │   ├── ingestion_service.py
│   │   ├── indicator_service.py
│   │   ├── sr_service.py
│   │   ├── sentiment_service.py
│   │   ├── fundamental_service.py
│   │   ├── scoring_service.py
│   │   ├── rr_scanner_service.py
│   │   ├── watchlist_service.py
│   │   ├── auth_service.py
│   │   └── admin_service.py
│   ├── providers/              # External data provider abstractions
│   │   ├── protocol.py         # MarketDataProvider, SentimentProvider, FundamentalProvider Protocols
│   │   ├── alpaca.py           # Alpaca OHLCV provider (alpaca-py)
│   │   ├── gemini_sentiment.py # Gemini LLM sentiment provider (google-genai + search grounding)
│   │   └── fmp.py              # Financial Modeling Prep fundamentals provider (httpx)
│   ├── scheduler.py            # APScheduler job definitions
│   ├── dependencies.py         # FastAPI dependency injection
│   ├── middleware.py            # Logging, error handling
│   └── cache.py                # LRU cache wrapper with invalidation
├── tests/
│   ├── unit/
│   ├── property/
│   └── conftest.py
├── deploy/                     # Deployment templates
│   ├── nginx.conf              # Nginx reverse proxy config for signal.thiessen.io
│   ├── stock-data-backend.service  # systemd service file
│   └── setup_db.sh             # DB creation + migration script
├── .gitea/
│   └── workflows/
│       └── deploy.yml          # Gitea Actions CI/CD pipeline
├── alembic.ini
├── pyproject.toml
└── .env.example

Components and Interfaces

1. Market Data Provider Protocol

from typing import Protocol
from datetime import date

class MarketDataProvider(Protocol):
    async def fetch_ohlcv(
        self, ticker: str, start_date: date, end_date: date
    ) -> list[OHLCVRecord]:
        """Fetch OHLCV data for a ticker in a date range."""
        ...

class SentimentProvider(Protocol):
    async def fetch_sentiment(self, ticker: str) -> SentimentRecord:
        """Fetch current sentiment analysis for a ticker."""
        ...

class FundamentalProvider(Protocol):
    async def fetch_fundamentals(self, ticker: str) -> FundamentalRecord:
        """Fetch fundamental data for a ticker."""
        ...

Each data source has its own protocol since they come from different external services. Swapping any provider means implementing the relevant protocol — no other code changes.

Concrete Provider Implementations:

Data Type Provider SDK/Library Auth Notes
OHLCV Alpaca Markets Data API alpaca-py API key + secret Free tier, daily bars, good rate limits
Sentiment Gemini (gemini-2.0-flash) with Google Search grounding google-genai API key LLM analyzes live web data (news, social media) per ticker. Returns structured JSON with classification + confidence. Search grounding ensures current data, not just training knowledge.
Fundamentals Financial Modeling Prep (FMP) httpx (REST) API key Free tier: P/E, revenue growth, earnings surprise, market cap

Gemini Sentiment Provider Details:

The sentiment provider sends a structured prompt to Gemini with search grounding enabled:

  • Prompt asks for current market sentiment analysis for a specific ticker
  • Gemini searches the web for recent news, social media mentions, analyst opinions
  • Response is requested in JSON mode: {"classification": "bullish|bearish|neutral", "confidence": 0-100, "reasoning": "..."}
  • The reasoning field is logged but not stored — only classification and confidence are persisted as a Sentiment_Score
  • Cost: ~$0.001 per call with gemini-2.0-flash (negligible for 30-min polling of a few dozen tickers)

2. Ticker Registry

  • Add ticker: Validate symbol (non-empty, uppercase, alphanumeric), check uniqueness, insert.
  • Delete ticker: Cascade delete all associated data (OHLCV, scores, SR levels, trade setups, watchlist entries, sentiment, fundamentals).
  • List tickers: Return all, sorted alphabetically.

3. Price Store

  • Upsert OHLCV: Insert or update on (ticker, date) conflict. Validates: high >= low, all prices >= 0, volume >= 0, date <= today.
  • Query: By ticker + date range. Uses composite index on (ticker, date).
  • On upsert: Invalidate LRU cache entries for the affected ticker. Mark composite score as stale.

4. Ingestion Pipeline

  • Fetch + upsert: Calls provider, validates records, upserts into Price Store.
  • Rate limit handling: Tracks last_ingested_date per ticker in memory during a fetch. On rate limit, returns partial result with progress info. Resume continues from last_ingested_date + 1 day.
  • Error handling: Provider errors return descriptive message, no data modification.

5. Technical Analysis Service

Computes indicators from OHLCV data. Each indicator function:

  • Takes ticker + date range as input
  • Fetches OHLCV from Price Store
  • Validates minimum data requirements (e.g., RSI needs 14+ records)
  • Returns raw values + normalized 0-100 score
  • Results cached via LRU (keyed on ticker + date range + indicator type)

Indicators:

Indicator Min Data Default Period
ADX 28 bars 14
EMA period+1 20, 50
RSI 15 bars 14
ATR 15 bars 14
Volume Profile 20 bars N/A
Pivot Points 5 bars N/A

6. S/R Detector

  • Detection methods: Volume Profile (HVN/LVN zones) and Pivot Points (swing highs/lows).
  • Strength scoring: Count how many times price has touched/respected a level (0-100 scale).
  • Merge: Levels from different methods within configurable tolerance (default 0.5%) are merged into a single consolidated level. Merged levels combine strength scores.
  • Tagging: Each level tagged as "support" or "resistance" relative to current (latest close) price.
  • Recalculation: Triggered when new OHLCV data arrives for a ticker.
  • Output: Sorted by strength descending, includes detection method.

7. Sentiment Service

  • Collection: Scheduled job (default every 30 min) fetches sentiment for all tracked tickers.
  • Storage: Each record has classification (bullish/bearish/neutral), confidence (0-100), source, timestamp.
  • Dimension score: Weighted average of scores within lookback window (default 24h). Time decay applied — more recent scores weighted higher. Bullish = high score, bearish = low score, neutral = 50.

8. Fundamental Service

  • Collection: Scheduled job (default daily) fetches P/E, revenue growth, earnings surprise %, market cap.
  • Storage: Latest snapshot per ticker. On new data, marks fundamental dimension score as stale.
  • Error handling: On provider failure, retain existing data, log error.

9. Scoring Engine

  • Dimensions: technical, sr_quality, sentiment, fundamental, momentum — each scored 0-100.
  • Composite: Weighted average of available dimensions. Missing dimensions excluded, weights re-normalized.
  • Staleness: Scores marked stale when underlying data changes. Recomputed on-demand when requested.
  • Weight update: When user updates weights, all composite scores are recomputed.
  • Rankings: Return tickers sorted by composite score descending, all dimension scores included.

Dimension score computation:

  • Technical: Weighted combination of ADX trend strength, EMA directional alignment, RSI momentum position.
  • S/R Quality: Based on number of strong S/R levels, proximity of nearest levels to current price, and average strength.
  • Sentiment: Weighted average with time decay from sentiment service.
  • Fundamental: Normalized composite of P/E (lower is better, relative to sector), revenue growth, earnings surprise.
  • Momentum: Rate of change of price over configurable lookback periods (e.g., 5-day, 20-day).

10. R:R Scanner

  • Scan: Periodic job scans all tracked tickers.
  • Long setup: Entry = current price, target = nearest SR level above, stop = entry - (ATR × multiplier). R:R = (target - entry) / (entry - stop).
  • Short setup: Entry = current price, target = nearest SR level below, stop = entry + (ATR × multiplier). R:R = (entry - target) / (stop - entry).
  • Filter: Only setups meeting configurable R:R threshold (default 3:1).
  • Recalculation: When SR levels or price data changes, recalculate and prune invalid setups.
  • Skip: Tickers without sufficient SR levels or ATR data are skipped with logged reason.

11. Watchlist Service

  • Auto-populate: Top-X tickers by composite score (default X=10). Auto entries update when scores change.
  • Manual entries: Users can add/remove. Tagged as manual, not subject to auto-population.
  • Cap: Max size = auto count + 10 manual (default max 20).
  • Response: Each entry includes composite score, all dimension scores, R:R ratio (if setup exists), active SR levels.
  • Sorting: By composite score, any dimension score, or R:R ratio.

12. Auth Service

  • Registration: Configurable on/off. Creates user with no API access by default (admin must grant).
  • Login: Validates credentials, returns JWT (60-min expiry). Error messages don't reveal which field is wrong.
  • JWT: Contains user_id, role, expiry. Validated on every protected request.
  • Roles: user and admin. Middleware checks role for admin endpoints.
  • Password: bcrypt hashed. Never stored or returned in plaintext.

13. Admin Service

  • Default admin: Created on first startup (username: "admin", password: "admin").
  • User management: Grant/revoke access, toggle registration, list users, reset passwords, create accounts.
  • System settings: Persisted in DB. Frequencies, thresholds, weights, watchlist size.
  • Data maintenance: Delete data older than N days (OHLCV, sentiment, fundamentals). Preserves tickers, users, latest scores.
  • Job control: Enable/disable scheduled jobs, trigger manual runs.

API Envelope

All responses follow:

class APIEnvelope(BaseModel):
    status: Literal["success", "error"]
    data: Any | None = None
    error: str | None = None

Dependency Injection

FastAPI's Depends() for:

  • DB session (async context manager)
  • Current user (from JWT)
  • Admin-only guard
  • Service instances (constructed with session)

Data Models

Entity Relationship Diagram

erDiagram
    User {
        int id PK
        string username UK
        string password_hash
        string role
        bool has_access
        datetime created_at
        datetime updated_at
    }

    Ticker {
        int id PK
        string symbol UK
        datetime created_at
    }

    OHLCVRecord {
        int id PK
        int ticker_id FK
        date date
        float open
        float high
        float low
        float close
        bigint volume
        datetime created_at
    }

    SentimentScore {
        int id PK
        int ticker_id FK
        string classification
        int confidence
        string source
        datetime timestamp
    }

    FundamentalData {
        int id PK
        int ticker_id FK
        float pe_ratio
        float revenue_growth
        float earnings_surprise
        float market_cap
        datetime fetched_at
    }

    SRLevel {
        int id PK
        int ticker_id FK
        float price_level
        string type
        int strength
        string detection_method
        datetime created_at
    }

    DimensionScore {
        int id PK
        int ticker_id FK
        string dimension
        float score
        bool is_stale
        datetime computed_at
    }

    CompositeScore {
        int id PK
        int ticker_id FK
        float score
        bool is_stale
        string weights_json
        datetime computed_at
    }

    TradeSetup {
        int id PK
        int ticker_id FK
        string direction
        float entry_price
        float stop_loss
        float target
        float rr_ratio
        float composite_score
        datetime detected_at
    }

    WatchlistEntry {
        int id PK
        int user_id FK
        int ticker_id FK
        string entry_type
        datetime added_at
    }

    SystemSetting {
        int id PK
        string key UK
        string value
        datetime updated_at
    }

    IngestionProgress {
        int id PK
        int ticker_id FK
        date last_ingested_date
        datetime updated_at
    }

    Ticker ||--o{ OHLCVRecord : has
    Ticker ||--o{ SentimentScore : has
    Ticker ||--o| FundamentalData : has
    Ticker ||--o{ SRLevel : has
    Ticker ||--o{ DimensionScore : has
    Ticker ||--o| CompositeScore : has
    Ticker ||--o{ TradeSetup : has
    Ticker ||--o{ WatchlistEntry : on
    User ||--o{ WatchlistEntry : owns
    Ticker ||--o| IngestionProgress : tracks

Key Model Details

OHLCVRecord

  • Composite unique constraint on (ticker_id, date).
  • Composite index on (ticker_id, date) for range queries.
  • date is date-only (no time component).
  • Validation: high >= low, all prices >= 0, volume >= 0, date <= today.

SRLevel

  • type: "support" or "resistance".
  • detection_method: "volume_profile" or "pivot_point" or "merged".
  • strength: 0-100 integer.

DimensionScore

  • dimension: one of "technical", "sr_quality", "sentiment", "fundamental", "momentum".
  • is_stale: set to True when underlying data changes, triggers recomputation on next read.

CompositeScore

  • weights_json: JSON string of the weights used for this computation (for auditability).
  • is_stale: same staleness pattern as DimensionScore.

WatchlistEntry

  • entry_type: "auto" or "manual".
  • Unique constraint on (user_id, ticker_id).

User

  • role: "user" or "admin".
  • has_access: boolean, default False. Admin must grant access after registration.

IngestionProgress

  • Tracks the last successfully ingested date per ticker for rate-limit resume.
  • Unique constraint on ticker_id.

Database Migrations

Alembic manages all schema changes. Initial migration creates all tables. Subsequent migrations handle schema evolution. Migration files are version-controlled.

Correctness Properties

A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.

Property 1: Ticker creation round-trip

For any valid NASDAQ ticker symbol (non-empty, uppercase, alphanumeric), adding it to the Ticker Registry and then listing all tickers should include that symbol in the result.

Validates: Requirements 1.1

Property 2: Duplicate ticker rejection

For any valid ticker symbol, adding it to the Ticker Registry twice should succeed the first time and return a duplicate error the second time, with the registry containing exactly one entry for that symbol.

Validates: Requirements 1.2

Property 3: Whitespace ticker rejection

For any string composed entirely of whitespace characters (including the empty string), submitting it as a ticker symbol should be rejected with a validation error, and the Ticker Registry should remain unchanged.

Validates: Requirements 1.3

Property 4: Ticker deletion cascades

For any ticker with associated OHLCV records, scores, SR levels, trade setups, sentiment, and fundamental data, deleting the ticker should remove the ticker and all associated records from the database.

Validates: Requirements 1.5

Property 5: OHLCV storage round-trip

For any valid OHLCV record (valid ticker, high >= low, all prices >= 0, volume >= 0, date <= today), storing it in the Price Store and retrieving it by (ticker, date) should return the same open, high, low, close, and volume values.

Validates: Requirements 2.1, 2.2

Property 6: OHLCV validation rejects invalid records

For any OHLCV record where high < low, or any price is negative, or volume is negative, or date is in the future, the Backend Service should reject the record with a validation error and the Price Store should remain unchanged.

Validates: Requirements 2.3

Property 7: OHLCV rejects unregistered tickers

For any OHLCV record referencing a ticker symbol not present in the Ticker Registry, the Backend Service should reject the record with an error.

Validates: Requirements 2.4

Property 8: Provider error preserves existing data

For any data type (OHLCV, sentiment, fundamentals) and any existing data state, if the market data provider returns an error or is unreachable during a fetch, all existing data in the store should remain unchanged.

Validates: Requirements 3.2, 7.3, 8.3

Property 9: Rate-limit resume continuity

For any ticker and date range where ingestion is interrupted by a rate limit after N records, resuming the fetch for the same ticker and date range should continue from the day after the last successfully ingested date, resulting in no gaps and no duplicate records across the combined ingestion.

Validates: Requirements 3.3, 3.4, 4.5

Property 10: Scheduled collection processes all tickers

For any set of tracked tickers, when a scheduled collection job (OHLCV, sentiment, or fundamentals) runs, it should attempt to fetch data for every tracked ticker. If one ticker fails, the remaining tickers should still be processed.

Validates: Requirements 4.1, 4.3, 7.1, 8.2

Property 11: Score bounds invariant

For any computed score in the system — indicator normalized score, SR level strength, dimension score, or composite score — the value must be in the range [0, 100].

Validates: Requirements 5.2, 6.2, 9.1

Property 12: Indicator minimum data enforcement

For any ticker with fewer OHLCV records than the minimum required for a given indicator (e.g., RSI needs 14+, ADX needs 28+), requesting that indicator should return an error specifying the minimum data requirement.

Validates: Requirements 5.4

Property 13: EMA cross directional bias

For any ticker and date range with sufficient OHLCV data, the EMA cross signal should return "bullish" when short EMA > long EMA, "bearish" when short EMA < long EMA, and "neutral" when they are equal (within floating-point tolerance).

Validates: Requirements 5.3

Property 14: Indicator computation determinism

For any valid OHLCV dataset and indicator type (ADX, EMA, RSI, ATR), computing the indicator twice with the same inputs should produce identical results.

Validates: Requirements 5.1

Property 15: SR level support/resistance tagging

For any SR level and current price, the level should be tagged "support" if the level price is below the current price, and "resistance" if the level price is above the current price.

Validates: Requirements 6.3

Property 16: SR level merging within tolerance

For any two SR levels from different detection methods whose price levels are within the configurable tolerance (default 0.5%), the SR Detector should merge them into a single consolidated level. For any two levels outside the tolerance, they should remain separate.

Validates: Requirements 6.5

Property 17: SR level detection from data

For any OHLCV dataset with sufficient data, the SR Detector should produce SR levels derived from Volume Profile (HVN/LVN) and/or Pivot Points (swing highs/lows), and each level should reference its detection method.

Validates: Requirements 6.1

Property 18: Sentiment score data shape

For any stored Sentiment Score, the classification must be one of (bullish, bearish, neutral), confidence must be in [0, 100], and source and timestamp must be non-null.

Validates: Requirements 7.2

Property 19: Sentiment dimension score uses time decay

For any set of sentiment scores within the lookback window, the sentiment dimension score should weight more recent scores higher than older ones. Specifically, given two sets of scores with identical values but different timestamps, the set with more recent timestamps should produce a higher (or equal) dimension score if bullish, or lower (or equal) if bearish.

Validates: Requirements 7.4

Property 20: Fundamental data storage round-trip

For any valid fundamental data record (P/E ratio, revenue growth, earnings surprise %, market cap), storing it and retrieving it for the same ticker should return the same values.

Validates: Requirements 8.1

Property 21: Composite score is weighted average

For any ticker with dimension scores and a set of weights, the composite score should equal the weighted average of the available dimension scores. Specifically: composite = sum(weight_i * score_i) / sum(weight_i) for all available dimensions.

Validates: Requirements 9.2

Property 22: Missing dimensions re-normalize weights

For any ticker missing one or more dimension scores, the composite score should be computed using only available dimensions with weights re-normalized to sum to 1.0, and the response should indicate which dimensions are missing.

Validates: Requirements 9.3

Property 23: Staleness marking on data change

For any ticker, when underlying data changes (new OHLCV, new sentiment, new fundamentals), the affected dimension scores and composite score should be marked as stale.

Validates: Requirements 9.4

Property 24: Stale score recomputation on demand

For any ticker with a stale composite score, requesting the score should trigger recomputation and return a fresh (non-stale) score that reflects current data.

Validates: Requirements 9.5

Property 25: Weight update triggers full recomputation

For any set of tickers with composite scores, when dimension weights are updated, all composite scores should be recomputed using the new weights.

Validates: Requirements 9.7

Property 26: Trade setup R:R threshold filtering

For any set of potential trade setups, only those with R:R ratio >= the configured threshold (default 3:1) should be returned. No setup below the threshold should appear in results.

Validates: Requirements 10.1

Property 27: Trade setup computation correctness

For any ticker with SR levels and ATR data, a long setup should have target = nearest SR level above current price and stop = entry - ATR-based distance, while a short setup should have target = nearest SR level below current price and stop = entry + ATR-based distance. The R:R ratio should equal |target - entry| / |entry - stop|.

Validates: Requirements 10.2, 10.3

Property 28: Trade setup data completeness

For any trade setup, it must include: entry price (> 0), stop-loss (> 0), target (> 0), R:R ratio (> 0), direction (one of "long" or "short"), and composite score (0-100).

Validates: Requirements 10.4

Property 29: Trade setup pruning on data change

For any existing trade setup, when underlying SR levels or price data changes such that the setup no longer meets the R:R threshold, the setup should be removed.

Validates: Requirements 10.5

Property 30: Watchlist auto-population

For any set of tickers with composite scores, the watchlist auto-populated entries should be exactly the top-X tickers by composite score (where X is configurable, default 10).

Validates: Requirements 11.1

Property 31: Watchlist entry data completeness

For any watchlist entry, the response should include composite score, all dimension scores, R:R ratio (if a trade setup exists for that ticker), and active SR levels.

Validates: Requirements 11.2

Property 32: Manual watchlist entries persist through auto-population

For any manually added watchlist entry, it should be tagged as "manual" and should not be removed or replaced when auto-population runs, regardless of the ticker's composite score ranking.

Validates: Requirements 11.3

Property 33: Watchlist size cap enforcement

For any watchlist, the total number of entries should never exceed auto-populate count + 10 manual additions (default max 20). Attempting to add a manual entry beyond the cap should be rejected.

Validates: Requirements 11.4

Property 34: Registration creates no-access user

For any valid credentials submitted when registration is enabled, the created user should have has_access = False and role = "user".

Validates: Requirements 12.1

Property 35: Registration disabled rejects all attempts

For any credentials submitted when registration is disabled, the registration should be rejected regardless of credential validity.

Validates: Requirements 12.2

Property 36: Login returns valid JWT

For any registered user with valid credentials, login should return a JWT access token that decodes to contain the user's ID, role, and an expiry time 60 minutes from issuance.

Validates: Requirements 12.3

Property 37: Invalid credentials return generic error

For any login attempt with invalid credentials (wrong username, wrong password, or both), the error response should be identical — not revealing which field was incorrect.

Validates: Requirements 12.4

Property 38: Access control enforcement

For any protected endpoint, unauthenticated requests should receive HTTP 401, and authenticated users without the required role or access should receive HTTP 403.

Validates: Requirements 12.5

Property 39: Admin user management operations

For any user account, an admin should be able to grant access, revoke access, and reset the password, with each operation correctly updating the user's state in the database.

Validates: Requirements 13.2

Property 40: Data cleanup preserves structure

For any dataset with records of various ages, admin data cleanup (delete records older than N days) should remove old OHLCV, sentiment, and fundamental records while preserving all ticker entries, user accounts, and the latest scores.

Validates: Requirements 13.4

Property 41: Sorting correctness

For any list endpoint with a defined sort order (tickers alphabetically, SR levels by strength desc, rankings by composite score desc, trade setups by R:R desc then composite desc), the returned results must be correctly sorted according to the specified order.

Validates: Requirements 1.4, 6.6, 9.6, 10.8, 11.6

Error Handling

API Error Responses

All errors use the standard JSON envelope with appropriate HTTP status codes:

{
  "status": "error",
  "data": null,
  "error": "Human-readable error message"
}
Scenario HTTP Status Error Message Pattern
Validation failure (bad input) 400 "Validation error: {details}"
Authentication missing/expired 401 "Authentication required" / "Token expired"
Insufficient permissions 403 "Insufficient permissions"
Resource not found 404 "Ticker not found: {symbol}"
Duplicate resource 409 "Ticker already exists: {symbol}"
Provider unreachable 502 "Market data provider unavailable"
Rate limited by provider 429 "Rate limited. Ingested {n} records. Resume available."
Internal error 500 "Internal server error"

Error Handling Strategies

Provider Errors

  • Wrap all provider calls in try/except.
  • On connection error or timeout: return 502 with descriptive message. Existing data is never modified.
  • On rate limit: record progress (last ingested date), return 429 with progress info.
  • On unexpected provider response: log full response, return 502.

Database Errors

  • Unique constraint violations: catch IntegrityError, return 409.
  • Connection pool exhaustion: log, return 503 "Service temporarily unavailable".
  • All DB operations within transactions — rollback on any error.

Validation Errors

  • Pydantic model validation catches schema-level errors automatically (400).
  • Business validation (e.g., high < low, future date) in service layer, raises custom exceptions mapped to 400.

Scheduled Job Errors

  • Each ticker processed independently — one failure doesn't stop others.
  • Errors logged with structured JSON (ticker, job name, error type, message).
  • Job-level errors (e.g., scheduler crash) logged and job retried on next interval.

Authentication Errors

  • Invalid/missing token: 401 with generic message.
  • Expired token: 401 with "Token expired" message.
  • Invalid credentials on login: 401 with generic "Invalid credentials" (never reveals which field).
  • Insufficient role: 403.

Exception Hierarchy

class AppError(Exception):
    """Base application error."""
    status_code: int = 500
    message: str = "Internal server error"

class ValidationError(AppError):
    status_code = 400

class NotFoundError(AppError):
    status_code = 404

class DuplicateError(AppError):
    status_code = 409

class AuthenticationError(AppError):
    status_code = 401

class AuthorizationError(AppError):
    status_code = 403

class ProviderError(AppError):
    status_code = 502

class RateLimitError(AppError):
    status_code = 429

A global exception handler in FastAPI middleware catches AppError subclasses and formats them into the JSON envelope.

Deployment and Infrastructure

Target Environment

  • Production: Debian 12 with Nginx and PostgreSQL (pre-installed by operator)
  • Development: macOS with local PostgreSQL (via Homebrew or Docker)
  • CI/CD: Gitea Actions
  • Domain: signal.thiessen.io (reverse-proxied through Nginx)

Local Development (macOS)

Local dev works identically to production. Install PostgreSQL via Homebrew (brew install postgresql@16) or run it in Docker. The app is pure Python — no platform-specific dependencies.

# Setup
git clone <repo>
cd stock-data-backend
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

# Local DB
createdb stock_data_backend
cp .env.example .env  # Edit DATABASE_URL to point to local Postgres

# Migrations
alembic upgrade head

# Run
uvicorn app.main:app --reload --port 8000

Database Setup Script (deploy/setup_db.sh)

Creates the PostgreSQL database, user, and runs migrations. Idempotent — safe to run multiple times.

#!/bin/bash
set -e
DB_NAME="${DB_NAME:-stock_data_backend}"
DB_USER="${DB_USER:-stock_backend}"
DB_PASS="${DB_PASS:-changeme}"

sudo -u postgres psql <<EOF
DO \$\$
BEGIN
  IF NOT EXISTS (SELECT FROM pg_roles WHERE rolname = '${DB_USER}') THEN
    CREATE ROLE ${DB_USER} WITH LOGIN PASSWORD '${DB_PASS}';
  END IF;
END \$\$;
SELECT 'CREATE DATABASE ${DB_NAME} OWNER ${DB_USER}'
WHERE NOT EXISTS (SELECT FROM pg_database WHERE datname = '${DB_NAME}')\gexec
GRANT ALL PRIVILEGES ON DATABASE ${DB_NAME} TO ${DB_USER};
EOF

# Run migrations
alembic upgrade head

Nginx Configuration (deploy/nginx.conf)

Reverse proxy template for signal.thiessen.io. Assumes SSL is handled externally (e.g., Certbot).

server {
    listen 80;
    server_name signal.thiessen.io;

    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_read_timeout 120s;
    }
}

Systemd Service (deploy/stock-data-backend.service)

Runs the app as a daemon with auto-restart.

[Unit]
Description=Stock Data Backend
After=network.target postgresql.service

[Service]
Type=exec
User=stock_backend
Group=stock_backend
WorkingDirectory=/opt/stock-data-backend
EnvironmentFile=/opt/stock-data-backend/.env
ExecStart=/opt/stock-data-backend/.venv/bin/uvicorn app.main:app --host 127.0.0.1 --port 8000 --workers 1
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Gitea Actions CI/CD (.gitea/workflows/deploy.yml)

Pipeline: lint → test → deploy to Debian server via SSH.

name: Deploy
on:
  push:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_DB: test_db
          POSTGRES_USER: test_user
          POSTGRES_PASSWORD: test_pass
        ports:
          - 5432:5432
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install -e ".[dev]"
      - run: alembic upgrade head
        env:
          DATABASE_URL: postgresql+asyncpg://test_user:test_pass@localhost:5432/test_db
      - run: pytest --tb=short
        env:
          DATABASE_URL: postgresql+asyncpg://test_user:test_pass@localhost:5432/test_db

  deploy:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Deploy via SSH
        uses: appleboy/ssh-action@v1
        with:
          host: ${{ secrets.DEPLOY_HOST }}
          username: ${{ secrets.DEPLOY_USER }}
          key: ${{ secrets.DEPLOY_KEY }}
          script: |
            cd /opt/stock-data-backend
            git pull origin main
            source .venv/bin/activate
            pip install -e .
            alembic upgrade head
            sudo systemctl restart stock-data-backend

Environment Variables (.env.example)

# Database
DATABASE_URL=postgresql+asyncpg://stock_backend:changeme@localhost:5432/stock_data_backend

# Auth
JWT_SECRET=change-this-to-a-random-secret
JWT_EXPIRY_MINUTES=60

# OHLCV Provider — Alpaca Markets
ALPACA_API_KEY=
ALPACA_API_SECRET=

# Sentiment Provider — Gemini with Search Grounding
GEMINI_API_KEY=
GEMINI_MODEL=gemini-2.0-flash

# Fundamentals Provider — Financial Modeling Prep
FMP_API_KEY=

# Scheduled Jobs
DATA_COLLECTOR_FREQUENCY=daily
SENTIMENT_POLL_INTERVAL_MINUTES=30
FUNDAMENTAL_FETCH_FREQUENCY=daily
RR_SCAN_FREQUENCY=daily

# Scoring Defaults
DEFAULT_WATCHLIST_AUTO_SIZE=10
DEFAULT_RR_THRESHOLD=3.0

# Database Pool
DB_POOL_SIZE=5
DB_POOL_TIMEOUT=30

# Logging
LOG_LEVEL=INFO

Testing Strategy

Testing Framework

  • Unit/Integration tests: pytest with pytest-asyncio for async test support.
  • Property-based tests: hypothesis — the standard PBT library for Python.
  • Database tests: Use a test PostgreSQL database with Alembic migrations applied before test runs. Each test runs in a transaction that is rolled back.
  • Provider mocking: Mock the MarketDataProvider protocol for all tests that don't need real external calls.

Test Organization

tests/
├── conftest.py              # Shared fixtures (db session, test client, mock provider)
├── unit/                    # Unit tests for individual functions
│   ├── test_ticker_service.py
│   ├── test_price_service.py
│   ├── test_indicator_service.py
│   ├── test_sr_service.py
│   ├── test_scoring_service.py
│   ├── test_rr_scanner_service.py
│   ├── test_watchlist_service.py
│   ├── test_auth_service.py
│   └── test_admin_service.py
├── property/                # Property-based tests (hypothesis)
│   ├── test_ticker_properties.py
│   ├── test_ohlcv_properties.py
│   ├── test_ingestion_properties.py
│   ├── test_indicator_properties.py
│   ├── test_sr_properties.py
│   ├── test_scoring_properties.py
│   ├── test_rr_scanner_properties.py
│   ├── test_watchlist_properties.py
│   └── test_auth_properties.py
└── integration/             # API-level integration tests
    └── test_api_endpoints.py

Dual Testing Approach

Unit tests cover:

  • Specific examples that demonstrate correct behavior (e.g., known RSI calculation for a fixed dataset).
  • Edge cases: empty ticker list, zero OHLCV records, expired JWT, max watchlist size.
  • Error conditions: provider timeout, invalid input shapes, missing data.
  • Integration points: API endpoint → service → database flow.

Property-based tests cover:

  • Universal properties that hold for all valid inputs (the 41 correctness properties above).
  • Each property test uses hypothesis with @settings(max_examples=100) minimum.
  • Each property test is tagged with a comment referencing the design property:
    # Feature: stock-data-backend, Property 5: OHLCV storage round-trip
    @given(ohlcv=valid_ohlcv_records())
    @settings(max_examples=100)
    def test_ohlcv_storage_round_trip(ohlcv):
        ...
    

Unit tests and property tests are complementary:

  • Unit tests catch concrete bugs with specific, readable examples.
  • Property tests verify general correctness across randomized inputs.
  • Together they provide comprehensive coverage.

Hypothesis Custom Strategies

Custom hypothesis strategies for generating domain objects:

  • valid_ticker_symbols(): Uppercase alphanumeric strings, 1-5 chars.
  • whitespace_strings(): Strings composed entirely of whitespace (including empty).
  • valid_ohlcv_records(): Records where high >= low, all prices >= 0, volume >= 0, date <= today.
  • invalid_ohlcv_records(): Records violating at least one constraint.
  • dimension_scores(): Floats in [0, 100] for each dimension.
  • weight_configs(): Dicts of dimension → positive float weight.
  • sr_levels(): Levels with valid price, type, strength, method.
  • sentiment_scores(): Records with valid classification, confidence, source, timestamp.
  • trade_setups(): Setups with valid entry, stop, target, direction, R:R.

Property-to-Test Mapping

Each of the 41 correctness properties maps to exactly one property-based test. The tag format is:

Feature: stock-data-backend, Property {number}: {property_title}

For example:

  • Feature: stock-data-backend, Property 1: Ticker creation round-trip
  • Feature: stock-data-backend, Property 6: OHLCV validation rejects invalid records
  • Feature: stock-data-backend, Property 21: Composite score is weighted average
  • Feature: stock-data-backend, Property 41: Sorting correctness