Files
Dennis Thiessen 61ab24490d
Some checks failed
Deploy / lint (push) Failing after 7s
Deploy / test (push) Has been skipped
Deploy / deploy (push) Has been skipped
first commit
2026-02-20 17:31:01 +01:00

1122 lines
43 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Design Document: Stock Data Backend
## Overview
The Stock Data Backend is an MVP investing-signal platform built with Python/FastAPI and PostgreSQL, focused on NASDAQ stocks. It ingests OHLCV price data from a swappable market data provider, computes technical indicators (ADX, EMA, RSI, ATR, Volume Profile, Pivot Points), detects support/resistance levels, collects sentiment and fundamental data, and feeds everything into a composite scoring engine. The scoring engine ranks tickers, auto-populates a watchlist, and an R:R scanner flags asymmetric trade setups.
The system is API-first (REST, JSON envelope, versioned URLs), uses JWT auth with role-based access, and runs scheduled jobs for data collection. All computation is on-demand or scheduled — no streaming, no websockets, no real-time feeds.
### Key Design Decisions
- **Single process**: FastAPI app with APScheduler for scheduled jobs — no separate worker processes.
- **On-demand scoring**: Composite scores are marked stale when inputs change and recomputed only when requested.
- **Simple LRU cache**: In-memory `functools.lru_cache` (max 1000 entries) for indicator computations. No TTL — cache is invalidated when new OHLCV data is ingested for a ticker.
- **Provider abstraction**: Market data provider behind a Python Protocol class for swappability.
- **Fixed indicator set**: ADX, EMA, RSI, ATR, Volume Profile, Pivot Points — no plugin architecture.
- **Sentiment**: Single source, weighted average with configurable time decay and lookback window.
- **Fundamentals**: Single source, simple periodic fetch.
- **Watchlist cap**: Auto-populated top-X (default 10) + max 10 manual additions = max 20.
## Architecture
### High-Level Architecture
```mermaid
graph TB
Client[API Client] --> API[FastAPI App /api/v1/]
API --> Auth[Auth Service]
API --> TickerReg[Ticker Registry]
API --> PriceStore[Price Store]
API --> Ingestion[Ingestion Pipeline]
API --> TechAnalysis[Technical Analysis]
API --> SRDetector[S/R Detector]
API --> Scoring[Scoring Engine]
API --> RRScanner[R:R Scanner]
API --> Watchlist[Watchlist Service]
API --> Admin[Admin Service]
Ingestion --> Provider[Market Data Provider Protocol]
Provider --> ExternalAPI[External Market Data API]
Scheduler[APScheduler] --> Ingestion
Scheduler --> SentimentCollector[Sentiment Collector]
Scheduler --> FundCollector[Fundamental Collector]
Scheduler --> RRScanner
TickerReg --> DB[(PostgreSQL)]
PriceStore --> DB
Auth --> DB
Scoring --> DB
SRDetector --> DB
Watchlist --> DB
TechAnalysis --> Cache[LRU Cache max=1000]
TechAnalysis --> PriceStore
```
### Request Flow
```mermaid
sequenceDiagram
participant C as Client
participant A as FastAPI
participant Auth as Auth Middleware
participant S as Service Layer
participant DB as PostgreSQL
C->>A: HTTP Request + JWT
A->>Auth: Validate token + role
Auth-->>A: User context
A->>S: Call service method
S->>DB: Query/Mutate
DB-->>S: Result
S-->>A: Domain result
A-->>C: JSON envelope response
```
### Project Structure
```
stock-data-backend/
├── alembic/ # DB migrations
│ ├── versions/
│ └── env.py
├── app/
│ ├── main.py # FastAPI app, lifespan, scheduler
│ ├── config.py # Settings via pydantic-settings
│ ├── database.py # SQLAlchemy engine, session factory
│ ├── models/ # SQLAlchemy ORM models
│ │ ├── ticker.py
│ │ ├── ohlcv.py
│ │ ├── user.py
│ │ ├── sentiment.py
│ │ ├── fundamental.py
│ │ ├── score.py
│ │ ├── sr_level.py
│ │ ├── trade_setup.py
│ │ ├── watchlist.py
│ │ └── settings.py
│ ├── schemas/ # Pydantic request/response schemas
│ │ ├── common.py # APIEnvelope, pagination
│ │ ├── ticker.py
│ │ ├── ohlcv.py
│ │ ├── auth.py
│ │ ├── indicator.py
│ │ ├── sr_level.py
│ │ ├── sentiment.py
│ │ ├── fundamental.py
│ │ ├── score.py
│ │ ├── trade_setup.py
│ │ ├── watchlist.py
│ │ └── admin.py
│ ├── routers/ # FastAPI routers (one per domain)
│ │ ├── tickers.py
│ │ ├── ohlcv.py
│ │ ├── ingestion.py
│ │ ├── indicators.py
│ │ ├── sr_levels.py
│ │ ├── sentiment.py
│ │ ├── fundamentals.py
│ │ ├── scores.py
│ │ ├── trades.py
│ │ ├── watchlist.py
│ │ ├── auth.py
│ │ ├── admin.py
│ │ └── health.py
│ ├── services/ # Business logic
│ │ ├── ticker_service.py
│ │ ├── price_service.py
│ │ ├── ingestion_service.py
│ │ ├── indicator_service.py
│ │ ├── sr_service.py
│ │ ├── sentiment_service.py
│ │ ├── fundamental_service.py
│ │ ├── scoring_service.py
│ │ ├── rr_scanner_service.py
│ │ ├── watchlist_service.py
│ │ ├── auth_service.py
│ │ └── admin_service.py
│ ├── providers/ # External data provider abstractions
│ │ ├── protocol.py # MarketDataProvider, SentimentProvider, FundamentalProvider Protocols
│ │ ├── alpaca.py # Alpaca OHLCV provider (alpaca-py)
│ │ ├── gemini_sentiment.py # Gemini LLM sentiment provider (google-genai + search grounding)
│ │ └── fmp.py # Financial Modeling Prep fundamentals provider (httpx)
│ ├── scheduler.py # APScheduler job definitions
│ ├── dependencies.py # FastAPI dependency injection
│ ├── middleware.py # Logging, error handling
│ └── cache.py # LRU cache wrapper with invalidation
├── tests/
│ ├── unit/
│ ├── property/
│ └── conftest.py
├── deploy/ # Deployment templates
│ ├── nginx.conf # Nginx reverse proxy config for signal.thiessen.io
│ ├── stock-data-backend.service # systemd service file
│ └── setup_db.sh # DB creation + migration script
├── .gitea/
│ └── workflows/
│ └── deploy.yml # Gitea Actions CI/CD pipeline
├── alembic.ini
├── pyproject.toml
└── .env.example
```
## Components and Interfaces
### 1. Market Data Provider Protocol
```python
from typing import Protocol
from datetime import date
class MarketDataProvider(Protocol):
async def fetch_ohlcv(
self, ticker: str, start_date: date, end_date: date
) -> list[OHLCVRecord]:
"""Fetch OHLCV data for a ticker in a date range."""
...
class SentimentProvider(Protocol):
async def fetch_sentiment(self, ticker: str) -> SentimentRecord:
"""Fetch current sentiment analysis for a ticker."""
...
class FundamentalProvider(Protocol):
async def fetch_fundamentals(self, ticker: str) -> FundamentalRecord:
"""Fetch fundamental data for a ticker."""
...
```
Each data source has its own protocol since they come from different external services. Swapping any provider means implementing the relevant protocol — no other code changes.
**Concrete Provider Implementations:**
| Data Type | Provider | SDK/Library | Auth | Notes |
|-----------|----------|-------------|------|-------|
| OHLCV | Alpaca Markets Data API | `alpaca-py` | API key + secret | Free tier, daily bars, good rate limits |
| Sentiment | Gemini (gemini-2.0-flash) with Google Search grounding | `google-genai` | API key | LLM analyzes live web data (news, social media) per ticker. Returns structured JSON with classification + confidence. Search grounding ensures current data, not just training knowledge. |
| Fundamentals | Financial Modeling Prep (FMP) | `httpx` (REST) | API key | Free tier: P/E, revenue growth, earnings surprise, market cap |
**Gemini Sentiment Provider Details:**
The sentiment provider sends a structured prompt to Gemini with search grounding enabled:
- Prompt asks for current market sentiment analysis for a specific ticker
- Gemini searches the web for recent news, social media mentions, analyst opinions
- Response is requested in JSON mode: `{"classification": "bullish|bearish|neutral", "confidence": 0-100, "reasoning": "..."}`
- The `reasoning` field is logged but not stored — only classification and confidence are persisted as a Sentiment_Score
- Cost: ~$0.001 per call with gemini-2.0-flash (negligible for 30-min polling of a few dozen tickers)
### 2. Ticker Registry
- **Add ticker**: Validate symbol (non-empty, uppercase, alphanumeric), check uniqueness, insert.
- **Delete ticker**: Cascade delete all associated data (OHLCV, scores, SR levels, trade setups, watchlist entries, sentiment, fundamentals).
- **List tickers**: Return all, sorted alphabetically.
### 3. Price Store
- **Upsert OHLCV**: Insert or update on (ticker, date) conflict. Validates: high >= low, all prices >= 0, volume >= 0, date <= today.
- **Query**: By ticker + date range. Uses composite index on (ticker, date).
- **On upsert**: Invalidate LRU cache entries for the affected ticker. Mark composite score as stale.
### 4. Ingestion Pipeline
- **Fetch + upsert**: Calls provider, validates records, upserts into Price Store.
- **Rate limit handling**: Tracks `last_ingested_date` per ticker in memory during a fetch. On rate limit, returns partial result with progress info. Resume continues from `last_ingested_date + 1 day`.
- **Error handling**: Provider errors return descriptive message, no data modification.
### 5. Technical Analysis Service
Computes indicators from OHLCV data. Each indicator function:
- Takes ticker + date range as input
- Fetches OHLCV from Price Store
- Validates minimum data requirements (e.g., RSI needs 14+ records)
- Returns raw values + normalized 0-100 score
- Results cached via LRU (keyed on ticker + date range + indicator type)
Indicators:
| Indicator | Min Data | Default Period |
|-----------|----------|----------------|
| ADX | 28 bars | 14 |
| EMA | period+1 | 20, 50 |
| RSI | 15 bars | 14 |
| ATR | 15 bars | 14 |
| Volume Profile | 20 bars | N/A |
| Pivot Points | 5 bars | N/A |
### 6. S/R Detector
- **Detection methods**: Volume Profile (HVN/LVN zones) and Pivot Points (swing highs/lows).
- **Strength scoring**: Count how many times price has touched/respected a level (0-100 scale).
- **Merge**: Levels from different methods within configurable tolerance (default 0.5%) are merged into a single consolidated level. Merged levels combine strength scores.
- **Tagging**: Each level tagged as "support" or "resistance" relative to current (latest close) price.
- **Recalculation**: Triggered when new OHLCV data arrives for a ticker.
- **Output**: Sorted by strength descending, includes detection method.
### 7. Sentiment Service
- **Collection**: Scheduled job (default every 30 min) fetches sentiment for all tracked tickers.
- **Storage**: Each record has classification (bullish/bearish/neutral), confidence (0-100), source, timestamp.
- **Dimension score**: Weighted average of scores within lookback window (default 24h). Time decay applied — more recent scores weighted higher. Bullish = high score, bearish = low score, neutral = 50.
### 8. Fundamental Service
- **Collection**: Scheduled job (default daily) fetches P/E, revenue growth, earnings surprise %, market cap.
- **Storage**: Latest snapshot per ticker. On new data, marks fundamental dimension score as stale.
- **Error handling**: On provider failure, retain existing data, log error.
### 9. Scoring Engine
- **Dimensions**: technical, sr_quality, sentiment, fundamental, momentum — each scored 0-100.
- **Composite**: Weighted average of available dimensions. Missing dimensions excluded, weights re-normalized.
- **Staleness**: Scores marked stale when underlying data changes. Recomputed on-demand when requested.
- **Weight update**: When user updates weights, all composite scores are recomputed.
- **Rankings**: Return tickers sorted by composite score descending, all dimension scores included.
**Dimension score computation**:
- **Technical**: Weighted combination of ADX trend strength, EMA directional alignment, RSI momentum position.
- **S/R Quality**: Based on number of strong S/R levels, proximity of nearest levels to current price, and average strength.
- **Sentiment**: Weighted average with time decay from sentiment service.
- **Fundamental**: Normalized composite of P/E (lower is better, relative to sector), revenue growth, earnings surprise.
- **Momentum**: Rate of change of price over configurable lookback periods (e.g., 5-day, 20-day).
### 10. R:R Scanner
- **Scan**: Periodic job scans all tracked tickers.
- **Long setup**: Entry = current price, target = nearest SR level above, stop = entry - (ATR × multiplier). R:R = (target - entry) / (entry - stop).
- **Short setup**: Entry = current price, target = nearest SR level below, stop = entry + (ATR × multiplier). R:R = (entry - target) / (stop - entry).
- **Filter**: Only setups meeting configurable R:R threshold (default 3:1).
- **Recalculation**: When SR levels or price data changes, recalculate and prune invalid setups.
- **Skip**: Tickers without sufficient SR levels or ATR data are skipped with logged reason.
### 11. Watchlist Service
- **Auto-populate**: Top-X tickers by composite score (default X=10). Auto entries update when scores change.
- **Manual entries**: Users can add/remove. Tagged as manual, not subject to auto-population.
- **Cap**: Max size = auto count + 10 manual (default max 20).
- **Response**: Each entry includes composite score, all dimension scores, R:R ratio (if setup exists), active SR levels.
- **Sorting**: By composite score, any dimension score, or R:R ratio.
### 12. Auth Service
- **Registration**: Configurable on/off. Creates user with no API access by default (admin must grant).
- **Login**: Validates credentials, returns JWT (60-min expiry). Error messages don't reveal which field is wrong.
- **JWT**: Contains user_id, role, expiry. Validated on every protected request.
- **Roles**: `user` and `admin`. Middleware checks role for admin endpoints.
- **Password**: bcrypt hashed. Never stored or returned in plaintext.
### 13. Admin Service
- **Default admin**: Created on first startup (username: "admin", password: "admin").
- **User management**: Grant/revoke access, toggle registration, list users, reset passwords, create accounts.
- **System settings**: Persisted in DB. Frequencies, thresholds, weights, watchlist size.
- **Data maintenance**: Delete data older than N days (OHLCV, sentiment, fundamentals). Preserves tickers, users, latest scores.
- **Job control**: Enable/disable scheduled jobs, trigger manual runs.
### API Envelope
All responses follow:
```python
class APIEnvelope(BaseModel):
status: Literal["success", "error"]
data: Any | None = None
error: str | None = None
```
### Dependency Injection
FastAPI's `Depends()` for:
- DB session (async context manager)
- Current user (from JWT)
- Admin-only guard
- Service instances (constructed with session)
## Data Models
### Entity Relationship Diagram
```mermaid
erDiagram
User {
int id PK
string username UK
string password_hash
string role
bool has_access
datetime created_at
datetime updated_at
}
Ticker {
int id PK
string symbol UK
datetime created_at
}
OHLCVRecord {
int id PK
int ticker_id FK
date date
float open
float high
float low
float close
bigint volume
datetime created_at
}
SentimentScore {
int id PK
int ticker_id FK
string classification
int confidence
string source
datetime timestamp
}
FundamentalData {
int id PK
int ticker_id FK
float pe_ratio
float revenue_growth
float earnings_surprise
float market_cap
datetime fetched_at
}
SRLevel {
int id PK
int ticker_id FK
float price_level
string type
int strength
string detection_method
datetime created_at
}
DimensionScore {
int id PK
int ticker_id FK
string dimension
float score
bool is_stale
datetime computed_at
}
CompositeScore {
int id PK
int ticker_id FK
float score
bool is_stale
string weights_json
datetime computed_at
}
TradeSetup {
int id PK
int ticker_id FK
string direction
float entry_price
float stop_loss
float target
float rr_ratio
float composite_score
datetime detected_at
}
WatchlistEntry {
int id PK
int user_id FK
int ticker_id FK
string entry_type
datetime added_at
}
SystemSetting {
int id PK
string key UK
string value
datetime updated_at
}
IngestionProgress {
int id PK
int ticker_id FK
date last_ingested_date
datetime updated_at
}
Ticker ||--o{ OHLCVRecord : has
Ticker ||--o{ SentimentScore : has
Ticker ||--o| FundamentalData : has
Ticker ||--o{ SRLevel : has
Ticker ||--o{ DimensionScore : has
Ticker ||--o| CompositeScore : has
Ticker ||--o{ TradeSetup : has
Ticker ||--o{ WatchlistEntry : on
User ||--o{ WatchlistEntry : owns
Ticker ||--o| IngestionProgress : tracks
```
### Key Model Details
**OHLCVRecord**
- Composite unique constraint on `(ticker_id, date)`.
- Composite index on `(ticker_id, date)` for range queries.
- `date` is date-only (no time component).
- Validation: `high >= low`, all prices `>= 0`, `volume >= 0`, `date <= today`.
**SRLevel**
- `type`: "support" or "resistance".
- `detection_method`: "volume_profile" or "pivot_point" or "merged".
- `strength`: 0-100 integer.
**DimensionScore**
- `dimension`: one of "technical", "sr_quality", "sentiment", "fundamental", "momentum".
- `is_stale`: set to `True` when underlying data changes, triggers recomputation on next read.
**CompositeScore**
- `weights_json`: JSON string of the weights used for this computation (for auditability).
- `is_stale`: same staleness pattern as DimensionScore.
**WatchlistEntry**
- `entry_type`: "auto" or "manual".
- Unique constraint on `(user_id, ticker_id)`.
**User**
- `role`: "user" or "admin".
- `has_access`: boolean, default `False`. Admin must grant access after registration.
**IngestionProgress**
- Tracks the last successfully ingested date per ticker for rate-limit resume.
- Unique constraint on `ticker_id`.
### Database Migrations
Alembic manages all schema changes. Initial migration creates all tables. Subsequent migrations handle schema evolution. Migration files are version-controlled.
## Correctness Properties
*A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
### Property 1: Ticker creation round-trip
*For any* valid NASDAQ ticker symbol (non-empty, uppercase, alphanumeric), adding it to the Ticker Registry and then listing all tickers should include that symbol in the result.
**Validates: Requirements 1.1**
### Property 2: Duplicate ticker rejection
*For any* valid ticker symbol, adding it to the Ticker Registry twice should succeed the first time and return a duplicate error the second time, with the registry containing exactly one entry for that symbol.
**Validates: Requirements 1.2**
### Property 3: Whitespace ticker rejection
*For any* string composed entirely of whitespace characters (including the empty string), submitting it as a ticker symbol should be rejected with a validation error, and the Ticker Registry should remain unchanged.
**Validates: Requirements 1.3**
### Property 4: Ticker deletion cascades
*For any* ticker with associated OHLCV records, scores, SR levels, trade setups, sentiment, and fundamental data, deleting the ticker should remove the ticker and all associated records from the database.
**Validates: Requirements 1.5**
### Property 5: OHLCV storage round-trip
*For any* valid OHLCV record (valid ticker, high >= low, all prices >= 0, volume >= 0, date <= today), storing it in the Price Store and retrieving it by (ticker, date) should return the same open, high, low, close, and volume values.
**Validates: Requirements 2.1, 2.2**
### Property 6: OHLCV validation rejects invalid records
*For any* OHLCV record where high < low, or any price is negative, or volume is negative, or date is in the future, the Backend Service should reject the record with a validation error and the Price Store should remain unchanged.
**Validates: Requirements 2.3**
### Property 7: OHLCV rejects unregistered tickers
*For any* OHLCV record referencing a ticker symbol not present in the Ticker Registry, the Backend Service should reject the record with an error.
**Validates: Requirements 2.4**
### Property 8: Provider error preserves existing data
*For any* data type (OHLCV, sentiment, fundamentals) and any existing data state, if the market data provider returns an error or is unreachable during a fetch, all existing data in the store should remain unchanged.
**Validates: Requirements 3.2, 7.3, 8.3**
### Property 9: Rate-limit resume continuity
*For any* ticker and date range where ingestion is interrupted by a rate limit after N records, resuming the fetch for the same ticker and date range should continue from the day after the last successfully ingested date, resulting in no gaps and no duplicate records across the combined ingestion.
**Validates: Requirements 3.3, 3.4, 4.5**
### Property 10: Scheduled collection processes all tickers
*For any* set of tracked tickers, when a scheduled collection job (OHLCV, sentiment, or fundamentals) runs, it should attempt to fetch data for every tracked ticker. If one ticker fails, the remaining tickers should still be processed.
**Validates: Requirements 4.1, 4.3, 7.1, 8.2**
### Property 11: Score bounds invariant
*For any* computed score in the system — indicator normalized score, SR level strength, dimension score, or composite score — the value must be in the range [0, 100].
**Validates: Requirements 5.2, 6.2, 9.1**
### Property 12: Indicator minimum data enforcement
*For any* ticker with fewer OHLCV records than the minimum required for a given indicator (e.g., RSI needs 14+, ADX needs 28+), requesting that indicator should return an error specifying the minimum data requirement.
**Validates: Requirements 5.4**
### Property 13: EMA cross directional bias
*For any* ticker and date range with sufficient OHLCV data, the EMA cross signal should return "bullish" when short EMA > long EMA, "bearish" when short EMA < long EMA, and "neutral" when they are equal (within floating-point tolerance).
**Validates: Requirements 5.3**
### Property 14: Indicator computation determinism
*For any* valid OHLCV dataset and indicator type (ADX, EMA, RSI, ATR), computing the indicator twice with the same inputs should produce identical results.
**Validates: Requirements 5.1**
### Property 15: SR level support/resistance tagging
*For any* SR level and current price, the level should be tagged "support" if the level price is below the current price, and "resistance" if the level price is above the current price.
**Validates: Requirements 6.3**
### Property 16: SR level merging within tolerance
*For any* two SR levels from different detection methods whose price levels are within the configurable tolerance (default 0.5%), the SR Detector should merge them into a single consolidated level. For any two levels outside the tolerance, they should remain separate.
**Validates: Requirements 6.5**
### Property 17: SR level detection from data
*For any* OHLCV dataset with sufficient data, the SR Detector should produce SR levels derived from Volume Profile (HVN/LVN) and/or Pivot Points (swing highs/lows), and each level should reference its detection method.
**Validates: Requirements 6.1**
### Property 18: Sentiment score data shape
*For any* stored Sentiment Score, the classification must be one of (bullish, bearish, neutral), confidence must be in [0, 100], and source and timestamp must be non-null.
**Validates: Requirements 7.2**
### Property 19: Sentiment dimension score uses time decay
*For any* set of sentiment scores within the lookback window, the sentiment dimension score should weight more recent scores higher than older ones. Specifically, given two sets of scores with identical values but different timestamps, the set with more recent timestamps should produce a higher (or equal) dimension score if bullish, or lower (or equal) if bearish.
**Validates: Requirements 7.4**
### Property 20: Fundamental data storage round-trip
*For any* valid fundamental data record (P/E ratio, revenue growth, earnings surprise %, market cap), storing it and retrieving it for the same ticker should return the same values.
**Validates: Requirements 8.1**
### Property 21: Composite score is weighted average
*For any* ticker with dimension scores and a set of weights, the composite score should equal the weighted average of the available dimension scores. Specifically: `composite = sum(weight_i * score_i) / sum(weight_i)` for all available dimensions.
**Validates: Requirements 9.2**
### Property 22: Missing dimensions re-normalize weights
*For any* ticker missing one or more dimension scores, the composite score should be computed using only available dimensions with weights re-normalized to sum to 1.0, and the response should indicate which dimensions are missing.
**Validates: Requirements 9.3**
### Property 23: Staleness marking on data change
*For any* ticker, when underlying data changes (new OHLCV, new sentiment, new fundamentals), the affected dimension scores and composite score should be marked as stale.
**Validates: Requirements 9.4**
### Property 24: Stale score recomputation on demand
*For any* ticker with a stale composite score, requesting the score should trigger recomputation and return a fresh (non-stale) score that reflects current data.
**Validates: Requirements 9.5**
### Property 25: Weight update triggers full recomputation
*For any* set of tickers with composite scores, when dimension weights are updated, all composite scores should be recomputed using the new weights.
**Validates: Requirements 9.7**
### Property 26: Trade setup R:R threshold filtering
*For any* set of potential trade setups, only those with R:R ratio >= the configured threshold (default 3:1) should be returned. No setup below the threshold should appear in results.
**Validates: Requirements 10.1**
### Property 27: Trade setup computation correctness
*For any* ticker with SR levels and ATR data, a long setup should have target = nearest SR level above current price and stop = entry - ATR-based distance, while a short setup should have target = nearest SR level below current price and stop = entry + ATR-based distance. The R:R ratio should equal `|target - entry| / |entry - stop|`.
**Validates: Requirements 10.2, 10.3**
### Property 28: Trade setup data completeness
*For any* trade setup, it must include: entry price (> 0), stop-loss (> 0), target (> 0), R:R ratio (> 0), direction (one of "long" or "short"), and composite score (0-100).
**Validates: Requirements 10.4**
### Property 29: Trade setup pruning on data change
*For any* existing trade setup, when underlying SR levels or price data changes such that the setup no longer meets the R:R threshold, the setup should be removed.
**Validates: Requirements 10.5**
### Property 30: Watchlist auto-population
*For any* set of tickers with composite scores, the watchlist auto-populated entries should be exactly the top-X tickers by composite score (where X is configurable, default 10).
**Validates: Requirements 11.1**
### Property 31: Watchlist entry data completeness
*For any* watchlist entry, the response should include composite score, all dimension scores, R:R ratio (if a trade setup exists for that ticker), and active SR levels.
**Validates: Requirements 11.2**
### Property 32: Manual watchlist entries persist through auto-population
*For any* manually added watchlist entry, it should be tagged as "manual" and should not be removed or replaced when auto-population runs, regardless of the ticker's composite score ranking.
**Validates: Requirements 11.3**
### Property 33: Watchlist size cap enforcement
*For any* watchlist, the total number of entries should never exceed auto-populate count + 10 manual additions (default max 20). Attempting to add a manual entry beyond the cap should be rejected.
**Validates: Requirements 11.4**
### Property 34: Registration creates no-access user
*For any* valid credentials submitted when registration is enabled, the created user should have `has_access = False` and role = "user".
**Validates: Requirements 12.1**
### Property 35: Registration disabled rejects all attempts
*For any* credentials submitted when registration is disabled, the registration should be rejected regardless of credential validity.
**Validates: Requirements 12.2**
### Property 36: Login returns valid JWT
*For any* registered user with valid credentials, login should return a JWT access token that decodes to contain the user's ID, role, and an expiry time 60 minutes from issuance.
**Validates: Requirements 12.3**
### Property 37: Invalid credentials return generic error
*For any* login attempt with invalid credentials (wrong username, wrong password, or both), the error response should be identical — not revealing which field was incorrect.
**Validates: Requirements 12.4**
### Property 38: Access control enforcement
*For any* protected endpoint, unauthenticated requests should receive HTTP 401, and authenticated users without the required role or access should receive HTTP 403.
**Validates: Requirements 12.5**
### Property 39: Admin user management operations
*For any* user account, an admin should be able to grant access, revoke access, and reset the password, with each operation correctly updating the user's state in the database.
**Validates: Requirements 13.2**
### Property 40: Data cleanup preserves structure
*For any* dataset with records of various ages, admin data cleanup (delete records older than N days) should remove old OHLCV, sentiment, and fundamental records while preserving all ticker entries, user accounts, and the latest scores.
**Validates: Requirements 13.4**
### Property 41: Sorting correctness
*For any* list endpoint with a defined sort order (tickers alphabetically, SR levels by strength desc, rankings by composite score desc, trade setups by R:R desc then composite desc), the returned results must be correctly sorted according to the specified order.
**Validates: Requirements 1.4, 6.6, 9.6, 10.8, 11.6**
## Error Handling
### API Error Responses
All errors use the standard JSON envelope with appropriate HTTP status codes:
```json
{
"status": "error",
"data": null,
"error": "Human-readable error message"
}
```
| Scenario | HTTP Status | Error Message Pattern |
|----------|-------------|----------------------|
| Validation failure (bad input) | 400 | "Validation error: {details}" |
| Authentication missing/expired | 401 | "Authentication required" / "Token expired" |
| Insufficient permissions | 403 | "Insufficient permissions" |
| Resource not found | 404 | "Ticker not found: {symbol}" |
| Duplicate resource | 409 | "Ticker already exists: {symbol}" |
| Provider unreachable | 502 | "Market data provider unavailable" |
| Rate limited by provider | 429 | "Rate limited. Ingested {n} records. Resume available." |
| Internal error | 500 | "Internal server error" |
### Error Handling Strategies
**Provider Errors**
- Wrap all provider calls in try/except.
- On connection error or timeout: return 502 with descriptive message. Existing data is never modified.
- On rate limit: record progress (last ingested date), return 429 with progress info.
- On unexpected provider response: log full response, return 502.
**Database Errors**
- Unique constraint violations: catch `IntegrityError`, return 409.
- Connection pool exhaustion: log, return 503 "Service temporarily unavailable".
- All DB operations within transactions — rollback on any error.
**Validation Errors**
- Pydantic model validation catches schema-level errors automatically (400).
- Business validation (e.g., high < low, future date) in service layer, raises custom exceptions mapped to 400.
**Scheduled Job Errors**
- Each ticker processed independently — one failure doesn't stop others.
- Errors logged with structured JSON (ticker, job name, error type, message).
- Job-level errors (e.g., scheduler crash) logged and job retried on next interval.
**Authentication Errors**
- Invalid/missing token: 401 with generic message.
- Expired token: 401 with "Token expired" message.
- Invalid credentials on login: 401 with generic "Invalid credentials" (never reveals which field).
- Insufficient role: 403.
### Exception Hierarchy
```python
class AppError(Exception):
"""Base application error."""
status_code: int = 500
message: str = "Internal server error"
class ValidationError(AppError):
status_code = 400
class NotFoundError(AppError):
status_code = 404
class DuplicateError(AppError):
status_code = 409
class AuthenticationError(AppError):
status_code = 401
class AuthorizationError(AppError):
status_code = 403
class ProviderError(AppError):
status_code = 502
class RateLimitError(AppError):
status_code = 429
```
A global exception handler in FastAPI middleware catches `AppError` subclasses and formats them into the JSON envelope.
## Deployment and Infrastructure
### Target Environment
- **Production**: Debian 12 with Nginx and PostgreSQL (pre-installed by operator)
- **Development**: macOS with local PostgreSQL (via Homebrew or Docker)
- **CI/CD**: Gitea Actions
- **Domain**: `signal.thiessen.io` (reverse-proxied through Nginx)
### Local Development (macOS)
Local dev works identically to production. Install PostgreSQL via Homebrew (`brew install postgresql@16`) or run it in Docker. The app is pure Python — no platform-specific dependencies.
```bash
# Setup
git clone <repo>
cd stock-data-backend
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
# Local DB
createdb stock_data_backend
cp .env.example .env # Edit DATABASE_URL to point to local Postgres
# Migrations
alembic upgrade head
# Run
uvicorn app.main:app --reload --port 8000
```
### Database Setup Script (`deploy/setup_db.sh`)
Creates the PostgreSQL database, user, and runs migrations. Idempotent — safe to run multiple times.
```bash
#!/bin/bash
set -e
DB_NAME="${DB_NAME:-stock_data_backend}"
DB_USER="${DB_USER:-stock_backend}"
DB_PASS="${DB_PASS:-changeme}"
sudo -u postgres psql <<EOF
DO \$\$
BEGIN
IF NOT EXISTS (SELECT FROM pg_roles WHERE rolname = '${DB_USER}') THEN
CREATE ROLE ${DB_USER} WITH LOGIN PASSWORD '${DB_PASS}';
END IF;
END \$\$;
SELECT 'CREATE DATABASE ${DB_NAME} OWNER ${DB_USER}'
WHERE NOT EXISTS (SELECT FROM pg_database WHERE datname = '${DB_NAME}')\gexec
GRANT ALL PRIVILEGES ON DATABASE ${DB_NAME} TO ${DB_USER};
EOF
# Run migrations
alembic upgrade head
```
### Nginx Configuration (`deploy/nginx.conf`)
Reverse proxy template for `signal.thiessen.io`. Assumes SSL is handled externally (e.g., Certbot).
```nginx
server {
listen 80;
server_name signal.thiessen.io;
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_read_timeout 120s;
}
}
```
### Systemd Service (`deploy/stock-data-backend.service`)
Runs the app as a daemon with auto-restart.
```ini
[Unit]
Description=Stock Data Backend
After=network.target postgresql.service
[Service]
Type=exec
User=stock_backend
Group=stock_backend
WorkingDirectory=/opt/stock-data-backend
EnvironmentFile=/opt/stock-data-backend/.env
ExecStart=/opt/stock-data-backend/.venv/bin/uvicorn app.main:app --host 127.0.0.1 --port 8000 --workers 1
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
```
### Gitea Actions CI/CD (`.gitea/workflows/deploy.yml`)
Pipeline: lint → test → deploy to Debian server via SSH.
```yaml
name: Deploy
on:
push:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:16
env:
POSTGRES_DB: test_db
POSTGRES_USER: test_user
POSTGRES_PASSWORD: test_pass
ports:
- 5432:5432
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install -e ".[dev]"
- run: alembic upgrade head
env:
DATABASE_URL: postgresql+asyncpg://test_user:test_pass@localhost:5432/test_db
- run: pytest --tb=short
env:
DATABASE_URL: postgresql+asyncpg://test_user:test_pass@localhost:5432/test_db
deploy:
needs: test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Deploy via SSH
uses: appleboy/ssh-action@v1
with:
host: ${{ secrets.DEPLOY_HOST }}
username: ${{ secrets.DEPLOY_USER }}
key: ${{ secrets.DEPLOY_KEY }}
script: |
cd /opt/stock-data-backend
git pull origin main
source .venv/bin/activate
pip install -e .
alembic upgrade head
sudo systemctl restart stock-data-backend
```
### Environment Variables (`.env.example`)
```env
# Database
DATABASE_URL=postgresql+asyncpg://stock_backend:changeme@localhost:5432/stock_data_backend
# Auth
JWT_SECRET=change-this-to-a-random-secret
JWT_EXPIRY_MINUTES=60
# OHLCV Provider — Alpaca Markets
ALPACA_API_KEY=
ALPACA_API_SECRET=
# Sentiment Provider — Gemini with Search Grounding
GEMINI_API_KEY=
GEMINI_MODEL=gemini-2.0-flash
# Fundamentals Provider — Financial Modeling Prep
FMP_API_KEY=
# Scheduled Jobs
DATA_COLLECTOR_FREQUENCY=daily
SENTIMENT_POLL_INTERVAL_MINUTES=30
FUNDAMENTAL_FETCH_FREQUENCY=daily
RR_SCAN_FREQUENCY=daily
# Scoring Defaults
DEFAULT_WATCHLIST_AUTO_SIZE=10
DEFAULT_RR_THRESHOLD=3.0
# Database Pool
DB_POOL_SIZE=5
DB_POOL_TIMEOUT=30
# Logging
LOG_LEVEL=INFO
```
## Testing Strategy
### Testing Framework
- **Unit/Integration tests**: `pytest` with `pytest-asyncio` for async test support.
- **Property-based tests**: `hypothesis` — the standard PBT library for Python.
- **Database tests**: Use a test PostgreSQL database with Alembic migrations applied before test runs. Each test runs in a transaction that is rolled back.
- **Provider mocking**: Mock the `MarketDataProvider` protocol for all tests that don't need real external calls.
### Test Organization
```
tests/
├── conftest.py # Shared fixtures (db session, test client, mock provider)
├── unit/ # Unit tests for individual functions
│ ├── test_ticker_service.py
│ ├── test_price_service.py
│ ├── test_indicator_service.py
│ ├── test_sr_service.py
│ ├── test_scoring_service.py
│ ├── test_rr_scanner_service.py
│ ├── test_watchlist_service.py
│ ├── test_auth_service.py
│ └── test_admin_service.py
├── property/ # Property-based tests (hypothesis)
│ ├── test_ticker_properties.py
│ ├── test_ohlcv_properties.py
│ ├── test_ingestion_properties.py
│ ├── test_indicator_properties.py
│ ├── test_sr_properties.py
│ ├── test_scoring_properties.py
│ ├── test_rr_scanner_properties.py
│ ├── test_watchlist_properties.py
│ └── test_auth_properties.py
└── integration/ # API-level integration tests
└── test_api_endpoints.py
```
### Dual Testing Approach
**Unit tests** cover:
- Specific examples that demonstrate correct behavior (e.g., known RSI calculation for a fixed dataset).
- Edge cases: empty ticker list, zero OHLCV records, expired JWT, max watchlist size.
- Error conditions: provider timeout, invalid input shapes, missing data.
- Integration points: API endpoint → service → database flow.
**Property-based tests** cover:
- Universal properties that hold for all valid inputs (the 41 correctness properties above).
- Each property test uses `hypothesis` with `@settings(max_examples=100)` minimum.
- Each property test is tagged with a comment referencing the design property:
```python
# Feature: stock-data-backend, Property 5: OHLCV storage round-trip
@given(ohlcv=valid_ohlcv_records())
@settings(max_examples=100)
def test_ohlcv_storage_round_trip(ohlcv):
...
```
**Unit tests and property tests are complementary**:
- Unit tests catch concrete bugs with specific, readable examples.
- Property tests verify general correctness across randomized inputs.
- Together they provide comprehensive coverage.
### Hypothesis Custom Strategies
Custom `hypothesis` strategies for generating domain objects:
- `valid_ticker_symbols()`: Uppercase alphanumeric strings, 1-5 chars.
- `whitespace_strings()`: Strings composed entirely of whitespace (including empty).
- `valid_ohlcv_records()`: Records where high >= low, all prices >= 0, volume >= 0, date <= today.
- `invalid_ohlcv_records()`: Records violating at least one constraint.
- `dimension_scores()`: Floats in [0, 100] for each dimension.
- `weight_configs()`: Dicts of dimension → positive float weight.
- `sr_levels()`: Levels with valid price, type, strength, method.
- `sentiment_scores()`: Records with valid classification, confidence, source, timestamp.
- `trade_setups()`: Setups with valid entry, stop, target, direction, R:R.
### Property-to-Test Mapping
Each of the 41 correctness properties maps to exactly one property-based test. The tag format is:
```
Feature: stock-data-backend, Property {number}: {property_title}
```
For example:
- `Feature: stock-data-backend, Property 1: Ticker creation round-trip`
- `Feature: stock-data-backend, Property 6: OHLCV validation rejects invalid records`
- `Feature: stock-data-backend, Property 21: Composite score is weighted average`
- `Feature: stock-data-backend, Property 41: Sorting correctness`