major update
Some checks failed
Deploy / lint (push) Failing after 8s
Deploy / test (push) Has been skipped
Deploy / deploy (push) Has been skipped

This commit is contained in:
Dennis Thiessen
2026-02-27 16:08:09 +01:00
parent 61ab24490d
commit 181cfe6588
71 changed files with 7647 additions and 281 deletions

View File

@@ -0,0 +1 @@
{"specId": "997fa90b-08bc-4b72-b099-ecc0ad611b06", "workflowType": "requirements-first", "specType": "bugfix"}

View File

@@ -0,0 +1,39 @@
# Bugfix Requirements Document
## Introduction
The R:R scanner's `scan_ticker` function selects trade setup targets by picking whichever S/R level yields the highest R:R ratio. Because R:R = reward / risk and risk is fixed (ATR-based stop), this always favors the most distant S/R level. The result is unrealistic trade setups targeting far-away levels that price is unlikely to reach. The scanner should instead select the highest-quality target by balancing R:R ratio with level strength and proximity to current price.
## Bug Analysis
### Current Behavior (Defect)
1.1 WHEN scanning for long setups THEN the system iterates all resistance levels above entry price and selects the one with the maximum R:R ratio, which is always the most distant level since risk is fixed
1.2 WHEN scanning for short setups THEN the system iterates all support levels below entry price and selects the one with the maximum R:R ratio, which is always the most distant level since risk is fixed
1.3 WHEN multiple S/R levels exist at varying distances with different strength values THEN the system ignores the `strength` field entirely and selects based solely on R:R magnitude
1.4 WHEN a weak, distant S/R level exists alongside a strong, nearby S/R level THEN the system selects the weak distant level because it produces a higher R:R ratio, resulting in an unrealistic trade setup
### Expected Behavior (Correct)
2.1 WHEN scanning for long setups THEN the system SHALL compute a quality score for each candidate resistance level that factors in R:R ratio, S/R level strength, and proximity to entry price, and select the level with the highest quality score
2.2 WHEN scanning for short setups THEN the system SHALL compute a quality score for each candidate support level that factors in R:R ratio, S/R level strength, and proximity to entry price, and select the level with the highest quality score
2.3 WHEN multiple S/R levels exist at varying distances with different strength values THEN the system SHALL weight stronger levels higher in the quality score, favoring targets that price is more likely to reach
2.4 WHEN a weak, distant S/R level exists alongside a strong, nearby S/R level THEN the system SHALL prefer the strong nearby level unless the distant level's combined quality score (considering its lower proximity and strength factors) still exceeds the nearby level's score
### Unchanged Behavior (Regression Prevention)
3.1 WHEN no S/R levels exist above entry price for longs (or below for shorts) THEN the system SHALL CONTINUE TO produce no setup for that direction
3.2 WHEN no candidate level meets the R:R threshold THEN the system SHALL CONTINUE TO produce no setup for that direction
3.3 WHEN only one S/R level exists in the target direction THEN the system SHALL CONTINUE TO evaluate it against the R:R threshold and produce a setup if it qualifies
3.4 WHEN scanning all tickers THEN the system SHALL CONTINUE TO process each ticker independently and persist results to the database
3.5 WHEN fetching stored trade setups THEN the system SHALL CONTINUE TO return them sorted by R:R ratio descending with composite score as secondary sort

View File

@@ -0,0 +1,209 @@
# R:R Scanner Target Quality Bugfix Design
## Overview
The `scan_ticker` function in `app/services/rr_scanner_service.py` selects trade setup targets by iterating candidate S/R levels and picking the one with the highest R:R ratio. Because risk is fixed (ATR × multiplier), R:R is a monotonically increasing function of distance from entry price. This means the scanner always selects the most distant S/R level, producing unrealistic trade setups.
The fix replaces the `max(rr)` selection with a quality score that balances three factors: R:R ratio, S/R level strength (0100), and proximity to current price. The quality score is computed as a weighted sum of normalized components, and the candidate with the highest quality score is selected as the target.
## Glossary
- **Bug_Condition (C)**: Multiple candidate S/R levels exist in the target direction, and the current code selects the most distant one purely because it has the highest R:R ratio, ignoring strength and proximity
- **Property (P)**: The scanner should select the candidate with the highest quality score (a weighted combination of R:R ratio, strength, and proximity) rather than the highest raw R:R ratio
- **Preservation**: All behavior for single-candidate scenarios, no-candidate scenarios, R:R threshold filtering, database persistence, and `get_trade_setups` sorting must remain unchanged
- **scan_ticker**: The function in `app/services/rr_scanner_service.py` that scans a single ticker for long and short trade setups
- **SRLevel.strength**: An integer 0100 representing how many times price has touched this level relative to total bars (computed by `sr_service._strength_from_touches`)
- **quality_score**: New scoring metric: `w_rr * norm_rr + w_strength * norm_strength + w_proximity * norm_proximity`
## Bug Details
### Fault Condition
The bug manifests when multiple S/R levels exist in the target direction (above entry for longs, below entry for shorts) and the scanner selects the most distant level because it has the highest R:R ratio, even though a closer, stronger level would be a more realistic target.
**Formal Specification:**
```
FUNCTION isBugCondition(input)
INPUT: input of type {entry_price, risk, candidate_levels: list[{price_level, strength}]}
OUTPUT: boolean
candidates := [lv for lv in candidate_levels where reward(lv) / risk >= rr_threshold]
IF len(candidates) < 2 THEN RETURN false
max_rr_level := argmax(candidates, key=lambda lv: reward(lv) / risk)
max_quality_level := argmax(candidates, key=lambda lv: quality_score(lv, entry_price, risk))
RETURN max_rr_level != max_quality_level
END FUNCTION
```
### Examples
- **Long, 2 resistance levels**: Entry=100, ATR-stop=97 (risk=3). Level A: price=103, strength=80 (R:R=1.0). Level B: price=115, strength=10 (R:R=5.0). Current code picks B (highest R:R). Expected: picks A (strong, nearby, realistic).
- **Long, 3 resistance levels**: Entry=50, risk=2. Level A: price=53, strength=90 (R:R=1.5). Level B: price=58, strength=40 (R:R=4.0). Level C: price=70, strength=5 (R:R=10.0). Current code picks C. Expected: picks A or B depending on quality weights.
- **Short, 2 support levels**: Entry=200, risk=5. Level A: price=192, strength=70 (R:R=1.6). Level B: price=170, strength=15 (R:R=6.0). Current code picks B. Expected: picks A.
- **Single candidate (no bug)**: Entry=100, risk=3. Only Level A: price=106, strength=50 (R:R=2.0). Both old and new code select A — no divergence.
## Expected Behavior
### Preservation Requirements
**Unchanged Behaviors:**
- When no S/R levels exist in the target direction, no setup is produced for that direction
- When no candidate level meets the R:R threshold, no setup is produced
- When only one S/R level exists in the target direction, it is evaluated against the R:R threshold and used if it qualifies
- `scan_all_tickers` processes each ticker independently; one failure does not stop others
- `get_trade_setups` returns results sorted by R:R ratio descending with composite score as secondary sort
- Database persistence: old setups are deleted and new ones inserted per ticker
- ATR computation, OHLCV fetching, and stop-loss calculation remain unchanged
- The TradeSetup model fields and their rounding (4 decimal places) remain unchanged
**Scope:**
All inputs where only zero or one candidate S/R levels exist in the target direction are completely unaffected by this fix. The fix only changes the selection logic when multiple qualifying candidates exist.
## Hypothesized Root Cause
Based on the bug description, the root cause is straightforward:
1. **Selection by max R:R only**: The inner loop in `scan_ticker` tracks `best_rr` and `best_target`, selecting whichever level produces the highest `rr = reward / risk`. Since `risk` is constant (ATR-based), `rr` is proportional to distance. The code has no mechanism to factor in `SRLevel.strength` or proximity.
2. **No quality scoring exists**: The `SRLevel.strength` field (0100) is available in the database and loaded by the query, but the selection loop never reads it. There is no quality score computation anywhere in the codebase.
3. **No proximity normalization**: Distance from entry is used only to compute reward, never as a penalty. Closer levels are always disadvantaged.
## Correctness Properties
Property 1: Fault Condition - Quality Score Selection Replaces Max R:R
_For any_ input where multiple candidate S/R levels exist in the target direction and meet the R:R threshold, the fixed `scan_ticker` function SHALL select the candidate with the highest quality score (weighted combination of normalized R:R, normalized strength, and normalized proximity) rather than the candidate with the highest raw R:R ratio.
**Validates: Requirements 2.1, 2.2, 2.3, 2.4**
Property 2: Preservation - Single/Zero Candidate Behavior Unchanged
_For any_ input where zero or one candidate S/R levels exist in the target direction, the fixed `scan_ticker` function SHALL produce the same result as the original function, preserving the existing filtering, persistence, and output behavior.
**Validates: Requirements 3.1, 3.2, 3.3, 3.4, 3.5**
## Fix Implementation
### Changes Required
Assuming our root cause analysis is correct:
**File**: `app/services/rr_scanner_service.py`
**Function**: `scan_ticker`
**Specific Changes**:
1. **Add `_compute_quality_score` helper function**: A new module-level function that computes the quality score for a candidate S/R level given entry price, risk, and configurable weights.
```python
def _compute_quality_score(
rr: float,
strength: int,
distance: float,
entry_price: float,
*,
w_rr: float = 0.35,
w_strength: float = 0.35,
w_proximity: float = 0.30,
rr_cap: float = 10.0,
) -> float:
norm_rr = min(rr / rr_cap, 1.0)
norm_strength = strength / 100.0
norm_proximity = 1.0 - min(distance / entry_price, 1.0)
return w_rr * norm_rr + w_strength * norm_strength + w_proximity * norm_proximity
```
- `norm_rr`: R:R capped at `rr_cap` (default 10) and divided to get 01 range
- `norm_strength`: Strength divided by 100 (already 0100 integer)
- `norm_proximity`: `1 - (distance / entry_price)`, so closer levels score higher
- Default weights: 0.35 R:R, 0.35 strength, 0.30 proximity (sum = 1.0)
2. **Replace long setup selection loop**: Instead of tracking `best_rr` / `best_target`, iterate candidates, compute quality score for each, and track `best_quality` / `best_candidate`. Still filter by `rr >= rr_threshold` before scoring. Store the selected level's R:R in the TradeSetup (not the quality score — R:R remains the reported metric).
3. **Replace short setup selection loop**: Same change as longs but for levels below entry.
4. **Pass `SRLevel` object through selection**: The loop already has access to `lv.strength` from the query. No additional DB queries needed.
5. **No changes to `get_trade_setups`**: Sorting by `rr_ratio` descending remains. The `rr_ratio` stored in TradeSetup is the actual R:R of the selected level, not the quality score.
## Testing Strategy
### Validation Approach
The testing strategy follows a two-phase approach: first, surface counterexamples that demonstrate the bug on unfixed code, then verify the fix works correctly and preserves existing behavior.
### Exploratory Fault Condition Checking
**Goal**: Surface counterexamples that demonstrate the bug BEFORE implementing the fix. Confirm or refute the root cause analysis. If we refute, we will need to re-hypothesize.
**Test Plan**: Create mock scenarios with multiple S/R levels of varying strength and distance. Run `scan_ticker` on unfixed code and assert that the selected target is NOT the most distant level. These tests will fail on unfixed code, confirming the bug.
**Test Cases**:
1. **Long with strong-near vs weak-far**: Entry=100, risk=3. Near level (103, strength=80) vs far level (115, strength=10). Assert selected target != 115 (will fail on unfixed code)
2. **Short with strong-near vs weak-far**: Entry=200, risk=5. Near level (192, strength=70) vs far level (170, strength=15). Assert selected target != 170 (will fail on unfixed code)
3. **Three candidates with varying profiles**: Entry=50, risk=2. Three levels at different distances/strengths. Assert selection is not purely distance-based (will fail on unfixed code)
**Expected Counterexamples**:
- The unfixed code always selects the most distant level regardless of strength
- Root cause confirmed: selection loop only tracks `best_rr` which is proportional to distance
### Fix Checking
**Goal**: Verify that for all inputs where the bug condition holds, the fixed function produces the expected behavior.
**Pseudocode:**
```
FOR ALL input WHERE isBugCondition(input) DO
result := scan_ticker_fixed(input)
selected_level := result.target
ASSERT selected_level == argmax(candidates, key=quality_score)
ASSERT quality_score(selected_level) >= quality_score(any_other_candidate)
END FOR
```
### Preservation Checking
**Goal**: Verify that for all inputs where the bug condition does NOT hold, the fixed function produces the same result as the original function.
**Pseudocode:**
```
FOR ALL input WHERE NOT isBugCondition(input) DO
ASSERT scan_ticker_original(input) == scan_ticker_fixed(input)
END FOR
```
**Testing Approach**: Property-based testing is recommended for preservation checking because:
- It generates many test cases automatically across the input domain
- It catches edge cases that manual unit tests might miss
- It provides strong guarantees that behavior is unchanged for all non-buggy inputs
**Test Plan**: Observe behavior on UNFIXED code first for zero-candidate and single-candidate scenarios, then write property-based tests capturing that behavior.
**Test Cases**:
1. **Zero candidates preservation**: Generate random tickers with no S/R levels in target direction. Verify no setup is produced (same as original).
2. **Single candidate preservation**: Generate random tickers with exactly one qualifying S/R level. Verify same setup is produced as original.
3. **Below-threshold preservation**: Generate random tickers where all candidates have R:R below threshold. Verify no setup is produced.
4. **Database persistence preservation**: Verify old setups are deleted and new ones inserted identically.
### Unit Tests
- Test `_compute_quality_score` with known inputs and verify output matches expected formula
- Test that quality score components are properly normalized to 01 range
- Test that `rr_cap` correctly caps the R:R normalization
- Test edge cases: strength=0, strength=100, distance=0, single candidate
### Property-Based Tests
- Generate random sets of S/R levels with varying strengths and distances; verify the selected target always has the highest quality score among candidates
- Generate random single-candidate scenarios; verify output matches what the original function would produce
- Generate random inputs with all candidates below R:R threshold; verify no setup is produced
### Integration Tests
- Test full `scan_ticker` flow with mocked DB containing multiple S/R levels of varying quality
- Test `scan_all_tickers` still processes each ticker independently
- Test that `get_trade_setups` returns correct sorting after fix

View File

@@ -0,0 +1,35 @@
# Tasks
## 1. Add quality score helper function
- [x] 1.1 Create `_compute_quality_score(rr, strength, distance, entry_price, *, w_rr=0.35, w_strength=0.35, w_proximity=0.30, rr_cap=10.0) -> float` function in `app/services/rr_scanner_service.py` that computes a weighted sum of normalized R:R, normalized strength, and normalized proximity
- [x] 1.2 Implement normalization: `norm_rr = min(rr / rr_cap, 1.0)`, `norm_strength = strength / 100.0`, `norm_proximity = 1.0 - min(distance / entry_price, 1.0)`
- [x] 1.3 Return `w_rr * norm_rr + w_strength * norm_strength + w_proximity * norm_proximity`
## 2. Replace long setup selection logic
- [x] 2.1 In `scan_ticker`, replace the long setup loop that tracks `best_rr` / `best_target` with a loop that computes `quality_score` for each candidate via `_compute_quality_score` and tracks `best_quality` / `best_candidate_rr` / `best_candidate_target`
- [x] 2.2 Keep the `rr >= rr_threshold` filter — only candidates meeting the threshold are scored
- [x] 2.3 Store the selected candidate's actual R:R ratio (not the quality score) in `TradeSetup.rr_ratio`
## 3. Replace short setup selection logic
- [x] 3.1 Apply the same quality-score selection change to the short setup loop, mirroring the long setup changes
- [x] 3.2 Ensure distance is computed as `entry_price - lv.price_level` for short candidates
## 4. Write unit tests for `_compute_quality_score`
- [x] 4.1 Create `tests/unit/test_rr_scanner_quality_score.py` with tests for known inputs verifying the formula output
- [x] 4.2 Test edge cases: strength=0, strength=100, distance=0, rr at cap, rr above cap
- [x] 4.3 Test that all normalized components stay in 01 range
## 5. Write exploratory bug-condition tests (run on unfixed code to confirm bug)
- [x] 5.1 [PBT-exploration] Create `tests/unit/test_rr_scanner_bug_exploration.py` with a property test that generates multiple S/R levels with varying strengths and distances, calls `scan_ticker`, and asserts the selected target is NOT always the most distant level — expected to FAIL on unfixed code, confirming the bug
## 6. Write fix-checking tests
- [x] 6.1 [PBT-fix] Create `tests/unit/test_rr_scanner_fix_check.py` with a property test that generates multiple candidate S/R levels meeting the R:R threshold, calls `scan_ticker` on fixed code, and asserts the selected target has the highest quality score among all candidates
## 7. Write preservation tests
- [x] 7.1 [PBT-preservation] Create `tests/unit/test_rr_scanner_preservation.py` with a property test that generates zero-candidate and single-candidate scenarios and asserts the fixed function produces the same output as the original (no setup for zero candidates, same setup for single candidate)
- [x] 7.2 Add unit test verifying that when no S/R levels exist, no setup is produced (unchanged)
- [x] 7.3 Add unit test verifying that when only one candidate meets threshold, it is selected (unchanged)
- [x] 7.4 Add unit test verifying `get_trade_setups` sorting is unchanged (R:R desc, composite desc)
## 8. Integration test
- [x] 8.1 Add integration test in `tests/unit/test_rr_scanner_integration.py` that mocks DB with multiple S/R levels of varying quality, runs `scan_ticker`, and verifies the full flow: quality-based selection, correct TradeSetup fields, database persistence