feat(job_scout): add 6 Swiss/EU companies, new adapters, and scan-stats table

Automate Palantir, QuantCo, Swissgrid, RUAG, SBB, BKW (drop BFH/Dialectic); 25 companies automated, 0 manual. - adapters: lever (Palantir/QuantCo), generic json (Swissgrid), sbb, bkw - fetch_playwright: optional ?page=N pagination (page_param/max_pages) for RUAG - location_matches: treat pan-EU "Europe"/"EMEA" postings as eligible - per-company _score_floor so pre-filtered German-language boards stay visible - POSITIVE_KEYWORDS: add data scientist / data science (medium) - report: scan-stats table (scraped / CH-remote / match>=2 / newest / time) + totals Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 15:15:22 +02:00
parent 49ba42138d
commit da66443aa8
1 changed files with 383 additions and 49 deletions
@@ -1,7 +1,7 @@
 """Job scout for Dennis's quarterly target companies.
 Pulls latest openings from companies via public ATS APIs (Workday/Ashby/Greenhouse/
-SmartRecruiters/Eightfold/RSS) and, for JS-rendered careers sites, a headless-browser
+SmartRecruiters/Lever/Eightfold/RSS) and, for JS-rendered careers sites, a headless-browser
 (playwright) adapter. Filters by Swiss location or remote eligibility, scores fit against
 profile keywords, tracks which job IDs we've already seen, writes a markdown report.
@@ -22,6 +22,7 @@ See the adapter-coverage notes at the bottom for the current automated/manual sp
 import json
 import re
 import sys
 import time
 from functools import lru_cache
 import urllib.error
 import urllib.parse
@@ -59,6 +60,9 @@ POSITIVE_KEYWORDS = {
    "applied ai": 3, "applied ml": 3, "ai engineer": 3, "ml engineer": 3,
    "mlops": 3, "ai platform": 3, "ml platform": 3,
    "python": 2, "java": 2, "data engineer": 2, "data engineering": 2,
    # "data scientist" scored modestly (medium, not strong) — secondary to his data-eng/
    # platform thesis, but the targeted band at boutiques like QuantCo (see target memory).
    "data scientist": 2, "data science": 2,
    "solutions architect": 2, "platform engineer": 2,
    "ai infrastructure": 2, "inference": 2, "rag": 2, "agentic": 2,
    "kubernetes": 1, "docker": 1, "etl": 1, "pipeline": 1,
@@ -227,11 +231,72 @@ COMPANIES = [
        "scroll_count": 5,
        "use_inner_text_as_blob": True,
    }),
    # --- Zürich/Zug high-comp additions (2026-05-31 list review) ---
    # Palantir (Lever). Verified: 221 postings on the public board. It's US/London-heavy, so
    # Swiss/Schwyz roles are rare but self-surface when posted (the location filter drops the
    # US/London bulk). No title filter: his target titles (Forward Deployed Software Engineer,
    # Deployment Strategist) aren't in ENG_TITLE_FILTER, so filtering would hide them.
    ("palantir",  "Palantir",  "lever", {"slug": "palantir"}),
    # QuantCo (Lever — note the trailing-hyphen slug "quantco-"). ~16 roles, most tagged
    # "Europe" (hybrid); QuantCo's continental hub is Zürich, so the EU-wide rule in
    # location_matches surfaces them. No title filter: the target band is DS/Quant/AI/Cloud
    # (see comp analysis), which ENG_TITLE_FILTER would drop; interns/frontend are caught by
    # NEGATIVE_KEYWORDS instead.
    ("quantco",   "QuantCo",   "lever", {"slug": "quantco-"}),
    # --- Bern/Thun local tier — WLB & proximity exception (comp bar relaxed; 2026-06-01) ---
    # Wired after live endpoint discovery. ⚠️ German citizen: RUAG classified work may require
    # Swiss citizenship — verify per-role before tailoring (see project_target_companies).
    # Swissgrid (Aarau): Magnolia CMS JSON endpoint (verified). placeOfWork is a bare city
    # (Aarau/Prilly/...), so loc_suffix tags it Switzerland for the CH filter. No title filter
    # (small board ~13 roles; lets Data Scientist / Applied-ML roles surface).
    ("swissgrid", "Swissgrid (Aarau)", "json", {
        "url": "https://www.swissgrid.ch/.rest/cloud/component-data?path=%2Fswissgrid%2Fen%2Fhome%2Fcareer%2Fjobs%2Fmain%2Fjoblist_transferred_11",
        "jobs_key": "jobs",
        "field_title": "title", "field_location": "placeOfWork",
        "field_url": "descriptionUrl", "field_date": "onlineSince",
        "loc_suffix": " Switzerland",
        "desc_keys": ["department", "typeOfEmployment", "entryLevel"],
    }),
    # RUAG (Thun/Bern/Emmen). Jobs render on the portal as anchors to jobs.ruag.ch; the first
    # line of each anchor is the title. All sites are Swiss, so default_location=Switzerland
    # passes the CH filter. ENG_TITLE_FILTER cuts the apprenticeship/Lehrstelle bulk.
    # Drupal portal: 20 jobs/page, server-rendered, paginated via ?page=N (0-indexed). The
    # first page is apprenticeship-heavy; eng roles (DevOps/Data/Cloud) are on later pages,
    # so we page through until a page adds nothing new (~5-6 pages).
    ("ruag",      "RUAG (Thun/Bern)", "playwright", {
        "url": "https://www.ruag.ch/en/working-us/job-portal",
        "wait_for": "a[href*='/offene-stellen/']",
        "card": "a[href*='/offene-stellen/']",
        "title_attr": "text",
        "link_attr": "href",
        "default_location": "Switzerland",
        "scroll_count": 1,
        "page_param": "page",
        "max_pages": 10,
        "_title_filter": ENG_TITLE_FILTER,
    }),
    # SBB (company.sbb.ch — the correct host; company-jobs.sbb.ch was wrong). AEM job filter
    # served as a flat JSON list; the fetch_sbb adapter replicates the user's IT + Bern-region
    # filter. German/generic titles, so _score_floor keeps the pre-filtered results visible.
    # ⚠️ DE-citizen limits may apply to some SBB security/critical-infra roles.
    ("sbb",       "SBB",       "sbb", {
        "topic": "IT / Telekommunikation",
        "region": "Bern Mittelland",
        "_score_floor": 2,
    }),
    # BKW Group (jobs.bkw.com — the real ATS host). PMS structured-data API; ~600 roles
    # group-wide, so fetch_bkw keeps only Berufsfeld categories Informatik/Trading/Finanzen
    # (IT/data + energy-trading, incl. the flagged Energiehandel roles). German/generic
    # titles, so _score_floor keeps the pre-filtered set visible.
    ("bkw",       "BKW (Bern)", "bkw", {"_score_floor": 2}),
 ]
 # Companies where adapter probing did not yield a reliable scrape. Reasons noted.
 # These surface as a clickable checklist in the report so they're not forgotten.
-# (Empty — all current target companies are automated.)
+# Companies that resist scraping stay here as a clickable report checklist. Currently empty —
 # every target company is automated. (Dropped 2026-06-01: BFH — academic FH pay below even the
 # relaxed Bern/Thun floor, research-leaning, 403s anyway; Dialectic — ~50-person crypto VC,
 # 0 open roles, crypto angle already covered by Kraken/Bitcoin Suisse/Coinbase Ventures.)
 MANUAL_CHECK = []
@@ -509,6 +574,145 @@ def fetch_onlyfy(args):
    return jobs
 def fetch_lever(args):
    """Lever public postings API. Palantir uses this. The board is US/London-heavy;
    Swiss/Zurich (Schwyz hub) roles are rare on it but will surface here when posted —
    location filtering downstream drops the US/London bulk. categories.allLocations
    captures multi-location postings; createdAt is epoch-ms."""
    slug = args["slug"]
    data = http_get_json(f"https://api.lever.co/v0/postings/{slug}?mode=json")
    jobs = []
    for j in data:
        cats = j.get("categories") or {}
        all_locs = cats.get("allLocations") or []
        loc_blob = " | ".join(x for x in ([cats.get("location") or ""] + [str(a) for a in all_locs]) if x)
        ts = j.get("createdAt")
        posted = ""
        if isinstance(ts, (int, float)):
            posted = datetime.fromtimestamp(ts / 1000, tz=timezone.utc).strftime("%Y-%m-%d")
        jobs.append({
            "id": j.get("id"),
            "title": j.get("text", ""),
            "location": loc_blob,
            "url": j.get("hostedUrl"),
            "posted": posted,
            "description": (j.get("descriptionPlain") or "")[:2500],
        })
    return jobs
 def fetch_json(args):
    """Generic JSON jobs API with configurable field names, for employer sites that expose
    a clean public endpoint. Verified use: Swissgrid (Magnolia CMS
    /.rest/cloud/component-data — {config, jobs:[...], filters}). Field names vary by site,
    so they're configurable: field_title/field_location/field_url/field_date. loc_suffix
    appends e.g. ' Switzerland' so the CH location filter matches city-only values such as
    "Aarau"/"Prilly" (not every Swiss town is in CH_LOCATION_KEYWORDS). desc_keys fold extra
    fields (department, employment type) into the description for keyword scoring.
    Args: url, jobs_key (default "jobs"), field_* (defaults title/location/url/date),
          url_prefix, loc_suffix, desc_keys."""
    data = http_get_json(args["url"])
    arr = data.get(args.get("jobs_key", "jobs"), []) if isinstance(data, dict) else (data or [])
    ft, fl = args.get("field_title", "title"), args.get("field_location", "location")
    fu, fd = args.get("field_url", "url"), args.get("field_date", "date")
    prefix, suffix = args.get("url_prefix", ""), args.get("loc_suffix", "")
    desc_keys = args.get("desc_keys", [])
    jobs = []
    for j in arr:
        url = j.get(fu, "") or ""
        if url and not url.startswith("http") and prefix:
            url = prefix.rstrip("/") + "/" + url.lstrip("/")
        loc = (j.get(fl, "") or "").strip() + suffix
        desc = " ".join(str(j.get(k)) for k in desc_keys if j.get(k))
        jobs.append({
            "id": str(j.get("id") or url),
            "title": j.get(ft, ""),
            "location": loc,
            "url": url,
            "posted": j.get(fd, "") or "",
            "description": desc[:500],
        })
    return jobs
 def fetch_sbb(args):
    """SBB (company.sbb.ch) AEM job filter. The whole board is served as a flat JSON list
    at .../jobfilter.results.json (~145 roles); the website filters client-side via each
    job's numbered `attributes`: '20'=Berufsfeld/topic, '110'=region, '100'=city,
    'links.directlink'=the jobs.sbb.ch URL. We replicate the user's IT + Bern-region filter
    so only commutable IT roles surface. Titles are German/generic (Application Engineer,
    Network Security Engineer, OT Architekt) and won't match ENG_TITLE_FILTER or the keyword
    scorer, so this company is given a _score_floor in COMPANIES to keep its pre-filtered
    results visible. topic/region are configurable substrings."""
    url = args.get("url", ("https://company.sbb.ch/content/internet/corporate/de/"
                           "jobs-karriere/jobs/job-suche/jcr:content/parmain/"
                           "jobfilter.results.json"))
    topic = args.get("topic", "IT / Telekommunikation")
    region = args.get("region", "Bern Mittelland")
    data = http_get_json(url)
    arr = data if isinstance(data, list) else (data.get("results") or data.get("jobs") or [])
    jobs = []
    for j in arr:
        a = j.get("attributes", {}) or {}
        blob = " ".join(str(x) for v in a.values() for x in (v if isinstance(v, list) else [v]))
        if topic and topic not in blob:
            continue
        if region and region not in blob:
            continue
        region_v = " ".join(a.get("110", []) or [])
        city_v = " ".join(a.get("100", []) or [])
        field_v = " ".join(a.get("20", []) or [])
        jobs.append({
            "id": str(j.get("id") or j.get("viewkey") or ""),
            "title": j.get("title", ""),
            "location": f"{city_v} {region_v} Schweiz".strip(),
            "url": (j.get("links") or {}).get("directlink", ""),
            "posted": j.get("start_date", "") or "",
            "description": (field_v + " " + (j.get("text", "") or ""))[:400],
        })
    return jobs
 def fetch_bkw(args):
    """BKW Group (jobs.bkw.com) PMS structured-data API. The whole-group board is ~600 roles
    dominated by building-tech / electrical / civil-engineering trades; we keep only the
    Berufsfeld categories relevant to the user (Informatik / Trading / Finanzen), which
    surfaces IT/data plus the energy-trading roles (Quant Risk Modeller, Solution Architect
    Energiehandel, Energy Derivatives/Market-Risk analysts). locations[].address gives
    city/country. Pre-filtered + German/generic titles, so paired with a _score_floor in
    COMPANIES. The category allowlist is configurable."""
    url = args.get("url", ("https://jobs.bkw.com/_api/v1/structureddata?"
                           "configFromContentElement=82381&language=de-ch"))
    allow = [c.lower() for c in args.get("categories", ["Informatik", "Trading", "Finanzen"])]
    data = http_get_json(url)
    arr = data if isinstance(data, list) else []
    if not arr and isinstance(data, dict):
        for v in data.values():
            if isinstance(v, list) and v and isinstance(v[0], dict) and "title" in v[0]:
                arr = v
                break
    jobs = []
    for j in arr:
        if j.get("type") and j.get("type") != "jobs":
            continue
        cats = [c.get("title", "") for c in (j.get("relations", {}) or {}).get("Berufsfeld", []) or []]
        if allow and not any(any(a in c.lower() for a in allow) for c in cats):
            continue
        locs = j.get("locations") or []
        addr = (locs[0].get("address") if locs and isinstance(locs[0], dict) else {}) or {}
        loc = " ".join(x for x in [addr.get("city", ""), addr.get("country", "")] if x) or "Schweiz"
        jobs.append({
            "id": str(j.get("id") or j.get("url") or ""),
            "title": j.get("title", ""),
            "location": loc,
            "url": j.get("url", ""),
            "posted": "",
            "description": " ".join(cats + [j.get("subtitle", "") or ""])[:300],
        })
    return jobs
 # Injected before page scripts run, to mask the most common headless-detection signals.
 # Required for Google; harmless for the other sites.
 STEALTH_JS = """
@@ -577,18 +781,12 @@ def fetch_playwright(args):
    ctx.add_init_script(STEALTH_JS)
    page = ctx.new_page()
    jobs = []
-    try:
+    seen_ids = set()
-        page.goto(args["url"], timeout=45000, wait_until="domcontentloaded")
+
-        # Optional cookie banner acceptance
+    def scrape_current():
-        for sel in args.get("cookie_accept", []) or []:
+        """Extract cards from the currently-loaded page; append new ones to `jobs`.
-            try:
+        Returns the count of newly-added (not-yet-seen) cards so a pagination loop can
-                btn = page.locator(sel).first
+        stop once a page contributes nothing new."""
                if btn.is_visible(timeout=2000):
                    btn.click()
                    page.wait_for_timeout(500)
            except Exception:
                pass
        # Wait for job content to render
        wait_for = args.get("wait_for")
        if wait_for:
            try:
@@ -605,6 +803,7 @@ def fetch_playwright(args):
        cards = page.locator(args["card"])
        n = min(cards.count(), args.get("max_cards", 150))
        added = 0
        for i in range(n):
            card = cards.nth(i)
            try:
@@ -638,6 +837,11 @@ def fetch_playwright(args):
                if not title:
                    continue
                jid = href or f"{page.url}#{i}"
                if jid in seen_ids:
                    continue
                seen_ids.add(jid)
                added += 1
                description = ""
                if args.get("use_inner_text_as_blob"):
                    # Use the full card text as both location source and description
@@ -646,26 +850,47 @@ def fetch_playwright(args):
                    if not location:
                        location = full[:300]
                jobs.append({
-                    "id": href or f"{args['url']}#{i}",
+                    "id": jid,
                    "title": title,
                    "location": location,
-                    "url": href or args["url"],
+                    "url": href or page.url,
                    "posted": "",
                    "description": description,
                })
            except Exception:
                continue
        return added
    try:
        page.goto(args["url"], timeout=45000, wait_until="domcontentloaded")
        # Optional cookie banner acceptance (once, on the first page)
        for sel in args.get("cookie_accept", []) or []:
            try:
                btn = page.locator(sel).first
                if btn.is_visible(timeout=2000):
                    btn.click()
                    page.wait_for_timeout(500)
            except Exception:
                pass
        # Optional query-param pagination (e.g. Drupal "?page=N", 0-indexed). The base URL is
        # page 0 (already loaded); fetch successive pages until one adds no new cards.
        page_param = args.get("page_param")
        if page_param:
            base = args["url"]
            joiner = "&" if "?" in base else "?"
            for p in range(args.get("max_pages", 8)):
                if p > 0:
                    page.goto(f"{base}{joiner}{page_param}={p}", timeout=45000,
                              wait_until="domcontentloaded")
                added = scrape_current()
                if p > 0 and added == 0:
                    break
        else:
            scrape_current()
    finally:
        ctx.close()
-    # Deduplicate within a single company by id
+    return jobs
    seen, deduped = set(), []
    for j in jobs:
        if j["id"] in seen:
            continue
        seen.add(j["id"])
        deduped.append(j)
    return deduped
 ADAPTERS = {
@@ -678,6 +903,10 @@ ADAPTERS = {
    "rss": fetch_rss,
    "getro": fetch_getro,
    "onlyfy": fetch_onlyfy,
    "lever": fetch_lever,
    "json": fetch_json,
    "sbb": fetch_sbb,
    "bkw": fetch_bkw,
    "playwright": fetch_playwright,
 }
@@ -690,9 +919,12 @@ def location_matches(loc_text):
    has_remote = any(k in low for k in REMOTE_KEYWORDS)
    is_us_only = any(p in low for p in US_ONLY_PATTERNS) and not in_ch
    has_eu_hint = any(k in low for k in EU_HINT_KEYWORDS)
-    # Count as remote-eligible only if it isn't a US-only remote listing
+    # Pan-European postings (location literally "Europe"/"EMEA", e.g. QuantCo's Lever board)
-    # and it has at least one EU/global hint
+    # are reachable for a DACH-based candidate even without an explicit "remote" keyword, so
-    is_remote = has_remote and not is_us_only and has_eu_hint
+    # treat them as eligible too. City-specific EU roles (e.g. "Berlin or Munich") stay out.
    is_eu_wide = any(k in low for k in ("europe", "emea")) and not is_us_only
    # Count as remote/EU-eligible only if it isn't a US-only listing and has an EU/global hint
    is_remote = (has_remote or is_eu_wide) and not is_us_only and has_eu_hint
    return in_ch, is_remote
@@ -747,14 +979,65 @@ def save_seen(seen):
    STATE_FILE.write_text(json.dumps(seen, indent=2, ensure_ascii=False), encoding="utf-8")
-def write_report(path, results, errors, new_only, include_weak):
+def _parse_posted(s):
    """Best-effort parse of an adapter's `posted` field into a date, across the mix of
    formats the boards use (ISO 8601 incl. trailing Z, YYYY-MM-DD, DD.MM.YYYY). Returns None
    for unparseable values (e.g. Workday's relative "Posted 5 Days Ago", or empty)."""
    if not s or not isinstance(s, str):
        return None
    s = s.strip()
    try:
        return datetime.fromisoformat(s.replace("Z", "+00:00")).date()
    except ValueError:
        pass
    for fmt in ("%Y-%m-%d", "%d.%m.%Y", "%Y/%m/%d", "%d/%m/%Y"):
        try:
            return datetime.strptime(s[:10], fmt).date()
        except ValueError:
            pass
    m = re.search(r"\d{4}-\d{2}-\d{2}", s)
    if m:
        try:
            return datetime.strptime(m.group(0), "%Y-%m-%d").date()
        except ValueError:
            pass
    return None
 def write_stats_table(stats, total_secs):
    """Render the per-company scan stats as a markdown table (+ a totals row)."""
    out = ["## Scan stats\n",
           "| Company | Scraped | CH/Remote | Match ≥2 | Newest posting | Time (s) |",
           "|---|--:|--:|--:|:--|--:|"]
    t_scraped = t_elig = t_match = 0
    newest_all = None
    for s in stats:
        name = s["company"] + (" ⚠️" if s.get("error") else "")
        newest = s["newest"].isoformat() if s["newest"] else "—"
        out.append(f"| {name} | {s['scraped']:,} | {s['eligible']:,} | "
                   f"{s['match']:,} | {newest} | {s['secs']:.1f} |")
        t_scraped += s["scraped"]; t_elig += s["eligible"]; t_match += s["match"]
        if s["newest"] and (newest_all is None or s["newest"] > newest_all):
            newest_all = s["newest"]
    out.append(f"| **Total ({len(stats)})** | **{t_scraped:,}** | **{t_elig:,}** | "
               f"**{t_match:,}** | **{newest_all.isoformat() if newest_all else '—'}** | "
               f"**{total_secs:.1f}** |")
    out.append("")
    return out
 def write_report(path, results, errors, new_only, include_weak, stats=None, total_secs=0.0):
    today = datetime.now().strftime("%Y-%m-%d")
    n_new = sum(1 for r in results if r["is_new"])
    n_match = sum(1 for r in results if r["score"] >= 2)
    lines = [
        f"# Job scout report {today}{' (new only)' if new_only else ''}\n",
        f"Automated coverage: **{len(COMPANIES)}** companies. Manual checks: **{len(MANUAL_CHECK)}**.",
-        f"Total matches from automated companies: **{len(results)}** ({n_new} new since last run)\n",
+        f"Eligible (CH/remote): **{len(results)}** · interest matches (score ≥ 2): "
        f"**{n_match}** · **{n_new}** new since last run\n",
    ]
    if stats:
        lines += write_stats_table(stats, total_secs)
    if errors:
        lines.append("## Errors\n")
        for company, err in errors:
@@ -814,29 +1097,43 @@ def main():
    seen = load_seen()
    today = datetime.now(timezone.utc).strftime("%Y-%m-%d")
-    all_results, errors = [], []
+    all_results, errors, stats = [], [], []
    run_start = time.perf_counter()
    for cid, display, adapter, args in COMPANIES:
        if only and cid != only:
            continue
        print(f"Fetching {display}...", file=sys.stderr)
        t0 = time.perf_counter()
        try:
            jobs = ADAPTERS[adapter](args)
        except (urllib.error.URLError, urllib.error.HTTPError, ValueError) as e:
            errors.append((display, repr(e)))
            stats.append({"company": display, "scraped": 0, "eligible": 0,
                          "match": 0, "newest": None, "secs": time.perf_counter() - t0,
                          "error": True})
            continue
        except Exception as e:
            errors.append((display, f"unexpected: {e!r}"))
            stats.append({"company": display, "scraped": 0, "eligible": 0,
                          "match": 0, "newest": None, "secs": time.perf_counter() - t0,
                          "error": True})
            continue
        scraped = len(jobs)
        # Optional per-company title prefilter for high-volume boards
        title_filter = args.get("_title_filter")
        if title_filter:
            jobs = [j for j in jobs
                    if any(_kw_in(k, (j.get("title") or "").lower()) for k in title_filter)]
        # Newest posting on the board (board freshness), across parseable dates.
        dates = [d for j in jobs if (d := _parse_posted(j.get("posted")))]
        newest = max(dates) if dates else None
        company_seen = seen.setdefault(cid, {})
        title_seen = set()
        eligible = match = 0
        for j in jobs:
            jid = str(j.get("id") or j.get("url"))
            in_ch, is_remote = location_matches(j.get("location", ""))
@@ -848,8 +1145,17 @@ def main():
            if norm_title in title_seen:
                continue
            title_seen.add(norm_title)
            eligible += 1
            is_new = jid not in company_seen
            score, pos, neg = score_job(j, title_only=bool(title_filter))
            # Pre-filtered boards (e.g. SBB, already narrowed to IT+Bern by the adapter) carry
            # German/generic titles the profile scorer can't read; a _score_floor keeps their
            # already-relevant results out of the hidden weak bucket.
            floor = args.get("_score_floor")
            if floor is not None and score < floor:
                score = floor
            if score >= 2:
                match += 1
            all_results.append({
                "company": display, "company_id": cid,
                "title": j["title"], "location": j["location"],
@@ -859,8 +1165,13 @@ def main():
            })
            company_seen[jid] = {"title": j["title"], "first_seen": today}
        stats.append({"company": display, "scraped": scraped, "eligible": eligible,
                      "match": match, "newest": newest,
                      "secs": time.perf_counter() - t0, "error": False})
    save_seen(seen)
    _close_browser()
    total_secs = time.perf_counter() - run_start
    if new_only:
        all_results = [r for r in all_results if r["is_new"]]
@@ -869,43 +1180,66 @@ def main():
    REPORTS_DIR.mkdir(parents=True, exist_ok=True)
    report_path = REPORTS_DIR / f"{today}.md"
-    write_report(report_path, all_results, errors, new_only, include_weak)
+    write_report(report_path, all_results, errors, new_only, include_weak,
                 stats=stats, total_secs=total_secs)
    n_new = sum(1 for r in all_results if r["is_new"])
    print(f"\nReport written: {report_path}", file=sys.stderr)
-    print(f"Total matches: {len(all_results)} ({n_new} new)", file=sys.stderr)
+    print(f"Total matches: {len(all_results)} ({n_new} new) | "
          f"scanned {len(stats)} companies in {total_secs:.1f}s", file=sys.stderr)
    if errors:
        print(f"Errors: {len(errors)} - see report", file=sys.stderr)
-# === Adapter coverage (refreshed 2026-05-24) ==================================
+# === Adapter coverage (refreshed 2026-06-01) ==================================
-# 22 companies automated across 10 adapter types; 0 remain in MANUAL_CHECK.
+# 25 companies automated across 13 adapter types; MANUAL_CHECK is empty.
 #
 # Automated (COMPANIES above):
 #   workday        nvidia, novartis
 #   ashby          kraken, openai, confluent
-#   greenhouse     anthropic, gitlab, clickhouse, grafana
+#   greenhouse     anthropic, gitlab, grafana
 #   pcsx           microsoft                    (Eightfold position-search endpoint)
-#   wp_ajax        sygnum                        (WordPress admin-ajax JSON)
+#   smartrecruiters metgroup, ldc
 #   smartrecruiters metgroup, vitol, ldc
 #   rss            bis                           (vacancies.rss — RSS 1.0/RDF)
 #   getro          coinbase_ventures             (web3 portfolio network, collection 1625)
 #   onlyfy         bitcoin_suisse                (onlyfy.jobs ajax_list HTML fragment)
-#   playwright     google, apple, meta, roche, cisco  (headless browser, 3-15s each)
+#   lever          palantir, quantco             (api.lever.co; QuantCo slug is "quantco-")
 #   json           swissgrid                     (Magnolia /.rest/cloud/component-data)
 #   sbb            sbb                           (company.sbb.ch AEM jobfilter.results.json)
 #   bkw            bkw                           (jobs.bkw.com PMS structureddata API)
 #   playwright     google, apple, meta, roche, cisco, ruag  (headless browser, 3-15s each)
 #
-# Since the 2026-05-21 probe, six originally-manual sites moved to automated:
+# 2026-06-01 list review (verified live):
-# Google/Apple/Meta/Roche/Cisco via the playwright adapter, Microsoft via pcsx, and
+#   - Palantir (lever): 221 postings, US/London-heavy so Swiss/Schwyz roles are rare but
-# Sygnum via its WordPress AJAX endpoint. BIS was added via the new rss adapter, the
+#     self-surface (FDSE/Deployment-Strategist titles map to his FDE drafts).
-# Coinbase Ventures web3 portfolio network via the new getro adapter, and Bitcoin Suisse
+#   - Swissgrid (json): Magnolia CMS endpoint; placeOfWork is bare city, so loc_suffix tags
-# via the new onlyfy adapter (its bitcoinsuisse.com page is a JS SPA, but the underlying
+#     it Switzerland for the CH filter. ~13 roles incl. Data Scientist / Applied-ML.
-# onlyfy.jobs ATS serves a plain HTML list with locations). IBM Research and Sonova were
+#   - RUAG (playwright + page_param): Drupal portal, 20 jobs/page, paginated ?page=N. Page 0
-# dropped from the target list (no API / low fit; Sonova is MedTech, off-thesis).
+#     is apprenticeship-heavy; eng roles (DevOps/Data/Software) are on later pages, so we
 #     page through (max_pages). ENG_TITLE_FILTER cuts the Lehrstelle bulk. ⚠️ DE-citizen
 #     limits on RUAG classified roles — verify per-role.
 #   - SBB (sbb): correct host is company.sbb.ch (not company-jobs.sbb.ch). Flat JSON list;
 #     fetch_sbb replicates the user's IT + Bern-region filter. German/generic titles, so a
 #     _score_floor keeps the pre-filtered results visible. ⚠️ DE-citizen limits possible.
 #   - BKW (bkw): real host is jobs.bkw.com (PMS structureddata API), ~600 group-wide roles;
 #     fetch_bkw keeps Berufsfeld categories Informatik/Trading/Finanzen (IT/data + energy
 #     trading: Quant Risk, Solution Architect Energiehandel, ...). _score_floor as above.
 #   - QuantCo (lever, slug "quantco-"): ~16 roles, most tagged "Europe" (hybrid; Zürich is
 #     QuantCo's continental hub), surfaced via the EU-wide rule in location_matches. Strong:
 #     AI Engineer; medium: Cloud Engineer, AI Applied Scientist, Data Scientist, Quant
 #     Researcher, Software Engineer. Interns/frontend suppressed by NEGATIVE_KEYWORDS.
 #   The Bern/Thun tier intentionally relaxes the comp bar (see user_comp_bar memory).
 #
-# Note: the Coinbase Ventures board (getro) covers PORTFOLIO companies, not Coinbase
+# MANUAL_CHECK is empty — every target company is automated. Dropped 2026-06-01: BFH
-# itself. Coinbase-the-employer was dropped (mass layoffs / hiring freeze as of 2026-05;
+# (academic FH pay below the relaxed Bern/Thun floor, research-leaning, 403s anyway) and
-# re-add coinbase.com/careers if they reopen). AMINA Bank was dropped (poor Glassdoor).
+# Dialectic (~50-person crypto VC, 0 open roles; crypto already covered by Kraken / Bitcoin
 # Suisse / Coinbase Ventures).
 #
-# MANUAL_CHECK is now empty — every current target company is automated.
+# Earlier history: Google/Apple/Meta/Roche/Cisco automated via playwright; Microsoft via
 # pcsx; BIS via rss; Coinbase Ventures via getro; Bitcoin Suisse via onlyfy. Dropped:
 # ClickHouse, Vitol, Sygnum (Glassdoor/comp red flags), IBM Research + Sonova (low fit),
 # Coinbase-the-employer (hiring freeze), AMINA (poor Glassdoor), Canonical (pay+culture).
 # The Coinbase Ventures board (getro) covers PORTFOLIO companies, not Coinbase itself.
 # ==============================================================================