feat(job_scout): add getro adapter for Coinbase Ventures web3 network
Add a getro adapter (POST JSON search API) and wire up the Coinbase Ventures portfolio talent network (collection 1625), CH + eng-title filtered. Note this covers portfolio companies (Ashby, Notion, VALR, World, ...), not Coinbase itself, which doesn't list on its Ventures board — Coinbase-the-employer stays in MANUAL_CHECK. Also clean up stale comments: drop Sonova (MedTech, off-thesis, dead scrape) from MANUAL_CHECK, remove the dangling BIS comment now that BIS is automated via rss, and refresh the adapter-coverage notes and module docstring to the current 21-automated / 3-manual state. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
+88
-26
@@ -1,8 +1,9 @@
|
|||||||
"""Job scout for Dennis's quarterly target companies.
|
"""Job scout for Dennis's quarterly target companies.
|
||||||
|
|
||||||
Pulls latest openings from companies with known public ATS APIs (Workday/Ashby/Greenhouse),
|
Pulls latest openings from companies via public ATS APIs (Workday/Ashby/Greenhouse/
|
||||||
filters by Swiss location or remote eligibility, scores fit against profile keywords, tracks
|
SmartRecruiters/Eightfold/RSS) and, for JS-rendered careers sites, a headless-browser
|
||||||
which job IDs we've already seen, writes a markdown report.
|
(playwright) adapter. Filters by Swiss location or remote eligibility, scores fit against
|
||||||
|
profile keywords, tracks which job IDs we've already seen, writes a markdown report.
|
||||||
|
|
||||||
Usage:
|
Usage:
|
||||||
py scout.py # Pull all configured companies (strong + medium only)
|
py scout.py # Pull all configured companies (strong + medium only)
|
||||||
@@ -13,9 +14,9 @@ Usage:
|
|||||||
State : state/seen_jobs.json
|
State : state/seen_jobs.json
|
||||||
Output: reports/YYYY-MM-DD.md
|
Output: reports/YYYY-MM-DD.md
|
||||||
|
|
||||||
To add a company: append to COMPANIES with one of the existing adapter types.
|
To add a company: append to COMPANIES with one of the existing adapter types. A few sites
|
||||||
For companies behind custom careers sites (Google, MS, Meta, Apple, Roche, Novartis, IBM,
|
resist scraping even headless and stay in MANUAL_CHECK (surfaced as a report checklist).
|
||||||
Cisco, Sonova, Sygnum) — see TODO_ADAPTERS at the bottom.
|
See the adapter-coverage notes at the bottom for the current automated/manual split.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import json
|
import json
|
||||||
@@ -131,6 +132,15 @@ COMPANIES = [
|
|||||||
"url": "https://www.bis.org/doclist/vacancies.rss",
|
"url": "https://www.bis.org/doclist/vacancies.rss",
|
||||||
"default_location": "Basel, Switzerland",
|
"default_location": "Basel, Switzerland",
|
||||||
}),
|
}),
|
||||||
|
# Coinbase Ventures web3 talent network (Getro collection 1625). Aggregates roles
|
||||||
|
# across portfolio companies (Notion, Ashby, VALR, World, ...), NOT Coinbase itself —
|
||||||
|
# see fetch_getro. CH-filtered + eng title-filtered to stay relevant.
|
||||||
|
("coinbase_ventures", "Coinbase Ventures (web3)", "getro", {
|
||||||
|
"collection": 1625,
|
||||||
|
"locations": ["Switzerland"],
|
||||||
|
"job_functions": ["Software Engineering", "IT", "Data Science"],
|
||||||
|
"_title_filter": ENG_TITLE_FILTER,
|
||||||
|
}),
|
||||||
# Headless-browser scrapers — slower (3-15s per company) but covers JS-rendered sites.
|
# Headless-browser scrapers — slower (3-15s per company) but covers JS-rendered sites.
|
||||||
# Google actively bot-detects; the STEALTH_JS init script (applied to every context)
|
# Google actively bot-detects; the STEALTH_JS init script (applied to every context)
|
||||||
# is what makes its job list render. Cards are <li> with a "Learn more about <title>"
|
# is what makes its job list render. Cards are <li> with a "Learn more about <title>"
|
||||||
@@ -207,16 +217,12 @@ COMPANIES = [
|
|||||||
# Companies where adapter probing did not yield a reliable scrape. Reasons noted.
|
# Companies where adapter probing did not yield a reliable scrape. Reasons noted.
|
||||||
# These surface as a clickable checklist in the report so they're not forgotten.
|
# These surface as a clickable checklist in the report so they're not forgotten.
|
||||||
MANUAL_CHECK = [
|
MANUAL_CHECK = [
|
||||||
("Sonova", "PhenomPeople serves empty shell to automation (body never renders); widgets API rejects requests",
|
|
||||||
"https://careers.sonova.com/us/en/search-results?keywords=Switzerland"),
|
|
||||||
("Coinbase", "/careers/positions 302-redirects to landing; no job links or ATS API exposed even with stealth",
|
("Coinbase", "/careers/positions 302-redirects to landing; no job links or ATS API exposed even with stealth",
|
||||||
"https://www.coinbase.com/careers"),
|
"https://www.coinbase.com/careers"),
|
||||||
("AMINA Bank", "jobs are at /careers/ (#positions) via JS widget; only ~4 apply links, no scrapable list",
|
("AMINA Bank", "jobs are at /careers/ (#positions) via JS widget; only ~4 apply links, no scrapable list",
|
||||||
"https://aminagroup.com/careers/#positions"),
|
"https://aminagroup.com/careers/#positions"),
|
||||||
("Bitcoin Suisse", "jobs under /careers#open-positions load via JS widget; section empty at scrape time (likely no/few openings)",
|
("Bitcoin Suisse", "jobs under /careers#open-positions load via JS widget; section empty at scrape time (likely no/few openings)",
|
||||||
"https://bitcoinsuisse.com/careers#open-positions"),
|
"https://bitcoinsuisse.com/careers#open-positions"),
|
||||||
# International org — qualifies (Basel, commutable from Bern, salary net of Swiss tax),
|
|
||||||
# but uses a JS-heavy Taleo widget that doesn't render requisitions headless. Manual check.
|
|
||||||
]
|
]
|
||||||
|
|
||||||
|
|
||||||
@@ -416,6 +422,51 @@ def fetch_wp_ajax(args):
|
|||||||
return jobs
|
return jobs
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_getro(args):
|
||||||
|
"""Getro network job-board search API (POST JSON). Powers VC portfolio talent
|
||||||
|
networks — here the Coinbase Ventures web3 network (collection 1625). Returns roles
|
||||||
|
across ALL portfolio companies (Notion, Ashby, VALR, World, ...), NOT Coinbase itself;
|
||||||
|
Coinbase doesn't list its own openings on its Ventures board. Server-side filters:
|
||||||
|
searchable_locations and job_functions. Org name is folded into the title since this
|
||||||
|
is a multi-company board."""
|
||||||
|
collection = args["collection"]
|
||||||
|
url = f"https://api.getro.com/api/v2/collections/{collection}/search/jobs"
|
||||||
|
filters = {}
|
||||||
|
if args.get("locations"):
|
||||||
|
filters["searchable_locations"] = args["locations"]
|
||||||
|
if args.get("job_functions"):
|
||||||
|
filters["job_functions"] = args["job_functions"]
|
||||||
|
jobs, page = [], 0
|
||||||
|
while True:
|
||||||
|
data = http_get_json(url, method="POST", data={
|
||||||
|
"hitsPerPage": 100, "page": page, "query": "", "filters": filters,
|
||||||
|
})
|
||||||
|
res = data.get("results", {}) or {}
|
||||||
|
batch = res.get("jobs", []) or []
|
||||||
|
for j in batch:
|
||||||
|
org = (j.get("organization") or {}).get("name", "")
|
||||||
|
locs = j.get("searchable_locations") or j.get("locations") or []
|
||||||
|
loc_str = " | ".join(locs) if isinstance(locs, list) else str(locs)
|
||||||
|
ts = j.get("created_at")
|
||||||
|
posted = ""
|
||||||
|
if isinstance(ts, (int, float)):
|
||||||
|
posted = datetime.fromtimestamp(ts, tz=timezone.utc).strftime("%Y-%m-%d")
|
||||||
|
title = j.get("title", "")
|
||||||
|
jobs.append({
|
||||||
|
"id": str(j.get("id")),
|
||||||
|
"title": f"{title} @ {org}" if org else title,
|
||||||
|
"location": loc_str,
|
||||||
|
"url": j.get("url", ""),
|
||||||
|
"posted": posted,
|
||||||
|
"description": " ".join(filter(None, [org] + (j.get("skills") or []))),
|
||||||
|
})
|
||||||
|
total = res.get("count", 0)
|
||||||
|
page += 1
|
||||||
|
if not batch or len(jobs) >= total or page >= 10:
|
||||||
|
break
|
||||||
|
return jobs
|
||||||
|
|
||||||
|
|
||||||
# Injected before page scripts run, to mask the most common headless-detection signals.
|
# Injected before page scripts run, to mask the most common headless-detection signals.
|
||||||
# Required for Google; harmless for the other sites.
|
# Required for Google; harmless for the other sites.
|
||||||
STEALTH_JS = """
|
STEALTH_JS = """
|
||||||
@@ -583,6 +634,7 @@ ADAPTERS = {
|
|||||||
"wp_ajax": fetch_wp_ajax,
|
"wp_ajax": fetch_wp_ajax,
|
||||||
"smartrecruiters": fetch_smartrecruiters,
|
"smartrecruiters": fetch_smartrecruiters,
|
||||||
"rss": fetch_rss,
|
"rss": fetch_rss,
|
||||||
|
"getro": fetch_getro,
|
||||||
"playwright": fetch_playwright,
|
"playwright": fetch_playwright,
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -762,23 +814,33 @@ def main():
|
|||||||
print(f"Errors: {len(errors)} - see report", file=sys.stderr)
|
print(f"Errors: {len(errors)} - see report", file=sys.stderr)
|
||||||
|
|
||||||
|
|
||||||
# === Adapter probe results (2026-05-21) =======================================
|
# === Adapter coverage (refreshed 2026-05-24) ==================================
|
||||||
# Tested all 15 target companies. The 5 working adapters are in COMPANIES above.
|
# 21 companies automated across 9 adapter types; 3 remain in MANUAL_CHECK.
|
||||||
# The remaining 10 are in MANUAL_CHECK. To upgrade one of those from manual to
|
|
||||||
# automated, you'd need Playwright/Selenium (real browser) — different project.
|
|
||||||
#
|
#
|
||||||
# Google careers.google.com 404 on documented API; auth-gated
|
# Automated (COMPANIES above):
|
||||||
# Microsoft gcsservices.careers.ms.com TLS handshake hangs from non-MS clients
|
# workday nvidia, novartis
|
||||||
# Apple jobs.apple.com/api/v1 endpoint exists, location filter codes opaque
|
# ashby kraken, openai, confluent
|
||||||
# Meta metacareers.com GraphQL with auth token
|
# greenhouse anthropic, gitlab, clickhouse, grafana
|
||||||
# Roche careers.roche.com PhenomPeople/Eightfold, JS-rendered
|
# pcsx microsoft (Eightfold position-search endpoint)
|
||||||
# IBM Research research.ibm.com static page, no API
|
# wp_ajax sygnum (WordPress admin-ajax JSON)
|
||||||
# Cisco jobs.cisco.com JS-rendered SPA
|
# smartrecruiters metgroup, vitol, ldc
|
||||||
# Sonova careers.sonova.com PhenomPeople SaaS, no public JSON
|
# rss bis (vacancies.rss — RSS 1.0/RDF)
|
||||||
# Sygnum sygnum.com/careers Cloudflare-protected
|
# getro coinbase_ventures (web3 portfolio network, collection 1625)
|
||||||
# AMINA aminagroup.com/career static, low volume
|
# playwright google, apple, meta, roche, cisco (headless browser, 3-15s each)
|
||||||
# Bitcoin Suisse bitcoinsuisse.com/careers static, low volume
|
#
|
||||||
# Coinbase coinbase.com/careers Cloudflare-protected
|
# Since the 2026-05-21 probe, six originally-manual sites moved to automated:
|
||||||
|
# Google/Apple/Meta/Roche/Cisco via the playwright adapter, Microsoft via pcsx, and
|
||||||
|
# Sygnum via its WordPress AJAX endpoint. BIS was added via the new rss adapter, and the
|
||||||
|
# Coinbase Ventures web3 portfolio network via the new getro adapter. IBM Research and
|
||||||
|
# Sonova were dropped from the target list (no API / low fit; Sonova is MedTech, off-thesis).
|
||||||
|
#
|
||||||
|
# Note: the Coinbase Ventures board (getro) covers PORTFOLIO companies, not Coinbase
|
||||||
|
# itself — Coinbase-the-employer's own careers site stays in MANUAL_CHECK below.
|
||||||
|
#
|
||||||
|
# Still manual (MANUAL_CHECK above) — to automate, each needs a real-browser probe:
|
||||||
|
# Coinbase coinbase.com/careers Cloudflare-protected, 302 to landing
|
||||||
|
# AMINA aminagroup.com/careers JS widget, ~4 apply links, low volume
|
||||||
|
# Bitcoin Suisse bitcoinsuisse.com/careers JS widget, empty at scrape time, low volume
|
||||||
# ==============================================================================
|
# ==============================================================================
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user