docs(skills): enforce JD integrity — real posting verbatim + Playwright recipe

Add the JD Integrity section to shared_ops (no reconstructed/inferred JDs; WebFetch is JS-blind on careers boards; scrape JS-gated postings via the job_scout Playwright venv; STOP and ask if the real text is unobtainable). Wire the rule into /make-resume and /critique. Allow cisco/bkw WebFetch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 20:46:10 +02:00
parent 7ed9e5b615
commit d576254971
4 changed files with 39 additions and 2 deletions
@@ -5,6 +5,36 @@

 ---

+## JD Integrity (MANDATORY — applies to every skill)
+
+The job description is **ground truth**. Every requirement classification, framing decision, ATS keyword, and critique is derived from it. A wrong JD silently corrupts the entire package. Therefore:
+
+1. **The JD must be the real posting, verbatim.** Use the exact text of the live posting. Never invent, "reconstruct," paraphrase, summarize, infer, or fill gaps from training knowledge or a JD "template." There is **no such thing as a reconstructed JD** — if you don't have the real text, you don't have a JD.
+2. **A URL is not a JD.** If the user gives a link, you must fetch the actual posting text from it before doing anything else.
+3. **`WebFetch` is JS-blind on careers boards** (Google, Cisco, Apple, Meta, Workday/Phenom/Greenhouse/Lever/Recruitee SPAs). It returns the static shell or a stale search cache — do NOT trust it for JD text. Use the headless-browser scraper instead (recipe below).
+4. **If you cannot obtain the real JD text, STOP and ask the user to paste it.** Do not proceed to Phase 0/bullets/critique on a guessed JD. Blocking is correct; fabricating is not.
+5. **Record provenance.** The session file `JD source` line must state how the JD was obtained: `pasted by user` / `live scrape <date> via Playwright` / `file provided`. Never label a JD as authoritative unless it is the real text.
+
+### Fetching a JS-gated JD (Playwright recipe)
+
+The job_scout repo ships a Chromium + Playwright venv: `C:\Workspace\claude-resume-kit\job_scout\.venv\Scripts\python.exe` (the bare `py`/`python` on PATH do NOT have Playwright). To pull a single posting's full text:
+
+```bash
+cd "C:/Workspace/claude-resume-kit/job_scout" && .venv/Scripts/python.exe -c "
+import scout, io
+b = scout._get_browser(); ctx = b.new_context(); p = ctx.new_page()
+p.goto('<JOB_URL>', timeout=45000, wait_until='domcontentloaded')
+p.wait_for_timeout(5000)
+io.open('jd_dump.txt','w',encoding='utf-8').write(p.inner_text('body'))
+scout._close_browser()"
+```
+
+Then Read `job_scout/jd_dump.txt`, extract the posting body (Minimum/Preferred qualifications, About, Responsibilities), and save it verbatim to `output/<FolderName>/JD_<name>.txt`. (Windows console can't print some Unicode — always write to a UTF-8 file, then Read it; don't `print()` the body.) See `[[reference_live_posting_check]]` in memory.
+
+If the scrape fails (selector/timeout/captcha), fall back to rule 4: ask the user to paste the JD.
+
+---
+
 ## Three-Session Workflow

 Standard JD pipeline uses 3 sessions for token efficiency + quality: