feat: add AI fingerprint avoidance rules and fix em-dash patterns

- Add ai_fingerprint_rules.md with banned words, structural rules, and 12-item post-gen checklist - Fix Fellowships/Honors template format: --- to period separator - Fix Publications under-review template format - Update all 4 skills to load fingerprint rules during generation - Add AI scan section to critique framework - Update resume_reference and cl_reference with em-dash limits - Reduce em-dashes in example files Co-Authored-By: Akhil Peeketi <peeketiakhilreddy@gmail.com>
2026-03-09 05:08:57 -06:00
parent c51b49882f
commit 9a7c627cc3
12 changed files with 168 additions and 12 deletions
@@ -60,6 +60,7 @@ Find and read the session file for the .tex being critiqued (use derivation prot
   - **Critique Context** → reviewer persona, competitive landscape, domain vocabulary
   - If session file lacks Company Context or Critique Context: do 1-2 web searches to fill gaps
 2. Read `resume_builder/reference/critique_framework.md`
+3. Read `resume_builder/support/ai_fingerprint_rules.md` — use Section 6 checklist in Part 7 verification
 3. Read the .tex file(s) — derive paths from session file Output Files, or from `$ARGUMENTS`
 4. Read the JD (path from `$ARGUMENTS` or session file)
 5. Read the relevant bundle (`resume_builder/bundles/bundle_[role_type].md` — from session file)
@@ -121,6 +121,7 @@ Proceeding without confirmation may make unwanted edits that break package consi

 Load ONLY what the confirmed edits need:

+- **All edits:** `resume_builder/support/ai_fingerprint_rules.md` — scan for banned words/patterns before and after edits
 - **Bullet expand/rewrite/add:** `resume_builder/experience/` files + matching bundle + `resume_builder/support/achievement_reframing_guide.md`
 - **Summary rewrite:** Bundle (S2 summary guide) + `resume_builder/support/skills_taxonomy.md`
 - **Cover letter edits:** `resume_builder/support/significance_*.md` + `resume_builder/reference/cl_reference.md`
@@ -58,7 +58,8 @@ Read in this order:
 1. **Session file** — specifically: Company Context, Cover Letter Plan, Framing Strategy, ATS Keywords
 2. **Finished resume/CV .tex** — path from session file Output Files. Read to understand what CL must complement.
 3. `resume_builder/reference/cl_reference.md` — CL format rules, paragraph templates, anti-patterns
-4. The matching bundle from session file role type → `resume_builder/bundles/bundle_[role_type].md` — Section 5 (Cover Letter)
+4. `resume_builder/support/ai_fingerprint_rules.md` — Banned words, structural rules (CLs are most vulnerable)
+5. The matching bundle from session file role type → `resume_builder/bundles/bundle_[role_type].md` — Section 5 (Cover Letter)
 5. All significance files from `resume_builder/support/significance_*.md`

 Update session file Status: `Cover Letter: IN_PROGRESS`
@@ -174,6 +174,7 @@ If you proceed without confirmation, you will generate bullets the user didn't a
 **Re-read to restore context after compaction:**
 1. `output/<FolderName>/session_<name>.md` (framing + confirmed bullet plan)
 2. `resume_builder/reference/critical_rules.md` — Character Limits, Bold Width Penalty, Orphan rules
+3. `resume_builder/support/ai_fingerprint_rules.md` — Banned words, structural rules, post-gen checklist

 **Read template:** `resume_builder/templates/resume_template.tex` or `cv_template.tex` + `.cls`
 FIXED sections (from `config.md` FIXED Sections) are template-locked — only generate VARIABLE sections (Summary, Skills, Experience bullets/headers).
@@ -19,7 +19,7 @@

 **Positioning strategy:** Lead with ML pipeline development and independent protein engineering results. Emphasize broadly applicable computational skills (protein language models, MD simulations, free energy methods). Show evidence of independence (first-author papers, open-source tools) alongside collaboration (experimental validation, mentorship).

-**Differentiation angle:** Not just an MD user or an ML practitioner --- a bridge between biomolecular simulation and data-driven protein design, with production-quality software skills.
+**Differentiation angle:** Not just an MD user or an ML practitioner , a bridge between biomolecular simulation and data-driven protein design, with production-quality software skills.

 ---

@@ -93,7 +93,7 @@
 **Opening hook options (pick one):**
 - Method-development hook: "My research develops ML-guided protein engineering pipelines that compress months of experimental screening into hours, enabling rapid discovery of thermostable enzymes and high-affinity binders."
 - Scale hook: "In the past two years, I have screened over 8,500 enzyme variants using protein language models I fine-tuned, identifying 5 experimentally confirmed thermostable candidates."
- Vision hook: "The intersection of machine learning and biomolecular simulation --- where I have built my research program --- aligns closely with [Department]'s strengths in [specific area]."
+- Vision hook: "The intersection of machine learning and biomolecular simulation , where I have built my research program , aligns closely with [Department]'s strengths in [specific area]."

 **Paragraph 1 -- Research fit (3-4 sentences):**
 Connect your ML protein engineering work to the department's research strengths. Name the faculty or group if known. Reference one concrete result (e.g., 3,000x throughput, 5 confirmed hits).
@@ -1,3 +1,7 @@
+<!-- NOTE: Example bullets below show em-dashes (---) for parenthetical breaks. -->
+<!-- In actual generation, limit to max 2 em-dashes per full document. -->
+<!-- Prefer commas, semicolons, or parentheses for mid-bullet breaks. -->
+
 # Position: Postdoctoral Research Associate at Lakewood University

 ## Dates: Aug 2023 -- Present
@@ -89,3 +89,4 @@
 - No credential dump in closing paragraph
 - No repeating resume bullets verbatim — CL deepens, doesn't duplicate
 - Limit quantified claims to 3-5 per CL
+- **Em-dashes in CLs:** Max 2 per document. CLs are prose-heavy and em-dashes compound quickly. Use commas for parenthetical asides, colons for elaborations, periods for new sentences. Paired em-dashes (X --- detail --- Y) should use commas or parentheses instead.
@@ -437,6 +437,18 @@ If a cover letter was generated in the same session, run all checks below. Detec

 ---

+## Part 6G: AI Fingerprint Scan
+
+Run the 12-item checklist from `resume_builder/support/ai_fingerprint_rules.md` Section 6. Key scans:
+- Count em-dashes (`---`) in full document — flag if >2
+- Scan all bullet endings for -ing analysis phrases (the #1 structural AI marker)
+- Search for any Tier 1 banned word (delve, tapestry, multifaceted, pivotal, etc.)
+- Check CL for generic opener and uniform sentence length
+
+Any failure is a Tier 1 fix in Part 4.
+
+---
+
 ## Part 7: Post-Generation Verification

 Final mechanical checklist. Run AFTER all other critique parts. These are pass/fail checks, not scored dimensions.
@@ -237,6 +237,8 @@ Run this checklist after compile gate passes, before critique. Also used as Part
 Before presenting final output, verify:

 - [ ] All mechanical checks pass (chars, orphans, page fill, no submitted, sequences, variants)
+- [ ] Em-dash count: max 2 per document (resume or CL). Fellowships items use `. ` not `---`.
+- [ ] No -ing analysis endings on bullets ("...advancing the field", "...contributing to Y"). Restructure to end with a concrete result or metric.
 - [ ] All content checks pass (ATS, terms, inflation, provenance, pubs, cover letter)
 - [ ] All narrative checks pass (scan test, per-position flow, cross-position arc, CV sub-headers)
 - [ ] Company/institution name spelled correctly throughout
@@ -0,0 +1,133 @@
+# AI Fingerprint Avoidance Rules
+
+> **Architecture note:** The primary defense against AI detection is the generation protocol — specific facts from experience files, char limits, JD-specific vocabulary, named entities. This file is a secondary safety net for word/phrase/structural patterns.
+
+---
+
+## 1. Banned Words
+
+**Tier 1 — Dead Giveaways (NEVER use in any output):**
+delve, tapestry, multifaceted, pivotal, realm, synergy, paradigm, holistic, nuanced, foster, embark, leverage (as verb), utilize, harness, spearhead, cornerstone, landscape (metaphorical), journey (metaphorical), cutting-edge, novel, innovative (unless quoting a JD), groundbreaking
+
+**Banned Adjectives (use replacement):**
+
+| Banned | Replacement |
+|--------|-------------|
+| robust | strong, reliable |
+| comprehensive | thorough, broad |
+| innovative | new, original (or omit) |
+| pivotal | key, central |
+| meticulous | careful, precise |
+| diverse | varied, wide-ranging |
+| extensive | broad, deep, 10+ years of |
+
+**Banned Verbs (use replacement):**
+
+| Banned | Replacement |
+|--------|-------------|
+| leverage | use, apply, draw on |
+| utilize | use |
+| harness | apply, use, draw on |
+| spearhead | lead, start, launch |
+| foster | support, build, grow |
+| facilitate | run, lead, coordinate, enable |
+| showcase | show, demonstrate |
+| underscore | show, highlight |
+| bolster | strengthen, support |
+
+**Banned Adverbs:** meticulously, notably, subsequently (use "then" or "later"), remarkably, seamlessly, thereby
+
+**Banned Nouns (metaphorical use):** tapestry, landscape, journey, realm, synergy, paradigm, cornerstone
+
+**Technical exceptions:** "landscape" is fine when literal (e.g., "free energy landscape," "threat landscape"). "Novel" is fine when quoting a JD verbatim. Judge by context.
+
+---
+
+## 2. Banned Phrases
+
+**Opening / transition phrases:**
+- "In today's rapidly evolving..."
+- "At the forefront of..."
+- "It is worth noting that..."
+- "This experience has taught me..."
+- "I am uniquely positioned to..."
+- "In an era of..."
+
+**Resume / CL specific:**
+- "proven track record"
+- "passionate about" (use specific interest instead)
+- "I am excited to apply" (use concrete reason instead)
+- "demonstrated ability to" (just state what you did)
+- "strong foundation in"
+- "well-versed in"
+- "adept at"
+
+**Academic / research:**
+- "groundbreaking research"
+- "cutting-edge methodology"
+- "novel approach" (say what is new about it)
+- "significant contributions to the field"
+- "at the intersection of X and Y" (name the specific intersection)
+
+---
+
+## 3. Structural Rules
+
+### Sentence-Level
+- **No reframe pattern:** Never use "It's not X — it's Y" constructions
+- **No rhetorical Q+A:** Never ask a question then answer it ("What makes this unique? The answer is...")
+- **No gerund fragment stacking:** Avoid sequences of 3+ "-ing" phrases ("developing, testing, and deploying...")
+- **No -ing analysis endings on bullets:** This is the **#1 structural AI marker**. Bullets must NOT end with "-ing" phrases like "...advancing the field," "...contributing to improved Y," "...enabling new Z." Fix: restructure so the bullet ends with a concrete result, metric, or object. Example: "...contributing to a 15% reduction" is fine (ends with metric); "...contributing to improved efficiency" is not (vague -ing ending).
+- **Max 2 em-dashes per document:** Count all `---` in the full .tex file (resume or CL). If more than 2, replace extras with commas, semicolons, or parentheses. Fellowships/Honors items use `. ` not `---`.
+- **Post-gen scan:** After generating any document, scan all bullets for -ing endings. Flag and fix any found.
+
+### Prose-Level
+- **Vary sentence length:** Mix short (8-12 words) with long (20-30 words). Three consecutive same-length sentences flag as AI.
+- **No same-structure paragraph starts:** If P1 opens "My research...", P2 must NOT open "My experience..." P3 must NOT open "My approach..."
+- **No constant triplet structures:** Avoid "X, Y, and Z" in more than 2 sentences per document. Use pairs, single items, or lists of 4+.
+
+---
+
+## 4. Positive Markers (signals of human writing)
+
+1. **Specific details:** "Ran 847 MD simulations on protein variants" not "Conducted extensive simulations"
+2. **Front-loaded specifics:** Lead with the concrete thing, not the framing
+3. **Named entities:** Tool names, method names, journal names, institution names
+4. **Audience-appropriate jargon:** Use the JD's vocabulary, not generic synonyms
+5. **Short connecting words:** "so," "but," "and," "then" — not "consequently," "however," "additionally," "subsequently"
+6. **First-person specificity in CLs:** "I built" not "Was responsible for building"
+7. **Inside knowledge:** Reference specific group names, facility names, programmatic areas
+8. **Sentence length variety:** Deliberate mix of 8-word and 25-word sentences
+9. **Occasional "And"/"But" sentence openers** in CLs (1-2 per page max)
+10. **Contractions in CLs:** "I've" and "didn't" are acceptable in industry CLs (not academic)
+11. **One human detail per CL page:** A specific lab memory, a conference conversation, a problem that kept you up — concrete and brief
+
+---
+
+## 5. CL-Specific Note
+
+Cover letters are the most vulnerable document to AI detection because they are prose-heavy and readers have strong intuitions about "how people write." All rules above apply with extra weight in CLs. Pay special attention to:
+- Opening sentence (must be specific to the company, not generic)
+- Sentence length variety (CLs with uniform 15-20 word sentences read as AI)
+- Em-dash usage (CLs accumulate em-dashes fastest — max 2 for the entire letter)
+
+---
+
+## 6. Post-Generation Critique Scan Checklist
+
+Run this 12-item scan on every generated document before presenting to the user:
+
+1. [ ] Any Tier 1 banned word present? (Search for each)
+2. [ ] Any banned phrase from Section 2?
+3. [ ] More than 2 em-dashes (`---`) in the document?
+4. [ ] Any bullet ending with an -ing analysis phrase?
+5. [ ] Three or more consecutive sentences of similar length?
+6. [ ] Paragraph starts repeat the same structure (e.g., "My research...", "My experience...")?
+7. [ ] More than 2 "X, Y, and Z" triplet structures in the document?
+8. [ ] CL opens with a generic phrase instead of a company-specific reference?
+9. [ ] Any metaphorical use of "landscape," "journey," "realm," or "tapestry"?
+10. [ ] Passive voice in more than 20% of bullet verbs?
+11. [ ] Fellowships/Honors items use `---` instead of `. `?
+12. [ ] Any adverb from the banned list (meticulously, notably, subsequently, etc.)?
+
+**If any item fails:** Fix before presenting. These are not optional polish — they are detectable AI patterns.
@@ -213,13 +213,13 @@
 %  FELLOWSHIPS & HONORS — FIXED
 %========================================================================================
 % Fill with your actual fellowships and honors.
-% Format: \item \textbf{Name}, Granting Body (Year)---context.
+% Format: \item \textbf{Name}, Granting Body (Year). Context.
 % Target: 2 rendered lines per entry.

 \begin{rSection2}{Fellowships \& Honors}
-\item \textbf{[FIXED: Fellowship/Award]}, [FIXED: Body] ([FIXED: Year])---[FIXED: context and significance].
-\item \textbf{[FIXED: Fellowship/Award]}, [FIXED: Body] ([FIXED: Year])---[FIXED: context].
-\item \textbf{[FIXED: Fellowship/Award]}, [FIXED: Body] ([FIXED: Year])---[FIXED: context].
+\item \textbf{[FIXED: Fellowship/Award]}, [FIXED: Body] ([FIXED: Year]). [FIXED: Context and significance].
+\item \textbf{[FIXED: Fellowship/Award]}, [FIXED: Body] ([FIXED: Year]). [FIXED: Context].
+\item \textbf{[FIXED: Fellowship/Award]}, [FIXED: Body] ([FIXED: Year]). [FIXED: Context].
 \end{rSection2}

 %========================================================================================
@@ -249,7 +249,7 @@ $\dagger$ - equal contribution as first author.
 % Optional: Under Review section (only if actually under review -- check config.md provenance)
 % \textbf{Under Review}
 % \begin{enumerate}[leftmargin=1.5em, labelsep=0.5em, itemsep=0.1em]
-% \item[--] [FIXED: Author list. ``Title.'' \textit{Journal}---under review.]
+% \item[--] [FIXED: Author list. ``Title.'' \textit{Journal}. Under review.]
 % \end{enumerate}
 \end{rSection}

@@ -213,13 +213,13 @@
 %----------------------------------------------------------------------------------------
 %	HONORS & AWARDS — FIXED: Fill with your actual awards
 %----------------------------------------------------------------------------------------
-% Format: \item \textbf{Award}, Granting Body (Year)---brief context.
+% Format: \item \textbf{Award}, Granting Body (Year). Brief context.
 % Aim for 1 rendered line each. Adjust count to fit page budget.

 \begin{rSection2}{Honors \& Awards}
-\item \textbf{[FIXED: Award]}, [FIXED: Body] ([FIXED: Year])---[FIXED: context].
-\item \textbf{[FIXED: Award]}, [FIXED: Body] ([FIXED: Year])---[FIXED: context].
-\item \textbf{[FIXED: Award]}, [FIXED: Body] ([FIXED: Year])---[FIXED: context].
+\item \textbf{[FIXED: Award]}, [FIXED: Body] ([FIXED: Year]). [FIXED: context].
+\item \textbf{[FIXED: Award]}, [FIXED: Body] ([FIXED: Year]). [FIXED: context].
+\item \textbf{[FIXED: Award]}, [FIXED: Body] ([FIXED: Year]). [FIXED: context].
 \end{rSection2}
 \vspace{-0.1cm}