Initial release — claude-resume-kit v1.0

Complete AI-assisted resume/CV generation framework: - 6 Claude Code skills (setup-extract, setup-build-kb, make-resume, make-cl, edit-resume, critique) - LaTeX templates (resume, CV, cover letter) with .cls class files - 6 reference docs (shared_ops, resume_reference, cl_reference, critical_rules, session_file_template, critique_framework) - Fictional Dr. Jordan Chen examples (extraction, experience, bundle, config, session, JD) - Knowledge base scaffolding and config template - README with setup guide and workflow documentation
2026-03-09 01:55:15 -06:00
commit c51b49882f
38 changed files with 4837 additions and 0 deletions
@@ -0,0 +1,118 @@
+# Bundle: Academia
+
+> Role-type positioning guide for university faculty and research professor positions.
+
+---
+
+## S1: Role Profile
+
+**Target employers:** R1 research universities, liberal arts colleges with research programs, international universities
+**Typical titles:** Assistant Professor, Associate Professor, Research Assistant Professor, Lecturer, Postdoctoral Fellow
+**What they value (ranked):**
+1. Independent research capability with publication record
+2. Teaching experience or potential
+3. Method development (not just method application)
+4. Cross-disciplinary breadth (computational + experimental collaboration)
+5. Mentorship and advising evidence
+6. Grant-writing experience or potential for external funding (NIH, NSF)
+7. Open-source contributions and community engagement
+
+**Positioning strategy:** Lead with ML pipeline development and independent protein engineering results. Emphasize broadly applicable computational skills (protein language models, MD simulations, free energy methods). Show evidence of independence (first-author papers, open-source tools) alongside collaboration (experimental validation, mentorship).
+
+**Differentiation angle:** Not just an MD user or an ML practitioner --- a bridge between biomolecular simulation and data-driven protein design, with production-quality software skills.
+
+---
+
+## S2: Summary Guide
+
+**Tagline pattern:** [Method developer] + [application domain] + [scale/impact metric]
+
+**Building blocks (pick 3-4 for summary):**
+- ML-guided protein stability prediction (ESM-2, transfer learning)
+- High-throughput virtual screening (8,500+ enzyme variants)
+- Transfer learning for low-data protein property prediction
+- Enhanced sampling MD (metadynamics, replica exchange, FEP)
+- Enzyme solvent tolerance prediction
+- Open-source tool development (200+ GitHub stars)
+- Automated screening pipeline (Snakemake, SLURM)
+- Consistent domain: enzyme engineering, protein stability, folding thermodynamics
+
+**Summary do's:**
+- Open with "Computational biologist" or "Protein engineer"
+- Include one quantified throughput/scale metric
+- Name 2-3 specific methods/tools
+- Close with a research vision statement
+
+**Summary don'ts:**
+- Do not open with "Passionate" or "Motivated"
+- Do not list more than 3 software tools in the summary
+- Do not use buzzwords without concrete backing ("cutting-edge", "novel", "innovative")
+
+---
+
+## S3: Achievement Reframing Map
+
+**Priority matrix for academic roles:**
+
+| Priority | Achievement | Why | Reframing Notes |
+|----------|------------|-----|-----------------|
+| 1 (must) | L1: Enzyme Stability Screening | Core ML pipeline development + high-impact application | Lead bullet. Emphasize 3,000x throughput and independent development. |
+| 2 (must) | L4: Transfer Learning Framework | Open-source impact, community adoption | Highlight GitHub stars and external adoption as evidence of research maturity. |
+| 3 (must) | L3: Automated Screening Pipeline | Infrastructure contribution, team enablement | Frame as "enabling 6 researchers" -- departments value force multipliers. |
+| 4 (strong) | L2: Enzyme Solvent Tolerance | Deeper enzyme engineering expertise | Natural extension of stability work into industrial conditions. Note under-review status. |
+| 5 (strong) | L5: Unfolding Pathway Analysis | Mechanistic insight from simulations | Use if JD mentions dynamics, thermodynamics, or structural biology. |
+| 6 (if room) | L6: Mentorship | Teaching and advising fit | Include for faculty positions; optional for postdoc applications. |
+
+**Omit from academic resumes:** Undergraduate coursework projects, non-research achievements.
+
+---
+
+## S4: Skills Guide
+
+**Bold tools (tools the JD will likely name or ATS will scan):**
+- **GROMACS**, **Python**, **PyTorch**, **SLURM**
+- **Machine learning** (or **protein language models** if JD uses that phrase)
+
+**Include but do not bold:**
+- AlphaFold2, Rosetta, OpenMM, RDKit, BioPython, MDAnalysis
+- Snakemake, Git, Bash, PostgreSQL, Linux
+
+**Group strategy (for skills section):**
+- Group 1 -- Simulation & Modeling: GROMACS, OpenMM, AMBER, AutoDock Vina
+- Group 2 -- Machine Learning: Protein language models (ESM-2), graph neural networks, transfer learning, PyTorch
+- Group 3 -- Programming & HPC: Python, Bash, SLURM, Snakemake, Git
+- Group 4 -- Analysis & Visualization: BioPython, MDAnalysis, ProDy, PyMOL, matplotlib
+- Group 5 -- Domain Knowledge: protein engineering, drug discovery, free energy methods, enhanced sampling
+
+**Skills to omit for academia:** Excel, PowerPoint, basic office tools (assumed; wastes space).
+
+---
+
+## S5: Cover Letter Guide
+
+**Opening hook options (pick one):**
+- Method-development hook: "My research develops ML-guided protein engineering pipelines that compress months of experimental screening into hours, enabling rapid discovery of thermostable enzymes and high-affinity binders."
+- Scale hook: "In the past two years, I have screened over 8,500 enzyme variants using protein language models I fine-tuned, identifying 5 experimentally confirmed thermostable candidates."
+- Vision hook: "The intersection of machine learning and biomolecular simulation --- where I have built my research program --- aligns closely with [Department]'s strengths in [specific area]."
+
+**Paragraph 1 -- Research fit (3-4 sentences):**
+Connect your ML protein engineering work to the department's research strengths. Name the faculty or group if known. Reference one concrete result (e.g., 3,000x throughput, 5 confirmed hits).
+
+**Paragraph 2 -- Technical depth (3-4 sentences):**
+Go deeper on method development. Mention protein language model fine-tuning, transfer learning, or solvent tolerance extension. Reference the open-source tool and its adoption.
+
+**Paragraph 3 -- Teaching and collaboration (2-3 sentences):**
+Mention mentorship of 3 students, courses you could teach, and collaborative research plans. State what you want to do next at their institution.
+
+**Closing (1-2 sentences):**
+Express enthusiasm for the specific position. Reference the JD title and department name.
+
+**Anti-patterns:**
+- Do not restate the resume bullet-for-bullet
+- Do not begin with "I am writing to apply for..."
+- Do not use more than one exclamation mark in the entire letter
+- Do not name-drop software without saying what you did with it
+
+---
+
+*Source: experience_postdoc_lakewood.md, experience_phd_westfield.md, skills_taxonomy.md*
@@ -0,0 +1,101 @@
+# Configuration
+
+> Edit this file with your personal details. Every skill reads this file.
+
+---
+
+## Personal Info
+
+- **Name:** Jordan Chen
+- **Degree suffix:** Ph.D.
+- **Email:** jordan.chen@email.com
+- **Phone:** +1 5551234567
+- **Location:** Richland, WA 99354
+- **LinkedIn:** linkedin.com/in/jordanchen
+- **Google Scholar:** scholar.google.com/citations?user=XXXXXXXXX
+- **ORCID:** orcid.org/0000-0002-XXXX-XXXX
+- **Website:**
+
+---
+
+## Document Preferences
+
+- **Resume pages:** 2
+- **CV pages:** 5
+- **Resume bullet variant:** 2L (all variable bullets are 2-line)
+- **CV bullet variant:** 2L/3L mix
+- **Skills config (resume):** 4-3-2-2-2 (13 lines, 5 groups)
+- **Skills config (CV):** 4-4-3-3-3 (17 lines, 5 groups)
+- **Immigration line:** Yes | "Authorized to work in the United States"
+
+---
+
+## Provenance Flags
+
+Track the publication status of your work. Skills check this table before every output.
+
+| Item | Status | Correct Framing |
+|------|--------|----------------|
+| Enzyme solvent tolerance paper (Chen, Yamamoto, Holmberg) | under review at Proteins | "under review" -- never say "published" |
+| Screening pipeline tool | unpublished internal tool | "computational infrastructure I developed" -- never imply peer-reviewed |
+| Stability database preprint | preprint on bioRxiv, not yet submitted | "preprint" -- do not say "published" or "under review" |
+
+---
+
+## KB Corrections Log
+
+Verified errors to never re-introduce. Add entries as you catch mistakes.
+
+| Correction | Details |
+|-----------|---------|
+| Transfer learning framework credit | Co-developed with M. Rivera. Always use "Co-developed", never "Developed" alone. |
+| ESM-2 stability prediction accuracy | 0.82 Spearman (not 0.85). Confirmed in published Table 2. |
+
+---
+
+## Role Types
+
+Define the role types you're targeting. Each gets a bundle during setup.
+
+| Role Name | Target Employers | Tier | Bundle File |
+|-----------|-----------------|------|-------------|
+| Academic | R1 research universities, teaching-focused colleges | 1 | bundle_academic.md |
+| Industry R&D | Biotech/pharma companies | 2 | bundle_industry_rd.md |
+
+**Tier guide:** 1 = strongest evidence, full portfolio | 2 = strong with targeted emphasis | 3 = viable with careful framing
+
+---
+
+## Role-Type Decision Tree
+
+Customize this to map JD keywords to your role types.
+
+| If JD mentions... | Primary profile | Secondary (hybrid) |
+|-------------------|----------------|-------------------|
+| tenure-track, faculty, assistant professor, teaching | Academic | -- |
+| university, department, graduate students, NSF, NIH | Academic | Industry R&D |
+| ML, machine learning, data science, R&D | Industry R&D | Academic |
+| protein engineering, drug discovery, biologics | Academic | Industry R&D |
+| pharma, biotech, clinical pipeline, GMP | Industry R&D | -- |
+
+---
+
+## FIXED Sections
+
+List template sections that should NEVER be modified during generation.
+These are copied verbatim from your template every time.
+
+- Education
+- Publications (CV)
+- Honors & Awards
+- Header block (name, contact, links)
+- Undergraduate Research Experience (2 bullets, never changes)
+
+---
+
+## Output Rules
+
+- **Email in all outputs:** jordan.chen@email.com
+- **Resume package:** 2 pages + 1-page cover letter
+- **CV package:** 5 pages + 1-2 page cover letter
+- **Output .tex files ONLY** -- user compiles locally
@@ -0,0 +1,75 @@
+# Session: Whitfield University -- Assistant Professor, Computational Protein Engineering
+
+## Metadata
+- **JD file:** `JDs/whitfield_asst_prof_2026.txt`
+- **Output folder:** `output/Whitfield_ProteinEng/`
+- **Document type:** CV (5-page)
+- **Role type:** Academic
+- **Secondary:** --
+- **Created:** 2026-03-09
+- **Status:** Phase 2 complete
+
+---
+
+## Phase 0: JD Analysis
+
+**Position:** Assistant Professor, Department of Biomedical Engineering
+**Institution:** Whitfield University (R1 research university)
+**Key requirements:**
+- ML models for protein stability or design
+- Molecular dynamics simulations (GROMACS, OpenMM)
+- Protein structure prediction or molecular docking
+- Python, HPC, collaborative research
+- Publication record in computational biology
+- Teaching ability or potential
+- Independent research program
+
+**ATS keywords identified:**
+machine learning, protein engineering, protein language model, molecular dynamics, GROMACS, drug discovery, free energy, HPC, Python, virtual screening, enhanced sampling, tenure-track
+
+**Bundle selected:** `bundle_academic.md`
+**Experience files loaded:** `experience_postdoc_lakewood.md`, `experience_phd_westfield.md`
+
+---
+
+## Phase 1: Bullet Plan
+
+### Postdoc -- Lakewood University (Aug 2023 -- Present) [4 variable bullets]
+
+| Slot | Achievement | Variant | Rationale |
+|------|------------|---------|-----------|
+| 1 | L1: Enzyme Stability Screening | 2L | Lead bullet -- direct JD match (ML + protein engineering) |
+| 2 | L4: Transfer Learning Framework | 2L | Open-source tool, community adoption, JD mentions "collaborative" |
+| 3 | L2: Enzyme Solvent Tolerance | 2L | Deepens enzyme engineering focus; industrial applications |
+| 4 | L3: Automated Screening Pipeline | 2L | JD requires HPC; infrastructure contribution |
+
+### PhD -- Westfield (Aug 2018 -- Jul 2023) [3 variable bullets]
+
+| Slot | Achievement | Variant | Rationale |
+|------|------------|---------|-----------|
+| 1 | P1: Enhanced Sampling for Folding | 2L | Method development -- PhD flagship result |
+| 2 | P3: Ligand Binding Free Energy | 2L | Shows drug discovery breadth |
+| 3 | P4: Stability Database Pipeline | 2L | Data infrastructure; directly enabled postdoc ML work |
+
+### Undergrad Research -- Eastgate (2016 -- 2018) [FIXED, 2 bullets]
+
+**Summary headline:** Computational biologist specializing in ML-guided protein engineering and biomolecular simulation, with 15 publications and open-source tools adopted by 4 external groups.
+
+**Skills section:** 5 groups, 13 lines (4-3-2-2-2 config)
+
+---
+
+## Phase 2: Generation
+
+- **Output file:** `output/Whitfield_ProteinEng/e2e_whitfield_proteineng_cv.tex`
+- **Char counts verified:** All 2L bullets within 170--210 rendered chars
+- **Page count:** 5 pages (confirmed via budget card)
+
+---
+
+## Decisions Log
+
+1. Chose L1 over L5 as lead bullet -- L5 is a secondary result from the same paper, L1 is the primary contribution.
+2. Omitted L6 (mentorship) -- will highlight in teaching statement instead; space better used for L2.
+3. Used "Co-developed" for L4 per provenance flag (shared with M. Rivera).
+4. Solvent tolerance bullet notes "under review" status per config.md provenance table.
@@ -0,0 +1,127 @@
+# Position: Postdoctoral Research Associate at Lakewood University
+
+## Dates: Aug 2023 -- Present
+
+## Cross-Position Themes (for cover letters)
+- Research trajectory: classical protein simulation (PhD) to ML-accelerated protein engineering (postdoc)
+- Recurring architecture pattern: experimental data -> ML surrogate -> large-scale computational screening
+- Consistent focus: protein stability and folding thermodynamics throughout career
+
+---
+
+## Achievements
+
+### L1: ML-Guided Enzyme Stability Screening
+**Source:** Chen et al., ACS Catalysis 2025
+**Methods:** ESM-2 protein language model, GROMACS, replica exchange MD, Python/BioPython
+**Quantitative:** 0.82 Spearman on stability prediction, 3,000x throughput vs experiment, 8,500 variants screened, 5 confirmed hits
+**Bullet (2L):** Fine-tuned ESM-2 protein language model on 45K experimental melting temperatures, achieving 0.82 Spearman correlation and enabling 3,000$\times$ throughput screening of 8,500 enzyme variants for industrial thermostability.
+**Bullet (3L):** Fine-tuned ESM-2 protein language model on 45K experimental melting temperatures with transfer learning, achieving 0.82 Spearman correlation and 3,000$\times$ throughput over experimental screening --- identified 7 thermostable lipase variants with 15$+$ $^\circ$C stability gain, 5 experimentally confirmed via differential scanning calorimetry.
+**Tags:** academic, industry_rd
+**Significance:** Demonstrates independent ML pipeline development and protein engineering impact. 3,000x speedup is a concrete metric. Published first-author in high-impact journal.
+
+### L2: Enzyme Solvent Tolerance Prediction
+**Source:** Chen, Yamamoto, Holmberg, Proteins: Structure, Function, and Bioinformatics 2025 (under review)
+**Methods:** ESM-2 fine-tuning, GROMACS, explicit solvent MD, MM/PBSA free energy
+**Quantitative:** 0.78 Spearman on solvent tolerance, 50-ns MD of 80 enzyme-solvent systems, 4 solvent-tolerant variants identified
+**Bullet (2L):** Extended protein language model to predict enzyme solvent tolerance across 8 organic co-solvent systems, validating against 50-ns explicit-solvent MD for 80 enzyme variants and identifying 4 candidates for green chemistry applications.
+**Bullet (3L):** Extended protein language model to predict enzyme solvent tolerance across 8 organic co-solvent systems (0.78 Spearman on held-out set) validated against 50-ns explicit-solvent molecular dynamics free energy calculations for 80 enzyme variants --- identified 4 solvent-tolerant lipase candidates now under experimental characterization for green chemistry applications.
+**Tags:** academic, industry_rd
+**Significance:** Deepens enzyme engineering expertise into industrial conditions. Natural extension of thermostability work. Under-review status must be stated clearly.
+
+### L3: Automated Screening Pipeline
+**Source:** Internal infrastructure project (unpublished)
+**Methods:** Python, Snakemake, SLURM, GROMACS automation, PostgreSQL
+**Quantitative:** Automated sequence-to-simulation pipeline for 6 researchers, reduced per-variant setup from 4 hours to 10 minutes
+**Bullet (2L):** Automated sequence-to-simulation computational pipeline using Snakemake workflow manager, reducing per-variant setup from 4 hours to 10 minutes and supporting 6 researchers across 3 active projects.
+**Bullet (3L):** Designed and deployed automated sequence-to-simulation pipeline integrating AlphaFold2, GROMACS, and Snakemake with SLURM job scheduling --- reduced per-variant computational setup from 4 hours to 10 minutes and currently supports 6 researchers across 3 active protein engineering projects.
+**Tags:** academic, industry_rd
+**Significance:** Demonstrates software engineering and team-enabling skills beyond pure research. "6 researchers" shows collaborative impact. Unpublished -- never imply this is peer-reviewed.
+
+### L4: Transfer Learning Framework for Protein Properties
+**Source:** Chen, Rivera, Holmberg, Bioinformatics 2024
+**Methods:** ESM-2 embeddings, regression heads, active learning, Python/PyTorch
+**Quantitative:** 60% less labeled data needed, benchmarked on 5 protein families, open-source release (200+ GitHub stars)
+**Bullet (2L):** Co-developed transfer learning framework from protein language models reducing labeled training data by 60\% across 5 enzyme families, released as open-source tool with 200+ GitHub stars.
+**Bullet (3L):** Co-developed transfer learning framework leveraging ESM-2 protein language model embeddings with task-specific regression heads, reducing labeled training data requirements by 60\% across 5 enzyme families --- released as open-source Python package adopted by 4 external research groups (200+ GitHub stars).
+**Tags:** academic, industry_rd
+**Significance:** Open-source impact is strong evidence of community value. "Co-developed" verb is mandatory (shared with M. Rivera). GitHub stars provide external validation metric.
+
+### L5: Enzyme Unfolding Pathway Analysis
+**Source:** Chen et al., ACS Catalysis 2025 (same paper as L1, secondary result)
+**Methods:** Replica exchange MD, hydrogen bond analysis, principal component analysis, MDAnalysis
+**Quantitative:** 200-ns trajectories at 300--400 K for 14 variants, discovered unfolding pathway divergence at 340 K
+**Bullet (2L):** Revealed sequence-dependent enzyme unfolding pathway divergence at 340 K through 200-ns replica exchange MD simulations, identifying stabilizing salt bridge networks that informed rational design criteria.
+**Bullet (3L):** Revealed sequence-dependent unfolding pathway divergence in 14 lipase B variants through 200-ns replica exchange MD at 300--400 K, discovering critical conformational transition at 340 K and mapping stabilizing salt bridge networks that established rational design criteria for next-generation thermostable enzymes.
+**Tags:** academic
+**Significance:** Shows ability to extract mechanistic insight from large-scale simulations, not just run them. Salt bridge analysis is an actionable design metric.
+
+### L6: Mentorship and Collaboration
+**Source:** Group activities (ongoing)
+**Methods:** N/A
+**Quantitative:** Mentored 3 graduate students, 1 co-authored publication, organized weekly group seminar
+**Bullet (2L):** Mentored 3 graduate students on protein ML pipelines and MD simulation workflows, with 1 student co-authoring a peer-reviewed publication within 8 months of joining.
+**Bullet (3L):** Mentored 3 graduate students on protein language models, MD simulation best practices, and HPC workflows --- 1 student co-authored peer-reviewed publication within 8 months; organized weekly computational biology seminar attended by 12 group members across 2 research groups.
+**Tags:** academic
+**Significance:** Mentorship evidence is critical for faculty positions. Concrete outcome (co-authored pub) is stronger than vague "guided students."
+
+---
+---
+
+# Position: Ph.D. Researcher at Westfield Institute of Technology
+
+## Dates: Aug 2018 -- Jul 2023
+
+## Cross-Position Themes (for cover letters)
+- Foundation in classical biomolecular simulation before pivoting to ML-accelerated methods
+- Built core MD and free energy skills that underpin postdoc's ML protein engineering work
+- Dissertation: "Enhanced Sampling Methods for Protein Folding and Ligand Binding Thermodynamics"
+
+---
+
+## Achievements
+
+### P1: Enhanced Sampling for Protein Folding
+**Source:** Chen, Alvarez, J. Chem. Theory Comput. 2022
+**Methods:** Metadynamics, GROMACS, collective variable design, Python
+**Quantitative:** Characterized folding free energy landscapes for 6 small proteins, predicted folding temperatures within 8 K of experiment
+**Bullet (2L):** Developed metadynamics-based enhanced sampling protocol for protein folding free energy landscapes, predicting folding temperatures within 8 K of experiment across 6 small proteins.
+**Bullet (3L):** Developed metadynamics-based enhanced sampling protocol for protein folding using GROMACS, designing collective variables to capture folding reaction coordinates across 6 small proteins --- predicted folding temperatures within 8 K of experimental circular dichroism measurements, establishing computational screening protocol for protein stability.
+**Tags:** academic, industry_rd
+**Significance:** Dissertation flagship result. Shows deep MD expertise predating the ML pivot. "Within 8 K" is a concrete validation metric.
+
+### P2: Force Field Benchmarking for Intrinsically Disordered Proteins
+**Source:** Chen, Alvarez, Kowalski, J. Chem. Theory Comput. 2021
+**Methods:** GROMACS (CHARMM36m, AMBER ff19SB, OPLS-AA/M), convergence testing, statistical analysis
+**Quantitative:** Benchmarked 4 force fields on 15 disordered protein sequences, established CHARMM36m as optimal for IDP ensembles
+**Bullet (2L):** Benchmarked 4 protein force fields on 15 intrinsically disordered protein sequences, establishing CHARMM36m as the optimal choice for IDP conformational ensemble prediction with 40\% better agreement with SAXS data.
+**Bullet (3L):** Benchmarked 4 protein force fields (CHARMM36m, AMBER ff19SB, OPLS-AA/M, a99SB-disp) on 15 intrinsically disordered protein sequences and NMR chemical shift data, establishing CHARMM36m as optimal for IDP ensembles --- 40\% better agreement with experimental SAXS profiles while maintaining comparable computational cost.
+**Tags:** academic, industry_rd
+**Significance:** Systematic benchmarking shows methodological rigor. Force field selection expertise is broadly applicable. Good for academic positions.
+
+### P3: Ligand Binding Free Energy Calculations
+**Source:** Chen, Alvarez, J. Med. Chem. 2023
+**Methods:** Free energy perturbation (FEP), GROMACS, PMX for alchemical transformations, enhanced sampling
+**Quantitative:** Calculated relative binding free energies for 40 congeneric ligand pairs, RMSE of 0.9 kcal/mol vs experiment
+**Bullet (2L):** Calculated relative binding free energies for 40 congeneric ligand pairs via free energy perturbation, achieving 0.9 kcal/mol RMSE against experimental IC50 data across 3 drug target families.
+**Bullet (3L):** Calculated relative binding free energies for 40 congeneric ligand pairs across 3 drug target families using free energy perturbation with enhanced sampling in GROMACS --- achieved 0.9 kcal/mol RMSE against experimental IC50 data, enabling prospective ranking of 12 novel candidates for medicinal chemistry follow-up.
+**Tags:** academic, industry_rd
+**Significance:** Shows drug discovery application of simulation skills. FEP is a high-demand technique. Complements the protein-focused work of the postdoc.
+
+### P4: Protein Stability Database and Analysis Pipeline
+**Source:** Chen, Kowalski, Alvarez, Bioinformatics 2021
+**Methods:** Python, PostgreSQL, BioPython, statistical analysis, automated data curation
+**Quantitative:** Curated 12,000 experimental melting temperatures from 3 databases, built analysis pipeline, used by 8 lab members
+**Bullet (2L):** Built curated protein thermostability database integrating 12,000 experimental melting temperatures from 3 public sources, with automated quality filters adopted by 8 lab members for ML training set construction.
+**Bullet (3L):** Built curated protein thermostability database integrating 12,000 experimental melting temperatures from ProTherm, FireProtDB, and Meltome Atlas with automated quality filters and outlier detection --- adopted by 8 lab members for ML training set construction and directly enabled postdoctoral ESM-2 fine-tuning work.
+**Tags:** academic
+**Significance:** Infrastructure work that enabled later ML research. Shows data engineering skills. Directly connects PhD to postdoc research arc.
+
+### P5: Teaching and Outreach
+**Source:** Department records (2019--2023)
+**Methods:** N/A
+**Quantitative:** TA for 4 semesters, 120+ students total, developed 3 computational lab modules
+**Bullet (2L):** Served as teaching assistant for computational biology courses across 4 semesters, developing 3 hands-on simulation lab modules adopted department-wide for 120+ students.
+**Bullet (3L):** Served as teaching assistant for computational biology courses across 4 semesters (120+ students total), developing 3 hands-on GROMACS/Python simulation lab modules subsequently adopted department-wide and contributing to course receiving highest student evaluation score in department.
+**Tags:** academic
+**Significance:** Teaching evidence for academic applications. "Adopted department-wide" shows lasting impact beyond the TA role. Omit for industry resumes.
@@ -0,0 +1,88 @@
+# Deep Learning-Guided Screening of Thermostable Enzyme Variants for Industrial Biocatalysis
+
+## Metadata
+
+- **Authors:** J. Chen, R. Nakamura, S. Patel, K. Holmberg, M. Rivera
+- **Year:** 2025
+- **Journal:** ACS Catalysis
+- **DOI:** 10.1021/acscatal.2025.XXXXX
+- **Author position:** First author
+- **Status:** Published (online Jan 2025)
+- **Citations:** 12 (as of Mar 2026)
+
+## Methods & Tools
+
+- **Protein structure:** AlphaFold2 for initial structure prediction, Rosetta for refinement
+- **ML framework:** Fine-tuned protein language model (ESM-2, 650M parameters)
+  - Architecture: transformer encoder with task-specific regression head
+  - Training data: ~45,000 experimentally measured melting temperatures from ProTherm/FireProtDB
+  - Training/validation/test split: 70/15/15
+- **MD engine:** GROMACS 2023 with CHARMM36m force field
+- **Enhanced sampling:** Replica exchange MD (T-REMD) for conformational landscape mapping
+- **Docking:** AutoDock Vina for substrate binding pose prediction
+- **Analysis:** Python (BioPython, MDAnalysis, ProDy), PyMOL for visualization
+- **Plotting:** matplotlib, seaborn for fitness landscapes and stability distributions
+- **Hardware:** 320 GPU-hours on university HPC (NVIDIA A100)
+- **Workflow:** Snakemake pipeline for automated screen-simulate-validate cycles
+- **Version control:** Git, DVC for dataset versioning
+
+## Key Results (with numbers)
+
+- Fine-tuned ESM-2 model achieving Spearman correlation of 0.82 on melting temperature prediction across 12 enzyme families
+- Validation on held-out test set: MAE = 2.3 degrees C, R-squared = 0.79
+- Screened 8,500 single- and double-mutant variants in silico in 48 hours (vs. estimated 14 months experimentally)
+- Identified 7 thermostable variants of lipase B with predicted melting temperature 15+ degrees C above wild type
+- Experimental collaborators confirmed stability improvement for 5 of 7 candidates (differential scanning calorimetry)
+- 200-ns replica exchange MD simulations revealed stabilizing salt bridge networks absent in wild type
+- Discovered sequence-dependent unfolding pathway divergence above 340 K across the variant library
+- Achieved 3,000x throughput improvement over experimental screening for equivalent hit rate
+- Transfer learning from ESM-2 reduced required training data by 60% compared to training from scratch
+- Total compute: 320 GPU-hours (training) + 1,200 CPU-hours (MD validation) vs. estimated 18 months wet-lab
+
+## Collaboration & Scope
+
+- **PI / Senior author:** K. Holmberg (Lakewood University, computational biology group lead)
+- **J. Chen's role:** Designed ML pipeline, fine-tuned protein language model, ran all MD simulations, wrote manuscript draft
+- **R. Nakamura:** Curated training data from ProTherm/FireProtDB databases
+- **S. Patel:** Experimental validation of top-7 candidates (DSC and activity assays)
+- **M. Rivera:** Snakemake workflow design (co-developed with J. Chen)
+- **Scope:** Single-lab project with experimental validation collaboration
+
+## Provenance
+
+- **Publication status:** Published, peer-reviewed
+- **Peer review notes:** 3 reviewers, 1 revision cycle, accepted after minor revisions
+- **Claiming rules:**
+  - FULL ownership: ML pipeline design, model fine-tuning, MD simulations, manuscript writing
+  - SHARED ownership: Snakemake workflow (co-developed with M. Rivera)
+  - NO ownership: Training data curation (R. Nakamura), experimental validation (S. Patel)
+- **Safe verbs for bullets:** Developed, Designed, Built, Fine-tuned (for ML work); Co-developed (for workflow)
+- **Unsafe claims:** Cannot claim experimental validation; cannot claim sole credit for workflow automation
+- **Data availability:** Trained model weights deposited on Hugging Face (open access)
+- **Code availability:** Screening pipeline on GitHub (public repo, MIT license)
+
+## Resume Bullet Seeds
+
+1. **[STAR: Protein language model for stability prediction]**
+   Situation: Enzyme thermostability screening bottlenecked by experimental throughput.
+   Task: Build ML model for rapid stability prediction across enzyme families.
+   Action: Fine-tuned ESM-2 protein language model on 45K experimental melting temperatures.
+   Result: 0.82 Spearman correlation, screened 8,500 variants in 48 hrs, 5/7 top hits confirmed.
+
+2. **[STAR: Thermostable enzyme discovery]**
+   Situation: Industrial biocatalysis requires enzymes stable above 70 degrees C.
+   Task: Identify lipase B variants with substantially improved thermostability.
+   Action: Combined ML-accelerated screening with 200-ns replica exchange MD validation.
+   Result: Identified 7 variants with 15+ degrees C stability gain, 5 experimentally confirmed.
+
+3. **[STAR: Transfer learning pipeline]**
+   Situation: Limited labeled data for enzyme stability prediction.
+   Task: Reduce training data requirements while maintaining accuracy.
+   Action: Co-developed transfer learning pipeline from ESM-2 pretrained representations.
+   Result: 60% reduction in required training data while maintaining sub-3 degrees C MAE.
+
+4. **[STAR: Conformational dynamics]**
+   Situation: Static structure predictions cannot capture unfolding pathways.
+   Task: Reveal stabilizing interactions in engineered enzyme variants.
+   Action: Ran 200-ns T-REMD simulations of wild-type and 7 top variants at 300--400 K.
+   Result: Discovered stabilizing salt bridge networks and sequence-dependent unfolding divergence at 340 K.