Initial release — claude-resume-kit v1.0

Complete AI-assisted resume/CV generation framework:
- 6 Claude Code skills (setup-extract, setup-build-kb, make-resume, make-cl, edit-resume, critique)
- LaTeX templates (resume, CV, cover letter) with .cls class files
- 6 reference docs (shared_ops, resume_reference, cl_reference, critical_rules, session_file_template, critique_framework)
- Fictional Dr. Jordan Chen examples (extraction, experience, bundle, config, session, JD)
- Knowledge base scaffolding and config template
- README with setup guide and workflow documentation
This commit is contained in:
Akhil Reddy Peeketi
2026-03-09 01:55:15 -06:00
commit c51b49882f
38 changed files with 4837 additions and 0 deletions
@@ -0,0 +1,118 @@
# Bundle: Academia
> Role-type positioning guide for university faculty and research professor positions.
---
## S1: Role Profile
**Target employers:** R1 research universities, liberal arts colleges with research programs, international universities
**Typical titles:** Assistant Professor, Associate Professor, Research Assistant Professor, Lecturer, Postdoctoral Fellow
**What they value (ranked):**
1. Independent research capability with publication record
2. Teaching experience or potential
3. Method development (not just method application)
4. Cross-disciplinary breadth (computational + experimental collaboration)
5. Mentorship and advising evidence
6. Grant-writing experience or potential for external funding (NIH, NSF)
7. Open-source contributions and community engagement
**Positioning strategy:** Lead with ML pipeline development and independent protein engineering results. Emphasize broadly applicable computational skills (protein language models, MD simulations, free energy methods). Show evidence of independence (first-author papers, open-source tools) alongside collaboration (experimental validation, mentorship).
**Differentiation angle:** Not just an MD user or an ML practitioner --- a bridge between biomolecular simulation and data-driven protein design, with production-quality software skills.
---
## S2: Summary Guide
**Tagline pattern:** [Method developer] + [application domain] + [scale/impact metric]
**Building blocks (pick 3-4 for summary):**
- ML-guided protein stability prediction (ESM-2, transfer learning)
- High-throughput virtual screening (8,500+ enzyme variants)
- Transfer learning for low-data protein property prediction
- Enhanced sampling MD (metadynamics, replica exchange, FEP)
- Enzyme solvent tolerance prediction
- Open-source tool development (200+ GitHub stars)
- Automated screening pipeline (Snakemake, SLURM)
- Consistent domain: enzyme engineering, protein stability, folding thermodynamics
**Summary do's:**
- Open with "Computational biologist" or "Protein engineer"
- Include one quantified throughput/scale metric
- Name 2-3 specific methods/tools
- Close with a research vision statement
**Summary don'ts:**
- Do not open with "Passionate" or "Motivated"
- Do not list more than 3 software tools in the summary
- Do not use buzzwords without concrete backing ("cutting-edge", "novel", "innovative")
---
## S3: Achievement Reframing Map
**Priority matrix for academic roles:**
| Priority | Achievement | Why | Reframing Notes |
|----------|------------|-----|-----------------|
| 1 (must) | L1: Enzyme Stability Screening | Core ML pipeline development + high-impact application | Lead bullet. Emphasize 3,000x throughput and independent development. |
| 2 (must) | L4: Transfer Learning Framework | Open-source impact, community adoption | Highlight GitHub stars and external adoption as evidence of research maturity. |
| 3 (must) | L3: Automated Screening Pipeline | Infrastructure contribution, team enablement | Frame as "enabling 6 researchers" -- departments value force multipliers. |
| 4 (strong) | L2: Enzyme Solvent Tolerance | Deeper enzyme engineering expertise | Natural extension of stability work into industrial conditions. Note under-review status. |
| 5 (strong) | L5: Unfolding Pathway Analysis | Mechanistic insight from simulations | Use if JD mentions dynamics, thermodynamics, or structural biology. |
| 6 (if room) | L6: Mentorship | Teaching and advising fit | Include for faculty positions; optional for postdoc applications. |
**Omit from academic resumes:** Undergraduate coursework projects, non-research achievements.
---
## S4: Skills Guide
**Bold tools (tools the JD will likely name or ATS will scan):**
- **GROMACS**, **Python**, **PyTorch**, **SLURM**
- **Machine learning** (or **protein language models** if JD uses that phrase)
**Include but do not bold:**
- AlphaFold2, Rosetta, OpenMM, RDKit, BioPython, MDAnalysis
- Snakemake, Git, Bash, PostgreSQL, Linux
**Group strategy (for skills section):**
- Group 1 -- Simulation & Modeling: GROMACS, OpenMM, AMBER, AutoDock Vina
- Group 2 -- Machine Learning: Protein language models (ESM-2), graph neural networks, transfer learning, PyTorch
- Group 3 -- Programming & HPC: Python, Bash, SLURM, Snakemake, Git
- Group 4 -- Analysis & Visualization: BioPython, MDAnalysis, ProDy, PyMOL, matplotlib
- Group 5 -- Domain Knowledge: protein engineering, drug discovery, free energy methods, enhanced sampling
**Skills to omit for academia:** Excel, PowerPoint, basic office tools (assumed; wastes space).
---
## S5: Cover Letter Guide
**Opening hook options (pick one):**
- Method-development hook: "My research develops ML-guided protein engineering pipelines that compress months of experimental screening into hours, enabling rapid discovery of thermostable enzymes and high-affinity binders."
- Scale hook: "In the past two years, I have screened over 8,500 enzyme variants using protein language models I fine-tuned, identifying 5 experimentally confirmed thermostable candidates."
- Vision hook: "The intersection of machine learning and biomolecular simulation --- where I have built my research program --- aligns closely with [Department]'s strengths in [specific area]."
**Paragraph 1 -- Research fit (3-4 sentences):**
Connect your ML protein engineering work to the department's research strengths. Name the faculty or group if known. Reference one concrete result (e.g., 3,000x throughput, 5 confirmed hits).
**Paragraph 2 -- Technical depth (3-4 sentences):**
Go deeper on method development. Mention protein language model fine-tuning, transfer learning, or solvent tolerance extension. Reference the open-source tool and its adoption.
**Paragraph 3 -- Teaching and collaboration (2-3 sentences):**
Mention mentorship of 3 students, courses you could teach, and collaborative research plans. State what you want to do next at their institution.
**Closing (1-2 sentences):**
Express enthusiasm for the specific position. Reference the JD title and department name.
**Anti-patterns:**
- Do not restate the resume bullet-for-bullet
- Do not begin with "I am writing to apply for..."
- Do not use more than one exclamation mark in the entire letter
- Do not name-drop software without saying what you did with it
---
*Source: experience_postdoc_lakewood.md, experience_phd_westfield.md, skills_taxonomy.md*
+101
View File
@@ -0,0 +1,101 @@
# Configuration
> Edit this file with your personal details. Every skill reads this file.
---
## Personal Info
- **Name:** Jordan Chen
- **Degree suffix:** Ph.D.
- **Email:** jordan.chen@email.com
- **Phone:** +1 5551234567
- **Location:** Richland, WA 99354
- **LinkedIn:** linkedin.com/in/jordanchen
- **Google Scholar:** scholar.google.com/citations?user=XXXXXXXXX
- **ORCID:** orcid.org/0000-0002-XXXX-XXXX
- **Website:**
---
## Document Preferences
- **Resume pages:** 2
- **CV pages:** 5
- **Resume bullet variant:** 2L (all variable bullets are 2-line)
- **CV bullet variant:** 2L/3L mix
- **Skills config (resume):** 4-3-2-2-2 (13 lines, 5 groups)
- **Skills config (CV):** 4-4-3-3-3 (17 lines, 5 groups)
- **Immigration line:** Yes | "Authorized to work in the United States"
---
## Provenance Flags
Track the publication status of your work. Skills check this table before every output.
| Item | Status | Correct Framing |
|------|--------|----------------|
| Enzyme solvent tolerance paper (Chen, Yamamoto, Holmberg) | under review at Proteins | "under review" -- never say "published" |
| Screening pipeline tool | unpublished internal tool | "computational infrastructure I developed" -- never imply peer-reviewed |
| Stability database preprint | preprint on bioRxiv, not yet submitted | "preprint" -- do not say "published" or "under review" |
---
## KB Corrections Log
Verified errors to never re-introduce. Add entries as you catch mistakes.
| Correction | Details |
|-----------|---------|
| Transfer learning framework credit | Co-developed with M. Rivera. Always use "Co-developed", never "Developed" alone. |
| ESM-2 stability prediction accuracy | 0.82 Spearman (not 0.85). Confirmed in published Table 2. |
---
## Role Types
Define the role types you're targeting. Each gets a bundle during setup.
| Role Name | Target Employers | Tier | Bundle File |
|-----------|-----------------|------|-------------|
| Academic | R1 research universities, teaching-focused colleges | 1 | bundle_academic.md |
| Industry R&D | Biotech/pharma companies | 2 | bundle_industry_rd.md |
**Tier guide:** 1 = strongest evidence, full portfolio | 2 = strong with targeted emphasis | 3 = viable with careful framing
---
## Role-Type Decision Tree
Customize this to map JD keywords to your role types.
| If JD mentions... | Primary profile | Secondary (hybrid) |
|-------------------|----------------|-------------------|
| tenure-track, faculty, assistant professor, teaching | Academic | -- |
| university, department, graduate students, NSF, NIH | Academic | Industry R&D |
| ML, machine learning, data science, R&D | Industry R&D | Academic |
| protein engineering, drug discovery, biologics | Academic | Industry R&D |
| pharma, biotech, clinical pipeline, GMP | Industry R&D | -- |
---
## FIXED Sections
List template sections that should NEVER be modified during generation.
These are copied verbatim from your template every time.
- Education
- Publications (CV)
- Honors & Awards
- Header block (name, contact, links)
- Undergraduate Research Experience (2 bullets, never changes)
---
## Output Rules
- **Email in all outputs:** jordan.chen@email.com
- **Resume package:** 2 pages + 1-page cover letter
- **CV package:** 5 pages + 1-2 page cover letter
- **Output .tex files ONLY** -- user compiles locally
@@ -0,0 +1,75 @@
# Session: Whitfield University -- Assistant Professor, Computational Protein Engineering
## Metadata
- **JD file:** `JDs/whitfield_asst_prof_2026.txt`
- **Output folder:** `output/Whitfield_ProteinEng/`
- **Document type:** CV (5-page)
- **Role type:** Academic
- **Secondary:** --
- **Created:** 2026-03-09
- **Status:** Phase 2 complete
---
## Phase 0: JD Analysis
**Position:** Assistant Professor, Department of Biomedical Engineering
**Institution:** Whitfield University (R1 research university)
**Key requirements:**
- ML models for protein stability or design
- Molecular dynamics simulations (GROMACS, OpenMM)
- Protein structure prediction or molecular docking
- Python, HPC, collaborative research
- Publication record in computational biology
- Teaching ability or potential
- Independent research program
**ATS keywords identified:**
machine learning, protein engineering, protein language model, molecular dynamics, GROMACS, drug discovery, free energy, HPC, Python, virtual screening, enhanced sampling, tenure-track
**Bundle selected:** `bundle_academic.md`
**Experience files loaded:** `experience_postdoc_lakewood.md`, `experience_phd_westfield.md`
---
## Phase 1: Bullet Plan
### Postdoc -- Lakewood University (Aug 2023 -- Present) [4 variable bullets]
| Slot | Achievement | Variant | Rationale |
|------|------------|---------|-----------|
| 1 | L1: Enzyme Stability Screening | 2L | Lead bullet -- direct JD match (ML + protein engineering) |
| 2 | L4: Transfer Learning Framework | 2L | Open-source tool, community adoption, JD mentions "collaborative" |
| 3 | L2: Enzyme Solvent Tolerance | 2L | Deepens enzyme engineering focus; industrial applications |
| 4 | L3: Automated Screening Pipeline | 2L | JD requires HPC; infrastructure contribution |
### PhD -- Westfield (Aug 2018 -- Jul 2023) [3 variable bullets]
| Slot | Achievement | Variant | Rationale |
|------|------------|---------|-----------|
| 1 | P1: Enhanced Sampling for Folding | 2L | Method development -- PhD flagship result |
| 2 | P3: Ligand Binding Free Energy | 2L | Shows drug discovery breadth |
| 3 | P4: Stability Database Pipeline | 2L | Data infrastructure; directly enabled postdoc ML work |
### Undergrad Research -- Eastgate (2016 -- 2018) [FIXED, 2 bullets]
**Summary headline:** Computational biologist specializing in ML-guided protein engineering and biomolecular simulation, with 15 publications and open-source tools adopted by 4 external groups.
**Skills section:** 5 groups, 13 lines (4-3-2-2-2 config)
---
## Phase 2: Generation
- **Output file:** `output/Whitfield_ProteinEng/e2e_whitfield_proteineng_cv.tex`
- **Char counts verified:** All 2L bullets within 170--210 rendered chars
- **Page count:** 5 pages (confirmed via budget card)
---
## Decisions Log
1. Chose L1 over L5 as lead bullet -- L5 is a secondary result from the same paper, L1 is the primary contribution.
2. Omitted L6 (mentorship) -- will highlight in teaching statement instead; space better used for L2.
3. Used "Co-developed" for L4 per provenance flag (shared with M. Rivera).
4. Solvent tolerance bullet notes "under review" status per config.md provenance table.
@@ -0,0 +1,127 @@
# Position: Postdoctoral Research Associate at Lakewood University
## Dates: Aug 2023 -- Present
## Cross-Position Themes (for cover letters)
- Research trajectory: classical protein simulation (PhD) to ML-accelerated protein engineering (postdoc)
- Recurring architecture pattern: experimental data -> ML surrogate -> large-scale computational screening
- Consistent focus: protein stability and folding thermodynamics throughout career
---
## Achievements
### L1: ML-Guided Enzyme Stability Screening
**Source:** Chen et al., ACS Catalysis 2025
**Methods:** ESM-2 protein language model, GROMACS, replica exchange MD, Python/BioPython
**Quantitative:** 0.82 Spearman on stability prediction, 3,000x throughput vs experiment, 8,500 variants screened, 5 confirmed hits
**Bullet (2L):** Fine-tuned ESM-2 protein language model on 45K experimental melting temperatures, achieving 0.82 Spearman correlation and enabling 3,000$\times$ throughput screening of 8,500 enzyme variants for industrial thermostability.
**Bullet (3L):** Fine-tuned ESM-2 protein language model on 45K experimental melting temperatures with transfer learning, achieving 0.82 Spearman correlation and 3,000$\times$ throughput over experimental screening --- identified 7 thermostable lipase variants with 15$+$ $^\circ$C stability gain, 5 experimentally confirmed via differential scanning calorimetry.
**Tags:** academic, industry_rd
**Significance:** Demonstrates independent ML pipeline development and protein engineering impact. 3,000x speedup is a concrete metric. Published first-author in high-impact journal.
### L2: Enzyme Solvent Tolerance Prediction
**Source:** Chen, Yamamoto, Holmberg, Proteins: Structure, Function, and Bioinformatics 2025 (under review)
**Methods:** ESM-2 fine-tuning, GROMACS, explicit solvent MD, MM/PBSA free energy
**Quantitative:** 0.78 Spearman on solvent tolerance, 50-ns MD of 80 enzyme-solvent systems, 4 solvent-tolerant variants identified
**Bullet (2L):** Extended protein language model to predict enzyme solvent tolerance across 8 organic co-solvent systems, validating against 50-ns explicit-solvent MD for 80 enzyme variants and identifying 4 candidates for green chemistry applications.
**Bullet (3L):** Extended protein language model to predict enzyme solvent tolerance across 8 organic co-solvent systems (0.78 Spearman on held-out set) validated against 50-ns explicit-solvent molecular dynamics free energy calculations for 80 enzyme variants --- identified 4 solvent-tolerant lipase candidates now under experimental characterization for green chemistry applications.
**Tags:** academic, industry_rd
**Significance:** Deepens enzyme engineering expertise into industrial conditions. Natural extension of thermostability work. Under-review status must be stated clearly.
### L3: Automated Screening Pipeline
**Source:** Internal infrastructure project (unpublished)
**Methods:** Python, Snakemake, SLURM, GROMACS automation, PostgreSQL
**Quantitative:** Automated sequence-to-simulation pipeline for 6 researchers, reduced per-variant setup from 4 hours to 10 minutes
**Bullet (2L):** Automated sequence-to-simulation computational pipeline using Snakemake workflow manager, reducing per-variant setup from 4 hours to 10 minutes and supporting 6 researchers across 3 active projects.
**Bullet (3L):** Designed and deployed automated sequence-to-simulation pipeline integrating AlphaFold2, GROMACS, and Snakemake with SLURM job scheduling --- reduced per-variant computational setup from 4 hours to 10 minutes and currently supports 6 researchers across 3 active protein engineering projects.
**Tags:** academic, industry_rd
**Significance:** Demonstrates software engineering and team-enabling skills beyond pure research. "6 researchers" shows collaborative impact. Unpublished -- never imply this is peer-reviewed.
### L4: Transfer Learning Framework for Protein Properties
**Source:** Chen, Rivera, Holmberg, Bioinformatics 2024
**Methods:** ESM-2 embeddings, regression heads, active learning, Python/PyTorch
**Quantitative:** 60% less labeled data needed, benchmarked on 5 protein families, open-source release (200+ GitHub stars)
**Bullet (2L):** Co-developed transfer learning framework from protein language models reducing labeled training data by 60\% across 5 enzyme families, released as open-source tool with 200+ GitHub stars.
**Bullet (3L):** Co-developed transfer learning framework leveraging ESM-2 protein language model embeddings with task-specific regression heads, reducing labeled training data requirements by 60\% across 5 enzyme families --- released as open-source Python package adopted by 4 external research groups (200+ GitHub stars).
**Tags:** academic, industry_rd
**Significance:** Open-source impact is strong evidence of community value. "Co-developed" verb is mandatory (shared with M. Rivera). GitHub stars provide external validation metric.
### L5: Enzyme Unfolding Pathway Analysis
**Source:** Chen et al., ACS Catalysis 2025 (same paper as L1, secondary result)
**Methods:** Replica exchange MD, hydrogen bond analysis, principal component analysis, MDAnalysis
**Quantitative:** 200-ns trajectories at 300--400 K for 14 variants, discovered unfolding pathway divergence at 340 K
**Bullet (2L):** Revealed sequence-dependent enzyme unfolding pathway divergence at 340 K through 200-ns replica exchange MD simulations, identifying stabilizing salt bridge networks that informed rational design criteria.
**Bullet (3L):** Revealed sequence-dependent unfolding pathway divergence in 14 lipase B variants through 200-ns replica exchange MD at 300--400 K, discovering critical conformational transition at 340 K and mapping stabilizing salt bridge networks that established rational design criteria for next-generation thermostable enzymes.
**Tags:** academic
**Significance:** Shows ability to extract mechanistic insight from large-scale simulations, not just run them. Salt bridge analysis is an actionable design metric.
### L6: Mentorship and Collaboration
**Source:** Group activities (ongoing)
**Methods:** N/A
**Quantitative:** Mentored 3 graduate students, 1 co-authored publication, organized weekly group seminar
**Bullet (2L):** Mentored 3 graduate students on protein ML pipelines and MD simulation workflows, with 1 student co-authoring a peer-reviewed publication within 8 months of joining.
**Bullet (3L):** Mentored 3 graduate students on protein language models, MD simulation best practices, and HPC workflows --- 1 student co-authored peer-reviewed publication within 8 months; organized weekly computational biology seminar attended by 12 group members across 2 research groups.
**Tags:** academic
**Significance:** Mentorship evidence is critical for faculty positions. Concrete outcome (co-authored pub) is stronger than vague "guided students."
---
---
# Position: Ph.D. Researcher at Westfield Institute of Technology
## Dates: Aug 2018 -- Jul 2023
## Cross-Position Themes (for cover letters)
- Foundation in classical biomolecular simulation before pivoting to ML-accelerated methods
- Built core MD and free energy skills that underpin postdoc's ML protein engineering work
- Dissertation: "Enhanced Sampling Methods for Protein Folding and Ligand Binding Thermodynamics"
---
## Achievements
### P1: Enhanced Sampling for Protein Folding
**Source:** Chen, Alvarez, J. Chem. Theory Comput. 2022
**Methods:** Metadynamics, GROMACS, collective variable design, Python
**Quantitative:** Characterized folding free energy landscapes for 6 small proteins, predicted folding temperatures within 8 K of experiment
**Bullet (2L):** Developed metadynamics-based enhanced sampling protocol for protein folding free energy landscapes, predicting folding temperatures within 8 K of experiment across 6 small proteins.
**Bullet (3L):** Developed metadynamics-based enhanced sampling protocol for protein folding using GROMACS, designing collective variables to capture folding reaction coordinates across 6 small proteins --- predicted folding temperatures within 8 K of experimental circular dichroism measurements, establishing computational screening protocol for protein stability.
**Tags:** academic, industry_rd
**Significance:** Dissertation flagship result. Shows deep MD expertise predating the ML pivot. "Within 8 K" is a concrete validation metric.
### P2: Force Field Benchmarking for Intrinsically Disordered Proteins
**Source:** Chen, Alvarez, Kowalski, J. Chem. Theory Comput. 2021
**Methods:** GROMACS (CHARMM36m, AMBER ff19SB, OPLS-AA/M), convergence testing, statistical analysis
**Quantitative:** Benchmarked 4 force fields on 15 disordered protein sequences, established CHARMM36m as optimal for IDP ensembles
**Bullet (2L):** Benchmarked 4 protein force fields on 15 intrinsically disordered protein sequences, establishing CHARMM36m as the optimal choice for IDP conformational ensemble prediction with 40\% better agreement with SAXS data.
**Bullet (3L):** Benchmarked 4 protein force fields (CHARMM36m, AMBER ff19SB, OPLS-AA/M, a99SB-disp) on 15 intrinsically disordered protein sequences and NMR chemical shift data, establishing CHARMM36m as optimal for IDP ensembles --- 40\% better agreement with experimental SAXS profiles while maintaining comparable computational cost.
**Tags:** academic, industry_rd
**Significance:** Systematic benchmarking shows methodological rigor. Force field selection expertise is broadly applicable. Good for academic positions.
### P3: Ligand Binding Free Energy Calculations
**Source:** Chen, Alvarez, J. Med. Chem. 2023
**Methods:** Free energy perturbation (FEP), GROMACS, PMX for alchemical transformations, enhanced sampling
**Quantitative:** Calculated relative binding free energies for 40 congeneric ligand pairs, RMSE of 0.9 kcal/mol vs experiment
**Bullet (2L):** Calculated relative binding free energies for 40 congeneric ligand pairs via free energy perturbation, achieving 0.9 kcal/mol RMSE against experimental IC50 data across 3 drug target families.
**Bullet (3L):** Calculated relative binding free energies for 40 congeneric ligand pairs across 3 drug target families using free energy perturbation with enhanced sampling in GROMACS --- achieved 0.9 kcal/mol RMSE against experimental IC50 data, enabling prospective ranking of 12 novel candidates for medicinal chemistry follow-up.
**Tags:** academic, industry_rd
**Significance:** Shows drug discovery application of simulation skills. FEP is a high-demand technique. Complements the protein-focused work of the postdoc.
### P4: Protein Stability Database and Analysis Pipeline
**Source:** Chen, Kowalski, Alvarez, Bioinformatics 2021
**Methods:** Python, PostgreSQL, BioPython, statistical analysis, automated data curation
**Quantitative:** Curated 12,000 experimental melting temperatures from 3 databases, built analysis pipeline, used by 8 lab members
**Bullet (2L):** Built curated protein thermostability database integrating 12,000 experimental melting temperatures from 3 public sources, with automated quality filters adopted by 8 lab members for ML training set construction.
**Bullet (3L):** Built curated protein thermostability database integrating 12,000 experimental melting temperatures from ProTherm, FireProtDB, and Meltome Atlas with automated quality filters and outlier detection --- adopted by 8 lab members for ML training set construction and directly enabled postdoctoral ESM-2 fine-tuning work.
**Tags:** academic
**Significance:** Infrastructure work that enabled later ML research. Shows data engineering skills. Directly connects PhD to postdoc research arc.
### P5: Teaching and Outreach
**Source:** Department records (2019--2023)
**Methods:** N/A
**Quantitative:** TA for 4 semesters, 120+ students total, developed 3 computational lab modules
**Bullet (2L):** Served as teaching assistant for computational biology courses across 4 semesters, developing 3 hands-on simulation lab modules adopted department-wide for 120+ students.
**Bullet (3L):** Served as teaching assistant for computational biology courses across 4 semesters (120+ students total), developing 3 hands-on GROMACS/Python simulation lab modules subsequently adopted department-wide and contributing to course receiving highest student evaluation score in department.
**Tags:** academic
**Significance:** Teaching evidence for academic applications. "Adopted department-wide" shows lasting impact beyond the TA role. Omit for industry resumes.
@@ -0,0 +1,88 @@
# Deep Learning-Guided Screening of Thermostable Enzyme Variants for Industrial Biocatalysis
## Metadata
- **Authors:** J. Chen, R. Nakamura, S. Patel, K. Holmberg, M. Rivera
- **Year:** 2025
- **Journal:** ACS Catalysis
- **DOI:** 10.1021/acscatal.2025.XXXXX
- **Author position:** First author
- **Status:** Published (online Jan 2025)
- **Citations:** 12 (as of Mar 2026)
## Methods & Tools
- **Protein structure:** AlphaFold2 for initial structure prediction, Rosetta for refinement
- **ML framework:** Fine-tuned protein language model (ESM-2, 650M parameters)
- Architecture: transformer encoder with task-specific regression head
- Training data: ~45,000 experimentally measured melting temperatures from ProTherm/FireProtDB
- Training/validation/test split: 70/15/15
- **MD engine:** GROMACS 2023 with CHARMM36m force field
- **Enhanced sampling:** Replica exchange MD (T-REMD) for conformational landscape mapping
- **Docking:** AutoDock Vina for substrate binding pose prediction
- **Analysis:** Python (BioPython, MDAnalysis, ProDy), PyMOL for visualization
- **Plotting:** matplotlib, seaborn for fitness landscapes and stability distributions
- **Hardware:** 320 GPU-hours on university HPC (NVIDIA A100)
- **Workflow:** Snakemake pipeline for automated screen-simulate-validate cycles
- **Version control:** Git, DVC for dataset versioning
## Key Results (with numbers)
- Fine-tuned ESM-2 model achieving Spearman correlation of 0.82 on melting temperature prediction across 12 enzyme families
- Validation on held-out test set: MAE = 2.3 degrees C, R-squared = 0.79
- Screened 8,500 single- and double-mutant variants in silico in 48 hours (vs. estimated 14 months experimentally)
- Identified 7 thermostable variants of lipase B with predicted melting temperature 15+ degrees C above wild type
- Experimental collaborators confirmed stability improvement for 5 of 7 candidates (differential scanning calorimetry)
- 200-ns replica exchange MD simulations revealed stabilizing salt bridge networks absent in wild type
- Discovered sequence-dependent unfolding pathway divergence above 340 K across the variant library
- Achieved 3,000x throughput improvement over experimental screening for equivalent hit rate
- Transfer learning from ESM-2 reduced required training data by 60% compared to training from scratch
- Total compute: 320 GPU-hours (training) + 1,200 CPU-hours (MD validation) vs. estimated 18 months wet-lab
## Collaboration & Scope
- **PI / Senior author:** K. Holmberg (Lakewood University, computational biology group lead)
- **J. Chen's role:** Designed ML pipeline, fine-tuned protein language model, ran all MD simulations, wrote manuscript draft
- **R. Nakamura:** Curated training data from ProTherm/FireProtDB databases
- **S. Patel:** Experimental validation of top-7 candidates (DSC and activity assays)
- **M. Rivera:** Snakemake workflow design (co-developed with J. Chen)
- **Scope:** Single-lab project with experimental validation collaboration
## Provenance
- **Publication status:** Published, peer-reviewed
- **Peer review notes:** 3 reviewers, 1 revision cycle, accepted after minor revisions
- **Claiming rules:**
- FULL ownership: ML pipeline design, model fine-tuning, MD simulations, manuscript writing
- SHARED ownership: Snakemake workflow (co-developed with M. Rivera)
- NO ownership: Training data curation (R. Nakamura), experimental validation (S. Patel)
- **Safe verbs for bullets:** Developed, Designed, Built, Fine-tuned (for ML work); Co-developed (for workflow)
- **Unsafe claims:** Cannot claim experimental validation; cannot claim sole credit for workflow automation
- **Data availability:** Trained model weights deposited on Hugging Face (open access)
- **Code availability:** Screening pipeline on GitHub (public repo, MIT license)
## Resume Bullet Seeds
1. **[STAR: Protein language model for stability prediction]**
Situation: Enzyme thermostability screening bottlenecked by experimental throughput.
Task: Build ML model for rapid stability prediction across enzyme families.
Action: Fine-tuned ESM-2 protein language model on 45K experimental melting temperatures.
Result: 0.82 Spearman correlation, screened 8,500 variants in 48 hrs, 5/7 top hits confirmed.
2. **[STAR: Thermostable enzyme discovery]**
Situation: Industrial biocatalysis requires enzymes stable above 70 degrees C.
Task: Identify lipase B variants with substantially improved thermostability.
Action: Combined ML-accelerated screening with 200-ns replica exchange MD validation.
Result: Identified 7 variants with 15+ degrees C stability gain, 5 experimentally confirmed.
3. **[STAR: Transfer learning pipeline]**
Situation: Limited labeled data for enzyme stability prediction.
Task: Reduce training data requirements while maintaining accuracy.
Action: Co-developed transfer learning pipeline from ESM-2 pretrained representations.
Result: 60% reduction in required training data while maintaining sub-3 degrees C MAE.
4. **[STAR: Conformational dynamics]**
Situation: Static structure predictions cannot capture unfolding pathways.
Task: Reveal stabilizing interactions in engineered enzyme variants.
Action: Ran 200-ns T-REMD simulations of wild-type and 7 top variants at 300--400 K.
Result: Discovered stabilizing salt bridge networks and sequence-dependent unfolding divergence at 340 K.