Initial release — claude-resume-kit v1.0
Complete AI-assisted resume/CV generation framework: - 6 Claude Code skills (setup-extract, setup-build-kb, make-resume, make-cl, edit-resume, critique) - LaTeX templates (resume, CV, cover letter) with .cls class files - 6 reference docs (shared_ops, resume_reference, cl_reference, critical_rules, session_file_template, critique_framework) - Fictional Dr. Jordan Chen examples (extraction, experience, bundle, config, session, JD) - Knowledge base scaffolding and config template - README with setup guide and workflow documentation
This commit is contained in:
@@ -0,0 +1,118 @@
|
||||
# Bundle: Academia
|
||||
|
||||
> Role-type positioning guide for university faculty and research professor positions.
|
||||
|
||||
---
|
||||
|
||||
## S1: Role Profile
|
||||
|
||||
**Target employers:** R1 research universities, liberal arts colleges with research programs, international universities
|
||||
**Typical titles:** Assistant Professor, Associate Professor, Research Assistant Professor, Lecturer, Postdoctoral Fellow
|
||||
**What they value (ranked):**
|
||||
1. Independent research capability with publication record
|
||||
2. Teaching experience or potential
|
||||
3. Method development (not just method application)
|
||||
4. Cross-disciplinary breadth (computational + experimental collaboration)
|
||||
5. Mentorship and advising evidence
|
||||
6. Grant-writing experience or potential for external funding (NIH, NSF)
|
||||
7. Open-source contributions and community engagement
|
||||
|
||||
**Positioning strategy:** Lead with ML pipeline development and independent protein engineering results. Emphasize broadly applicable computational skills (protein language models, MD simulations, free energy methods). Show evidence of independence (first-author papers, open-source tools) alongside collaboration (experimental validation, mentorship).
|
||||
|
||||
**Differentiation angle:** Not just an MD user or an ML practitioner --- a bridge between biomolecular simulation and data-driven protein design, with production-quality software skills.
|
||||
|
||||
---
|
||||
|
||||
## S2: Summary Guide
|
||||
|
||||
**Tagline pattern:** [Method developer] + [application domain] + [scale/impact metric]
|
||||
|
||||
**Building blocks (pick 3-4 for summary):**
|
||||
- ML-guided protein stability prediction (ESM-2, transfer learning)
|
||||
- High-throughput virtual screening (8,500+ enzyme variants)
|
||||
- Transfer learning for low-data protein property prediction
|
||||
- Enhanced sampling MD (metadynamics, replica exchange, FEP)
|
||||
- Enzyme solvent tolerance prediction
|
||||
- Open-source tool development (200+ GitHub stars)
|
||||
- Automated screening pipeline (Snakemake, SLURM)
|
||||
- Consistent domain: enzyme engineering, protein stability, folding thermodynamics
|
||||
|
||||
**Summary do's:**
|
||||
- Open with "Computational biologist" or "Protein engineer"
|
||||
- Include one quantified throughput/scale metric
|
||||
- Name 2-3 specific methods/tools
|
||||
- Close with a research vision statement
|
||||
|
||||
**Summary don'ts:**
|
||||
- Do not open with "Passionate" or "Motivated"
|
||||
- Do not list more than 3 software tools in the summary
|
||||
- Do not use buzzwords without concrete backing ("cutting-edge", "novel", "innovative")
|
||||
|
||||
---
|
||||
|
||||
## S3: Achievement Reframing Map
|
||||
|
||||
**Priority matrix for academic roles:**
|
||||
|
||||
| Priority | Achievement | Why | Reframing Notes |
|
||||
|----------|------------|-----|-----------------|
|
||||
| 1 (must) | L1: Enzyme Stability Screening | Core ML pipeline development + high-impact application | Lead bullet. Emphasize 3,000x throughput and independent development. |
|
||||
| 2 (must) | L4: Transfer Learning Framework | Open-source impact, community adoption | Highlight GitHub stars and external adoption as evidence of research maturity. |
|
||||
| 3 (must) | L3: Automated Screening Pipeline | Infrastructure contribution, team enablement | Frame as "enabling 6 researchers" -- departments value force multipliers. |
|
||||
| 4 (strong) | L2: Enzyme Solvent Tolerance | Deeper enzyme engineering expertise | Natural extension of stability work into industrial conditions. Note under-review status. |
|
||||
| 5 (strong) | L5: Unfolding Pathway Analysis | Mechanistic insight from simulations | Use if JD mentions dynamics, thermodynamics, or structural biology. |
|
||||
| 6 (if room) | L6: Mentorship | Teaching and advising fit | Include for faculty positions; optional for postdoc applications. |
|
||||
|
||||
**Omit from academic resumes:** Undergraduate coursework projects, non-research achievements.
|
||||
|
||||
---
|
||||
|
||||
## S4: Skills Guide
|
||||
|
||||
**Bold tools (tools the JD will likely name or ATS will scan):**
|
||||
- **GROMACS**, **Python**, **PyTorch**, **SLURM**
|
||||
- **Machine learning** (or **protein language models** if JD uses that phrase)
|
||||
|
||||
**Include but do not bold:**
|
||||
- AlphaFold2, Rosetta, OpenMM, RDKit, BioPython, MDAnalysis
|
||||
- Snakemake, Git, Bash, PostgreSQL, Linux
|
||||
|
||||
**Group strategy (for skills section):**
|
||||
- Group 1 -- Simulation & Modeling: GROMACS, OpenMM, AMBER, AutoDock Vina
|
||||
- Group 2 -- Machine Learning: Protein language models (ESM-2), graph neural networks, transfer learning, PyTorch
|
||||
- Group 3 -- Programming & HPC: Python, Bash, SLURM, Snakemake, Git
|
||||
- Group 4 -- Analysis & Visualization: BioPython, MDAnalysis, ProDy, PyMOL, matplotlib
|
||||
- Group 5 -- Domain Knowledge: protein engineering, drug discovery, free energy methods, enhanced sampling
|
||||
|
||||
**Skills to omit for academia:** Excel, PowerPoint, basic office tools (assumed; wastes space).
|
||||
|
||||
---
|
||||
|
||||
## S5: Cover Letter Guide
|
||||
|
||||
**Opening hook options (pick one):**
|
||||
- Method-development hook: "My research develops ML-guided protein engineering pipelines that compress months of experimental screening into hours, enabling rapid discovery of thermostable enzymes and high-affinity binders."
|
||||
- Scale hook: "In the past two years, I have screened over 8,500 enzyme variants using protein language models I fine-tuned, identifying 5 experimentally confirmed thermostable candidates."
|
||||
- Vision hook: "The intersection of machine learning and biomolecular simulation --- where I have built my research program --- aligns closely with [Department]'s strengths in [specific area]."
|
||||
|
||||
**Paragraph 1 -- Research fit (3-4 sentences):**
|
||||
Connect your ML protein engineering work to the department's research strengths. Name the faculty or group if known. Reference one concrete result (e.g., 3,000x throughput, 5 confirmed hits).
|
||||
|
||||
**Paragraph 2 -- Technical depth (3-4 sentences):**
|
||||
Go deeper on method development. Mention protein language model fine-tuning, transfer learning, or solvent tolerance extension. Reference the open-source tool and its adoption.
|
||||
|
||||
**Paragraph 3 -- Teaching and collaboration (2-3 sentences):**
|
||||
Mention mentorship of 3 students, courses you could teach, and collaborative research plans. State what you want to do next at their institution.
|
||||
|
||||
**Closing (1-2 sentences):**
|
||||
Express enthusiasm for the specific position. Reference the JD title and department name.
|
||||
|
||||
**Anti-patterns:**
|
||||
- Do not restate the resume bullet-for-bullet
|
||||
- Do not begin with "I am writing to apply for..."
|
||||
- Do not use more than one exclamation mark in the entire letter
|
||||
- Do not name-drop software without saying what you did with it
|
||||
|
||||
---
|
||||
|
||||
*Source: experience_postdoc_lakewood.md, experience_phd_westfield.md, skills_taxonomy.md*
|
||||
@@ -0,0 +1,101 @@
|
||||
# Configuration
|
||||
|
||||
> Edit this file with your personal details. Every skill reads this file.
|
||||
|
||||
---
|
||||
|
||||
## Personal Info
|
||||
|
||||
- **Name:** Jordan Chen
|
||||
- **Degree suffix:** Ph.D.
|
||||
- **Email:** jordan.chen@email.com
|
||||
- **Phone:** +1 5551234567
|
||||
- **Location:** Richland, WA 99354
|
||||
- **LinkedIn:** linkedin.com/in/jordanchen
|
||||
- **Google Scholar:** scholar.google.com/citations?user=XXXXXXXXX
|
||||
- **ORCID:** orcid.org/0000-0002-XXXX-XXXX
|
||||
- **Website:**
|
||||
|
||||
---
|
||||
|
||||
## Document Preferences
|
||||
|
||||
- **Resume pages:** 2
|
||||
- **CV pages:** 5
|
||||
- **Resume bullet variant:** 2L (all variable bullets are 2-line)
|
||||
- **CV bullet variant:** 2L/3L mix
|
||||
- **Skills config (resume):** 4-3-2-2-2 (13 lines, 5 groups)
|
||||
- **Skills config (CV):** 4-4-3-3-3 (17 lines, 5 groups)
|
||||
- **Immigration line:** Yes | "Authorized to work in the United States"
|
||||
|
||||
---
|
||||
|
||||
## Provenance Flags
|
||||
|
||||
Track the publication status of your work. Skills check this table before every output.
|
||||
|
||||
| Item | Status | Correct Framing |
|
||||
|------|--------|----------------|
|
||||
| Enzyme solvent tolerance paper (Chen, Yamamoto, Holmberg) | under review at Proteins | "under review" -- never say "published" |
|
||||
| Screening pipeline tool | unpublished internal tool | "computational infrastructure I developed" -- never imply peer-reviewed |
|
||||
| Stability database preprint | preprint on bioRxiv, not yet submitted | "preprint" -- do not say "published" or "under review" |
|
||||
|
||||
---
|
||||
|
||||
## KB Corrections Log
|
||||
|
||||
Verified errors to never re-introduce. Add entries as you catch mistakes.
|
||||
|
||||
| Correction | Details |
|
||||
|-----------|---------|
|
||||
| Transfer learning framework credit | Co-developed with M. Rivera. Always use "Co-developed", never "Developed" alone. |
|
||||
| ESM-2 stability prediction accuracy | 0.82 Spearman (not 0.85). Confirmed in published Table 2. |
|
||||
|
||||
---
|
||||
|
||||
## Role Types
|
||||
|
||||
Define the role types you're targeting. Each gets a bundle during setup.
|
||||
|
||||
| Role Name | Target Employers | Tier | Bundle File |
|
||||
|-----------|-----------------|------|-------------|
|
||||
| Academic | R1 research universities, teaching-focused colleges | 1 | bundle_academic.md |
|
||||
| Industry R&D | Biotech/pharma companies | 2 | bundle_industry_rd.md |
|
||||
|
||||
**Tier guide:** 1 = strongest evidence, full portfolio | 2 = strong with targeted emphasis | 3 = viable with careful framing
|
||||
|
||||
---
|
||||
|
||||
## Role-Type Decision Tree
|
||||
|
||||
Customize this to map JD keywords to your role types.
|
||||
|
||||
| If JD mentions... | Primary profile | Secondary (hybrid) |
|
||||
|-------------------|----------------|-------------------|
|
||||
| tenure-track, faculty, assistant professor, teaching | Academic | -- |
|
||||
| university, department, graduate students, NSF, NIH | Academic | Industry R&D |
|
||||
| ML, machine learning, data science, R&D | Industry R&D | Academic |
|
||||
| protein engineering, drug discovery, biologics | Academic | Industry R&D |
|
||||
| pharma, biotech, clinical pipeline, GMP | Industry R&D | -- |
|
||||
|
||||
---
|
||||
|
||||
## FIXED Sections
|
||||
|
||||
List template sections that should NEVER be modified during generation.
|
||||
These are copied verbatim from your template every time.
|
||||
|
||||
- Education
|
||||
- Publications (CV)
|
||||
- Honors & Awards
|
||||
- Header block (name, contact, links)
|
||||
- Undergraduate Research Experience (2 bullets, never changes)
|
||||
|
||||
---
|
||||
|
||||
## Output Rules
|
||||
|
||||
- **Email in all outputs:** jordan.chen@email.com
|
||||
- **Resume package:** 2 pages + 1-page cover letter
|
||||
- **CV package:** 5 pages + 1-2 page cover letter
|
||||
- **Output .tex files ONLY** -- user compiles locally
|
||||
@@ -0,0 +1,75 @@
|
||||
# Session: Whitfield University -- Assistant Professor, Computational Protein Engineering
|
||||
|
||||
## Metadata
|
||||
- **JD file:** `JDs/whitfield_asst_prof_2026.txt`
|
||||
- **Output folder:** `output/Whitfield_ProteinEng/`
|
||||
- **Document type:** CV (5-page)
|
||||
- **Role type:** Academic
|
||||
- **Secondary:** --
|
||||
- **Created:** 2026-03-09
|
||||
- **Status:** Phase 2 complete
|
||||
|
||||
---
|
||||
|
||||
## Phase 0: JD Analysis
|
||||
|
||||
**Position:** Assistant Professor, Department of Biomedical Engineering
|
||||
**Institution:** Whitfield University (R1 research university)
|
||||
**Key requirements:**
|
||||
- ML models for protein stability or design
|
||||
- Molecular dynamics simulations (GROMACS, OpenMM)
|
||||
- Protein structure prediction or molecular docking
|
||||
- Python, HPC, collaborative research
|
||||
- Publication record in computational biology
|
||||
- Teaching ability or potential
|
||||
- Independent research program
|
||||
|
||||
**ATS keywords identified:**
|
||||
machine learning, protein engineering, protein language model, molecular dynamics, GROMACS, drug discovery, free energy, HPC, Python, virtual screening, enhanced sampling, tenure-track
|
||||
|
||||
**Bundle selected:** `bundle_academic.md`
|
||||
**Experience files loaded:** `experience_postdoc_lakewood.md`, `experience_phd_westfield.md`
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Bullet Plan
|
||||
|
||||
### Postdoc -- Lakewood University (Aug 2023 -- Present) [4 variable bullets]
|
||||
|
||||
| Slot | Achievement | Variant | Rationale |
|
||||
|------|------------|---------|-----------|
|
||||
| 1 | L1: Enzyme Stability Screening | 2L | Lead bullet -- direct JD match (ML + protein engineering) |
|
||||
| 2 | L4: Transfer Learning Framework | 2L | Open-source tool, community adoption, JD mentions "collaborative" |
|
||||
| 3 | L2: Enzyme Solvent Tolerance | 2L | Deepens enzyme engineering focus; industrial applications |
|
||||
| 4 | L3: Automated Screening Pipeline | 2L | JD requires HPC; infrastructure contribution |
|
||||
|
||||
### PhD -- Westfield (Aug 2018 -- Jul 2023) [3 variable bullets]
|
||||
|
||||
| Slot | Achievement | Variant | Rationale |
|
||||
|------|------------|---------|-----------|
|
||||
| 1 | P1: Enhanced Sampling for Folding | 2L | Method development -- PhD flagship result |
|
||||
| 2 | P3: Ligand Binding Free Energy | 2L | Shows drug discovery breadth |
|
||||
| 3 | P4: Stability Database Pipeline | 2L | Data infrastructure; directly enabled postdoc ML work |
|
||||
|
||||
### Undergrad Research -- Eastgate (2016 -- 2018) [FIXED, 2 bullets]
|
||||
|
||||
**Summary headline:** Computational biologist specializing in ML-guided protein engineering and biomolecular simulation, with 15 publications and open-source tools adopted by 4 external groups.
|
||||
|
||||
**Skills section:** 5 groups, 13 lines (4-3-2-2-2 config)
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Generation
|
||||
|
||||
- **Output file:** `output/Whitfield_ProteinEng/e2e_whitfield_proteineng_cv.tex`
|
||||
- **Char counts verified:** All 2L bullets within 170--210 rendered chars
|
||||
- **Page count:** 5 pages (confirmed via budget card)
|
||||
|
||||
---
|
||||
|
||||
## Decisions Log
|
||||
|
||||
1. Chose L1 over L5 as lead bullet -- L5 is a secondary result from the same paper, L1 is the primary contribution.
|
||||
2. Omitted L6 (mentorship) -- will highlight in teaching statement instead; space better used for L2.
|
||||
3. Used "Co-developed" for L4 per provenance flag (shared with M. Rivera).
|
||||
4. Solvent tolerance bullet notes "under review" status per config.md provenance table.
|
||||
@@ -0,0 +1,127 @@
|
||||
# Position: Postdoctoral Research Associate at Lakewood University
|
||||
|
||||
## Dates: Aug 2023 -- Present
|
||||
|
||||
## Cross-Position Themes (for cover letters)
|
||||
- Research trajectory: classical protein simulation (PhD) to ML-accelerated protein engineering (postdoc)
|
||||
- Recurring architecture pattern: experimental data -> ML surrogate -> large-scale computational screening
|
||||
- Consistent focus: protein stability and folding thermodynamics throughout career
|
||||
|
||||
---
|
||||
|
||||
## Achievements
|
||||
|
||||
### L1: ML-Guided Enzyme Stability Screening
|
||||
**Source:** Chen et al., ACS Catalysis 2025
|
||||
**Methods:** ESM-2 protein language model, GROMACS, replica exchange MD, Python/BioPython
|
||||
**Quantitative:** 0.82 Spearman on stability prediction, 3,000x throughput vs experiment, 8,500 variants screened, 5 confirmed hits
|
||||
**Bullet (2L):** Fine-tuned ESM-2 protein language model on 45K experimental melting temperatures, achieving 0.82 Spearman correlation and enabling 3,000$\times$ throughput screening of 8,500 enzyme variants for industrial thermostability.
|
||||
**Bullet (3L):** Fine-tuned ESM-2 protein language model on 45K experimental melting temperatures with transfer learning, achieving 0.82 Spearman correlation and 3,000$\times$ throughput over experimental screening --- identified 7 thermostable lipase variants with 15$+$ $^\circ$C stability gain, 5 experimentally confirmed via differential scanning calorimetry.
|
||||
**Tags:** academic, industry_rd
|
||||
**Significance:** Demonstrates independent ML pipeline development and protein engineering impact. 3,000x speedup is a concrete metric. Published first-author in high-impact journal.
|
||||
|
||||
### L2: Enzyme Solvent Tolerance Prediction
|
||||
**Source:** Chen, Yamamoto, Holmberg, Proteins: Structure, Function, and Bioinformatics 2025 (under review)
|
||||
**Methods:** ESM-2 fine-tuning, GROMACS, explicit solvent MD, MM/PBSA free energy
|
||||
**Quantitative:** 0.78 Spearman on solvent tolerance, 50-ns MD of 80 enzyme-solvent systems, 4 solvent-tolerant variants identified
|
||||
**Bullet (2L):** Extended protein language model to predict enzyme solvent tolerance across 8 organic co-solvent systems, validating against 50-ns explicit-solvent MD for 80 enzyme variants and identifying 4 candidates for green chemistry applications.
|
||||
**Bullet (3L):** Extended protein language model to predict enzyme solvent tolerance across 8 organic co-solvent systems (0.78 Spearman on held-out set) validated against 50-ns explicit-solvent molecular dynamics free energy calculations for 80 enzyme variants --- identified 4 solvent-tolerant lipase candidates now under experimental characterization for green chemistry applications.
|
||||
**Tags:** academic, industry_rd
|
||||
**Significance:** Deepens enzyme engineering expertise into industrial conditions. Natural extension of thermostability work. Under-review status must be stated clearly.
|
||||
|
||||
### L3: Automated Screening Pipeline
|
||||
**Source:** Internal infrastructure project (unpublished)
|
||||
**Methods:** Python, Snakemake, SLURM, GROMACS automation, PostgreSQL
|
||||
**Quantitative:** Automated sequence-to-simulation pipeline for 6 researchers, reduced per-variant setup from 4 hours to 10 minutes
|
||||
**Bullet (2L):** Automated sequence-to-simulation computational pipeline using Snakemake workflow manager, reducing per-variant setup from 4 hours to 10 minutes and supporting 6 researchers across 3 active projects.
|
||||
**Bullet (3L):** Designed and deployed automated sequence-to-simulation pipeline integrating AlphaFold2, GROMACS, and Snakemake with SLURM job scheduling --- reduced per-variant computational setup from 4 hours to 10 minutes and currently supports 6 researchers across 3 active protein engineering projects.
|
||||
**Tags:** academic, industry_rd
|
||||
**Significance:** Demonstrates software engineering and team-enabling skills beyond pure research. "6 researchers" shows collaborative impact. Unpublished -- never imply this is peer-reviewed.
|
||||
|
||||
### L4: Transfer Learning Framework for Protein Properties
|
||||
**Source:** Chen, Rivera, Holmberg, Bioinformatics 2024
|
||||
**Methods:** ESM-2 embeddings, regression heads, active learning, Python/PyTorch
|
||||
**Quantitative:** 60% less labeled data needed, benchmarked on 5 protein families, open-source release (200+ GitHub stars)
|
||||
**Bullet (2L):** Co-developed transfer learning framework from protein language models reducing labeled training data by 60\% across 5 enzyme families, released as open-source tool with 200+ GitHub stars.
|
||||
**Bullet (3L):** Co-developed transfer learning framework leveraging ESM-2 protein language model embeddings with task-specific regression heads, reducing labeled training data requirements by 60\% across 5 enzyme families --- released as open-source Python package adopted by 4 external research groups (200+ GitHub stars).
|
||||
**Tags:** academic, industry_rd
|
||||
**Significance:** Open-source impact is strong evidence of community value. "Co-developed" verb is mandatory (shared with M. Rivera). GitHub stars provide external validation metric.
|
||||
|
||||
### L5: Enzyme Unfolding Pathway Analysis
|
||||
**Source:** Chen et al., ACS Catalysis 2025 (same paper as L1, secondary result)
|
||||
**Methods:** Replica exchange MD, hydrogen bond analysis, principal component analysis, MDAnalysis
|
||||
**Quantitative:** 200-ns trajectories at 300--400 K for 14 variants, discovered unfolding pathway divergence at 340 K
|
||||
**Bullet (2L):** Revealed sequence-dependent enzyme unfolding pathway divergence at 340 K through 200-ns replica exchange MD simulations, identifying stabilizing salt bridge networks that informed rational design criteria.
|
||||
**Bullet (3L):** Revealed sequence-dependent unfolding pathway divergence in 14 lipase B variants through 200-ns replica exchange MD at 300--400 K, discovering critical conformational transition at 340 K and mapping stabilizing salt bridge networks that established rational design criteria for next-generation thermostable enzymes.
|
||||
**Tags:** academic
|
||||
**Significance:** Shows ability to extract mechanistic insight from large-scale simulations, not just run them. Salt bridge analysis is an actionable design metric.
|
||||
|
||||
### L6: Mentorship and Collaboration
|
||||
**Source:** Group activities (ongoing)
|
||||
**Methods:** N/A
|
||||
**Quantitative:** Mentored 3 graduate students, 1 co-authored publication, organized weekly group seminar
|
||||
**Bullet (2L):** Mentored 3 graduate students on protein ML pipelines and MD simulation workflows, with 1 student co-authoring a peer-reviewed publication within 8 months of joining.
|
||||
**Bullet (3L):** Mentored 3 graduate students on protein language models, MD simulation best practices, and HPC workflows --- 1 student co-authored peer-reviewed publication within 8 months; organized weekly computational biology seminar attended by 12 group members across 2 research groups.
|
||||
**Tags:** academic
|
||||
**Significance:** Mentorship evidence is critical for faculty positions. Concrete outcome (co-authored pub) is stronger than vague "guided students."
|
||||
|
||||
---
|
||||
---
|
||||
|
||||
# Position: Ph.D. Researcher at Westfield Institute of Technology
|
||||
|
||||
## Dates: Aug 2018 -- Jul 2023
|
||||
|
||||
## Cross-Position Themes (for cover letters)
|
||||
- Foundation in classical biomolecular simulation before pivoting to ML-accelerated methods
|
||||
- Built core MD and free energy skills that underpin postdoc's ML protein engineering work
|
||||
- Dissertation: "Enhanced Sampling Methods for Protein Folding and Ligand Binding Thermodynamics"
|
||||
|
||||
---
|
||||
|
||||
## Achievements
|
||||
|
||||
### P1: Enhanced Sampling for Protein Folding
|
||||
**Source:** Chen, Alvarez, J. Chem. Theory Comput. 2022
|
||||
**Methods:** Metadynamics, GROMACS, collective variable design, Python
|
||||
**Quantitative:** Characterized folding free energy landscapes for 6 small proteins, predicted folding temperatures within 8 K of experiment
|
||||
**Bullet (2L):** Developed metadynamics-based enhanced sampling protocol for protein folding free energy landscapes, predicting folding temperatures within 8 K of experiment across 6 small proteins.
|
||||
**Bullet (3L):** Developed metadynamics-based enhanced sampling protocol for protein folding using GROMACS, designing collective variables to capture folding reaction coordinates across 6 small proteins --- predicted folding temperatures within 8 K of experimental circular dichroism measurements, establishing computational screening protocol for protein stability.
|
||||
**Tags:** academic, industry_rd
|
||||
**Significance:** Dissertation flagship result. Shows deep MD expertise predating the ML pivot. "Within 8 K" is a concrete validation metric.
|
||||
|
||||
### P2: Force Field Benchmarking for Intrinsically Disordered Proteins
|
||||
**Source:** Chen, Alvarez, Kowalski, J. Chem. Theory Comput. 2021
|
||||
**Methods:** GROMACS (CHARMM36m, AMBER ff19SB, OPLS-AA/M), convergence testing, statistical analysis
|
||||
**Quantitative:** Benchmarked 4 force fields on 15 disordered protein sequences, established CHARMM36m as optimal for IDP ensembles
|
||||
**Bullet (2L):** Benchmarked 4 protein force fields on 15 intrinsically disordered protein sequences, establishing CHARMM36m as the optimal choice for IDP conformational ensemble prediction with 40\% better agreement with SAXS data.
|
||||
**Bullet (3L):** Benchmarked 4 protein force fields (CHARMM36m, AMBER ff19SB, OPLS-AA/M, a99SB-disp) on 15 intrinsically disordered protein sequences and NMR chemical shift data, establishing CHARMM36m as optimal for IDP ensembles --- 40\% better agreement with experimental SAXS profiles while maintaining comparable computational cost.
|
||||
**Tags:** academic, industry_rd
|
||||
**Significance:** Systematic benchmarking shows methodological rigor. Force field selection expertise is broadly applicable. Good for academic positions.
|
||||
|
||||
### P3: Ligand Binding Free Energy Calculations
|
||||
**Source:** Chen, Alvarez, J. Med. Chem. 2023
|
||||
**Methods:** Free energy perturbation (FEP), GROMACS, PMX for alchemical transformations, enhanced sampling
|
||||
**Quantitative:** Calculated relative binding free energies for 40 congeneric ligand pairs, RMSE of 0.9 kcal/mol vs experiment
|
||||
**Bullet (2L):** Calculated relative binding free energies for 40 congeneric ligand pairs via free energy perturbation, achieving 0.9 kcal/mol RMSE against experimental IC50 data across 3 drug target families.
|
||||
**Bullet (3L):** Calculated relative binding free energies for 40 congeneric ligand pairs across 3 drug target families using free energy perturbation with enhanced sampling in GROMACS --- achieved 0.9 kcal/mol RMSE against experimental IC50 data, enabling prospective ranking of 12 novel candidates for medicinal chemistry follow-up.
|
||||
**Tags:** academic, industry_rd
|
||||
**Significance:** Shows drug discovery application of simulation skills. FEP is a high-demand technique. Complements the protein-focused work of the postdoc.
|
||||
|
||||
### P4: Protein Stability Database and Analysis Pipeline
|
||||
**Source:** Chen, Kowalski, Alvarez, Bioinformatics 2021
|
||||
**Methods:** Python, PostgreSQL, BioPython, statistical analysis, automated data curation
|
||||
**Quantitative:** Curated 12,000 experimental melting temperatures from 3 databases, built analysis pipeline, used by 8 lab members
|
||||
**Bullet (2L):** Built curated protein thermostability database integrating 12,000 experimental melting temperatures from 3 public sources, with automated quality filters adopted by 8 lab members for ML training set construction.
|
||||
**Bullet (3L):** Built curated protein thermostability database integrating 12,000 experimental melting temperatures from ProTherm, FireProtDB, and Meltome Atlas with automated quality filters and outlier detection --- adopted by 8 lab members for ML training set construction and directly enabled postdoctoral ESM-2 fine-tuning work.
|
||||
**Tags:** academic
|
||||
**Significance:** Infrastructure work that enabled later ML research. Shows data engineering skills. Directly connects PhD to postdoc research arc.
|
||||
|
||||
### P5: Teaching and Outreach
|
||||
**Source:** Department records (2019--2023)
|
||||
**Methods:** N/A
|
||||
**Quantitative:** TA for 4 semesters, 120+ students total, developed 3 computational lab modules
|
||||
**Bullet (2L):** Served as teaching assistant for computational biology courses across 4 semesters, developing 3 hands-on simulation lab modules adopted department-wide for 120+ students.
|
||||
**Bullet (3L):** Served as teaching assistant for computational biology courses across 4 semesters (120+ students total), developing 3 hands-on GROMACS/Python simulation lab modules subsequently adopted department-wide and contributing to course receiving highest student evaluation score in department.
|
||||
**Tags:** academic
|
||||
**Significance:** Teaching evidence for academic applications. "Adopted department-wide" shows lasting impact beyond the TA role. Omit for industry resumes.
|
||||
@@ -0,0 +1,88 @@
|
||||
# Deep Learning-Guided Screening of Thermostable Enzyme Variants for Industrial Biocatalysis
|
||||
|
||||
## Metadata
|
||||
|
||||
- **Authors:** J. Chen, R. Nakamura, S. Patel, K. Holmberg, M. Rivera
|
||||
- **Year:** 2025
|
||||
- **Journal:** ACS Catalysis
|
||||
- **DOI:** 10.1021/acscatal.2025.XXXXX
|
||||
- **Author position:** First author
|
||||
- **Status:** Published (online Jan 2025)
|
||||
- **Citations:** 12 (as of Mar 2026)
|
||||
|
||||
## Methods & Tools
|
||||
|
||||
- **Protein structure:** AlphaFold2 for initial structure prediction, Rosetta for refinement
|
||||
- **ML framework:** Fine-tuned protein language model (ESM-2, 650M parameters)
|
||||
- Architecture: transformer encoder with task-specific regression head
|
||||
- Training data: ~45,000 experimentally measured melting temperatures from ProTherm/FireProtDB
|
||||
- Training/validation/test split: 70/15/15
|
||||
- **MD engine:** GROMACS 2023 with CHARMM36m force field
|
||||
- **Enhanced sampling:** Replica exchange MD (T-REMD) for conformational landscape mapping
|
||||
- **Docking:** AutoDock Vina for substrate binding pose prediction
|
||||
- **Analysis:** Python (BioPython, MDAnalysis, ProDy), PyMOL for visualization
|
||||
- **Plotting:** matplotlib, seaborn for fitness landscapes and stability distributions
|
||||
- **Hardware:** 320 GPU-hours on university HPC (NVIDIA A100)
|
||||
- **Workflow:** Snakemake pipeline for automated screen-simulate-validate cycles
|
||||
- **Version control:** Git, DVC for dataset versioning
|
||||
|
||||
## Key Results (with numbers)
|
||||
|
||||
- Fine-tuned ESM-2 model achieving Spearman correlation of 0.82 on melting temperature prediction across 12 enzyme families
|
||||
- Validation on held-out test set: MAE = 2.3 degrees C, R-squared = 0.79
|
||||
- Screened 8,500 single- and double-mutant variants in silico in 48 hours (vs. estimated 14 months experimentally)
|
||||
- Identified 7 thermostable variants of lipase B with predicted melting temperature 15+ degrees C above wild type
|
||||
- Experimental collaborators confirmed stability improvement for 5 of 7 candidates (differential scanning calorimetry)
|
||||
- 200-ns replica exchange MD simulations revealed stabilizing salt bridge networks absent in wild type
|
||||
- Discovered sequence-dependent unfolding pathway divergence above 340 K across the variant library
|
||||
- Achieved 3,000x throughput improvement over experimental screening for equivalent hit rate
|
||||
- Transfer learning from ESM-2 reduced required training data by 60% compared to training from scratch
|
||||
- Total compute: 320 GPU-hours (training) + 1,200 CPU-hours (MD validation) vs. estimated 18 months wet-lab
|
||||
|
||||
## Collaboration & Scope
|
||||
|
||||
- **PI / Senior author:** K. Holmberg (Lakewood University, computational biology group lead)
|
||||
- **J. Chen's role:** Designed ML pipeline, fine-tuned protein language model, ran all MD simulations, wrote manuscript draft
|
||||
- **R. Nakamura:** Curated training data from ProTherm/FireProtDB databases
|
||||
- **S. Patel:** Experimental validation of top-7 candidates (DSC and activity assays)
|
||||
- **M. Rivera:** Snakemake workflow design (co-developed with J. Chen)
|
||||
- **Scope:** Single-lab project with experimental validation collaboration
|
||||
|
||||
## Provenance
|
||||
|
||||
- **Publication status:** Published, peer-reviewed
|
||||
- **Peer review notes:** 3 reviewers, 1 revision cycle, accepted after minor revisions
|
||||
- **Claiming rules:**
|
||||
- FULL ownership: ML pipeline design, model fine-tuning, MD simulations, manuscript writing
|
||||
- SHARED ownership: Snakemake workflow (co-developed with M. Rivera)
|
||||
- NO ownership: Training data curation (R. Nakamura), experimental validation (S. Patel)
|
||||
- **Safe verbs for bullets:** Developed, Designed, Built, Fine-tuned (for ML work); Co-developed (for workflow)
|
||||
- **Unsafe claims:** Cannot claim experimental validation; cannot claim sole credit for workflow automation
|
||||
- **Data availability:** Trained model weights deposited on Hugging Face (open access)
|
||||
- **Code availability:** Screening pipeline on GitHub (public repo, MIT license)
|
||||
|
||||
## Resume Bullet Seeds
|
||||
|
||||
1. **[STAR: Protein language model for stability prediction]**
|
||||
Situation: Enzyme thermostability screening bottlenecked by experimental throughput.
|
||||
Task: Build ML model for rapid stability prediction across enzyme families.
|
||||
Action: Fine-tuned ESM-2 protein language model on 45K experimental melting temperatures.
|
||||
Result: 0.82 Spearman correlation, screened 8,500 variants in 48 hrs, 5/7 top hits confirmed.
|
||||
|
||||
2. **[STAR: Thermostable enzyme discovery]**
|
||||
Situation: Industrial biocatalysis requires enzymes stable above 70 degrees C.
|
||||
Task: Identify lipase B variants with substantially improved thermostability.
|
||||
Action: Combined ML-accelerated screening with 200-ns replica exchange MD validation.
|
||||
Result: Identified 7 variants with 15+ degrees C stability gain, 5 experimentally confirmed.
|
||||
|
||||
3. **[STAR: Transfer learning pipeline]**
|
||||
Situation: Limited labeled data for enzyme stability prediction.
|
||||
Task: Reduce training data requirements while maintaining accuracy.
|
||||
Action: Co-developed transfer learning pipeline from ESM-2 pretrained representations.
|
||||
Result: 60% reduction in required training data while maintaining sub-3 degrees C MAE.
|
||||
|
||||
4. **[STAR: Conformational dynamics]**
|
||||
Situation: Static structure predictions cannot capture unfolding pathways.
|
||||
Task: Reveal stabilizing interactions in engineered enzyme variants.
|
||||
Action: Ran 200-ns T-REMD simulations of wild-type and 7 top variants at 300--400 K.
|
||||
Result: Discovered stabilizing salt bridge networks and sequence-dependent unfolding divergence at 340 K.
|
||||
Reference in New Issue
Block a user