ReviewMarch 18, 2026

Artificial Intelligence in Chemical Engineering: Protein Design from First Principles to Structural Prediction
Click to copy article linkArticle link copied!

Joseph S. Bailey Jr.
Joseph S. Bailey, Jr.
Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W Woodruff Avenue, Columbus, Ohio 43210, United States
More by Joseph S. Bailey, Jr.
Søren C. Spina
Søren C. Spina
Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W Woodruff Avenue, Columbus, Ohio 43210, United States
More by Søren C. Spina
Andrew Hu
Andrew Hu
College of Medicine, The Ohio State University, 460 W 10th Avenue, Columbus, Ohio 43210, United States
More by Andrew Hu
Nathan Phan
Nathan Phan
Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W Woodruff Avenue, Columbus, Ohio 43210, United States
More by Nathan Phan
Rachel B. Getman
Rachel B. Getman
Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W Woodruff Avenue, Columbus, Ohio 43210, United States
More by Rachel B. Getman
https://orcid.org/0000-0003-0755-0534
Blaise R. Kimmel*
Blaise R. Kimmel
Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W Woodruff Avenue, Columbus, Ohio 43210, United States
Center for Cancer Engineering, Ohio State University Comprehensive Cancer Center, The Ohio State University, 2255 Kenny Road, Columbus, Ohio 43210, United States
Pelotonia Institute for Immuno-Oncology, Ohio State University Comprehensive Cancer Center, The Ohio State University, 2255 Kenny Road, Columbus, Ohio 43210, United States
*Email: [email protected]
More by Blaise R. Kimmel
https://orcid.org/0000-0002-9582-9887

Open PDF

ACS Engineering Au

Cite this: ACS Eng. Au 2026, XXXX, XXX, XXX-XXX

Click to copy citationCitation copied!

https://doi.org/10.1021/acsengineeringau.5c00099

Published March 18, 2026

CC-BY-NC-ND 4.0 .

Abstract

Click to copy section linkSection link copied!

Machine learning and artificial intelligence are improving the speed and accuracy of every step during the protein design process. Early computational strategies relied on physics-based modeling and energy functions to identify amino acid sequences and desired folds. Recent advances in deep-learning structure prediction, diffusion-based backbone generation, and graph-based sequence design now allow researchers to explore the protein sequence and structural space more efficiently. These developments allow proteins to be used as fundamental systems whose components can be engineered with high precision. Computational predictions still struggle to properly account for conformational dynamics, catalytic environments, external interactions, and the broader chemical diversity present in natural enzymes. This review covers the progression from physics-based methods to deep learning, generative methods, and includes current strategies for evaluating stability and function in silico and experimentally.

This publication is licensed under

CC-BY-NC-ND 4.0 .

License Summary*
You are free to share(copy and redistribute) this article in any medium or format within the parameters below:
- Creative Commons (CC): This is a Creative Commons license.
- Attribution (BY): Credit must be given to the creator.
- Non-Commercial (NC): Only non-commercial uses of the work are permitted.
- No Derivatives (ND): Derivative works may be created for non-commercial purposes, but sharing is prohibited.
View full license
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
License Summary*
You are free to share(copy and redistribute) this article in any medium or format within the parameters below:
- Creative Commons (CC): This is a Creative Commons license.
- Attribution (BY): Credit must be given to the creator.
- Non-Commercial (NC): Only non-commercial uses of the work are permitted.
- No Derivatives (ND): Derivative works may be created for non-commercial purposes, but sharing is prohibited.
View full license
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
License Summary*
You are free to share(copy and redistribute) this article in any medium or format within the parameters below:
- Creative Commons (CC): This is a Creative Commons license.
- Attribution (BY): Credit must be given to the creator.
- Non-Commercial (NC): Only non-commercial uses of the work are permitted.
- No Derivatives (ND): Derivative works may be created for non-commercial purposes, but sharing is prohibited.
View full license
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
License Summary*
You are free to share(copy and redistribute) this article in any medium or format within the parameters below:
- Creative Commons (CC): This is a Creative Commons license.
- Attribution (BY): Credit must be given to the creator.
- Non-Commercial (NC): Only non-commercial uses of the work are permitted.
- No Derivatives (ND): Derivative works may be created for non-commercial purposes, but sharing is prohibited.
View full license
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.

Subjects

Keywords

Special Issue

Published as part of ACS Engineering Au special issue “AI and Machine Learning in Chemical Engineering: Breakthroughs and Applications”.

Introduction

Click to copy section linkSection link copied!

At the atomic scale, enzymes can be understood as molecule-scale bioreactors that catalyze reactions through fundamental chemical engineering principles such as thermodynamics, kinetics, transport phenomena, and mass and energy balances. Engineered metalloenzymes now perform unique transformations under conditions that once required high temperatures, (1−3) a trend mirrored by machine-learning (ML) engineered hydrolases and proteomics platforms that streamline plastic depolymerization at ambient temperatures (3,4) and peptide discovery with minimal costly trial-and-error screens. (5) Industrial protein engineering pipelines extend this trajectory via the production of vitamins, biofuels, and specialty chemicals with improved stability and selectivity via rational and de novo computational protein design. (6) These advances show the application of chemical engineering principles to programmable biomaterials.

Protein engineering involves systematically modifying natural proteins to investigate, alter, or repurpose their inherent biological functions and to design novel proteins for specific applications. Over the past four decades, the field has moved from making incremental changes to naturally occurring proteins toward de novo protein design, the creation of proteins with defined structures and precise functionalities from scratch, (7) now operating at a scale and level of precision that was previously unrealistic. Advances in computational modeling, structural biology, and directed evolution have reshaped protein design, making it possible to build enzymes with atomic-level structural accuracy, including folds and functions not yet found in nature. (8,9)

For much of its history, structural uncertainty has limited the extent to which proteins can be used as tunable bioreactors due to a fundamental constraint known as the protein folding problem. This problem explores how a linear amino acid sequence defines a specified three-dimensional (3D) structure and thus determines the protein function (Figure 1). Protein folding is thermodynamically driven toward low free-energy states, but the expansive conformational search space described by Levinthal’s paradox, which says that if a protein sampled all possible conformations randomly it would take longer than the age of the universe to fold, makes uninformed structural prediction impossible. (10,11) For decades, the gap between physical theory and prediction constrained rational protein engineering, defining one of the greatest unsolved questions in biology. (12) Structural insight in the 1980s and 1990s was limited and in many cases simply unavailable, forcing researchers to rely on mechanistic intuition via “minimal” design, where simplified protein-like structures that capture only the most essential features of folding are used for “rational” design. This intuitive approach uses iterative trial-and-error mutation rounds to slowly progress toward functional improvement. (13) Directed evolution provided a powerful alternative but is limited in widespread use due to the need for advanced experimental screens, which are costly, labor-intensive, and time-consuming. (7,9)

Figure 1. From sequence to protein structure and conformational behavior. (A) Biological information transfer follows a deterministic pathway from DNA to RNA to protein, linking the encoded sequence information to the emergent molecular function and dynamics. (B) Input amino acid sequences serve as the basis for predictive modeling frameworks. (C) Sequence-informed A.I./ML frameworks trained on sequence and structural ensemble data learn the mapping between linear sequences and conformation space. (D) The resulting structural ensemble offers a data-driven view of protein flexibility and structural diversity derived directly from the amino acid sequence. Reprinted or adapted with permission under a CC-BY 3.0 License from Ille et al. (14) Copyright 2025 *AIP Publishing*.

Identifying sequences that reliably define a desired structure led to the inverse folding problem. (12,15) Early computational approaches neglected flexibility, solvation, and entropic effects, often leading to unstable or misfolded structures despite the use of fixed-backbone design and rotamer libraries intended to make protein folding more manageable. Although advances in algorithm development and statistical mechanics in the early 2000s enabled credible de novo designs, structural prediction continued to constrain the shift from conceptual feasibility to routine engineering. (13,17)

Change toward functional design began when physics-based platforms, such as ROSETTA and PyRosetta, improved conformational sampling. By using explicit energy functions to aid deep-learning approaches, including AlphaFold and RoseTTAFold, physics-based methods achieve near-experimental accuracy. (18,19) In parallel, large-scale protein language models (LLMs) such as ESM and ProGen are trained directly on the sequence data. These autoregressive architectures show that multiple sequence alignments (MSAs) are not always necessary and that structural and functional information can be inferred solely from sequence space. Diffusion-based models further expanded generative design in both structure and sequence space by reframing folding, docking, and binder generation as probabilistic sampling problems. Developments that allow sequence, structure, and function to be optimized in tandem result in the generation of proteins with diverse folds and improved functionality (7,8,13) (Figure 2).

Figure 2. Conceptual view of the protein functional universe. The diagram maps the relationships among sequence, structure, and function spaces. Each circle represents an individual protein defined by its amino acid sequence, 3D folds, and biological activity. The blue circles correspond to proteins accessible through natural evolution or traditional protein engineering, primarily clustered within well-explored regions (yellow). Gray circles indicate proteins that remain uncharacterized and lie within the unexplored sequence–structure–function space. The red circles represent proteins accessible through ML-driven *de novo* design, which extends exploration beyond natural boundaries into previously inaccessible regions. In this framework, sequence space (top layer) is linked to structure space (middle layer) and ultimately to function space (bottom layer), with A.I. methods systematically probing across all three layers. Reprinted or adapted with permission under a CC-BY 4.0 License from Zhang et al. (22) Copyright 2025 MDPI.

The integration of advances in structural prediction with engineering principles has increased protein tunability, as synthetic proteins are emerging as realistic alternatives to natural proteins in medicine, biotechnology, and sustainability. Active sites, binding interfaces, and stability profiles can now be co-optimized within defined thermodynamic and kinetic limits, resulting in designed enzymes that are being applied to green chemistry, protein binders for clinical pipelines, and orthogonal tRNA-synthetase pairs for the expansion of the genetic code and the incorporation of noncanonical amino acids (ncAAs). (13,20,21) Modern design tools support binder design, docking analysis, interaction scoring, and interface confidence measures for the predicted complexes. Working alongside first-principles approaches, ML expands classical chemical engineering into synthetic biology by enabling exploration of sequence space at previously inaccessible scales. Applications to defined biological targets–from generating programmable binders to repurposing existing scaffolds–show that the development of new generative models is a rapidly expanding field where the ideation of biological structure can be predicted for highly specialized applications. These tools are the “engines” driving a new era of biochemical innovation. These tools open the door to expanding genetic codes, new biocatalysts, and programmable protein machines by connecting molecular design to industrial applications in medicine, sustainability, and synthetic biology.

Foundational Structure Prediction and Design Frameworks

Click to copy section linkSection link copied!

ROSETTA and PyRosetta

Introduced in the late 1990s by the Baker laboratory, ROSETTA remains one of the most influential physics-based platforms in computational protein design, laying the foundation for modern docking and design protocols using a protein fragment assembly approach, in which short peptide fragments from protein structures are recombined to approximate unknown folds (Table 1). (23,24) ROSETTA uses Monte Carlo sampling together with hybrid energy functions that integrate physics-based terms such as van der Waals interactions, hydrogen bonding, electrostatics, and solvation with knowledge-based statistical potentials derived from structural data sets to predict unknown folds without requiring an exhaustive search over all conformations. (25) This approach made ab initio folding computationally practical. (26) Over time, the suite has expanded well beyond its original capabilities and now includes protein–protein docking, flexible small molecules with RosettaLigand, (27) antibody modeling, enzyme active-site design, RosettaMatch-based metal coordination tuning, and covalent docking extensions. (25,27,28) Application of the ROSETTA toolkit has led to analyses of resistance mutations in HIV-1 protease inhibitors, backbone redesign to create more thermostable metalloenzymes, and the creation of biocatalysts for plastic degradation and green chemical synthesis. (25,26,28)

Table 1. Summary of the Foundational Prediction and Validation Framework

package	primary role	integration	reported accuracy/benchmarks	key architecture	limitations
ROSETTA (1998)	physics-based modeling, de novo folding, docking, enzyme/binder design	(1) modular protocols (RosettaScripts)	(1) ab initio folding within 2–4 Å RMSD for small proteins	(1) Monte Carlo sampling with fragment assembly	(1) computationally expensive
		(2) integrates with PyRosetta and experimental pipelines	(2) accurate ligand docking (≤2 Å in RosettaLigand); successful in antibody modeling and enzyme design (24−26)	(2) hybrid energy function (physics and knowledge-based potentials)	(2) limited backbone flexibility in fixed-backbone design
					(3) underrepresents entropy and solvent dynamics
					(4) requires large sampling for success
PyRosetta (2010)	scriptable interface for custom design workflows	(1) python API to ROSETTA core	(1) comparable accuracy to ROSETTA protocols	exposes ROSETTA “Pose” object, scoring functions, and movers to Python	(1) requires user scripting; limited scalability without HPC
		(2) integrates with NumPy/pandas/ML tools	(2) flexible pipelines for alanine scanning, ΔΔG, interface mapping (29,30)		(2) inherits ROSETTA’s scoring function, biases
		(2) integrates with NumPy/pandas/ML tools			(3) not inherently generative
AlphaFold2 (2020)	high-accuracy structure prediction from sequence	used in nearly all modern pipelines as a validation filter	(1) CASP14: median GDT_TS > 90	deep attention networks (Evoformer and structure module) with iterative refinement	(1) deterministic outputs
			(2) subangstrom accuracy for many folds		(2) limited conformational diversity
			(3) proteome-scale modeling (34,37,47,48)		(3) no motif conditioning
			(3) proteome-scale modeling (34,37,47,48)		(4) no explicit ligand/cofactor modeling
ColabFold (2021)	accessible high-throughput structure prediction	(1) integrates AlphaFold2/RF models	(1) CASP14 free modeling accuracy close to AlphaFold2	(1) AlphaFold2/RF backbone adapted to Colab notebook	(1) dependent on MSA quality
		(2) uses MMseq2 for fast MSA generation	(2) ≥40× faster MSA generation; robust on toxin families and multimer predictions (40−42,49)	(2) MMseq2 for sequence search	(2) reduce precision vs AlphaFold2
		(3) used on Google Colab/local			(3) deterministic
		(3) used on Google Colab/local			(4) limited support for rare folds or novel chemistries
RoseTTAFold (2021–2023)	multitrack prediction (RF), nucleic acid complexes (RFNA), all-atom assemblies (RFAA)	(1) extends to motif scaffolding, protein–ligand/nucleic acid complexes	(1) RF: three-track models within 2–3 Å	(1) three-track neural network (RF)	(1) deterministic (RF/RFAA/RFNA)
		(2) paired with ProteinMPNN/LigandMPNN	(2) RFAA: subangstrom ligand placement	(2) graph-based all-atom encoding (RFAA)	(2) limited dynamics
		(2) paired with ProteinMPNN/LigandMPNN	(3) RFNA: improved protein–DNA/RNA accuracy (44−46)	(3) sequence and structure alphabet expansion (RFNA)	(3) incomplete coverage of novel chemistries

First introduced in 2010, PyRosetta extended ROSETTA capabilities to the Python interface. (29) This enabled rapid prototyping of custom pipelines without modifying the C++ core by granting direct access to the internal pose object (Protein Data Bank file), scoring functions, and movers. PyRosetta functions as an interface layer that accelerates hypothesis-driven design and has been widely adopted as a platform for research and education, highlighted by Jupyter notebook-based teaching modules that guide users through tasks such as protein folding, protein–protein and protein–ligand docking, and antibody design. (30) In industrial settings, PyRosetta has been used to optimize enzyme stability, where ΔΔG scanning evaluates the effects of hundreds of mutations in silico before experimental screening. (29) In immunoengineering, alanine-scanning and ΔΔG protocols have identified critical hotspot residues at the antibody–antigen interface. (30) PyRosetta has also been used for mutagenesis of synthetase residues, freezing of tRNA backbones, and binding energy calculations for ncAAs, providing a computational platform for genetic code expansion. (31) This flexibility allows integration with statistical analysis and ML libraries, custom energy functions, metalloenzyme development, and covalent hotspot evaluation. (29−33) Despite its broad capabilities and flexibility as a “general-purpose” modeling suite, ROSETTA is still computationally demanding and heavily reliant on sampling. Fixed-backbone models tend to miss contributions from entropy, solvent interactions, and conformational flexibility. Failure to capture the full dynamics of the system directly affects how thoroughly sequence and conformational space are explored and thus has a direct effect on model success.

AlphaFold: Structure Prediction at Scale

Upon its release in 2020, AlphaFold2 (AF2) set new field-wide standards in CASP14 (Critical Assessment of Structure Prediction) by achieving near-experimental accuracy for most targets. In this blind benchmark, structures must be predicted before experimental coordinates are released. (34−36) AlphaFold’s major impact has been the reduction of structural prediction uncertainty, and its precision stems from the integration of evolutionary information with an attention-based deep-learning architecture, the Evoformer, coupled to an end-to-end coordinate generation model that enforces three-dimensional (3D) spatial constraints. Using this framework, AF2 outperformed competing methods and achieved Global Distance Test Total Scores (GDT_TS) above 90 for most targets, which measures how closely predicted structures match the overall fold and backbone geometry of the experimental model. By resolving many long-standing challenges in the “protein folding problem”, AlphaFold is now routinely used as a structural filter in protein engineering workflows. Sequences are first screened in silico, and only those predicted to refold into high-confidence conformations move forward to experimental characterization. (37) AlphaFold has also contributed to cryo-EM map interpretation, molecular replacement strategies, protein complex structure prediction, and validation of de novo designs. (34,35,37) AlphaFold2 was used at the proteome scale, providing structural coverage of nearly the entire human proteome and thousands of proteins across diverse organisms. (35) Despite its broad utility, AlphaFold predictions are largely deterministic and offer limited conformational diversity. There are still limitations in motif conditioning, ligand placement, and explicit modeling of ncAAs or cofactors. (38,39) Confidence metrics like pLDDT speak to structural reliability, but they do not predict whether a design will express, remain soluble, or function catalytically.

ColabFold: Standardized Prediction

ColabFold adapted AlphaFold for rapid user-friendly execution by using MSAs in the Google Colaboratory (Colab) environment. (40) Colab is a free, cloud-based platform hosted by Google that runs Jupyter Notebooks in a web browser, providing users with access to GPUs without the need for local installation. By taking advantage of cloud GPUs, ColabFold makes cutting-edge structure prediction available to laboratories and classrooms worldwide, delivering results significantly faster while achieving comparable accuracy. (41) The notebook interface allows beginners to run protein predictions with minimal setup, while advanced users can use command line tools for batch processing and parameter tuning, contributing to the routine use of the platform for teaching, prototyping, and large-scale protein modeling. (40) Although slightly less precise than full AF2 or RoseTTAFold pipelines, the trade-off of marginally reduced precision in exchange for greatly increased throughput has made ColabFold well-suited for exploratory and comparative studies in which researchers must screen hundreds of candidate scaffolds or assess entire protein families, cases that would be impractical with AlphaFold or RoseTTAFold alone. (42) ColabFold, like AlphaFold, suffers from deterministic outputs, dependence on sequence alignment quality, and database size, offering limited conditioning flexibility, and as a result, is also best viewed as a scalable front-end filter. Although ColabFold shares AlphaFold’s constraints, the strength of the software lies in its throughput and accessibility, making ColabFold an entry point for the broad application of A.I.-based protein structure modeling across research, education, and design applications. (41)

RoseTTAFold: Expansion to All-Atom Modeling

RoseTTAFold (RF) introduced a three-track architecture that combines sequence, pairwise, geometry, and 3D coordinates. (43) For monomeric targets, RF is comparable in accuracy to AlphaFold but also provides structural flexibility for motif scaffolding and downstream adaptations. Early adaptations highlighted the potential of the RF framework for motif scaffolding, in which functional sites or short structural motifs can be embedded in novel backbones in a single step. Subsequent extensions such as RoseTTAFoldNA (RFNA) and RoseTTAFold All-Atom (RFAA) broaden the scope through the incorporation of nucleic acids for protein–DNA/RNA modeling and expanded descriptions of ligands, metals, and covalent modifications using graph-based atomic encodings. (44,45) The combination of sequence-based representation of proteins with graph-based atomic representation of ligands enabled the modeling of metalloenzymes, glycosylated antibodies, and small-molecule complexes within a unified framework. Cross-model consistency between AlphaFold2 and RoseTTAFold correlates with experimental foldability and solubility, providing a secondary in silico filter for the predicted de novo structure. (46) The generative extension RFDiffusion reframed protein design as a denoising process in joint sequence–structure space, enabling conditioning on motifs, symmetry, and functional constraints (Figure 3). This transition from accurate monomer prediction to all-atom biomolecular assemblies via controlled sampling represents a conceptual shift toward a probabilistic generative design.

Figure 3. Overview of the current protein design dogma. Traditional protein science is often described as a one-way flow in which (A) amino acid sequences give rise to (B) folded structures, which in turn underpin (C) biological function. Modern *de novo* design inverts this logic: researchers now begin with the desired function and work backward to identify compatible folds and sequences. Current computational frameworks align with three broad strategies: (1) two-stage design, in which structural generators such as *ROSETTA, RoseTTAFold*, or *PyRosetta* first propose candidate protein backbones that are then optimized by sequence design engines; (2) sequence-driven methods, exemplified by *AlphaFold2* and *ColabFold*, which predict protein structures directly from amino acid sequence information and are widely used to validate or filter design candidates; and (3) coguided approaches, including multitrack RoseTTAFold variants (*RF, RFNA, RFAA*) and diffusion-based models (*RFDiffusion*), which integrate amino acid sequence and protein structure generation simultaneously. These complementary strategies extend the protein design beyond natural sequence–structure relationships, enabling a function-first exploration of protein space. Reprinted or adapted with permission under a CC-BY 4.0 License from Zhang et al. (22) Copyright 2025 MDPI.

Shared Constraints of Foundational Frameworks

Limitations of de novo protein design, such as deterministic results, are evident across the foundational frameworks. These methods typically generate a single or narrow set of static structural “snapshots” rather than dynamic ensembles, emphasizing backbone geometry over solvent dynamics, entropy, and conformational transitions. Though useful for validation, static predictions limit exploration of alternative conformational changes or functional states, which are often critical in systems where flexibility and allostery are central to successful design. In addition, generalization to ncAAs, metals, and post-translational modifications (PTMs) remains uneven, and computational accuracy does not guarantee experimental foldability or function. Even with advances such as RFAA, predictive reliability for small molecules, metal centers, or covalent modifications is not yet comparable to that of physics-based docking or QM/MM refinement.

There is a trade-off between computational resources and predictive depth. Highly accurate methods often require substantial time and GPU power, whereas faster, more accessible approaches, such as ColabFold, are less precise and less flexible. Overlapping limitations indicate that current tools excel at answering whether a sequence will fold but are less well-equipped to determine which sequence should be engineered to achieve a desired function or binding outcome. This distinction drove the development of both generative methods that operate directly in sequence space and diffusion-based approaches that treat folding, docking, and binder generation as probabilistic sampling problems and the next step in protein design.

Generative Backbone and Sequence Design

Click to copy section linkSection link copied!

Energy minimization has dominated protein design for decades. Given a backbone sequence, the residue and rotamer spaces are explored in search of variants that lower the overall protein score. High computational costs and an expansive conformational landscape limit the sampling efficiency of the available protein search space. To address this limitation, newer ML models use probabilistic methods, graph-based neural networks, and other machine-learning architectures to generate proteins with structural and functional diversity. This approach places value on a strong training data set rather than on heuristics and physics, building upon a structure-and-sequence-only design. By combining symmetry and three-dimensional constraints, novel methods harness sequence–structure relationships to create stable scaffolds that support catalytic motifs, bind small molecules, and form higher-order assemblies with increased precision (Figure 4).

Figure 4. Overview of the A.I.-driven protein design toolbox. According to their functional roles in A.I.-driven generative protein design, the protein design toolbox can be divided into five categories: (A) structure prediction frameworks (e.g., AlphaFold2, RoseTTAFold) that validate fold accuracy; (B) de novo backbone generators (RFDiffusion, RFDiffusionAA) that embed motifs or active sites into novel folds; (C) fixed-backbone sequence designers (LigandMPNN) that optimize sequences against a defined structural context; (D) sequence generation models (ProteinMPNN), which not only perform fixed-backbone optimization but also function as a generative sampler of amino acid sequences; and (E) sequence–structure cogeneration and refinement frameworks (PLACER), which jointly optimize side chains, ligands, and active-site geometry. Reprinted or adapted with permission under a CC-BY 4.0 License from Zhang et al. (22) Copyright 2025 MDPI.

Diffusion: Backbone Construction and Modeling as a Generative Process

RoseTTAFold Diffusion (RFDiffusion) is a generative deep-learning model that uses denoising diffusion probabilistic models (DDPMs), or “diffusion models” for protein backbone generation. (19) DDPMs reframe the backbone design optimization problem as a generative issue. RFDiffusion refines residue “frames” into geometrically consistent protein structures through learned denoising steps, operating in a residue-centered frame representation. By treating each residue as a separate rigid frame, local backbone geometry is preserved and proteins retain global flexibility, resulting in realistic protein backbones and diverse protein structures under minimal structural constraints. (50) This approach maintains a balance between global flexibility and backbone geometry, which is important for generating symmetric assemblies, multimeric complexes, and scaffolds built on functional motifs. RFDiffusion All-Atom (RFDiffusionAA) builds upon the original framework by adding explicit all-atom parameters (Table 2). This allows for the reshaping of binding pockets around small molecules, and ligand-aware conditioning for uses such as designing catalytic centers with an expansive molecular library. (45,51)

Table 2. Summary of Generative ML Frameworks for Protein Design

package	primary role	integration	reported accuracy/benchmarks	key architecture	limitations
RFDiffusion (2023)	De novo protein backbone and functional motif design	generates protein scaffolds for motif/catalyst embedding	(1) diverse novel folds (up to 600 residues)	(1) 3D frame-based denoising diffusion model using RoseTTAFold	(1) high GPU cost
			(2) RMSD ∼ 2 Å for motif placement; 42–54% success for TIM barrels	(2) supports symmetric design, self-conditioning, and partial motif constraints	(2) sampling variance
			(3) 19% hit rate for binders		(3) limited explicit ligand handling (addressed in RFDiffusionAA)
			(4) 23/25 success rate in motif scaffolding		(4) sensitive to motif constraints
			(5) improved interface and side-chain quality in RFDiffusionAA (19)		(5) challenges with polar interfaces
					(6) stochastic outputs may vary
RFDiffusionAA (2024)	active-site-aware protein backbone generation and binder design	used for enzyme pocket design, synthetase-ligand scaffolding, and interface tuning	(1) >20% increase in ΔΔG success	(1) RFDiffusion fine-tuned on active-site data	(1) requires detailed active-site input
			(2) supports joint active-site and motif design	(2) supports per-residue conditioning, side-chain aware diffusion, and flexible residue input	(2) no end-to-end sequence optimization (must be coupled with ProteinMPNN and LigandMPNN)
			(3) improved hallucination accuracy (19,45)
ProteinMPNN (2022)	amino acid sequence design for fixed backbones	follows RF/RFDiffusion scaffold generation	(1) ∼50–55% native sequence recovery overall	message passing graph neural network on protein backbone context	(1) fixed backbone
			(2) ∼90–95% for buried residues; 200× faster than ROSETTA (57,58)		(2) no ligand/cofactor support
					(3) no noncanonical AA modeling
					(4) lacks backbone flexibility
LigandMPNN (2025)	sequence optimization in the presence of ligands	pocket-specific redesign postdocking or PLACER-generated poses	(1) 63.3% sequence recovery (small molecules), 50.5% nucleotides, 77.5% metals	(1) dual-graph neural network linking ligand atoms and protein residues	(1) requires accurate initial ligand pose and placement
LigandMPNN (2025)	sequence optimization in the presence of ligands		(2) Chi1 recovery ∼86% (59)	(2) ligand-aware autoregressive design and side-chain packing	(2) sparse data for rare chemotypes
PLACER (2025)	active-site evaluation and pose refinement	filters/optimizes RFDiffusion and LigandMPNN output	(1) RMSD ≈ 1.1 Å for ligand active-site alignment	SE(3)-equivariant graph transformer and denoising-based side-chain and ligand optimization	(1) requires known ligand pose or transition-state geometry
			(2) improves functional design success by 3–5× in catalytic benchmarks (63,64)		(2) limited support for de novo ligand generation
					(3) sensitive to backbone geometry errors

Protein backbones are constructed around functional residues given user-input features, such as symmetry, catalytic information, and 3D constraints. Iterative backbone creation allows for the exploration of backbone diversity while retaining global flexibility. In recent benchmarks, RFDiffusion has shown successful monomer generation, cyclic and polyhedral assemblies, and motif scaffolding without requiring symmetry templates. Experimental validation confirmed correct folding and oligomerization. (19) However, despite these improvements, the performance varies across systems. Modeling of polar interfaces, noncanonical residues and RNA-associated systems remains challenging. Ongoing developments aim to address these constraints to enable enhanced control over the active-site design of proteins in future releases. (51,52)

Diffusion methods are a recently developed and successful tool for scientists to improve molecular docking and design (Table 2). DiffDock integrates diffusion and molecular docking by denoising ligand translations, torsions, and rotations, ranking structures to obtain a final prediction. (53) DiffDock showed substantial improvement from previous methods for both traditional docking and docking with de novo structures. Similarly, diffusion models such as EvoDiff use evolutionary sequence data to design proteins relative to natural sequence and functional space. (54) An advantage of diffusive methods is user control. Models can be conditioned to specific inputs and outputs to generate a variety of biologically relevant proteins. Building upon these sequence-only methods, Protpardelle is a diffusion model that codesigns sequence and structure by focusing on the side-chain positions at multiple states before collapsing into a single state. (55) By denoising side-chain backbones together, all-atom frameworks such as Protpardelle can be conditioned strictly on side-chain function groups. Use of both diffusion and evolutionary-scale data has led to substantial improvements in previous frameworks, yielding functionally diverse and biologically relevant natural sequences.

ProteinMPNN: Sequence Design as Geometric Prediction

Once a backbone is defined, sequence assignment becomes the constraint. ProteinMPNN uses a “message passing” neural network that replaces a combinatorial search with conditional probabilities, reframing the residue assignment based on geometric predictions. Trained on approximately 20,000 high-resolution protein structures, ProteinMPNN revealed that local protein backbone context is a major determinant of amino acid residue identity. In MPNN frameworks, each residue is treated as a node and exchanges information with neighboring residues. Much like friends in a social network, residues update each other about their surroundings. Sequences are predicted from N-terminus to C-terminus and conditioned on geometric features (such as α-carbon, distances, and side-chain orientations), resulting in context-aware predictions for amino acid residue identity. (56) Using this graph-based architecture to customize the decoding order and identify constraints across chains, backbone contexts are inferred and computational costs are reduced, resulting in native sequence recovery increases of up to nearly 10% relative to ROSETTA fixed-backbone energy minimization methods.

ProteinMPNN sequences have been shown to frequently refold successfully under AlphaFold validation, particularly in the absence of MSAs. (57) It accomplishes this by using noise augmentation during training, which increases tolerance for imperfect backbones. This allows for better accommodation of symmetry-aware and multichain design. (58) The primary limitation of this framework is the assumption of a structural rigidity. Backbone flexibility, induced fit, and explicit ligand interactions are outside the model’s core assumptions. Despite these pitfalls, ProteinMPNN has become an effective second-stage filter in modern design pipelines.

LigandMPNN: Incorporating Chemical Context into Sequence Design

LigandMPNN extends sequence design in the presence of small molecules, nucleotides, and metals critical for enzyme and binding site engineering. (59) The key innovation is a dual-graph architecture. One graph encodes protein residues and the other encodes ligand atoms, allowing for information transfer between the ligand and protein. With this, residue identities and side-chain orientations are refined based on the ligand chemistry and the overall local environment. (59) In essence, this process is similar to planning a dinner table for a multicourse meal, where guests (protein residues) choose their seats based not only on nearby friends (local protein backbone context) but also on the range of courses offered (the ligand and its atoms) and the order in which the dishes on the menu are served. This approach improves sequence recovery and packing accuracy at binding interfaces relative to backbone-only approaches.

Experimental studies demonstrate increased accuracy and broad utility using ProteinMPNN. Sequence recovery at ligand-contact positions reached 63.3% for small molecules, 50.5% for nucleotides, and 77.5% for metal-binding residues, significantly outperforming ProteinMPNN and ROSETTA (∼34–50%). (59) Successful binder redesign of weak or nonfunctional ROSETTA-derived structures improved binder affinity up to 100-fold, with over 100 confirmed complexes, including small molecules and nucleotide binders, metal coordination sites, and ligand-dependent protein switches. (60,61) When combined with generative backbone tools, LigandMPNN serves as a functional specificity filter in earlier stages of the design pipeline. The addition of ligand-aware conditioning without sacrificing speed enables the precise design of active sites and binding pockets tailored to specific chemical environments. Still, the effectiveness depends on accurate ligand placement and sufficient training data on rare chemotypes.

PLACER: Active-Site Geometry as a Filtering Step

PLACER (protein–ligand atomistic conformational ensemble resolver) focuses on the complementary challenge of catalytic preorganization and judging the precision of structures containing ligands and specialized residues. Unlike traditional docking tools, which treat proteins and ligands separately, PLACER represents both protein and ligand atoms as a unified molecular graph. By simultaneously docking and scoring structures instead of treating them as separate operations, PLACER learns spatial relationships directly from atomic coordinates. (62) This approach refines ligand coordinates and surrounding side chains within a geometry-aware neural network. The output includes both a predicted structure and confidence metric, predicted Root-Mean-Square Deviation (pRMSD), which ranks structures based on a geometric score that correlates with structural accuracy. (63)

In enzyme design benchmarks, structures filtered with PLACER achieve higher experimental success rates when compared directly to other docking frameworks, achieving 3–5× higher success rates. (62−65) The ability to predict ligand-binding structures without predocking or binding site mutations for catalytic compatibility, as well as its compatibility with other ML packages such as LigandMPNN distinguishes PLACER from other platforms. Overall, PLACER allows downstream ligand-specific sequence optimization before experimental testing (Figure 5). PLACER performs best when the starting backbone has the correct geometry but is dependent upon known ligand structures, has reduced performance on large and flexible cofactors, and is sensitive to receptor backbone displacement.

Figure 5. Timeline of major developments in protein structure prediction (black) and design methodologies (red). Following early innovations such as ROSETTA (1998) and PyRosetta (2010), the field saw nearly two decades of incremental progress before the emergence of transformative A.I.-based models such as AlphaFold2 (2020). Since then, breakthroughs in generative frameworks, including ProteinMPNN, RFDiffusion, and LigandMPNN, have rapidly expanded, marking a shift toward integrated prediction-design pipelines.

Protein Large Language Models and Sequence-Space Design

Click to copy section linkSection link copied!

Protein large language models (pLMs) are trained on an expansive protein sequence data set and learn statistical patterns that reflect physical constraints on folding, stability, and function. In contrast to structure-based approaches, pLMs infer relationships directly from sequence data. This enables structure prediction, functional analysis, and de novo sequence generation without the need for predefined structures.

ESM-2 is a protein language model that is trained by attempting to identify randomly masked amino acids in a protein sequence. By using ESM-2, a multiple sequence alignment (MSA) is not needed to complete structure predictions and a simpler neural architecture can be used. ESMFold is the single-sequence structure predictor that uses the ESM-2 language model. (66) In comparison to other structural predictors, because MSAs are not required, computational costs are significantly decreased and prediction speed improves substantially. ESM-2 showed comparable accuracy to AlphaFold and RoseTTAFold, revealing that unsupervised learning can use evolutionarily related sequences to predict the protein structure at high resolution. Operating within the latent space of ESM-2, ProtFlow is a flow-matching-based framework for generating de novo peptide sequences quickly with comprehensive semantic distribution learning. (67) ProtFlow was fine-tuned on antimicrobial peptides and successfully generated functional molecules that target underrepresented bacterial species.

ProtTrans was a foundational project that established how long and diverse pretraining significantly enhances the performance of pLMs. The project found that small-size supervised pLM embedded models performed similarly to methods that use MSAs. (68) ProGen is a language model that specializes in evolutionary sequence diversity and tunability through metrics that relate to primary sequence similarity, secondary structure accuracy, and conformational energy. (69) This is done by conditioning on keyword and taxonomic tags that relate sequences to cellular components, biological processes, and molecular function. Its successor, ProGen2, was trained on an even larger set of parameters with sequences sourced from genomic, metagenomic, and immune repertoire databases. (70) A feature of ProGen2 is its ability to generate new sequences and predict protein fitness without manual fine-tuning. Motivated by ProGen and similar protein autoregressive language models, ProtGPT2 was developed to generate proteins that are both stable and evolutionarily different from natural proteins. (71)

Language models are also being created to analyze sequences at the genome level. Evo is a multiscale model that can accomplish zero-shot prediction across biomolecule classes with comparable performance to domain-specific language models. (72) Evo can codesign protein–DNA and protein–RNA and successfully generate functional CRISPR-Cas complexes with transposable systems. Motivated by the success of ProTrans, Nucleotide Transformer applies masked language modeling to proteins and can accurately predict the context of nucleotide sequences without supervision. (73) Genome modeling has also expanded to other deep-learning forms, as shown with AlphaGenome. (74) Trained on protein-coding genes, AlphaGenome can perform multimodal prediction, long-sequence context, and base-pair resolution.

Generative A.I.

Generative A.I. has allowed new structural prediction models to be more informative, accurate, and customizable. Boltz-2 is a program that achieves similar structural prediction accuracy to AlphaFold while simultaneously excelling in binding affinity prediction. (75) Traditionally, free-energy perturbation (FEP) is the benchmark for predicting binding affinity; however, its high accuracy comes with increased computational cost. Boltz-2 trains on a diverse set of dynamic models and achieves binding affinity predictions with comparable accuracy to that of FEP while being over 1000 times faster. While structural predictions remain comparable to other models, Boltz-2’s key contribution is its improved ability to predict binding affinity. Other models, such as Chai-1, are reported to have higher accuracy for predicting protein multimer and protein–ligand structures than existing models. (76) To achieve this, Chai-1 was trained on both protein language model embeddings and multiple sequence alignments. Both Chai-1 and Boltz-2 are customizable models, where users can add constraints from experimentation to increase the prediction accuracy. Chroma is a generative model that extends user programmability in structural prediction even further. (77) By reversal of a correlated noise process, the generated structures follow the same distance-scaling patterns seen in real proteins. Users can apply external constraints, such as complex symmetry, predefined substructures, fixed-backbone arrangements, or even fully specified volumetric shapes, to guide the design process.

Generative modeling programs are also being applied to binder design, offering versatility and increased programmability. De novo antibody design is a unique challenge because complementarity-determining regions (CDR) must have an extremely precise binding affinity to target molecules. Germinal uses an antibody-specific language model and has demonstrated strong performance with experimental testing. (78) Bindcraft is compatible with many classes of protein targets, as shown through its success in generating binders for allergens, multidomain nucleases, and cell-surface receptors. (79) Other all-atom models, such as BoltzGen and ODesign allow user programmability when designing binders using features like covalent bonds, binding sites, and structural constraints, extending the functionality to binder design containing nucleic acid targets. (80,81)

Workflows for Model Training and Protein Design

A rapidly growing challenge when applying machine learning to biomolecular modeling is the necessary time and resources required to train large-scale models and complex biomolecular data. Computational workflows have emerged to streamline the protein design process to be automated, efficient, and more easily accessible to novice users. (82) BioNeMo is an open-source software to improve training throughput of A.I. models on GPUs for biomolecular design and drug discovery and has been recently used in algorithmic workflows for both blind docking and API-driven structural prediction. (83) Models such as these encourage individual user contributions to deepen and widen the scope of the current modeling tasks.

ProteinDJ is a specialized workflow for designing proteins on high-performance computing systems (HPC). (84) ProteinDJ has demonstrated that across eight GPUs, it can scale with 86.5% efficiency, substantially reducing the computational time. This pipeline includes tasks such as fold generation, sequence design, and design validation along with tunable features. BinderFlow specializes in de novo binder design and has a multifeature dashboard that provides real-time updates in a web interface. (85) Ovo has a data-driven quality control module, support for community plugins, and predicted-structure validation that uses the expansive ColabDesign library. (86) Each of these workflows supports different sets of programs and can provide automated options for protein design studies.

In Silico Evaluation of Designed Proteins

Click to copy section linkSection link copied!

In silico evaluations filter and rank desired proteins before wet-lab testing to reduce the experimental cost. Predictive tools estimate folding stability, binding affinity, catalytic potential, and solubility or aggregation risk to inform further optimization. Generative methods for sequence optimization are still imperfect since active-site models misrepresent chemical descriptors, folded proteins often fail to generate the intended geometry, and structural contexts can still hamper function through conformational instability or steric clashes. (87) Understanding and diagnosing these failures in a cost-effective way is critical to rapidly improving design reliability.

Static Scoring and Foldability Screening

Protein function is dependent on structural stability. Small destabilizations can diminish or eliminate catalytic function due to aggregation, misfolding, or proteolytic degradation. (88) Misfolded structures expose hydrophobic areas that lead to aggregation or prevent access to active-site residues essential for enzymatic function, as seen in loss-of-function mutations and diseases, such as Alzheimer’s disease and cystic fibrosis. (89,90) Energy-based scoring functions serve as the first structural stability filter, determining whether designed proteins adopt stable, physically plausible conformations. (90,91) ROSETTA’s ref2015 energy function combines van der Waals interactions, electrostatics, solvation, hydrogen bonding, and geometric preferences into a pseudoenergy score through a hybrid physics and knowledge-based framework. (91,92) The performance of static scoring functions like ref2015, however, is context-dependent. Reweighting approaches that blend experimental data with ML models, such as SRS2020, suggest that tailoring parameters to specific interfaces or mutations can improve ΔΔG predictions, outperforming the unmodified score functions.

Structure refinement tools, such as ROSETTA’s FastRelax protocol, are applied prior to scoring to relieve steric strain and optimize hydrogen-bond networks by iteratively repacking side chains and minimizing backbone energy. (92,93) Root-mean-square (RMSD) and quantitative energy comparison analyses between wild-type and mutant structures determine whether the designed proteins are structurally correct and energetically more favorable than other conformations. AlphaFold2-derived confidence metrics, such as predicted Local Distance Difference Test (pLDDT) and Predicted Aligned Error (PAE) matrix, act as proxies for protein flexibility and structural dynamics, offering a parallel “foldability screen” independent of static energy-function assumptions. (94) High average pLDDT scores above 80 correlate strongly with successful expression and folding, while low per-residue pLDDT often reflects flexibility rather than prediction error. The PAE matrix builds on this by quantifying interresidue positional confidence, pinpointing regions that static energy scores miss. (95,96) Though pLDDT and PAE provide static structural checks, neither predict catalytic potential nor do active-site geometry.

Static energy functions capture enthalpic contributions but underrepresent entropy, long-time scale flexibility, and solvent dynamics. Structures that depend on conformational rearrangement or tight geometric constraints cannot be fully captured by evaluations that provide “snapshots”, limiting the generation of more dynamic designs. AlphaFold3 scores protein–protein interactions during structural refinement using an interface-predicted template-modeling score (ipTM) to obtain confidence scores. (97)

Binding Energetics and Interface Quality

Binding affinity determines how effectively an enzyme can recognize and bind to a substrate, cofactor, or binding target. Poorly optimized interfaces result in weak or transient binding, off-target interactions, and a loss of function. The primary metric used to evaluate binding affinity is ΔΔG, which quantifies the free-energy change caused by a point mutation at a protein interface or active site and reflects the relative binding strength of individual residues (Figure 6). This approach has been used in protein engineering to help stabilize Fe(II)/αKG enzymes with ProteinMPNN to enhance thermostability and evolvability during directed evolution (110) and within alanine-scanning studies that mapped cooperative receptor-binding loops in Cry4Aa toxins, which are used in mosquito-targeting pesticides. (111) Further, a recent study of antimeasles virus antibodies used in silico alanine scanning and molecular dynamics (MD) to identify hotspot residue pairs within complementarity-determining regions (CDRs) that jointly influence both binding affinity and thermal stability, revealing an affinity-stability trade-off governed by relative hydropathy at key interaction sites. (112) Together, these methods link amino acid sequence changes to structural and functional outcomes, offering valuable insights into protein engineering and therapeutic design.

Figure 6. Computational strategies for evaluating amino acid sequence perturbations. (A) Structural stability analysis introduces mutations into a sequence and applies ab initio folding to predict conformational shifts, highlighting favorable and unfavorable perturbations. (B) Binding affinity analysis docks protein constituents, incorporates mutations, and estimates changes in binding free energy (ΔΔG) to evaluate the interaction stability. (C) Interface hotspot probing systematically mutates residues at binding interfaces to pinpoint the positions that are most critical for binding energetics.

ROSETTA ΔΔG calculations correlate reasonably with experimental data sets across large mutation libraries, though accuracy drops when backbone models are poorly defined. (89,98) Interface evaluation, however, extends beyond global ΔΔG. Solvent-accessible surface area (SASA), solvation energy, hydrogen-bond networks, and electrostatic forces all contribute to binding dynamics and are accessible through tools such as ROSETTA InterfaceAnalyzer. (99−101) Energetic hotspots can be identified via alanine scanning, which mutates interface residues to alanine, isolating side-chain contributions without disrupting the backbone geometry. (102,103) Alanine works as a neutral substitution due to its small size and nonpolar nature. Sites that show large energetic drops after alanine mutation stand out as binding hotspots that are important for stability or specificity. Proline and noncanonical amino acids (photo cross-linkers, metal coordinators, electrophilic warheads, etc.) have also been used as chemical probes at binding interfaces, capturing covalent interactions and assessing backbone rigidity, though these fall outside standard scoring workflows and need careful parametrization. (104−107) In efforts to standardize scoring across docking, ranking and protein screens, data sets like PDBbind and the CASF benchmark provide a basis for comparing and calibrating score functions. (108,109) Despite these efforts, static interface scoring continues to struggle with long-range electrostatics, solvation effects, and dynamic rearrangements, particularly in ligand-rich and flexible systems where backbone flexibility is directly linked to binding.

Dynamics, Sampling, and the Accuracy-Efficiency Trade-Off

Static “snapshots” produced by protein design tools inaccurately predict catalytic potential by overlooking dynamic movements crucial for stabilizing transition states such as loop openings and domain shifts. Molecular dynamics (MD) simulations can model how proteins shift and fluctuate over time by using Newton’s laws to predict atomic motion (Figure 7). Tools like GROMACS help researchers analyze how loop domains and hydrogen-bond networks change in single proteins and large assemblies over nanosecond to microsecond time scales. (113,114) MD has been used in tRNA-synthetase engineering, revealing how anticodon arms, acceptor stems, and binding loops respond to ncAA mutations and codon recognition. (115,116) To catch rare conformational states, metadynamics and replica-exchange MD (REMD) increase atomic detail by running parallel simulations with different constraints yet come with higher computational cost. (117,118) Course-grained (CG) models offer a useful alternative when system size or time scale makes simulation impractical, trading resolution for the ability to simulate larger assemblies at longer intervals by reducing the atomic detail into grouped sites. CG models capture global flexibility and thermodynamic changes, but cannot determine electronic rearrangements that determine bond formation, proton transfer, or charge redistribution. (119,120) In comparison, QM/MM treats the active site quantum-mechanically, while the surrounding protein is handled with classical force fields (FFs) such as AMBER, CHARMM, GROMOS, and OPLS-AA, which represent the potential energy of a system using mathematical functions and empirically derived parameters. (121−124) Though QM/MM are the most physically realistic options for estimating activation barriers, these frameworks are generally not practical beyond individual structure validation due to the high computational cost and time scales needed.

Figure 7. Computational evaluation of the biological and functional properties of proteins. (A) Molecular dynamics and catalysis simulate mutated proteins in solvated environments to capture conformational flexibility and catalytic changes through trajectory analyses. Hybrid pipelines that integrate molecular dynamics (MD) with ROSETTA and directed evolution have yielded efficient de novo and redesigned biocatalysts, such as HG3.17 and BH32.14, whose catalytic power emerges from MD-guided active-site reorganization and solvent shielding. (B) Solubility analysis predicts the effects of amino acid sequence variation on protein solubility by comparing mutant distributions to wild-type benchmarks. CamSol-based workflows enable the rational optimization of both solubility and conformational stability, as demonstrated for six antibodies (including two approved therapeutics), enhancing developability without compromising binding. (137) (C) Aggregation propensity assesses structural and sequence features to identify residues or motifs that drive aggregation, distinguishing soluble variants from aggregation-prone variants. Using Aggrescan3D, researchers computationally minimized aggregation hotspots to engineer green fluorescent protein (GFP) mutants with significantly improved solubility and reduced aggregation, resulting in a fast-folding, aggregation-resistant variant. (138) Together, these approaches extend computational evaluation to capture dynamic solubility and aggregation behaviors that critically influence protein performance in physiological and industrial contexts.

Empirical Valence Bond (EVB) methods provide a faster approximate alternative to QM/MM and are better suited for screening larger systems. EVB is used to clarify how the spatial arrangement of charged residues around an active site (electrostatic preorganization) lowers the activation energy and stabilizes reaction intermediates. Increased precision captures favorable sympathy conformations that static models are unable to capture acting as a structural template that is “pre-shaped” for the transition state. (115,116,125) Combining QM/MM with MD can further predict how synthetases discriminate between canonical and noncanonical amino acids based on electrostatics, hydrogen bonding, and active-site geometry. Despite their power, there are trade-offs. Due to increased precision, MD and QM are computationally demanding and require large data sets. MD often needs long trajectories because rare motions only appear with enough sampling, so runs can stretch into hundreds of nanoseconds before the system settles. (120) QM/MM accuracy is limited by computational cost and system size, and small FF errors can accumulate over long MD trajectories. (121−124,126)

Solubility and Aggregation Behaviors

Solubility remains one of the more difficult biophysical properties to predict with precision but is a critical filter for the manufacturability and therapeutic viability. A design that folds and binds may still fail during expression or purification. Natural proteins often exhibit poor solubility, limited thermostability, and low expression yields, particularly when reengineered for industrial or therapeutic use. (88,127,128) These constraints motivated the development of computational approaches to optimize the solubility and aggregation resistance. Structure- and sequence-based tools such as Camsol evaluate residue-level solubility using physicochemical features (e.g., hydrophobicity, β-sheet distributions) and suggest stabilizing mutations that do not affect global stability. (129) More recent ML-based tools such as PROTSOLM and GATSol use sequence- and structure-related features and long-range interactions to improve solubility prediction accuracy. (130,131) Another ML tool, soluble MPNN (MPNN_sol), is the product of the ProteinMPNN network being retrained on a data set strictly made up of soluble proteins and can be used for the de novo design of proteins with a low fraction of surface hydrophobics. Despite these advances, solubility alone is insufficient for determining developability.

Amyloid-β shows how exposed hydrophobic areas and β-sheet-prone regions fold on themselves into insoluble complexes, leading to further aggregation. (132,133) To address this, aggregation models use statistical potentials and ML classifiers trained on known amyloidogenic sequences and known aggregation motifs. (133) When introduced early in the design process, frameworks that integrate solubility and aggregation risk metrics improve manufacturability and reduce downstream failure. Data set bias and the inability to capture the energetic contributions of unique structures (e.g., multidomain proteins, (134,135) proteins with noncanonical residues (106,136)) limit the use of these models for systems that must remain stable across variable buffer conditions, pH ranges, and expression systems.

Limitations of In Silico Evaluation

In silico validation enables ranking of protein designs, but no computational framework can predict experimental success with complete accuracy. ΔΔG underestimates entropy and solvent effects, losing accuracy with poorly refined backbones. Alanine and proline scans miss “non-hotspot” interactions and neighborhood residue effects. MD and QM/MM provide mechanistic insight but are computationally costly for high-throughput workflows. For monomeric, well-structured proteins, solubility predictions work well; however, they fall short on multidomain assemblies and intrinsically disordered proteins (IDPs). Aggregation scoring frameworks struggle to differentiate between functional and aggregation-prone β-sheets.

Individually, these tools underrepresent the biophysical properties that determine downstream expression (Table 3). The next step in prediction accuracy is the integration of these tools. More accurate analysis will require the combination of physics-based scoring (e.g., CamSol), ML-guided confidence metrics (e.g., PROTSOLM, GATSol), MD simulation, and experimental data in one framework. Integration of these ML tools with standardized experimental benchmarks such as PDBbind can be used to recalibrate predictions against observed outcomes for better analysis across diverse structural classes. Multilayer assessment with experimental feedback is essential for reducing false negatives during downstream processes and broadening model applicability.

Table 3. Summary of In Silico Protein Design Parameters

metric	purpose	example methods	significance	limitations
structural stability	predict foldability	ROSETTA (ref2015), RMSD, AlphaFold, pLDDT (34,35,90,92)	ensures the designed fold is retained postmutation	static models neglect entropy and conformational flexibility
binding affinity	assess interaction strength	flex ddG, InterfaceAnalyzer, alanine/proline scanning, PDBbind (98,102,104,105,108)	guides interface design and ligand-binding optimization	sensitive to backbone quality and local packing residues
interface hotspot probing	localize key residues	alanine/proline scanning, ncAA probe libraries (e.g., PheCN, Bpa) (103,105,139)	identifies energetic “anchors” and enables targeted mutation design	noncanonical probes may bias geometry or introduce steric clashes
molecular dynamics and catalysis	model flexibility and transition states	MetaDynamics, MD, QM/MM, REMD, EVB (117,121,126,132,140)	reveals loop dynamics and allosteric networks for catalytic preorganization	high computational cost: enhanced methods require expertise and tuning
solubility	predict aggregation or expression risk	CamSol, PROTOSOLM, GATSol (130,131,133,141)	critical for developability, expression, and therapeutic viability	underperforms for IDPs, membrane proteins, or large multichain assemblies
aggregation propensity	identify aggregation-prone regions	Aggrescan3D, β-strand exposure models (132,142−144)	detects amyloid risk, hydrophobic patches	may misclassify functional β-sheets or multimer interfaces

Directed Evolution as a Complement to De Novo Design

Click to copy section linkSection link copied!

Directed evolution (DE) can be a complementary partner to computational design and shows that building de novo proteins is not always efficient or necessary. Due to evolution, natural proteins occupy highly optimized regions of sequence space. (17,145) Success in enzyme design usually occurs when modifying existing functionality. Iterative rounds of mutation and selection can identify substitutions that restore or improve activity when the rational design is insufficient. Experimental variants reveal force field blind spots in active-site geometry that computational models are unable to catch. (12,146,147)

One major limitation with directed evolution is the scale. Experimental libraries typically cover only 10³–10⁶ variants in comparison to the nearly infinite number of possible sequences in sequence space, and most mutations are neutral or unfavorable. (145) Computational tools address this limitation by identifying suitable positions for mutagenesis by mapping sequence entropy and residue interactions. (16) When designs fail because ideal geometries, electrostatics, or loop dynamics are not properly represented, DE acts as a diagnostic tool that provides clarity by distinguishing incorrect hypotheses from structural errors. Quantitative differences between native and designed proteins demonstrate why this feedback loop is important. Naturally occurring enzymes accelerate reactions by more than 10¹²-fold, while most de novo catalysts achieve modest gains at best. (87) The question then becomes whether computational starting scaffolds can achieve native-like efficiency. Computational design can narrow the search space by targeting mutations on native scaffolds, and directed evolution can then test and refine those variants. Repetition of this workflow can create a feedback loop that improves enzyme performance and computational prediction accuracy (Figure 8).

Figure 8. Conceptual framework contrasting traditional and A.I.-assisted directed evolution (DE) workflows. The diagram is divided into two pathways: the upper route represents conventional DE, where (A) natural sequence diversity is explored, (B) mutational libraries are generated, (C) variants are expressed, and (E) high-throughput screening identifies improved candidates through iterative experimental cycles. The lower route introduces A.I./ML-assisted or hybrid methodologies, in which (D) supervised models with uncertainty quantification learn the sequence-fitness landscape and use acquisition functions to propose new variants, balancing exploration (high uncertainty) and exploitation (high predicted fitness). These feedback-driven optimization strategies accelerate variant discovery with a reduced screening effort. Combined approaches, such as active learning-assisted directed evolution (ALDE), have yielded (F) optimized protoglobin-based biocatalysts for nonnative cyclopropanation reactions, enhancing their activity, selectivity, and stability while minimizing experimental costs. (148)

Experimental Validation of AI-Generated Protein Tools

Click to copy section linkSection link copied!

Interactions that determine the fold stability, catalytic efficiency, binding affinity, solubility, and aggregation propensity cannot be captured by in silico measures alone. High-resolution structural assays serve as a quantitative filter between computational predictions and practical utility and are the leading methods for testing the design accuracy (Table 4). X-ray crystallography is considered the gold standard for atomic-resolution and, when paired with prediction software, enables direct comparison between predicted and observed backbone geometry and active-site positioning. For X-ray crystallography success, proteins must crystallize under precise conditions, presenting a significant barrier, as many proteins fail to crystallize or only crystallize after extensive screens, taking weeks to months before high-quality crystals are formed. (149) In addition, crystallography locks proteins into a static lattice, making it difficult to study pH-sensitive states, transient complexes, and post-translational modifications.

Table 4. Experimental Validation Methods for Computational Protein Design

method	measurement	strengths	limitations	example applications
X-ray crystallography	atomic-resolution structural “snapshots”	well-established refinement pipelines	(1) requires crystallization (often challenging/time-consuming)	(1) benchmarking AlphaFold prediction
X-ray crystallography	atomic-resolution structural “snapshots”	well-established refinement pipelines	(2) static lattice limits dynamic studies	(2) validation of active sites (158,159)
Cryo-EM	structural validation of large assemblies and complexes	(1) no crystallization needed	(1) historically lower resolution for small proteins (<100 kDa)	(1) antibody–antigen complexes
		(2) captures transient or unstable complexes	(2) requires advanced processing software	(2) complement to crystallography
		(3) excels at large proteins, complexes, and membrane proteins	(2) requires advanced processing software	(3) ML refinement of maps (151,155,160)
NMR spectroscopy	conformational ensembles, loop dynamics, chemical environment	(1) probes protein, motion in solution	(1) limited to smaller proteins	(1) loop dynamics in catalysis
NMR spectroscopy		(2) reveals catalytic loop mobility and reaction intermediates	(2) requires isotopic labeling; lower spatial resolution than crystals	(2) conformational changes critical for function (152)
hybrid approaches	integrated models combining experimental and computational restraints	combines ML predictions (AlphaFold/ROSETTA) with sparse restraints (XL-MS, cryo-EM maps, covalent labeling)	requires careful alignment of computational and experimental data sets	refinement of protein–protein interfaces and complexes via XL-MS and AlphaFold/ROSETTA (154,155)

Cryo-electron microscopy (cryo-EM) has expanded validation to systems that do not readily crystallize, accurately capturing large assemblies, membrane proteins, and antibody–antigen complexes. (150) Historically, resolution for smaller proteins has been weaker using Cryo-EM; however, advances in reconstruction algorithms continue to narrow this gap. (149,151) Nuclear magnetic resonance (NMR) analyzes protein dynamics in solution. Loop movements that determine substrate entry, product release, and catalytic residue positioning are critical for enzymatic turnover. (152) NMR structures show local and global flexibility via structural ensembles and intermediates that govern catalysis by balancing these features. Flexibility measures are important as active-site geometry depends on specific motions and dynamic local environments. Recent deep-learning-assisted assignment of side chains and dynamics have extended the applicability of NMR to larger proteins and functional sites. (153)

Hybrid approaches integrate experimental restraints directly into the modeling. Cross-linking mass spectrometry (XL-MS) and covalent labeling improve the predictions at protein–protein interfaces. Iterative rebuilding of AlphaFold or ROSETTA models against cryo-EM density improves prediction accuracy beyond what either approach achieves alone. (154,155) Even limited experimental input can significantly improve structural precision, showing that minimal experimental data can assist computational platforms in generating biologically realistic models. Experimental validation also plays a very important diagnostic role. When predicted and observed structures do not agree, discrepancies identify whether failure is a result of incorrect backbone generation, flawed scoring metrics, or missing catalytic features. (87) Incorporating experimental data improves subsequent design cycles by updating the energetic parameters. Experimental validation functions can be the final checkpoint and empirical layer that closes the loop between the hypothesis and function. Computational tools map the candidate space, the structural and functional assays determine which designs are physically feasible, and experimental feedback improves the design of engineered proteins and the predictive frameworks used to develop them (Figure 9).

Figure 9. Experimental–computational pipeline for protein engineering. (A) Protein mutant libraries are generated by introducing sequence variations across the regions of interest. AlphaFold-guided domain-motif design (e.g., FBXO23-STX1B) has revealed novel regulatory interfaces relevant to therapeutic target discovery. (155) (B) Mutants are expressed in *Escherichia coli*, yeast, or mammalian systems to generate protein ensembles for screening; such expression-labeling pipelines support enzyme and biocatalyst development used in pharmaceuticals and green chemistry. (154) (C) Structural characterization via cryo-EM, NMR, and X-ray crystallography (and hybrid methods) refined with predictive models such as AlphaFold2 or ROSETTA resolves folding and conformational dynamics, as shown in ribosomal complex refinements in NMR-ROSETTA modeling of ubiquitin. (156,157) (D) Functional screening evaluates the activity and binding properties of mutant sets, such as hydroxyl radical footprinting, to identify active-site or interface residues that control activity and stability, as applied to Hsp90-co-chaperone systems and engineered oxidoreductase. (154) (E) Finally, A.I./ML integration combines experimental data with modeling to predict next-generation variants, accelerating industrial enzyme design, antibody optimization, and biosensor development.

Limitations and Future Directions in Computational Protein Design

Computational protein design has made significant advances over the years. In silico measures follow a hierarchy that adds physical realism at each layer at the cost of more computational resources. Because of this, most workflows use these methods selectively instead of applying them all. The process begins with statistics-based scoring for foldability and then moves to ensemble ΔΔG calculations for interface refinement. Frameworks then extend to atomistic and coarse-grained MD for dynamic motion and finally reach QM/MM for precise chemical detail. However, overlapping challenges remain across foundational frameworks and diffusion-based generative models. Static models cannot fully capture active-site geometry and side-chain constraints, which limits enzyme development. (94) Most algorithms focus on fixed protein backbones or on single low-energy conformations but overlook the dynamic rearrangements proteins undergo in solution over time. In addition, natural enzymes use cofactors, metals, and post-translational modifications that most design workflows cannot accommodate. Generative models produce stable folds and sequences but primarily focus on canonical amino acids and small-molecule ligands, thereby missing key noncanonical chemical descriptors and energy parameters. Even highly parametrized energy functions require custom weights for noncanonical amino acid and rare cofactors that balance the interactions between physics-based terms in nonnative binding pockets. (92) These constraints limit progress in orthogonal translation systems, metalloenzymes, and covalent inhibitors, all of which rely on the precise modeling of chemical interactions.

At the functional level, designed enzymes lag far behind natural catalysts as de novo designs often show catalytic activity far lower than that of natural enzymes. In many cases, designed proteins exhibit the same efficiency as catalytic antibodies created decades ago. (87) Across the entire pipeline, there are many opportunities for failure. Incorrect active-site hypotheses in the early stages, incorrect geometry during scaffold development, and missing effects during characterization all lead to imprecise predictions. Force fields cannot accurately parametrize multiresidue networks, as seen in natural enzymes, because of a reliance on minimal active-site motifs to discriminate between similar transition states. (94) In downstream processes, proteins that appear stable in upstream in silico filters can still aggregate, misfold, or cause immunogenic responses in vivo.

Improvements in benchmarking and standardization are still needed in the field. Inconsistent metrics make direct comparisons across platforms difficult. Reproducible algorithms and standard community-wide data sets would enable high-throughput enzyme development and evaluation of novel methods. This parallel optimization helps separate real improvements from data-set-specific results. Integrating diffusion-based backbone generators with ligand-aware sequence optimizers while considering all-atom parameters presents a promising route to efficient, high-throughput enzyme and binder design. Expanding this precision and speed to RNA and DNA constructs, orthogonal tRNA-ncAA-synthetase systems, and covalent inhibitors allows the creation of full computational-to-experimental pipelines from first principles. Combining generative approaches with hybrid ML methods and closed-loop experimental feedback ultimately enables frameworks that enable the directed evolution of de novo enzymes at speeds and with precision not yet seen or readily adopted in the chemical engineering field.

Author Information

Click to copy section linkSection link copied!

Corresponding Author
- Blaise R. Kimmel - Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W Woodruff Avenue, Columbus, Ohio 43210, United States; Center for Cancer Engineering, Ohio State University Comprehensive Cancer Center, The Ohio State University, 2255 Kenny Road, Columbus, Ohio 43210, United States; Pelotonia Institute for Immuno-Oncology, Ohio State University Comprehensive Cancer Center, The Ohio State University, 2255 Kenny Road, Columbus, Ohio 43210, United States; https://orcid.org/0000-0002-9582-9887; Email: [email protected]
Authors
- Joseph S. Bailey Jr. - Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W Woodruff Avenue, Columbus, Ohio 43210, United States
- Søren C. Spina - Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W Woodruff Avenue, Columbus, Ohio 43210, United States
- Andrew Hu - College of Medicine, The Ohio State University, 460 W 10th Avenue, Columbus, Ohio 43210, United States
- Nathan Phan - Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W Woodruff Avenue, Columbus, Ohio 43210, United States
- Rachel B. Getman - Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W Woodruff Avenue, Columbus, Ohio 43210, United States; https://orcid.org/0000-0003-0755-0534
Author Contributions
J.S.B. Jr.: Wrote the original draft of the manuscript, generated all figures and graphics for the manuscript, edited, revised, and approved the final version of the manuscript; S.C.S., A.H., and N.P.: supported the generation of graphics and writing for the manuscript; R.G.: edited, revised, and approved the final version of the manuscript; B.R.K.: wrote the original draft of the manuscript, edited, revised, and approved the final version of the manuscript, and acquired funding to support the work. CRediT: Joseph S. Bailey Jr. conceptualization, data curation, formal analysis, investigation, methodology, writing - original draft, writing - review & editing; Søren Spina visualization, writing - original draft, writing - review & editing; Andrew Hu visualization, writing - original draft, writing - review & editing; Nathan Phan visualization, writing - original draft, writing - review & editing; Rachel B. Getman funding acquisition, writing - review & editing; Blaise R. Kimmel conceptualization, data curation, formal analysis, funding acquisition, investigation, methodology, project administration, supervision, validation, visualization, writing - original draft, writing - review & editing.
Funding
We gratefully thank the Ohio State University Comprehensive Cancer Center (OSUCCC), OSUCCC Center for Cancer, and the Department of Chemical and Biomolecular Engineering at The Ohio State University for support of this work. B.R.K. acknowledges financial support from the Prostate Cancer Foundation Young Investigator Award.
Notes
The authors declare no competing financial interest.

Acknowledgments

Click to copy section linkSection link copied!

This work was supported in part by The Ohio State University Center for Cancer Engineering─Curing Cancer Through Research in Engineering and Sciences. B.R.K. acknowledges financial support from the Prostate Cancer Foundation Young Investigator Award. We acknowledge the use of PaperPal and Grammarly as AI tools to modify the grammar, phrasing, and sentence structure while writing this review. Each author takes full responsibility for the manuscript’s content.

References

Click to copy section linkSection link copied!

This article references 160 other publications.

1
Yu, Y.; Hu, C.; Xia, L.; Wang, J. Artificial Metalloenzyme Design with Unnatural Amino Acids and Non-Native Cofactors. ACS Catal. 2018, 8, 1851– 1863, DOI: 10.1021/acscatal.7b03754
Google Scholar
There is no corresponding record for this reference.
2
Mirts, E. N.; Bhagi-Damodaran, A.; Lu, Y. Understanding and Modulating Metalloenzymes with Unnatural Amino Acids, Non-Native Metal Ions, and Non-Native Metallocofactors. Acc. Chem. Res. 2019, 52, 935– 944, DOI: 10.1021/acs.accounts.9b00011
Google Scholar
There is no corresponding record for this reference.
3
Mann, S. I.; Nayak, A.; Gassner, G. T.; Therien, M. J.; DeGrado, W. F. De Novo Design, Solution Characterization, and Crystallographic Structure of an Abiological Mn–Porphyrin-Binding Protein Capable of Stabilizing a Mn(V) Species. J. Am. Chem. Soc. 2021, 143, 252– 259, DOI: 10.1021/jacs.0c10136
Google Scholar
There is no corresponding record for this reference.
4
Bergman, M. T.; Xiao, X.; Hall, C. K. In Silico Design and Analysis of Plastic-Binding Peptides. J. Phys. Chem. B 2023, 127, 8370– 8381, DOI: 10.1021/acs.jpcb.3c04319
Google Scholar
There is no corresponding record for this reference.
5
García-Moreno, P. J. Recent advances in the production of emulsifying peptides with the aid of proteomics and bioinformatics. Curr. Opin. Food Sci. 2023, 51, 101039 DOI: 10.1016/j.cofs.2023.101039
Google Scholar
There is no corresponding record for this reference.
6
Ndochinwa, G. O.; Wang, Q. Y.; Okoro, N. O. New advances in protein engineering for industrial applications: Key takeaways. Open Life Sci. 2024, 19, 20220856 DOI: 10.1515/biol-2022-0856
Google Scholar
There is no corresponding record for this reference.
7
Marcos, E.; Silva, D. Essentials of de novo protein design: Methods and applications. WIREs Comput. Mol. Sci. 2018, 8 (6), e1374 DOI: 10.1002/wcms.1374
Google Scholar
There is no corresponding record for this reference.
8
Huang, P.-S.; Boyken, S. E.; Baker, D. The coming of age of de novo protein design. Nature 2016, 537, 320– 327, DOI: 10.1038/nature19946
Google Scholar
There is no corresponding record for this reference.
9
Woolfson, D. N. A Brief History of De Novo Protein Design: Minimal, Rational, and Computational. J. Mol. Biol. 2021, 433, 167160 DOI: 10.1016/j.jmb.2021.167160
Google Scholar
There is no corresponding record for this reference.
10
Dill, K. A.; Ozkan, S. B.; Shell, M. S.; Weikl, T. R. Protein Folding Problem. Annu. Rev. Biophys. 2008, 37, 289– 316, DOI: 10.1146/annurev.biophys.37.092707.153558
Google Scholar
There is no corresponding record for this reference.
11
Kocher, C. D.; Dill, K. A. Origins of life: The Protein Folding Problem all over again?. Proc. Natl. Acad. Sci. U. S. A. 2024, 121, e2315000121 DOI: 10.1073/pnas.2315000121
Google Scholar
There is no corresponding record for this reference.
12
Chen, S.-J.; Hassan, M.; Jernigan, R. L. Protein folds vs. protein folding: Differing questions, different challenges. Proc. Natl. Acad. Sci. U. S. A. 2023, 120, e2214423119 DOI: 10.1073/pnas.2214423119
Google Scholar
There is no corresponding record for this reference.
13
Kiss, G.; Çelebi-Ölçüm, N.; Moretti, R.; Baker, D.; Houk, K. N. Computational Enzyme Design. Angew. Chem., Int. Ed. 2013, 52, 5700– 5725, DOI: 10.1002/anie.201204077
Google Scholar
There is no corresponding record for this reference.
14
Ille, A. M.; Anas, E.; Mathews, M. B.; Burley, S. K. From sequence to protein structure and conformational dynamics with artificial intelligence/machine learning. Struct. Dyn. 2025, 12, 030902 DOI: 10.1063/4.0000765
Google Scholar
There is no corresponding record for this reference.
15
Anfinsen, C. B. Principles that Govern the Folding of Protein Chains. Science 1973, 181, 223– 230, DOI: 10.1126/science.181.4096.223
Google Scholar
There is no corresponding record for this reference.
16
Voigt, C. A.; Mayo, S. L.; Arnold, F. H.; Wang, Z.-G. Computational method to reduce the search space for directed protein evolution. Proc. Natl. Acad. Sci. U. S. A. 2001, 98, 3778– 3783, DOI: 10.1073/pnas.051614498
Google Scholar
There is no corresponding record for this reference.
17
Kuhlman, B.; Baker, D. Native protein sequences are close to optimal for their structures. Proc. Natl. Acad. Sci. U. S. A. 2000, 97, 10383– 10388, DOI: 10.1073/pnas.97.19.10383
Google Scholar
There is no corresponding record for this reference.
18
Sleator, R. D. Solving the protein folding problem···. FEBS Lett. 2024, 598, 2831– 2835, DOI: 10.1002/1873-3468.15043
Google Scholar
There is no corresponding record for this reference.
19
Watson, J. L.; Juergens, D.; Bennett, N. R. De novo design of protein structure and function with RFdiffusion. Nature 2023, 620, 1089– 1100, DOI: 10.1038/s41586-023-06415-8
Google Scholar
There is no corresponding record for this reference.
20
Leveson-Gower, R. B. Designing Enzymatic Reactivity with an Expanded Palette. ChemBioChem 2025, 26, e202500076 DOI: 10.1002/cbic.202500076
Google Scholar
There is no corresponding record for this reference.
21
Hartman, M. C. T. Non-canonical Amino Acid Substrates of Escherichia coli Aminoacyl-tRNA Synthetases. ChemBioChem 2022, 23, e202100299 DOI: 10.1002/cbic.202100299
Google Scholar
There is no corresponding record for this reference.
22
Zhang, G.; Liu, C.; Lu, J.; Zhang, S.; Zhu, L. The Role of AI-Driven De Novo Protein Design in the Exploration of the Protein Functional Universe. Biology 2025, 14, 1268 DOI: 10.3390/biology14091268
Google Scholar
There is no corresponding record for this reference.
23
Rohl, C. A.; Strauss, C. E. M.; Misura, K. M. S.; Baker, D. Protein Structure Prediction Using Rosetta. Methods Enzymol. 2004, 383, 66– 93, DOI: 10.1016/S0076-6879(04)83004-0
Google Scholar
There is no corresponding record for this reference.
24
Simons, K. T.; Kooperberg, C.; Huang, E.; Baker, D. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and bayesian scoring functions. J. Mol. Biol. 1997, 268, 209– 225, DOI: 10.1006/jmbi.1997.0959
Google Scholar
There is no corresponding record for this reference.
25
Leman, J. K.; Weitzner, B. D.; Lewis, S. M. Macromolecular modeling and design in Rosetta: recent methods and frameworks. Nat. Methods 2020, 17, 665– 680, DOI: 10.1038/s41592-020-0848-2
Google Scholar
There is no corresponding record for this reference.
26
Das, R.; Baker, D. Macromolecular Modeling with Rosetta. Annu. Rev. Biochem. 2008, 77, 363– 382, DOI: 10.1146/annurev.biochem.77.062906.171838
Google Scholar
There is no corresponding record for this reference.
27
Kaufmann, K. W.; Meiler, J. Using RosettaLigand for Small Molecule Docking into Comparative Models. PLoS One 2012, 7, e50769 DOI: 10.1371/journal.pone.0050769
Google Scholar
There is no corresponding record for this reference.
28
Lemmon, G.; Kaufmann, K.; Meiler, J. Prediction of HIV-1 Protease/Inhibitor Affinity using RosettaLigand. Chem. Biol. Drug Des. 2012, 79, 888– 896, DOI: 10.1111/j.1747-0285.2012.01356.x
Google Scholar
There is no corresponding record for this reference.
29
Chaudhury, S.; Lyskov, S.; Gray, J. J. PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics 2010, 26, 689– 691, DOI: 10.1093/bioinformatics/btq007
Google Scholar
There is no corresponding record for this reference.
30
Le, K. H.; Adolf-Bryfogle, J.; Klima, J. C. PyRosetta Jupyter Notebooks Teach Biomolecular Structure Prediction and Design. Biophysicist 2021, 2, 108– 122, DOI: 10.35459/tbp.2019.000147
Google Scholar
There is no corresponding record for this reference.
31
Ford, A. S.; Weitzner, B. D.; Bahl, C. D. Integration of the Rosetta suite with the python software stack via reproducible packaging and core programming interfaces for distributed simulation. Protein Sci. 2020, 29, 43– 51, DOI: 10.1002/pro.3721
Google Scholar
There is no corresponding record for this reference.
32
Van Stappen, C.; Deng, Y.; Liu, Y. Designing Artificial Metalloenzymes by Tuning of the Environment beyond the Primary Coordination Sphere. Chem. Rev. 2022, 122, 11974– 12045, DOI: 10.1021/acs.chemrev.2c00106
Google Scholar
There is no corresponding record for this reference.
33
Tivon, B.; Wiese, J.; Müller, M. P. Computational Design of Lysine Targeting Covalent Binders Using Rosetta. J. Chem. Inf. Model. 2025, 65, 5612– 5622, DOI: 10.1021/acs.jcim.5c00212
Google Scholar
There is no corresponding record for this reference.
34
Jumper, J.; Evans, R.; Pritzel, A. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583– 589, DOI: 10.1038/s41586-021-03819-2
Google Scholar
There is no corresponding record for this reference.
35
Tunyasuvunakool, K.; Adler, J.; Wu, Z. Highly accurate protein structure prediction for the human proteome. Nature 2021, 596, 590– 596, DOI: 10.1038/s41586-021-03828-1
Google Scholar
There is no corresponding record for this reference.
36
Kryshtafovych, A.; Schwede, T.; Topf, M.; Fidelis, K.; Moult, J. Critical assessment of methods of protein structure prediction (CASP)─Round XIV. Proteins:Struct., Funct., Bioinf. 2021, 89, 1607– 1617, DOI: 10.1002/prot.26237
Google Scholar
There is no corresponding record for this reference.
37
Kovalevskiy, O.; Mateos-Garcia, J.; Tunyasuvunakool, K. AlphaFold two years on: Validation and impact. Proc. Natl. Acad. Sci. U. S. A. 2024, 121, e2315002121 DOI: 10.1073/pnas.2315002121
Google Scholar
There is no corresponding record for this reference.
38
Schneider, B.; Sweeney, B. A.; Bateman, A. When will RNA get its AlphaFold moment?. Nucleic Acids Res. 2023, 51, 9522– 9532, DOI: 10.1093/nar/gkad726
Google Scholar
There is no corresponding record for this reference.
39
Terwilliger, T. C.; Liebschner, D.; Croll, T. I. AlphaFold predictions are valuable hypotheses and accelerate but do not replace experimental structure determination. Nat. Methods 2024, 21, 110– 116, DOI: 10.1038/s41592-023-02087-4
Google Scholar
There is no corresponding record for this reference.
40
Mirdita, M.; Schütze, K.; Moriwaki, Y. ColabFold: making protein folding accessible to all. Nat. Methods 2022, 19, 679– 682, DOI: 10.1038/s41592-022-01488-1
Google Scholar
There is no corresponding record for this reference.
41
Kim, G.; Lee, S.; Levy Karin, E. Easy and accurate protein structure prediction using ColabFold. Nat. Protoc. 2025, 20, 620– 642, DOI: 10.1038/s41596-024-01060-5
Google Scholar
There is no corresponding record for this reference.
42
Kalogeropoulos, K.; Bohn, M. F.; Jenkins, D. E. A comparative study of protein structure prediction tools for challenging targets: Snake venom toxins. Toxicon 2024, 238, 107559 DOI: 10.1016/j.toxicon.2023.107559
Google Scholar
There is no corresponding record for this reference.
43
Baek, M.; DiMaio, F.; Anishchenko, I. Accurate prediction of protein structures and interactions using a three-track neural network. Science 2021, 373, 871– 876, DOI: 10.1126/science.abj8754
Google Scholar
There is no corresponding record for this reference.
44
Baek, M.; McHugh, R.; Anishchenko, I. Accurate prediction of protein–nucleic acid complexes using RoseTTAFoldNA. Nat. Methods 2024, 21, 117– 121, DOI: 10.1038/s41592-023-02086-5
Google Scholar
There is no corresponding record for this reference.
45
Krishna, R.; Wang, J.; Ahern, W. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 2024, 384, eadl2528 DOI: 10.1126/science.adl2528
Google Scholar
There is no corresponding record for this reference.
46
Liu, S.; Wu, K.; Chen, C. Obtaining protein foldability information from computational models of AlphaFold2 and RoseTTAFold. Comput. Struct. Biotechnol. J. 2022, 20, 4481– 4489, DOI: 10.1016/j.csbj.2022.08.034
Google Scholar
There is no corresponding record for this reference.
47
Wayment-Steele, H. K.; Ojoawo, A.; Otten, R. Predicting multiple conformations via sequence clustering and AlphaFold2. Nature 2024, 625, 832– 839, DOI: 10.1038/s41586-023-06832-9
Google Scholar
There is no corresponding record for this reference.
48
Casadevall, G.; Duran, C.; Osuna, S. AlphaFold2 and Deep Learning for Elucidating Enzyme Conformational Flexibility and Its Application for Design. JACS Au 2023, 3, 1554– 1562, DOI: 10.1021/jacsau.3c00188
Google Scholar
There is no corresponding record for this reference.
49
Vallejo, W.; Díaz-Uribe, C.; Fajardo, C. Google Colab and Virtual Simulations: Practical e-Learning Tools to Support the Teaching of Thermodynamics and to Introduce Coding to Students. ACS Omega 2022, 7, 7421– 7429, DOI: 10.1021/acsomega.2c00362
Google Scholar
There is no corresponding record for this reference.
50
Adiyaman, R.; Edmunds, N. S.; Genc, A. G.; Alharbi, S. M. A.; McGuffin, L. J. Improvement of protein tertiary and quaternary structure predictions using the ReFOLD refinement method and the AlphaFold2 recycling process. Bioinforma. Adv. 2023, 3 (1), vbad078 DOI: 10.1093/bioadv/vbad078
Google Scholar
There is no corresponding record for this reference.
51
Ahern, W.; Yim, J.; Tischer, D. Atom level enzyme active site scaffolding using RFdiffusion2. Nat. Methods 2026, 23, 96– 105, DOI: 10.1038/s41592-025-02975-x
Google Scholar
There is no corresponding record for this reference.
52
Wang, W.; Feng, C.; Han, R. trRosettaRNA: automated prediction of RNA 3D structure with transformer network. Nat. Commun. 2023, 14, 7266 DOI: 10.1038/s41467-023-42528-4
Google Scholar
There is no corresponding record for this reference.
53
Corso, G.; Stärk, H.; Jing, B.; Barzilay, R.; Jaakkola, T. DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking, arXiv:2210.01776. arXiv.org e-Print archive. https://arxiv.org/abs/2210.01776. 2023.
Google Scholar
There is no corresponding record for this reference.
54
Alamdari, S. Protein generation with evolutionary diffusion: sequence is all you need. Preprint at https://doi.org/10.1101/2023.09.11.556673. 2023.
Google Scholar
There is no corresponding record for this reference.
55
Chu, A. E.; Kim, J.; Cheng, L. An all-atom protein generative model. Proc. Natl. Acad. Sci. U. S. A. 2024, 121, e2311500121 DOI: 10.1073/pnas.2311500121
Google Scholar
There is no corresponding record for this reference.
56
Dauparas, J. Robust deep learning based protein sequence design using ProteinMPNN.
Google Scholar
There is no corresponding record for this reference.
57
Sumida, K. H.; Núñez-Franco, R.; Kalvet, I. Improving Protein Expression, Stability, and Function with ProteinMPNN. J. Am. Chem. Soc. 2024, 146, 2054– 2061, DOI: 10.1021/jacs.3c10941
Google Scholar
There is no corresponding record for this reference.
58
De Haas, R. J.; Brunette, N.; Goodson, A. Rapid and automated design of two-component protein nanomaterials using ProteinMPNN. Proc. Natl. Acad. Sci. U. S. A. 2024, 121, e2314646121 DOI: 10.1073/pnas.2314646121
Google Scholar
There is no corresponding record for this reference.
59
Dauparas, J.; Lee, G. R.; Pecoraro, R. Atomic context-conditioned protein sequence design using LigandMPNN. Nat. Methods 2025, 22, 717– 723, DOI: 10.1038/s41592-025-02626-1
Google Scholar
There is no corresponding record for this reference.
60
Clark-Elsayed, A. Comparing LigandMPNN and Directed Evolution for Altering the Effector-Binding Site in the RamR Transcription Factor.
Google Scholar
There is no corresponding record for this reference.
61
An, L.; Said, M.; Tran, L. Binding and sensing diverse small molecules using shape-complementary pseudocycles. Science 2024, 385, 276– 282, DOI: 10.1126/science.adn3780
Google Scholar
There is no corresponding record for this reference.
62
Agu, P. C.; Afiukwa, C. A.; Orji, O. U. Molecular docking as a tool for the discovery of molecular targets of nutraceuticals in diseases management. Sci. Rep. 2023, 13, 13398 DOI: 10.1038/s41598-023-40160-2
Google Scholar
There is no corresponding record for this reference.
63
Anishchenko, I. Modeling protein-small molecule conformational ensembles with ChemNet. Preprint at https://doi.org/10.1101/2024.09.25.614868. 2024.
Google Scholar
There is no corresponding record for this reference.
64
Lauko, A.; Pellock, S. J.; Sumida, K. H. Computational design of serine hydrolases. Science 2025, 388, eadu2454 DOI: 10.1126/science.adu2454
Google Scholar
There is no corresponding record for this reference.
65
Park, H.; Zhou, G.; Baek, M.; Baker, D.; DiMaio, F. Force Field Optimization Guided by Small Molecule Crystal Lattice Data Enables Consistent Sub-Angstrom Protein–Ligand Docking. J. Chem. Theory Comput. 2021, 17, 2000– 2010, DOI: 10.1021/acs.jctc.0c01184
Google Scholar
There is no corresponding record for this reference.
66
Garcia, M.; Dixit, S. M.; Rocklin, G. J. Evaluating zero-shot prediction of protein design success by AlphaFold, ESMFold, and ProteinMPNN.
Google Scholar
There is no corresponding record for this reference.
67
Kong, Z. ProtFlow: Flow Matching-based Protein Sequence Design with Comprehensive Protein Semantic Distribution Learning and High-quality Generation.
Google Scholar
There is no corresponding record for this reference.
68
Elnaggar, A.; Heinzinger, M.; Dallago, C. ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 7112– 7127, DOI: 10.1109/TPAMI.2021.3095381
Google Scholar
There is no corresponding record for this reference.
69
Madani, A. ProGen: Language Modeling for Protein Generation, arXiv:2004.03497. arXiv.org e-Print archive. https://arxiv.org/abs/2004.03497. 2020.
Google Scholar
There is no corresponding record for this reference.
70
Nijkamp, E.; Ruffolo, J. A.; Weinstein, E. N.; Naik, N.; Madani, A. ProGen2: Exploring the boundaries of protein language models. Cell Syst. 2023, 14, 968– 978.e3, DOI: 10.1016/j.cels.2023.10.002
Google Scholar
There is no corresponding record for this reference.
71
Ferruz, N.; Schmidt, S.; Höcker, B. ProtGPT2 is a deep unsupervised language model for protein design. Nat. Commun. 2022, 13, 4348 DOI: 10.1038/s41467-022-32007-7
Google Scholar
There is no corresponding record for this reference.
72
Nguyen, E.; Poli, M.; Durrant, M. G. Sequence modeling and design from molecular to genome scale with Evo. Science 2024, 386, eado9336 DOI: 10.1126/science.ado9336
Google Scholar
There is no corresponding record for this reference.
73
Dalla-Torre, H.; Gonzalez, L.; Mendoza-Revilla, J. Nucleotide Transformer: building and evaluating robust foundation models for human genomics. Nat. Methods 2025, 22, 287– 297, DOI: 10.1038/s41592-024-02523-z
Google Scholar
There is no corresponding record for this reference.
74
Avsec, Ž.; Latysheva, N.; Cheng, J. Advancing regulatory variant effect prediction with AlphaGenome. Nature 2026, 649, 1206– 1218, DOI: 10.1038/s41586-025-10014-0
Google Scholar
There is no corresponding record for this reference.
75
Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction.
Google Scholar
There is no corresponding record for this reference.
76
Chai Discovery. Chai-1: Decoding the molecular interactions of life. Preprint at https://doi.org/10.1101/2024.10.10.615955. 2024.
Google Scholar
There is no corresponding record for this reference.
77
Ingraham, J. B.; Baranov, M.; Costello, Z. Illuminating protein space with a programmable generative model. Nature 2023, 623, 1070– 1078, DOI: 10.1038/s41586-023-06728-8
Google Scholar
There is no corresponding record for this reference.
78
Mille-Fragoso, L. S. Efficient generation of epitope-targeted de novo antibodies with Germinal.
Google Scholar
There is no corresponding record for this reference.
79
Pacesa, M.; Nickel, L.; Schellhaas, C. One-shot design of functional protein binders with BindCraft. Nature 2025, 646, 483– 492, DOI: 10.1038/s41586-025-09429-6
Google Scholar
There is no corresponding record for this reference.
80
BoltzGen: Toward Universal Binder Design.
Google Scholar
There is no corresponding record for this reference.
81
Zhang, O. ODesign: A World Model for Biomolecular Interaction Design, arXiv:2510.22304. arXiv.org e-Print archive. https://arxiv.org/abs/2510.22304. 2025.
Google Scholar
There is no corresponding record for this reference.
82
Parks, M. Blind Virtual Screening at Scale: A Scalable End-to-End Pipeline for Blind Docking and Affinity Prediction.
Google Scholar
There is no corresponding record for this reference.
83
John, P. S. BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery, arXiv:2411.10548. arXiv.org e-Print archive. https://arxiv.org/abs/2411.10548. 2025.
Google Scholar
There is no corresponding record for this reference.
84
Silke, D.; Iskander, J.; Pan, J. ProteinDJ : A high-performance and modular protein design pipeline. Protein Sci. 2026, 35, e70464 DOI: 10.1002/pro.70464
Google Scholar
There is no corresponding record for this reference.
85
González-Rodríguez, N.; Chacón-Sánchez, C.; Llorca, O.; Fernández-Leiro, R. Automated and modular protein binder design with BinderFlow. PLOS Comput. Biol. 2025, 21, e1013747 DOI: 10.1371/journal.pcbi.1013747
Google Scholar
There is no corresponding record for this reference.
86
Danny, B. Ovo, an Open-Source Ecosystem for De Novo Protein Design.
Google Scholar
There is no corresponding record for this reference.
87
Baker, D. An exciting but challenging road ahead for computational enzyme design. Protein Sci. 2010, 19, 1817– 1819, DOI: 10.1002/pro.481
Google Scholar
There is no corresponding record for this reference.
88
Beadle, B. M.; Shoichet, B. K. Structural Bases of Stability–function Tradeoffs in Enzymes. J. Mol. Biol. 2002, 321, 285– 296, DOI: 10.1016/S0022-2836(02)00599-5
Google Scholar
There is no corresponding record for this reference.
89
Barlow, K. A.; Conchúir, S. Ó.; Thompson, S. Flex ddG: Rosetta Ensemble-Based Estimation of Changes in Protein–Protein Binding Affinity upon Mutation. J. Phys. Chem. B 2018, 122, 5389– 5399, DOI: 10.1021/acs.jpcb.7b11367
Google Scholar
There is no corresponding record for this reference.
90
Shringari, S. R.; Giannakoulias, S.; Ferrie, J. J.; Petersson, E. J. Rosetta custom score functions accurately predict ΔΔG of mutations at protein–protein interfaces using machine learning. Chem. Commun. 2020, 56, 6774– 6777, DOI: 10.1039/D0CC01959C
Google Scholar
There is no corresponding record for this reference.
91
Smith, S. T.; Meiler, J. Assessing multiple score functions in Rosetta for drug discovery. PLoS One 2020, 15, e0240450 DOI: 10.1371/journal.pone.0240450
Google Scholar
There is no corresponding record for this reference.
92
Alford, R. F.; Leaver-Fay, A.; Jeliazkov, J. R. The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design. J. Chem. Theory Comput. 2017, 13, 3031– 3048, DOI: 10.1021/acs.jctc.7b00125
Google Scholar
There is no corresponding record for this reference.
93
Tyka, M. D.; Keedy, D. A.; André, I. Alternate States of Proteins Revealed by Detailed Energy Landscape Mapping. J. Mol. Biol. 2011, 405, 607– 618, DOI: 10.1016/j.jmb.2010.11.008
Google Scholar
There is no corresponding record for this reference.
94
Planas-Iglesias, J.; Marques, S. M.; Pinto, G. P. Computational design of enzymes for biotechnological applications. Biotechnol. Adv. 2021, 47, 107696 DOI: 10.1016/j.biotechadv.2021.107696
Google Scholar
There is no corresponding record for this reference.
95
Guo, H.-B.; Perminov, A.; Bekele, S. AlphaFold2 models indicate that protein sequence determines both structure and dynamics. Sci. Rep. 2022, 12, 10696 DOI: 10.1038/s41598-022-14382-9
Google Scholar
There is no corresponding record for this reference.
96
Agarwal, V.; McShan, A. C. The power and pitfalls of AlphaFold2 for structure prediction beyond rigid globular proteins. Nat. Chem. Biol. 2024, 20, 950– 959, DOI: 10.1038/s41589-024-01638-w
Google Scholar
There is no corresponding record for this reference.
97
Abramson, J.; Adler, J.; Dunger, J. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024, 630, 493– 500, DOI: 10.1038/s41586-024-07487-w
Google Scholar
There is no corresponding record for this reference.
98
Friedland, G. D.; Linares, A. J.; Smith, C. A.; Kortemme, T. A Simple Model of Backbone Flexibility Improves Modeling of Side-chain Conformational Variability. J. Mol. Biol. 2008, 380, 757– 774, DOI: 10.1016/j.jmb.2008.05.006
Google Scholar
There is no corresponding record for this reference.
99
Kellogg, E. H.; Leaver-Fay, A.; Baker, D. Role of conformational sampling in computing mutation-induced changes in protein structure and stability. Proteins:Struct., Funct., Bioinf. 2011, 79, 830– 838, DOI: 10.1002/prot.22921
Google Scholar
There is no corresponding record for this reference.
100
Durham, E.; Dorr, B.; Woetzel, N.; Staritzbichler, R.; Meiler, J. Solvent accessible surface area approximations for rapid and accurate protein structure prediction. J. Mol. Model. 2009, 15, 1093– 1108, DOI: 10.1007/s00894-009-0454-9
Google Scholar
There is no corresponding record for this reference.
101
Bertalan, É.; Lešnik, S.; Bren, U.; Bondar, A.-N. Protein-water hydrogen-bond networks of G protein-coupled receptors: Graph-based analyses of static structures and molecular dynamics. J. Struct. Biol. 2020, 212, 107634 DOI: 10.1016/j.jsb.2020.107634
Google Scholar
There is no corresponding record for this reference.
102
Weiss, G. A.; Watanabe, C. K.; Zhong, A.; Goddard, A.; Sidhu, S. S. Rapid mapping of protein functional epitopes by combinatorial alanine scanning. Proc. Natl. Acad. Sci. U. S. A. 2000, 97, 8950– 8954, DOI: 10.1073/pnas.160252097
Google Scholar
There is no corresponding record for this reference.
103
Cunningham, B. C.; Wells, J. A. High-Resolution Epitope Mapping of hGH-Receptor Interactions by Alanine-Scanning Mutagenesis. Science 1989, 244, 1081– 1085, DOI: 10.1126/science.2471267
Google Scholar
There is no corresponding record for this reference.
104
Liu, H.; Song, L.; Meng, X. Proline-Mediated Enhancement in Evolvability of Disulfide-Rich Peptides for Discovering Protein Binders. J. Am. Chem. Soc. 2025, 147, 24870– 24883, DOI: 10.1021/jacs.5c07075
Google Scholar
There is no corresponding record for this reference.
105
Holden, J. K.; Pavlovicz, R.; Gobbi, A.; Song, Y.; Cunningham, C. N. Computational Site Saturation Mutagenesis of Canonical and Non-Canonical Amino Acids to Probe Protein-Peptide Interactions. Front. Mol. Biosci. 2022, 9, 848689 DOI: 10.3389/fmolb.2022.848689
Google Scholar
There is no corresponding record for this reference.
106
Spina, S. C.; Bailey, J.; Kimmel, B. Bind, catalyze, and quantify: a modern protein and enzyme engineering toolbox of genetically encoded non-canonical amino acids Protein Eng. Des. Sel. 2026gzag007 DOI: 10.1093/protein/gzag007 .
Google Scholar
There is no corresponding record for this reference.
107
Chen, Y.; Clay, N.; Phan, N. Molecular Matchmakers: Bioconjugation Techniques Enhance Prodrug Potency for Immunotherapy. Mol. Pharmaceutics 2025, 22, 58– 80, DOI: 10.1021/acs.molpharmaceut.4c00867
Google Scholar
There is no corresponding record for this reference.
108
Wang, R.; Fang, X.; Lu, Y.; Wang, S. The PDBbind Database: Collection of Binding Affinities for Protein–Ligand Complexes with Known Three-Dimensional Structures. J. Med. Chem. 2004, 47, 2977– 2980, DOI: 10.1021/jm030580l
Google Scholar
There is no corresponding record for this reference.
109
Liu, Z.; Su, M.; Han, L. Forging the Basis for Developing Protein–Ligand Interaction Scoring Functions. Acc. Chem. Res. 2017, 50, 302– 309, DOI: 10.1021/acs.accounts.6b00491
Google Scholar
There is no corresponding record for this reference.
110
King, B. R.; Sumida, K. H.; Caruso, J. L.; Baker, D.; Zalatan, J. G. Computational Stabilization of a Non-Heme Iron Enzyme Enables Efficient Evolution of New Function. Angew. Chem., Int. Ed. 2025, 64, e202414705 DOI: 10.1002/anie.202414705
Google Scholar
There is no corresponding record for this reference.
111
Howlader, M. T. H.; Kagawa, Y.; Miyakawa, A. Alanine Scanning Analyses of the Three Major Loops in Domain II of Bacillus thuringiensis Mosquitocidal Toxin Cry4Aa. Appl. Environ. Microbiol. 2010, 76, 860– 865, DOI: 10.1128/AEM.02175-09
Google Scholar
There is no corresponding record for this reference.
112
Paul, R.; Kasahara, K.; Sasaki, J. Unveiling the affinity–stability relationship in anti-measles virus antibodies: a computational approach for hotspots prediction. Front. Mol. Biosci. 2024, 10, 1302737 DOI: 10.3389/fmolb.2023.1302737
Google Scholar
There is no corresponding record for this reference.
113
Romero-Rivera, A.; Garcia-Borràs, M.; Osuna, S. Role of Conformational Dynamics in the Evolution of Retro-Aldolase Activity. ACS Catal. 2017, 7, 8524– 8532, DOI: 10.1021/acscatal.7b02954
Google Scholar
There is no corresponding record for this reference.
114
Lemkul, J. A. Introductory Tutorials for Simulating Protein Dynamics with GROMACS. J. Phys. Chem. B 2024, 128, 9418– 9435, DOI: 10.1021/acs.jpcb.4c04901
Google Scholar
There is no corresponding record for this reference.
115
Sanbonmatsu, K. Y.; Joseph, S.; Tung, C.-S. Simulating movement of tRNA into the ribosome during decoding. Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 15854– 15859, DOI: 10.1073/pnas.0503456102
Google Scholar
There is no corresponding record for this reference.
116
Li, R.; Macnamara, L.; Leuchter, J.; Alexander, R.; Cho, S. MD Simulations of tRNA and Aminoacyl-tRNA Synthetases: Dynamics, Folding, Binding, and Allostery. Int. J. Mol. Sci. 2015, 16, 15872– 15902, DOI: 10.3390/ijms160715872
Google Scholar
There is no corresponding record for this reference.
117
Patel, S.; Hosur, R. V. Replica exchange molecular dynamics simulations reveal self-association sites in M-Crystallin caused by mutations provide insights of cataract. Sci. Rep. 2021, 11, 23270 DOI: 10.1038/s41598-021-02728-8
Google Scholar
There is no corresponding record for this reference.
118
Stelzl, L. S.; Hummer, G. Kinetics from Replica Exchange Molecular Dynamics Simulations. J. Chem. Theory Comput. 2017, 13, 3927– 3935, DOI: 10.1021/acs.jctc.7b00372
Google Scholar
There is no corresponding record for this reference.
119
Feig, M.; Nawrocki, G.; Yu, I.; Wang, P.; Sugita, Y. Challenges and opportunities in connecting simulations with experiments via molecular dynamics of cellular environments. J. Phys. Conf. Ser. 2018, 1036, 012010 DOI: 10.1088/1742-6596/1036/1/012010
Google Scholar
There is no corresponding record for this reference.
120
Kumari, I.; Sandhu, P.; Ahmed, M.; Akhter, Y. Molecular Dynamics Simulations, Challenges and Opportunities: A Biologist’s Prospective. Curr. Protein Pept. Sci. 2017, 18, 1163– 1179, DOI: 10.2174/1389203718666170622074741
Google Scholar
There is no corresponding record for this reference.
121
Senn, H. M.; Thiel, W. QM/MM Methods for Biomolecular Systems. Angew. Chem., Int. Ed. 2009, 48, 1198– 1229, DOI: 10.1002/anie.200802019
Google Scholar
There is no corresponding record for this reference.
122
Lopes, P. E. M.; Guvench, O.; MacKerell, A. D. Current Status of Protein Force Fields for Molecular Dynamics Simulations. In Molecular Modeling of Proteins; Kukol, A., Ed.; Springer: New York, New York, NY, 2015; Vol. 1215, pp 47– 71.
Google Scholar
There is no corresponding record for this reference.
123
McMillin, D. R. Interatomic Repulsion and the Pauli Principle. J. Chem. Educ. 2021, 98, 2912– 2918, DOI: 10.1021/acs.jchemed.1c00326
Google Scholar
There is no corresponding record for this reference.
124
Guvench, O.; MacKerell, A. D. Comparison of Protein Force Fields for Molecular Dynamics Simulations. In Molecular Modeling of Proteins; Kukol, A., Ed.; Humana Press: Totowa, NJ, 2008; Vol. 443, pp 63– 88.
Google Scholar
There is no corresponding record for this reference.
125
Warshel, A.; Sharma, P. K.; Kato, M. Electrostatic Basis for Enzyme Catalysis. Chem. Rev. 2006, 106, 3210– 3235, DOI: 10.1021/cr0503106
Google Scholar
There is no corresponding record for this reference.
126
Van Der Kamp, M. W.; Mulholland, A. J. Combined Quantum Mechanics/Molecular Mechanics (QM/MM) Methods in Computational Enzymology. Biochemistry 2013, 52, 2708– 2728, DOI: 10.1021/bi400215w
Google Scholar
There is no corresponding record for this reference.
127
Magliery, T. J. Protein stability: computation, sequence statistics, and new experimental methods. Curr. Opin. Struct. Biol. 2015, 33, 161– 168, DOI: 10.1016/j.sbi.2015.09.002
Google Scholar
There is no corresponding record for this reference.
128
Singh, A.; Upadhyay, V.; Upadhyay, A. K.; Singh, S. M.; Panda, A. K. Protein recovery from inclusion bodies of Escherichia coli using mild solubilization process. Microb. Cell Factories 2015, 14, 41 DOI: 10.1186/s12934-015-0222-8
Google Scholar
There is no corresponding record for this reference.
129
Sormanni, P.; Aprile, F. A.; Vendruscolo, M. The CamSol Method of Rational Design of Protein Mutants with Enhanced Solubility. J. Mol. Biol. 2015, 427, 478– 490, DOI: 10.1016/j.jmb.2014.09.026
Google Scholar
There is no corresponding record for this reference.
130
Li, B.; Ming, D. GATSol, an enhanced predictor of protein solubility through the synergy of 3D structure graph and large language modeling. BMC Bioinf. 2024, 25, 204 DOI: 10.1186/s12859-024-05820-8
Google Scholar
There is no corresponding record for this reference.
131
Tan, Y.; Zheng, J.; Hong, L.; Zhou, B. ProtSolM: Protein Solubility Prediction with Multi-modal Features, arXiv:2406.19744. arXiv.org e-Print archive. https://arxiv.org/abs/2406.19744. 2024.
Google Scholar
There is no corresponding record for this reference.
132
Ghosh, D.; Biswas, A.; Radhakrishna, M. Advanced computational approaches to understand protein aggregation. Biophys. Rev. 2024, 5, 021302 DOI: 10.1063/5.0180691
Google Scholar
There is no corresponding record for this reference.
133
Oeller, M.; Kang, R.; Bell, R. Sequence-based prediction of pH-dependent protein solubility using CamSol. Brief. Bioinform. 2023, 24, bbad004 DOI: 10.1093/bib/bbad004
Google Scholar
There is no corresponding record for this reference.
134
Kimmel, B. R.; Mrksich, M. Development of an Enzyme-Inhibitor Reaction Using Cellular Retinoic Acid Binding Protein II for One-Pot Megamolecule Assembly. Chem. - Eur. J. 2021, 27, 17843– 17848, DOI: 10.1002/chem.202103059
Google Scholar
There is no corresponding record for this reference.
135
Kimmel, B. R.; Modica, J. A.; Parker, K.; Dravid, V.; Mrksich, M. Solid-Phase Synthesis of Megamolecules. J. Am. Chem. Soc. 2020, 142, 4534– 4538, DOI: 10.1021/jacs.9b12003
Google Scholar
There is no corresponding record for this reference.
136
Adomanis, R.; Phan, N.; Walter, G.; Kimmel, B. R. Modular Nanobody Conjugates with Controlled Topology Using Genetically Encoded Non-canonical Amino Acids. Preprint at https://doi.org/10.1101/2025.11.27.691038. 2025.
Google Scholar
There is no corresponding record for this reference.
137
Rosace, A.; Bennett, A.; Oeller, M. Automated optimization of solubility and conformational stability of antibodies and proteins. Nat. Commun. 2023, 14, 1937 DOI: 10.1038/s41467-023-37668-6
Google Scholar
There is no corresponding record for this reference.
138
Kuriata, A.; Iglesias, V.; Pujols, J. Aggrescan3D (A3D) 2.0: prediction and engineering of protein solubility. Nucleic Acids Res. 2019, 47, W300– W307, DOI: 10.1093/nar/gkz321
Google Scholar
There is no corresponding record for this reference.
139
Hirsch, M.; Desai, R. R.; Annaswamy, S.; Keatinge-Clay, A. T. Mutagenesis Supports AlphaFold Prediction of How Modular Polyketide Synthase Acyl Carrier Proteins Dock With Downstream Ketosynthases. Proteins:Struct., Funct., Bioinf. 2024, 92, 1375– 1384, DOI: 10.1002/prot.26733
Google Scholar
There is no corresponding record for this reference.
140
Araki, M.; Ekimoto, T.; Takemura, K. Molecular Dynamics Unveils Multiple-Site Binding of Inhibitors with Reduced Activity on the Surface of Dihydrofolate Reductase. J. Am. Chem. Soc. 2024, 146, 28685– 28695, DOI: 10.1021/jacs.4c04648
Google Scholar
There is no corresponding record for this reference.
141
Pimtawong, T.; Ren, J.; Lee, J.; Lee, H.-M.; Na, D. A review on computational models for predicting protein solubility. J. Microbiol. 2025, 63, 2408001 DOI: 10.71150/jm.2408001
Google Scholar
There is no corresponding record for this reference.
142
Navarro, S.; Ventura, S. Computational methods to predict protein aggregation. Curr. Opin. Struct. Biol. 2022, 73, 102343 DOI: 10.1016/j.sbi.2022.102343
Google Scholar
There is no corresponding record for this reference.
143
Prediction and Evaluation of Protein Aggregation with Computational Methods. In Methods in Molecular Biology; Springer US: New York, NY, 2025; pp 299– 314 DOI: 10.1007/978-1-0716-4196-5_17 .
Google Scholar
There is no corresponding record for this reference.
144
Santos, J.; Pujols, J.; Pallarès, I.; Iglesias, V.; Ventura, S. Computational prediction of protein aggregation: Advances in proteomics, conformation-specific algorithms and biotechnological applications. Comput. Struct. Biotechnol. J. 2020, 18, 1403– 1413, DOI: 10.1016/j.csbj.2020.05.026
Google Scholar
There is no corresponding record for this reference.
145
Arnold, F. H. Design by Directed Evolution. Acc. Chem. Res. 1998, 31, 125– 131, DOI: 10.1021/ar960017f
Google Scholar
There is no corresponding record for this reference.
146
Arnold, F. H. Directed evolution: Creating biocatalysts for the future. Chem. Eng. Sci. 1996, 51, 5091– 5102, DOI: 10.1016/S0009-2509(96)00288-6
Google Scholar
There is no corresponding record for this reference.
147
Cobb, R. E.; Chao, R.; Zhao, H. Directed evolution: Past, present, and future. AIChE J. 2013, 59, 1432– 1440, DOI: 10.1002/aic.13995
Google Scholar
There is no corresponding record for this reference.
148
Yang, J.; Lal, R. G.; Bowden, J. C. Active learning-assisted directed evolution. Nat. Commun. 2025, 16, 714 DOI: 10.1038/s41467-025-55987-8
Google Scholar
There is no corresponding record for this reference.
149
Terashi, G.; Wang, X.; Maddhuri Venkata Subramaniya, S. R.; Tesmer, J. J. G.; Kihara, D. Residue-wise local quality estimation for protein models from cryo-EM maps. Nat. Methods 2022, 19, 1116– 1125, DOI: 10.1038/s41592-022-01574-4
Google Scholar
There is no corresponding record for this reference.
150
Graille, M.; Sacquin-Mora, S.; Taly, A. Best Practices of Using AI-Based Models in Crystallography and Their Impact in Structural Biology. J. Chem. Inf. Model. 2023, 63, 3637– 3646, DOI: 10.1021/acs.jcim.3c00381
Google Scholar
There is no corresponding record for this reference.
151
Wang, X.; Zhu, H.; Terashi, G.; Taluja, M.; Kihara, D. DiffModeler: large macromolecular structure modeling for cryo-EM maps using a diffusion model. Nat. Methods 2024, 21, 2307– 2317, DOI: 10.1038/s41592-024-02479-0
Google Scholar
There is no corresponding record for this reference.
152
Serapian, S. A.; Crosby, J.; Crump, M. P.; Van Der Kamp, M. W. Path to Actinorhodin: Regio- and Stereoselective Ketone Reduction by a Type II Polyketide Ketoreductase Revealed in Atomistic Detail. JACS Au 2022, 2, 972– 984, DOI: 10.1021/jacsau.2c00086
Google Scholar
There is no corresponding record for this reference.
153
Shukla, V. K.; Karunanithy, G.; Vallurupalli, P.; Hansen, D. F. A combined NMR and deep neural network approach for enhancing the spectral resolution of aromatic side chains in proteins. Sci. Adv. 2024, 10, eadr2155 DOI: 10.1126/sciadv.adr2155
Google Scholar
There is no corresponding record for this reference.
154
Drake, Z. C.; Fowler, A. G.; Blum, A. A.; Lindert, S. Enhanced Protein Complex Prediction via Rosetta, AlphaFold, and Nondifferential Covalent Labeling Mass Spectrometry. J. Phys. Chem. B 2025, 129, 6489– 6497, DOI: 10.1021/acs.jpcb.5c02872
Google Scholar
There is no corresponding record for this reference.
155
Lee, C. Y.; Hubrich, D.; Varga, J. K. Systematic discovery of protein interaction interfaces using AlphaFold and experimental validation. Mol. Syst. Biol. 2024, 20, 75– 97, DOI: 10.1038/s44320-023-00005-6
Google Scholar
There is no corresponding record for this reference.
156
Koehler Leman, J.; Künze, G. Recent Advances in NMR Protein Structure Prediction with ROSETTA. Int. J. Mol. Sci. 2023, 24, 7835, DOI: 10.3390/ijms24097835
Google Scholar
There is no corresponding record for this reference.
157
Alshammari, M.; Wriggers, W.; Sun, J.; He, J. Refinement of AlphaFold2 models against experimental and hybrid cryo-EM density maps. QRB Discovery 2022, 3, e16 DOI: 10.1017/qrd.2022.13
Google Scholar
There is no corresponding record for this reference.
158
Humphreys, I. R.; Pei, J.; Baek, M. Computed structures of core eukaryotic protein complexes. Science 2021, 374, eabm4805 DOI: 10.1126/science.abm4805
Google Scholar
There is no corresponding record for this reference.
159
Bordin, N.; Sillitoe, I.; Nallapareddy, V. AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms. Commun. Biol. 2023, 6, 160 DOI: 10.1038/s42003-023-04488-9
Google Scholar
There is no corresponding record for this reference.
160
Wang, H.; Wang, J. How cryo-electron microscopy and X-ray crystallography complement each other. Protein Sci. 2017, 26, 32– 39, DOI: 10.1002/pro.3022
Google Scholar
There is no corresponding record for this reference.

Cited By

Click to copy section linkSection link copied!

This article has not yet been cited by other publications.

Get e-Alerts

ACS Engineering Au

Cite this: ACS Eng. Au 2026, XXXX, XXX, XXX-XXX

Click to copy citationCitation copied!

https://doi.org/10.1021/acsengineeringau.5c00099

Published March 18, 2026

CC-BY-NC-ND 4.0 .

Article Views

Altmetric

Citations

Learn about these metrics

Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.

Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.

The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.

Abstract
High Resolution Image
Download MS PowerPoint Slide
Figure 1
Figure 1. From sequence to protein structure and conformational behavior. (A) Biological information transfer follows a deterministic pathway from DNA to RNA to protein, linking the encoded sequence information to the emergent molecular function and dynamics. (B) Input amino acid sequences serve as the basis for predictive modeling frameworks. (C) Sequence-informed A.I./ML frameworks trained on sequence and structural ensemble data learn the mapping between linear sequences and conformation space. (D) The resulting structural ensemble offers a data-driven view of protein flexibility and structural diversity derived directly from the amino acid sequence. Reprinted or adapted with permission under a CC-BY 3.0 License from Ille et al. (14) Copyright 2025 AIP Publishing.
High Resolution Image
Download MS PowerPoint Slide
Figure 2
Figure 2. Conceptual view of the protein functional universe. The diagram maps the relationships among sequence, structure, and function spaces. Each circle represents an individual protein defined by its amino acid sequence, 3D folds, and biological activity. The blue circles correspond to proteins accessible through natural evolution or traditional protein engineering, primarily clustered within well-explored regions (yellow). Gray circles indicate proteins that remain uncharacterized and lie within the unexplored sequence–structure–function space. The red circles represent proteins accessible through ML-driven de novo design, which extends exploration beyond natural boundaries into previously inaccessible regions. In this framework, sequence space (top layer) is linked to structure space (middle layer) and ultimately to function space (bottom layer), with A.I. methods systematically probing across all three layers. Reprinted or adapted with permission under a CC-BY 4.0 License from Zhang et al. (22) Copyright 2025 MDPI.
High Resolution Image
Download MS PowerPoint Slide
Figure 3
Figure 3. Overview of the current protein design dogma. Traditional protein science is often described as a one-way flow in which (A) amino acid sequences give rise to (B) folded structures, which in turn underpin (C) biological function. Modern de novo design inverts this logic: researchers now begin with the desired function and work backward to identify compatible folds and sequences. Current computational frameworks align with three broad strategies: (1) two-stage design, in which structural generators such as ROSETTA, RoseTTAFold, or PyRosetta first propose candidate protein backbones that are then optimized by sequence design engines; (2) sequence-driven methods, exemplified by AlphaFold2 and ColabFold, which predict protein structures directly from amino acid sequence information and are widely used to validate or filter design candidates; and (3) coguided approaches, including multitrack RoseTTAFold variants (RF, RFNA, RFAA) and diffusion-based models (RFDiffusion), which integrate amino acid sequence and protein structure generation simultaneously. These complementary strategies extend the protein design beyond natural sequence–structure relationships, enabling a function-first exploration of protein space. Reprinted or adapted with permission under a CC-BY 4.0 License from Zhang et al. (22) Copyright 2025 MDPI.
High Resolution Image
Download MS PowerPoint Slide
Figure 4
Figure 4. Overview of the A.I.-driven protein design toolbox. According to their functional roles in A.I.-driven generative protein design, the protein design toolbox can be divided into five categories: (A) structure prediction frameworks (e.g., AlphaFold2, RoseTTAFold) that validate fold accuracy; (B) de novo backbone generators (RFDiffusion, RFDiffusionAA) that embed motifs or active sites into novel folds; (C) fixed-backbone sequence designers (LigandMPNN) that optimize sequences against a defined structural context; (D) sequence generation models (ProteinMPNN), which not only perform fixed-backbone optimization but also function as a generative sampler of amino acid sequences; and (E) sequence–structure cogeneration and refinement frameworks (PLACER), which jointly optimize side chains, ligands, and active-site geometry. Reprinted or adapted with permission under a CC-BY 4.0 License from Zhang et al. (22) Copyright 2025 MDPI.
High Resolution Image
Download MS PowerPoint Slide
Figure 5
Figure 5. Timeline of major developments in protein structure prediction (black) and design methodologies (red). Following early innovations such as ROSETTA (1998) and PyRosetta (2010), the field saw nearly two decades of incremental progress before the emergence of transformative A.I.-based models such as AlphaFold2 (2020). Since then, breakthroughs in generative frameworks, including ProteinMPNN, RFDiffusion, and LigandMPNN, have rapidly expanded, marking a shift toward integrated prediction-design pipelines.
High Resolution Image
Download MS PowerPoint Slide
Figure 6
Figure 6. Computational strategies for evaluating amino acid sequence perturbations. (A) Structural stability analysis introduces mutations into a sequence and applies ab initio folding to predict conformational shifts, highlighting favorable and unfavorable perturbations. (B) Binding affinity analysis docks protein constituents, incorporates mutations, and estimates changes in binding free energy (ΔΔG) to evaluate the interaction stability. (C) Interface hotspot probing systematically mutates residues at binding interfaces to pinpoint the positions that are most critical for binding energetics.
High Resolution Image
Download MS PowerPoint Slide
Figure 7
Figure 7. Computational evaluation of the biological and functional properties of proteins. (A) Molecular dynamics and catalysis simulate mutated proteins in solvated environments to capture conformational flexibility and catalytic changes through trajectory analyses. Hybrid pipelines that integrate molecular dynamics (MD) with ROSETTA and directed evolution have yielded efficient de novo and redesigned biocatalysts, such as HG3.17 and BH32.14, whose catalytic power emerges from MD-guided active-site reorganization and solvent shielding. (B) Solubility analysis predicts the effects of amino acid sequence variation on protein solubility by comparing mutant distributions to wild-type benchmarks. CamSol-based workflows enable the rational optimization of both solubility and conformational stability, as demonstrated for six antibodies (including two approved therapeutics), enhancing developability without compromising binding. (137) (C) Aggregation propensity assesses structural and sequence features to identify residues or motifs that drive aggregation, distinguishing soluble variants from aggregation-prone variants. Using Aggrescan3D, researchers computationally minimized aggregation hotspots to engineer green fluorescent protein (GFP) mutants with significantly improved solubility and reduced aggregation, resulting in a fast-folding, aggregation-resistant variant. (138) Together, these approaches extend computational evaluation to capture dynamic solubility and aggregation behaviors that critically influence protein performance in physiological and industrial contexts.
High Resolution Image
Download MS PowerPoint Slide
Figure 8
Figure 8. Conceptual framework contrasting traditional and A.I.-assisted directed evolution (DE) workflows. The diagram is divided into two pathways: the upper route represents conventional DE, where (A) natural sequence diversity is explored, (B) mutational libraries are generated, (C) variants are expressed, and (E) high-throughput screening identifies improved candidates through iterative experimental cycles. The lower route introduces A.I./ML-assisted or hybrid methodologies, in which (D) supervised models with uncertainty quantification learn the sequence-fitness landscape and use acquisition functions to propose new variants, balancing exploration (high uncertainty) and exploitation (high predicted fitness). These feedback-driven optimization strategies accelerate variant discovery with a reduced screening effort. Combined approaches, such as active learning-assisted directed evolution (ALDE), have yielded (F) optimized protoglobin-based biocatalysts for nonnative cyclopropanation reactions, enhancing their activity, selectivity, and stability while minimizing experimental costs. (148)
High Resolution Image
Download MS PowerPoint Slide
Figure 9
Figure 9. Experimental–computational pipeline for protein engineering. (A) Protein mutant libraries are generated by introducing sequence variations across the regions of interest. AlphaFold-guided domain-motif design (e.g., FBXO23-STX1B) has revealed novel regulatory interfaces relevant to therapeutic target discovery. (155) (B) Mutants are expressed in Escherichia coli, yeast, or mammalian systems to generate protein ensembles for screening; such expression-labeling pipelines support enzyme and biocatalyst development used in pharmaceuticals and green chemistry. (154) (C) Structural characterization via cryo-EM, NMR, and X-ray crystallography (and hybrid methods) refined with predictive models such as AlphaFold2 or ROSETTA resolves folding and conformational dynamics, as shown in ribosomal complex refinements in NMR-ROSETTA modeling of ubiquitin. (156,157) (D) Functional screening evaluates the activity and binding properties of mutant sets, such as hydroxyl radical footprinting, to identify active-site or interface residues that control activity and stability, as applied to Hsp90-co-chaperone systems and engineered oxidoreductase. (154) (E) Finally, A.I./ML integration combines experimental data with modeling to predict next-generation variants, accelerating industrial enzyme design, antibody optimization, and biosensor development.
High Resolution Image
Download MS PowerPoint Slide
References
This article references 160 other publications.
1. 1
  Yu, Y.; Hu, C.; Xia, L.; Wang, J. Artificial Metalloenzyme Design with Unnatural Amino Acids and Non-Native Cofactors. ACS Catal. 2018, 8, 1851– 1863, DOI: 10.1021/acscatal.7b03754
  There is no corresponding record for this reference.
2. 2
  Mirts, E. N.; Bhagi-Damodaran, A.; Lu, Y. Understanding and Modulating Metalloenzymes with Unnatural Amino Acids, Non-Native Metal Ions, and Non-Native Metallocofactors. Acc. Chem. Res. 2019, 52, 935– 944, DOI: 10.1021/acs.accounts.9b00011
  There is no corresponding record for this reference.
3. 3
  Mann, S. I.; Nayak, A.; Gassner, G. T.; Therien, M. J.; DeGrado, W. F. De Novo Design, Solution Characterization, and Crystallographic Structure of an Abiological Mn–Porphyrin-Binding Protein Capable of Stabilizing a Mn(V) Species. J. Am. Chem. Soc. 2021, 143, 252– 259, DOI: 10.1021/jacs.0c10136
  There is no corresponding record for this reference.
4. 4
  Bergman, M. T.; Xiao, X.; Hall, C. K. In Silico Design and Analysis of Plastic-Binding Peptides. J. Phys. Chem. B 2023, 127, 8370– 8381, DOI: 10.1021/acs.jpcb.3c04319
  There is no corresponding record for this reference.
5. 5
  García-Moreno, P. J. Recent advances in the production of emulsifying peptides with the aid of proteomics and bioinformatics. Curr. Opin. Food Sci. 2023, 51, 101039 DOI: 10.1016/j.cofs.2023.101039
  There is no corresponding record for this reference.
6. 6
  Ndochinwa, G. O.; Wang, Q. Y.; Okoro, N. O. New advances in protein engineering for industrial applications: Key takeaways. Open Life Sci. 2024, 19, 20220856 DOI: 10.1515/biol-2022-0856
  There is no corresponding record for this reference.
7. 7
  Marcos, E.; Silva, D. Essentials of de novo protein design: Methods and applications. WIREs Comput. Mol. Sci. 2018, 8 (6), e1374 DOI: 10.1002/wcms.1374
  There is no corresponding record for this reference.
8. 8
  Huang, P.-S.; Boyken, S. E.; Baker, D. The coming of age of de novo protein design. Nature 2016, 537, 320– 327, DOI: 10.1038/nature19946
  There is no corresponding record for this reference.
9. 9
  Woolfson, D. N. A Brief History of De Novo Protein Design: Minimal, Rational, and Computational. J. Mol. Biol. 2021, 433, 167160 DOI: 10.1016/j.jmb.2021.167160
  There is no corresponding record for this reference.
10. 10
  Dill, K. A.; Ozkan, S. B.; Shell, M. S.; Weikl, T. R. Protein Folding Problem. Annu. Rev. Biophys. 2008, 37, 289– 316, DOI: 10.1146/annurev.biophys.37.092707.153558
  There is no corresponding record for this reference.
11. 11
  Kocher, C. D.; Dill, K. A. Origins of life: The Protein Folding Problem all over again?. Proc. Natl. Acad. Sci. U. S. A. 2024, 121, e2315000121 DOI: 10.1073/pnas.2315000121
  There is no corresponding record for this reference.
12. 12
  Chen, S.-J.; Hassan, M.; Jernigan, R. L. Protein folds vs. protein folding: Differing questions, different challenges. Proc. Natl. Acad. Sci. U. S. A. 2023, 120, e2214423119 DOI: 10.1073/pnas.2214423119
  There is no corresponding record for this reference.
13. 13
  Kiss, G.; Çelebi-Ölçüm, N.; Moretti, R.; Baker, D.; Houk, K. N. Computational Enzyme Design. Angew. Chem., Int. Ed. 2013, 52, 5700– 5725, DOI: 10.1002/anie.201204077
  There is no corresponding record for this reference.
14. 14
  Ille, A. M.; Anas, E.; Mathews, M. B.; Burley, S. K. From sequence to protein structure and conformational dynamics with artificial intelligence/machine learning. Struct. Dyn. 2025, 12, 030902 DOI: 10.1063/4.0000765
  There is no corresponding record for this reference.
15. 15
  Anfinsen, C. B. Principles that Govern the Folding of Protein Chains. Science 1973, 181, 223– 230, DOI: 10.1126/science.181.4096.223
  There is no corresponding record for this reference.
16. 16
  Voigt, C. A.; Mayo, S. L.; Arnold, F. H.; Wang, Z.-G. Computational method to reduce the search space for directed protein evolution. Proc. Natl. Acad. Sci. U. S. A. 2001, 98, 3778– 3783, DOI: 10.1073/pnas.051614498
  There is no corresponding record for this reference.
17. 17
  Kuhlman, B.; Baker, D. Native protein sequences are close to optimal for their structures. Proc. Natl. Acad. Sci. U. S. A. 2000, 97, 10383– 10388, DOI: 10.1073/pnas.97.19.10383
  There is no corresponding record for this reference.
18. 18
  Sleator, R. D. Solving the protein folding problem···. FEBS Lett. 2024, 598, 2831– 2835, DOI: 10.1002/1873-3468.15043
  There is no corresponding record for this reference.
19. 19
  Watson, J. L.; Juergens, D.; Bennett, N. R. De novo design of protein structure and function with RFdiffusion. Nature 2023, 620, 1089– 1100, DOI: 10.1038/s41586-023-06415-8
  There is no corresponding record for this reference.
20. 20
  Leveson-Gower, R. B. Designing Enzymatic Reactivity with an Expanded Palette. ChemBioChem 2025, 26, e202500076 DOI: 10.1002/cbic.202500076
  There is no corresponding record for this reference.
21. 21
  Hartman, M. C. T. Non-canonical Amino Acid Substrates of Escherichia coli Aminoacyl-tRNA Synthetases. ChemBioChem 2022, 23, e202100299 DOI: 10.1002/cbic.202100299
  There is no corresponding record for this reference.
22. 22
  Zhang, G.; Liu, C.; Lu, J.; Zhang, S.; Zhu, L. The Role of AI-Driven De Novo Protein Design in the Exploration of the Protein Functional Universe. Biology 2025, 14, 1268 DOI: 10.3390/biology14091268
  There is no corresponding record for this reference.
23. 23
  Rohl, C. A.; Strauss, C. E. M.; Misura, K. M. S.; Baker, D. Protein Structure Prediction Using Rosetta. Methods Enzymol. 2004, 383, 66– 93, DOI: 10.1016/S0076-6879(04)83004-0
  There is no corresponding record for this reference.
24. 24
  Simons, K. T.; Kooperberg, C.; Huang, E.; Baker, D. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and bayesian scoring functions. J. Mol. Biol. 1997, 268, 209– 225, DOI: 10.1006/jmbi.1997.0959
  There is no corresponding record for this reference.
25. 25
  Leman, J. K.; Weitzner, B. D.; Lewis, S. M. Macromolecular modeling and design in Rosetta: recent methods and frameworks. Nat. Methods 2020, 17, 665– 680, DOI: 10.1038/s41592-020-0848-2
  There is no corresponding record for this reference.
26. 26
  Das, R.; Baker, D. Macromolecular Modeling with Rosetta. Annu. Rev. Biochem. 2008, 77, 363– 382, DOI: 10.1146/annurev.biochem.77.062906.171838
  There is no corresponding record for this reference.
27. 27
  Kaufmann, K. W.; Meiler, J. Using RosettaLigand for Small Molecule Docking into Comparative Models. PLoS One 2012, 7, e50769 DOI: 10.1371/journal.pone.0050769
  There is no corresponding record for this reference.
28. 28
  Lemmon, G.; Kaufmann, K.; Meiler, J. Prediction of HIV-1 Protease/Inhibitor Affinity using RosettaLigand. Chem. Biol. Drug Des. 2012, 79, 888– 896, DOI: 10.1111/j.1747-0285.2012.01356.x
  There is no corresponding record for this reference.
29. 29
  Chaudhury, S.; Lyskov, S.; Gray, J. J. PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics 2010, 26, 689– 691, DOI: 10.1093/bioinformatics/btq007
  There is no corresponding record for this reference.
30. 30
  Le, K. H.; Adolf-Bryfogle, J.; Klima, J. C. PyRosetta Jupyter Notebooks Teach Biomolecular Structure Prediction and Design. Biophysicist 2021, 2, 108– 122, DOI: 10.35459/tbp.2019.000147
  There is no corresponding record for this reference.
31. 31
  Ford, A. S.; Weitzner, B. D.; Bahl, C. D. Integration of the Rosetta suite with the python software stack via reproducible packaging and core programming interfaces for distributed simulation. Protein Sci. 2020, 29, 43– 51, DOI: 10.1002/pro.3721
  There is no corresponding record for this reference.
32. 32
  Van Stappen, C.; Deng, Y.; Liu, Y. Designing Artificial Metalloenzymes by Tuning of the Environment beyond the Primary Coordination Sphere. Chem. Rev. 2022, 122, 11974– 12045, DOI: 10.1021/acs.chemrev.2c00106
  There is no corresponding record for this reference.
33. 33
  Tivon, B.; Wiese, J.; Müller, M. P. Computational Design of Lysine Targeting Covalent Binders Using Rosetta. J. Chem. Inf. Model. 2025, 65, 5612– 5622, DOI: 10.1021/acs.jcim.5c00212
  There is no corresponding record for this reference.
34. 34
  Jumper, J.; Evans, R.; Pritzel, A. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583– 589, DOI: 10.1038/s41586-021-03819-2
  There is no corresponding record for this reference.
35. 35
  Tunyasuvunakool, K.; Adler, J.; Wu, Z. Highly accurate protein structure prediction for the human proteome. Nature 2021, 596, 590– 596, DOI: 10.1038/s41586-021-03828-1
  There is no corresponding record for this reference.
36. 36
  Kryshtafovych, A.; Schwede, T.; Topf, M.; Fidelis, K.; Moult, J. Critical assessment of methods of protein structure prediction (CASP)─Round XIV. Proteins:Struct., Funct., Bioinf. 2021, 89, 1607– 1617, DOI: 10.1002/prot.26237
  There is no corresponding record for this reference.
37. 37
  Kovalevskiy, O.; Mateos-Garcia, J.; Tunyasuvunakool, K. AlphaFold two years on: Validation and impact. Proc. Natl. Acad. Sci. U. S. A. 2024, 121, e2315002121 DOI: 10.1073/pnas.2315002121
  There is no corresponding record for this reference.
38. 38
  Schneider, B.; Sweeney, B. A.; Bateman, A. When will RNA get its AlphaFold moment?. Nucleic Acids Res. 2023, 51, 9522– 9532, DOI: 10.1093/nar/gkad726
  There is no corresponding record for this reference.
39. 39
  Terwilliger, T. C.; Liebschner, D.; Croll, T. I. AlphaFold predictions are valuable hypotheses and accelerate but do not replace experimental structure determination. Nat. Methods 2024, 21, 110– 116, DOI: 10.1038/s41592-023-02087-4
  There is no corresponding record for this reference.
40. 40
  Mirdita, M.; Schütze, K.; Moriwaki, Y. ColabFold: making protein folding accessible to all. Nat. Methods 2022, 19, 679– 682, DOI: 10.1038/s41592-022-01488-1
  There is no corresponding record for this reference.
41. 41
  Kim, G.; Lee, S.; Levy Karin, E. Easy and accurate protein structure prediction using ColabFold. Nat. Protoc. 2025, 20, 620– 642, DOI: 10.1038/s41596-024-01060-5
  There is no corresponding record for this reference.
42. 42
  Kalogeropoulos, K.; Bohn, M. F.; Jenkins, D. E. A comparative study of protein structure prediction tools for challenging targets: Snake venom toxins. Toxicon 2024, 238, 107559 DOI: 10.1016/j.toxicon.2023.107559
  There is no corresponding record for this reference.
43. 43
  Baek, M.; DiMaio, F.; Anishchenko, I. Accurate prediction of protein structures and interactions using a three-track neural network. Science 2021, 373, 871– 876, DOI: 10.1126/science.abj8754
  There is no corresponding record for this reference.
44. 44
  Baek, M.; McHugh, R.; Anishchenko, I. Accurate prediction of protein–nucleic acid complexes using RoseTTAFoldNA. Nat. Methods 2024, 21, 117– 121, DOI: 10.1038/s41592-023-02086-5
  There is no corresponding record for this reference.
45. 45
  Krishna, R.; Wang, J.; Ahern, W. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 2024, 384, eadl2528 DOI: 10.1126/science.adl2528
  There is no corresponding record for this reference.
46. 46
  Liu, S.; Wu, K.; Chen, C. Obtaining protein foldability information from computational models of AlphaFold2 and RoseTTAFold. Comput. Struct. Biotechnol. J. 2022, 20, 4481– 4489, DOI: 10.1016/j.csbj.2022.08.034
  There is no corresponding record for this reference.
47. 47
  Wayment-Steele, H. K.; Ojoawo, A.; Otten, R. Predicting multiple conformations via sequence clustering and AlphaFold2. Nature 2024, 625, 832– 839, DOI: 10.1038/s41586-023-06832-9
  There is no corresponding record for this reference.
48. 48
  Casadevall, G.; Duran, C.; Osuna, S. AlphaFold2 and Deep Learning for Elucidating Enzyme Conformational Flexibility and Its Application for Design. JACS Au 2023, 3, 1554– 1562, DOI: 10.1021/jacsau.3c00188
  There is no corresponding record for this reference.
49. 49
  Vallejo, W.; Díaz-Uribe, C.; Fajardo, C. Google Colab and Virtual Simulations: Practical e-Learning Tools to Support the Teaching of Thermodynamics and to Introduce Coding to Students. ACS Omega 2022, 7, 7421– 7429, DOI: 10.1021/acsomega.2c00362
  There is no corresponding record for this reference.
50. 50
  Adiyaman, R.; Edmunds, N. S.; Genc, A. G.; Alharbi, S. M. A.; McGuffin, L. J. Improvement of protein tertiary and quaternary structure predictions using the ReFOLD refinement method and the AlphaFold2 recycling process. Bioinforma. Adv. 2023, 3 (1), vbad078 DOI: 10.1093/bioadv/vbad078
  There is no corresponding record for this reference.
51. 51
  Ahern, W.; Yim, J.; Tischer, D. Atom level enzyme active site scaffolding using RFdiffusion2. Nat. Methods 2026, 23, 96– 105, DOI: 10.1038/s41592-025-02975-x
  There is no corresponding record for this reference.
52. 52
  Wang, W.; Feng, C.; Han, R. trRosettaRNA: automated prediction of RNA 3D structure with transformer network. Nat. Commun. 2023, 14, 7266 DOI: 10.1038/s41467-023-42528-4
  There is no corresponding record for this reference.
53. 53
  Corso, G.; Stärk, H.; Jing, B.; Barzilay, R.; Jaakkola, T. DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking, arXiv:2210.01776. arXiv.org e-Print archive. https://arxiv.org/abs/2210.01776. 2023.
  There is no corresponding record for this reference.
54. 54
  Alamdari, S. Protein generation with evolutionary diffusion: sequence is all you need. Preprint at https://doi.org/10.1101/2023.09.11.556673. 2023.
  There is no corresponding record for this reference.
55. 55
  Chu, A. E.; Kim, J.; Cheng, L. An all-atom protein generative model. Proc. Natl. Acad. Sci. U. S. A. 2024, 121, e2311500121 DOI: 10.1073/pnas.2311500121
  There is no corresponding record for this reference.
56. 56
  Dauparas, J. Robust deep learning based protein sequence design using ProteinMPNN.
  There is no corresponding record for this reference.
57. 57
  Sumida, K. H.; Núñez-Franco, R.; Kalvet, I. Improving Protein Expression, Stability, and Function with ProteinMPNN. J. Am. Chem. Soc. 2024, 146, 2054– 2061, DOI: 10.1021/jacs.3c10941
  There is no corresponding record for this reference.
58. 58
  De Haas, R. J.; Brunette, N.; Goodson, A. Rapid and automated design of two-component protein nanomaterials using ProteinMPNN. Proc. Natl. Acad. Sci. U. S. A. 2024, 121, e2314646121 DOI: 10.1073/pnas.2314646121
  There is no corresponding record for this reference.
59. 59
  Dauparas, J.; Lee, G. R.; Pecoraro, R. Atomic context-conditioned protein sequence design using LigandMPNN. Nat. Methods 2025, 22, 717– 723, DOI: 10.1038/s41592-025-02626-1
  There is no corresponding record for this reference.
60. 60
  Clark-Elsayed, A. Comparing LigandMPNN and Directed Evolution for Altering the Effector-Binding Site in the RamR Transcription Factor.
  There is no corresponding record for this reference.
61. 61
  An, L.; Said, M.; Tran, L. Binding and sensing diverse small molecules using shape-complementary pseudocycles. Science 2024, 385, 276– 282, DOI: 10.1126/science.adn3780
  There is no corresponding record for this reference.
62. 62
  Agu, P. C.; Afiukwa, C. A.; Orji, O. U. Molecular docking as a tool for the discovery of molecular targets of nutraceuticals in diseases management. Sci. Rep. 2023, 13, 13398 DOI: 10.1038/s41598-023-40160-2
  There is no corresponding record for this reference.
63. 63
  Anishchenko, I. Modeling protein-small molecule conformational ensembles with ChemNet. Preprint at https://doi.org/10.1101/2024.09.25.614868. 2024.
  There is no corresponding record for this reference.
64. 64
  Lauko, A.; Pellock, S. J.; Sumida, K. H. Computational design of serine hydrolases. Science 2025, 388, eadu2454 DOI: 10.1126/science.adu2454
  There is no corresponding record for this reference.
65. 65
  Park, H.; Zhou, G.; Baek, M.; Baker, D.; DiMaio, F. Force Field Optimization Guided by Small Molecule Crystal Lattice Data Enables Consistent Sub-Angstrom Protein–Ligand Docking. J. Chem. Theory Comput. 2021, 17, 2000– 2010, DOI: 10.1021/acs.jctc.0c01184
  There is no corresponding record for this reference.
66. 66
  Garcia, M.; Dixit, S. M.; Rocklin, G. J. Evaluating zero-shot prediction of protein design success by AlphaFold, ESMFold, and ProteinMPNN.
  There is no corresponding record for this reference.
67. 67
  Kong, Z. ProtFlow: Flow Matching-based Protein Sequence Design with Comprehensive Protein Semantic Distribution Learning and High-quality Generation.
  There is no corresponding record for this reference.
68. 68
  Elnaggar, A.; Heinzinger, M.; Dallago, C. ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 7112– 7127, DOI: 10.1109/TPAMI.2021.3095381
  There is no corresponding record for this reference.
69. 69
  Madani, A. ProGen: Language Modeling for Protein Generation, arXiv:2004.03497. arXiv.org e-Print archive. https://arxiv.org/abs/2004.03497. 2020.
  There is no corresponding record for this reference.
70. 70
  Nijkamp, E.; Ruffolo, J. A.; Weinstein, E. N.; Naik, N.; Madani, A. ProGen2: Exploring the boundaries of protein language models. Cell Syst. 2023, 14, 968– 978.e3, DOI: 10.1016/j.cels.2023.10.002
  There is no corresponding record for this reference.
71. 71
  Ferruz, N.; Schmidt, S.; Höcker, B. ProtGPT2 is a deep unsupervised language model for protein design. Nat. Commun. 2022, 13, 4348 DOI: 10.1038/s41467-022-32007-7
  There is no corresponding record for this reference.
72. 72
  Nguyen, E.; Poli, M.; Durrant, M. G. Sequence modeling and design from molecular to genome scale with Evo. Science 2024, 386, eado9336 DOI: 10.1126/science.ado9336
  There is no corresponding record for this reference.
73. 73
  Dalla-Torre, H.; Gonzalez, L.; Mendoza-Revilla, J. Nucleotide Transformer: building and evaluating robust foundation models for human genomics. Nat. Methods 2025, 22, 287– 297, DOI: 10.1038/s41592-024-02523-z
  There is no corresponding record for this reference.
74. 74
  Avsec, Ž.; Latysheva, N.; Cheng, J. Advancing regulatory variant effect prediction with AlphaGenome. Nature 2026, 649, 1206– 1218, DOI: 10.1038/s41586-025-10014-0
  There is no corresponding record for this reference.
75. 75
  Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction.
  There is no corresponding record for this reference.
76. 76
  Chai Discovery. Chai-1: Decoding the molecular interactions of life. Preprint at https://doi.org/10.1101/2024.10.10.615955. 2024.
  There is no corresponding record for this reference.
77. 77
  Ingraham, J. B.; Baranov, M.; Costello, Z. Illuminating protein space with a programmable generative model. Nature 2023, 623, 1070– 1078, DOI: 10.1038/s41586-023-06728-8
  There is no corresponding record for this reference.
78. 78
  Mille-Fragoso, L. S. Efficient generation of epitope-targeted de novo antibodies with Germinal.
  There is no corresponding record for this reference.
79. 79
  Pacesa, M.; Nickel, L.; Schellhaas, C. One-shot design of functional protein binders with BindCraft. Nature 2025, 646, 483– 492, DOI: 10.1038/s41586-025-09429-6
  There is no corresponding record for this reference.
80. 80
  BoltzGen: Toward Universal Binder Design.
  There is no corresponding record for this reference.
81. 81
  Zhang, O. ODesign: A World Model for Biomolecular Interaction Design, arXiv:2510.22304. arXiv.org e-Print archive. https://arxiv.org/abs/2510.22304. 2025.
  There is no corresponding record for this reference.
82. 82
  Parks, M. Blind Virtual Screening at Scale: A Scalable End-to-End Pipeline for Blind Docking and Affinity Prediction.
  There is no corresponding record for this reference.
83. 83
  John, P. S. BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery, arXiv:2411.10548. arXiv.org e-Print archive. https://arxiv.org/abs/2411.10548. 2025.
  There is no corresponding record for this reference.
84. 84
  Silke, D.; Iskander, J.; Pan, J. ProteinDJ : A high-performance and modular protein design pipeline. Protein Sci. 2026, 35, e70464 DOI: 10.1002/pro.70464
  There is no corresponding record for this reference.
85. 85
  González-Rodríguez, N.; Chacón-Sánchez, C.; Llorca, O.; Fernández-Leiro, R. Automated and modular protein binder design with BinderFlow. PLOS Comput. Biol. 2025, 21, e1013747 DOI: 10.1371/journal.pcbi.1013747
  There is no corresponding record for this reference.
86. 86
  Danny, B. Ovo, an Open-Source Ecosystem for De Novo Protein Design.
  There is no corresponding record for this reference.
87. 87
  Baker, D. An exciting but challenging road ahead for computational enzyme design. Protein Sci. 2010, 19, 1817– 1819, DOI: 10.1002/pro.481
  There is no corresponding record for this reference.
88. 88
  Beadle, B. M.; Shoichet, B. K. Structural Bases of Stability–function Tradeoffs in Enzymes. J. Mol. Biol. 2002, 321, 285– 296, DOI: 10.1016/S0022-2836(02)00599-5
  There is no corresponding record for this reference.
89. 89
  Barlow, K. A.; Conchúir, S. Ó.; Thompson, S. Flex ddG: Rosetta Ensemble-Based Estimation of Changes in Protein–Protein Binding Affinity upon Mutation. J. Phys. Chem. B 2018, 122, 5389– 5399, DOI: 10.1021/acs.jpcb.7b11367
  There is no corresponding record for this reference.
90. 90
  Shringari, S. R.; Giannakoulias, S.; Ferrie, J. J.; Petersson, E. J. Rosetta custom score functions accurately predict ΔΔG of mutations at protein–protein interfaces using machine learning. Chem. Commun. 2020, 56, 6774– 6777, DOI: 10.1039/D0CC01959C
  There is no corresponding record for this reference.
91. 91
  Smith, S. T.; Meiler, J. Assessing multiple score functions in Rosetta for drug discovery. PLoS One 2020, 15, e0240450 DOI: 10.1371/journal.pone.0240450
  There is no corresponding record for this reference.
92. 92
  Alford, R. F.; Leaver-Fay, A.; Jeliazkov, J. R. The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design. J. Chem. Theory Comput. 2017, 13, 3031– 3048, DOI: 10.1021/acs.jctc.7b00125
  There is no corresponding record for this reference.
93. 93
  Tyka, M. D.; Keedy, D. A.; André, I. Alternate States of Proteins Revealed by Detailed Energy Landscape Mapping. J. Mol. Biol. 2011, 405, 607– 618, DOI: 10.1016/j.jmb.2010.11.008
  There is no corresponding record for this reference.
94. 94
  Planas-Iglesias, J.; Marques, S. M.; Pinto, G. P. Computational design of enzymes for biotechnological applications. Biotechnol. Adv. 2021, 47, 107696 DOI: 10.1016/j.biotechadv.2021.107696
  There is no corresponding record for this reference.
95. 95
  Guo, H.-B.; Perminov, A.; Bekele, S. AlphaFold2 models indicate that protein sequence determines both structure and dynamics. Sci. Rep. 2022, 12, 10696 DOI: 10.1038/s41598-022-14382-9
  There is no corresponding record for this reference.
96. 96
  Agarwal, V.; McShan, A. C. The power and pitfalls of AlphaFold2 for structure prediction beyond rigid globular proteins. Nat. Chem. Biol. 2024, 20, 950– 959, DOI: 10.1038/s41589-024-01638-w
  There is no corresponding record for this reference.
97. 97
  Abramson, J.; Adler, J.; Dunger, J. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024, 630, 493– 500, DOI: 10.1038/s41586-024-07487-w
  There is no corresponding record for this reference.
98. 98
  Friedland, G. D.; Linares, A. J.; Smith, C. A.; Kortemme, T. A Simple Model of Backbone Flexibility Improves Modeling of Side-chain Conformational Variability. J. Mol. Biol. 2008, 380, 757– 774, DOI: 10.1016/j.jmb.2008.05.006
  There is no corresponding record for this reference.
99. 99
  Kellogg, E. H.; Leaver-Fay, A.; Baker, D. Role of conformational sampling in computing mutation-induced changes in protein structure and stability. Proteins:Struct., Funct., Bioinf. 2011, 79, 830– 838, DOI: 10.1002/prot.22921
  There is no corresponding record for this reference.
100. 100
  Durham, E.; Dorr, B.; Woetzel, N.; Staritzbichler, R.; Meiler, J. Solvent accessible surface area approximations for rapid and accurate protein structure prediction. J. Mol. Model. 2009, 15, 1093– 1108, DOI: 10.1007/s00894-009-0454-9
  There is no corresponding record for this reference.
101. 101
  Bertalan, É.; Lešnik, S.; Bren, U.; Bondar, A.-N. Protein-water hydrogen-bond networks of G protein-coupled receptors: Graph-based analyses of static structures and molecular dynamics. J. Struct. Biol. 2020, 212, 107634 DOI: 10.1016/j.jsb.2020.107634
  There is no corresponding record for this reference.
102. 102
  Weiss, G. A.; Watanabe, C. K.; Zhong, A.; Goddard, A.; Sidhu, S. S. Rapid mapping of protein functional epitopes by combinatorial alanine scanning. Proc. Natl. Acad. Sci. U. S. A. 2000, 97, 8950– 8954, DOI: 10.1073/pnas.160252097
  There is no corresponding record for this reference.
103. 103
  Cunningham, B. C.; Wells, J. A. High-Resolution Epitope Mapping of hGH-Receptor Interactions by Alanine-Scanning Mutagenesis. Science 1989, 244, 1081– 1085, DOI: 10.1126/science.2471267
  There is no corresponding record for this reference.
104. 104
  Liu, H.; Song, L.; Meng, X. Proline-Mediated Enhancement in Evolvability of Disulfide-Rich Peptides for Discovering Protein Binders. J. Am. Chem. Soc. 2025, 147, 24870– 24883, DOI: 10.1021/jacs.5c07075
  There is no corresponding record for this reference.
105. 105
  Holden, J. K.; Pavlovicz, R.; Gobbi, A.; Song, Y.; Cunningham, C. N. Computational Site Saturation Mutagenesis of Canonical and Non-Canonical Amino Acids to Probe Protein-Peptide Interactions. Front. Mol. Biosci. 2022, 9, 848689 DOI: 10.3389/fmolb.2022.848689
  There is no corresponding record for this reference.
106. 106
  Spina, S. C.; Bailey, J.; Kimmel, B. Bind, catalyze, and quantify: a modern protein and enzyme engineering toolbox of genetically encoded non-canonical amino acids Protein Eng. Des. Sel. 2026gzag007 DOI: 10.1093/protein/gzag007 .
  There is no corresponding record for this reference.
107. 107
  Chen, Y.; Clay, N.; Phan, N. Molecular Matchmakers: Bioconjugation Techniques Enhance Prodrug Potency for Immunotherapy. Mol. Pharmaceutics 2025, 22, 58– 80, DOI: 10.1021/acs.molpharmaceut.4c00867
  There is no corresponding record for this reference.
108. 108
  Wang, R.; Fang, X.; Lu, Y.; Wang, S. The PDBbind Database: Collection of Binding Affinities for Protein–Ligand Complexes with Known Three-Dimensional Structures. J. Med. Chem. 2004, 47, 2977– 2980, DOI: 10.1021/jm030580l
  There is no corresponding record for this reference.
109. 109
  Liu, Z.; Su, M.; Han, L. Forging the Basis for Developing Protein–Ligand Interaction Scoring Functions. Acc. Chem. Res. 2017, 50, 302– 309, DOI: 10.1021/acs.accounts.6b00491
  There is no corresponding record for this reference.
110. 110
  King, B. R.; Sumida, K. H.; Caruso, J. L.; Baker, D.; Zalatan, J. G. Computational Stabilization of a Non-Heme Iron Enzyme Enables Efficient Evolution of New Function. Angew. Chem., Int. Ed. 2025, 64, e202414705 DOI: 10.1002/anie.202414705
  There is no corresponding record for this reference.
111. 111
  Howlader, M. T. H.; Kagawa, Y.; Miyakawa, A. Alanine Scanning Analyses of the Three Major Loops in Domain II of Bacillus thuringiensis Mosquitocidal Toxin Cry4Aa. Appl. Environ. Microbiol. 2010, 76, 860– 865, DOI: 10.1128/AEM.02175-09
  There is no corresponding record for this reference.
112. 112
  Paul, R.; Kasahara, K.; Sasaki, J. Unveiling the affinity–stability relationship in anti-measles virus antibodies: a computational approach for hotspots prediction. Front. Mol. Biosci. 2024, 10, 1302737 DOI: 10.3389/fmolb.2023.1302737
  There is no corresponding record for this reference.
113. 113
  Romero-Rivera, A.; Garcia-Borràs, M.; Osuna, S. Role of Conformational Dynamics in the Evolution of Retro-Aldolase Activity. ACS Catal. 2017, 7, 8524– 8532, DOI: 10.1021/acscatal.7b02954
  There is no corresponding record for this reference.
114. 114
  Lemkul, J. A. Introductory Tutorials for Simulating Protein Dynamics with GROMACS. J. Phys. Chem. B 2024, 128, 9418– 9435, DOI: 10.1021/acs.jpcb.4c04901
  There is no corresponding record for this reference.
115. 115
  Sanbonmatsu, K. Y.; Joseph, S.; Tung, C.-S. Simulating movement of tRNA into the ribosome during decoding. Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 15854– 15859, DOI: 10.1073/pnas.0503456102
  There is no corresponding record for this reference.
116. 116
  Li, R.; Macnamara, L.; Leuchter, J.; Alexander, R.; Cho, S. MD Simulations of tRNA and Aminoacyl-tRNA Synthetases: Dynamics, Folding, Binding, and Allostery. Int. J. Mol. Sci. 2015, 16, 15872– 15902, DOI: 10.3390/ijms160715872
  There is no corresponding record for this reference.
117. 117
  Patel, S.; Hosur, R. V. Replica exchange molecular dynamics simulations reveal self-association sites in M-Crystallin caused by mutations provide insights of cataract. Sci. Rep. 2021, 11, 23270 DOI: 10.1038/s41598-021-02728-8
  There is no corresponding record for this reference.
118. 118
  Stelzl, L. S.; Hummer, G. Kinetics from Replica Exchange Molecular Dynamics Simulations. J. Chem. Theory Comput. 2017, 13, 3927– 3935, DOI: 10.1021/acs.jctc.7b00372
  There is no corresponding record for this reference.
119. 119
  Feig, M.; Nawrocki, G.; Yu, I.; Wang, P.; Sugita, Y. Challenges and opportunities in connecting simulations with experiments via molecular dynamics of cellular environments. J. Phys. Conf. Ser. 2018, 1036, 012010 DOI: 10.1088/1742-6596/1036/1/012010
  There is no corresponding record for this reference.
120. 120
  Kumari, I.; Sandhu, P.; Ahmed, M.; Akhter, Y. Molecular Dynamics Simulations, Challenges and Opportunities: A Biologist’s Prospective. Curr. Protein Pept. Sci. 2017, 18, 1163– 1179, DOI: 10.2174/1389203718666170622074741
  There is no corresponding record for this reference.
121. 121
  Senn, H. M.; Thiel, W. QM/MM Methods for Biomolecular Systems. Angew. Chem., Int. Ed. 2009, 48, 1198– 1229, DOI: 10.1002/anie.200802019
  There is no corresponding record for this reference.
122. 122
  Lopes, P. E. M.; Guvench, O.; MacKerell, A. D. Current Status of Protein Force Fields for Molecular Dynamics Simulations. In Molecular Modeling of Proteins; Kukol, A., Ed.; Springer: New York, New York, NY, 2015; Vol. 1215, pp 47– 71.
  There is no corresponding record for this reference.
123. 123
  McMillin, D. R. Interatomic Repulsion and the Pauli Principle. J. Chem. Educ. 2021, 98, 2912– 2918, DOI: 10.1021/acs.jchemed.1c00326
  There is no corresponding record for this reference.
124. 124
  Guvench, O.; MacKerell, A. D. Comparison of Protein Force Fields for Molecular Dynamics Simulations. In Molecular Modeling of Proteins; Kukol, A., Ed.; Humana Press: Totowa, NJ, 2008; Vol. 443, pp 63– 88.
  There is no corresponding record for this reference.
125. 125
  Warshel, A.; Sharma, P. K.; Kato, M. Electrostatic Basis for Enzyme Catalysis. Chem. Rev. 2006, 106, 3210– 3235, DOI: 10.1021/cr0503106
  There is no corresponding record for this reference.
126. 126
  Van Der Kamp, M. W.; Mulholland, A. J. Combined Quantum Mechanics/Molecular Mechanics (QM/MM) Methods in Computational Enzymology. Biochemistry 2013, 52, 2708– 2728, DOI: 10.1021/bi400215w
  There is no corresponding record for this reference.
127. 127
  Magliery, T. J. Protein stability: computation, sequence statistics, and new experimental methods. Curr. Opin. Struct. Biol. 2015, 33, 161– 168, DOI: 10.1016/j.sbi.2015.09.002
  There is no corresponding record for this reference.
128. 128
  Singh, A.; Upadhyay, V.; Upadhyay, A. K.; Singh, S. M.; Panda, A. K. Protein recovery from inclusion bodies of Escherichia coli using mild solubilization process. Microb. Cell Factories 2015, 14, 41 DOI: 10.1186/s12934-015-0222-8
  There is no corresponding record for this reference.
129. 129
  Sormanni, P.; Aprile, F. A.; Vendruscolo, M. The CamSol Method of Rational Design of Protein Mutants with Enhanced Solubility. J. Mol. Biol. 2015, 427, 478– 490, DOI: 10.1016/j.jmb.2014.09.026
  There is no corresponding record for this reference.
130. 130
  Li, B.; Ming, D. GATSol, an enhanced predictor of protein solubility through the synergy of 3D structure graph and large language modeling. BMC Bioinf. 2024, 25, 204 DOI: 10.1186/s12859-024-05820-8
  There is no corresponding record for this reference.
131. 131
  Tan, Y.; Zheng, J.; Hong, L.; Zhou, B. ProtSolM: Protein Solubility Prediction with Multi-modal Features, arXiv:2406.19744. arXiv.org e-Print archive. https://arxiv.org/abs/2406.19744. 2024.
  There is no corresponding record for this reference.
132. 132
  Ghosh, D.; Biswas, A.; Radhakrishna, M. Advanced computational approaches to understand protein aggregation. Biophys. Rev. 2024, 5, 021302 DOI: 10.1063/5.0180691
  There is no corresponding record for this reference.
133. 133
  Oeller, M.; Kang, R.; Bell, R. Sequence-based prediction of pH-dependent protein solubility using CamSol. Brief. Bioinform. 2023, 24, bbad004 DOI: 10.1093/bib/bbad004
  There is no corresponding record for this reference.
134. 134
  Kimmel, B. R.; Mrksich, M. Development of an Enzyme-Inhibitor Reaction Using Cellular Retinoic Acid Binding Protein II for One-Pot Megamolecule Assembly. Chem. - Eur. J. 2021, 27, 17843– 17848, DOI: 10.1002/chem.202103059
  There is no corresponding record for this reference.
135. 135
  Kimmel, B. R.; Modica, J. A.; Parker, K.; Dravid, V.; Mrksich, M. Solid-Phase Synthesis of Megamolecules. J. Am. Chem. Soc. 2020, 142, 4534– 4538, DOI: 10.1021/jacs.9b12003
  There is no corresponding record for this reference.
136. 136
  Adomanis, R.; Phan, N.; Walter, G.; Kimmel, B. R. Modular Nanobody Conjugates with Controlled Topology Using Genetically Encoded Non-canonical Amino Acids. Preprint at https://doi.org/10.1101/2025.11.27.691038. 2025.
  There is no corresponding record for this reference.
137. 137
  Rosace, A.; Bennett, A.; Oeller, M. Automated optimization of solubility and conformational stability of antibodies and proteins. Nat. Commun. 2023, 14, 1937 DOI: 10.1038/s41467-023-37668-6
  There is no corresponding record for this reference.
138. 138
  Kuriata, A.; Iglesias, V.; Pujols, J. Aggrescan3D (A3D) 2.0: prediction and engineering of protein solubility. Nucleic Acids Res. 2019, 47, W300– W307, DOI: 10.1093/nar/gkz321
  There is no corresponding record for this reference.
139. 139
  Hirsch, M.; Desai, R. R.; Annaswamy, S.; Keatinge-Clay, A. T. Mutagenesis Supports AlphaFold Prediction of How Modular Polyketide Synthase Acyl Carrier Proteins Dock With Downstream Ketosynthases. Proteins:Struct., Funct., Bioinf. 2024, 92, 1375– 1384, DOI: 10.1002/prot.26733
  There is no corresponding record for this reference.
140. 140
  Araki, M.; Ekimoto, T.; Takemura, K. Molecular Dynamics Unveils Multiple-Site Binding of Inhibitors with Reduced Activity on the Surface of Dihydrofolate Reductase. J. Am. Chem. Soc. 2024, 146, 28685– 28695, DOI: 10.1021/jacs.4c04648
  There is no corresponding record for this reference.
141. 141
  Pimtawong, T.; Ren, J.; Lee, J.; Lee, H.-M.; Na, D. A review on computational models for predicting protein solubility. J. Microbiol. 2025, 63, 2408001 DOI: 10.71150/jm.2408001
  There is no corresponding record for this reference.
142. 142
  Navarro, S.; Ventura, S. Computational methods to predict protein aggregation. Curr. Opin. Struct. Biol. 2022, 73, 102343 DOI: 10.1016/j.sbi.2022.102343
  There is no corresponding record for this reference.
143. 143
  Prediction and Evaluation of Protein Aggregation with Computational Methods. In Methods in Molecular Biology; Springer US: New York, NY, 2025; pp 299– 314 DOI: 10.1007/978-1-0716-4196-5_17 .
  There is no corresponding record for this reference.
144. 144
  Santos, J.; Pujols, J.; Pallarès, I.; Iglesias, V.; Ventura, S. Computational prediction of protein aggregation: Advances in proteomics, conformation-specific algorithms and biotechnological applications. Comput. Struct. Biotechnol. J. 2020, 18, 1403– 1413, DOI: 10.1016/j.csbj.2020.05.026
  There is no corresponding record for this reference.
145. 145
  Arnold, F. H. Design by Directed Evolution. Acc. Chem. Res. 1998, 31, 125– 131, DOI: 10.1021/ar960017f
  There is no corresponding record for this reference.
146. 146
  Arnold, F. H. Directed evolution: Creating biocatalysts for the future. Chem. Eng. Sci. 1996, 51, 5091– 5102, DOI: 10.1016/S0009-2509(96)00288-6
  There is no corresponding record for this reference.
147. 147
  Cobb, R. E.; Chao, R.; Zhao, H. Directed evolution: Past, present, and future. AIChE J. 2013, 59, 1432– 1440, DOI: 10.1002/aic.13995
  There is no corresponding record for this reference.
148. 148
  Yang, J.; Lal, R. G.; Bowden, J. C. Active learning-assisted directed evolution. Nat. Commun. 2025, 16, 714 DOI: 10.1038/s41467-025-55987-8
  There is no corresponding record for this reference.
149. 149
  Terashi, G.; Wang, X.; Maddhuri Venkata Subramaniya, S. R.; Tesmer, J. J. G.; Kihara, D. Residue-wise local quality estimation for protein models from cryo-EM maps. Nat. Methods 2022, 19, 1116– 1125, DOI: 10.1038/s41592-022-01574-4
  There is no corresponding record for this reference.
150. 150
  Graille, M.; Sacquin-Mora, S.; Taly, A. Best Practices of Using AI-Based Models in Crystallography and Their Impact in Structural Biology. J. Chem. Inf. Model. 2023, 63, 3637– 3646, DOI: 10.1021/acs.jcim.3c00381
  There is no corresponding record for this reference.
151. 151
  Wang, X.; Zhu, H.; Terashi, G.; Taluja, M.; Kihara, D. DiffModeler: large macromolecular structure modeling for cryo-EM maps using a diffusion model. Nat. Methods 2024, 21, 2307– 2317, DOI: 10.1038/s41592-024-02479-0
  There is no corresponding record for this reference.
152. 152
  Serapian, S. A.; Crosby, J.; Crump, M. P.; Van Der Kamp, M. W. Path to Actinorhodin: Regio- and Stereoselective Ketone Reduction by a Type II Polyketide Ketoreductase Revealed in Atomistic Detail. JACS Au 2022, 2, 972– 984, DOI: 10.1021/jacsau.2c00086
  There is no corresponding record for this reference.
153. 153
  Shukla, V. K.; Karunanithy, G.; Vallurupalli, P.; Hansen, D. F. A combined NMR and deep neural network approach for enhancing the spectral resolution of aromatic side chains in proteins. Sci. Adv. 2024, 10, eadr2155 DOI: 10.1126/sciadv.adr2155
  There is no corresponding record for this reference.
154. 154
  Drake, Z. C.; Fowler, A. G.; Blum, A. A.; Lindert, S. Enhanced Protein Complex Prediction via Rosetta, AlphaFold, and Nondifferential Covalent Labeling Mass Spectrometry. J. Phys. Chem. B 2025, 129, 6489– 6497, DOI: 10.1021/acs.jpcb.5c02872
  There is no corresponding record for this reference.
155. 155
  Lee, C. Y.; Hubrich, D.; Varga, J. K. Systematic discovery of protein interaction interfaces using AlphaFold and experimental validation. Mol. Syst. Biol. 2024, 20, 75– 97, DOI: 10.1038/s44320-023-00005-6
  There is no corresponding record for this reference.
156. 156
  Koehler Leman, J.; Künze, G. Recent Advances in NMR Protein Structure Prediction with ROSETTA. Int. J. Mol. Sci. 2023, 24, 7835, DOI: 10.3390/ijms24097835
  There is no corresponding record for this reference.
157. 157
  Alshammari, M.; Wriggers, W.; Sun, J.; He, J. Refinement of AlphaFold2 models against experimental and hybrid cryo-EM density maps. QRB Discovery 2022, 3, e16 DOI: 10.1017/qrd.2022.13
  There is no corresponding record for this reference.
158. 158
  Humphreys, I. R.; Pei, J.; Baek, M. Computed structures of core eukaryotic protein complexes. Science 2021, 374, eabm4805 DOI: 10.1126/science.abm4805
  There is no corresponding record for this reference.
159. 159
  Bordin, N.; Sillitoe, I.; Nallapareddy, V. AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms. Commun. Biol. 2023, 6, 160 DOI: 10.1038/s42003-023-04488-9
  There is no corresponding record for this reference.
160. 160
  Wang, H.; Wang, J. How cryo-electron microscopy and X-ray crystallography complement each other. Protein Sci. 2017, 26, 32– 39, DOI: 10.1002/pro.3022
  There is no corresponding record for this reference.

Artificial Intelligence in Chemical Engineering: Protein Design from First Principles to Structural PredictionClick to copy article linkArticle link copied!

ACS Engineering Au

License Summary*

Abstract

License Summary*

License Summary*

License Summary*

License Summary*

License Summary*

Special Issue

Introduction

Figure 1

Figure 2

Foundational Structure Prediction and Design Frameworks

ROSETTA and PyRosetta

AlphaFold: Structure Prediction at Scale

ColabFold: Standardized Prediction

RoseTTAFold: Expansion to All-Atom Modeling

Figure 3

Shared Constraints of Foundational Frameworks

Generative Backbone and Sequence Design

Figure 4

Diffusion: Backbone Construction and Modeling as a Generative Process

ProteinMPNN: Sequence Design as Geometric Prediction

LigandMPNN: Incorporating Chemical Context into Sequence Design

PLACER: Active-Site Geometry as a Filtering Step

Figure 5

Protein Large Language Models and Sequence-Space Design

Generative A.I.

Workflows for Model Training and Protein Design

In Silico Evaluation of Designed Proteins

Static Scoring and Foldability Screening

Binding Energetics and Interface Quality

Figure 6

Dynamics, Sampling, and the Accuracy-Efficiency Trade-Off

Figure 7

Solubility and Aggregation Behaviors

Limitations of In Silico Evaluation

Directed Evolution as a Complement to De Novo Design

Figure 8

Experimental Validation of AI-Generated Protein Tools

Figure 9

Limitations and Future Directions in Computational Protein Design

Author Information

Acknowledgments

References

Cited By

ACS Engineering Au

License Summary*

Article Views

Altmetric

Citations

Recommended Articles

Abstract

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

References

Artificial Intelligence in Chemical Engineering: Protein Design from First Principles to Structural Prediction
Click to copy article linkArticle link copied!