• Open Access
Review

Artificial Intelligence in Chemical Engineering: Protein Design from First Principles to Structural Prediction
Click to copy article linkArticle link copied!

  • Joseph S. Bailey Jr.
    Joseph S. Bailey, Jr.
    Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W Woodruff Avenue, Columbus, Ohio 43210, United States
  • Søren C. Spina
    Søren C. Spina
    Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W Woodruff Avenue, Columbus, Ohio 43210, United States
  • Andrew Hu
    Andrew Hu
    College of Medicine, The Ohio State University, 460 W 10th Avenue, Columbus, Ohio 43210, United States
    More by Andrew Hu
  • Nathan Phan
    Nathan Phan
    Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W Woodruff Avenue, Columbus, Ohio 43210, United States
    More by Nathan Phan
  • Rachel B. Getman
    Rachel B. Getman
    Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W Woodruff Avenue, Columbus, Ohio 43210, United States
  • Blaise R. Kimmel*
    Blaise R. Kimmel
    Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W Woodruff Avenue, Columbus, Ohio 43210, United States
    Center for Cancer Engineering, Ohio State University Comprehensive Cancer Center, The Ohio State University, 2255 Kenny Road, Columbus, Ohio 43210, United States
    Pelotonia Institute for Immuno-Oncology, Ohio State University Comprehensive Cancer Center, The Ohio State University, 2255 Kenny Road, Columbus, Ohio 43210, United States
    *Email: [email protected]
Open PDF

ACS Engineering Au

Cite this: ACS Eng. Au 2026, XXXX, XXX, XXX-XXX
Click to copy citationCitation copied!
https://doi.org/10.1021/acsengineeringau.5c00099
Published March 18, 2026

© 2026 The Authors. Published by American Chemical Society. This publication is licensed under

CC-BY-NC-ND 4.0 .

Abstract

Click to copy section linkSection link copied!

Machine learning and artificial intelligence are improving the speed and accuracy of every step during the protein design process. Early computational strategies relied on physics-based modeling and energy functions to identify amino acid sequences and desired folds. Recent advances in deep-learning structure prediction, diffusion-based backbone generation, and graph-based sequence design now allow researchers to explore the protein sequence and structural space more efficiently. These developments allow proteins to be used as fundamental systems whose components can be engineered with high precision. Computational predictions still struggle to properly account for conformational dynamics, catalytic environments, external interactions, and the broader chemical diversity present in natural enzymes. This review covers the progression from physics-based methods to deep learning, generative methods, and includes current strategies for evaluating stability and function in silico and experimentally.

This publication is licensed under

CC-BY-NC-ND 4.0 .
  • cc licence
  • by licence
  • nc licence
  • nd licence
© 2026 The Authors. Published by American Chemical Society

Special Issue

Published as part of ACS Engineering Au special issue “AI and Machine Learning in Chemical Engineering: Breakthroughs and Applications”.

Introduction

Click to copy section linkSection link copied!

At the atomic scale, enzymes can be understood as molecule-scale bioreactors that catalyze reactions through fundamental chemical engineering principles such as thermodynamics, kinetics, transport phenomena, and mass and energy balances. Engineered metalloenzymes now perform unique transformations under conditions that once required high temperatures, (1−3) a trend mirrored by machine-learning (ML) engineered hydrolases and proteomics platforms that streamline plastic depolymerization at ambient temperatures (3,4) and peptide discovery with minimal costly trial-and-error screens. (5) Industrial protein engineering pipelines extend this trajectory via the production of vitamins, biofuels, and specialty chemicals with improved stability and selectivity via rational and de novo computational protein design. (6) These advances show the application of chemical engineering principles to programmable biomaterials.
Protein engineering involves systematically modifying natural proteins to investigate, alter, or repurpose their inherent biological functions and to design novel proteins for specific applications. Over the past four decades, the field has moved from making incremental changes to naturally occurring proteins toward de novo protein design, the creation of proteins with defined structures and precise functionalities from scratch, (7) now operating at a scale and level of precision that was previously unrealistic. Advances in computational modeling, structural biology, and directed evolution have reshaped protein design, making it possible to build enzymes with atomic-level structural accuracy, including folds and functions not yet found in nature. (8,9)
For much of its history, structural uncertainty has limited the extent to which proteins can be used as tunable bioreactors due to a fundamental constraint known as the protein folding problem. This problem explores how a linear amino acid sequence defines a specified three-dimensional (3D) structure and thus determines the protein function (Figure 1). Protein folding is thermodynamically driven toward low free-energy states, but the expansive conformational search space described by Levinthal’s paradox, which says that if a protein sampled all possible conformations randomly it would take longer than the age of the universe to fold, makes uninformed structural prediction impossible. (10,11) For decades, the gap between physical theory and prediction constrained rational protein engineering, defining one of the greatest unsolved questions in biology. (12) Structural insight in the 1980s and 1990s was limited and in many cases simply unavailable, forcing researchers to rely on mechanistic intuition via “minimal” design, where simplified protein-like structures that capture only the most essential features of folding are used for “rational” design. This intuitive approach uses iterative trial-and-error mutation rounds to slowly progress toward functional improvement. (13) Directed evolution provided a powerful alternative but is limited in widespread use due to the need for advanced experimental screens, which are costly, labor-intensive, and time-consuming. (7,9)

Figure 1

Figure 1. From sequence to protein structure and conformational behavior. (A) Biological information transfer follows a deterministic pathway from DNA to RNA to protein, linking the encoded sequence information to the emergent molecular function and dynamics. (B) Input amino acid sequences serve as the basis for predictive modeling frameworks. (C) Sequence-informed A.I./ML frameworks trained on sequence and structural ensemble data learn the mapping between linear sequences and conformation space. (D) The resulting structural ensemble offers a data-driven view of protein flexibility and structural diversity derived directly from the amino acid sequence. Reprinted or adapted with permission under a CC-BY 3.0 License from Ille et al. (14) Copyright 2025 AIP Publishing.

Identifying sequences that reliably define a desired structure led to the inverse folding problem. (12,15) Early computational approaches neglected flexibility, solvation, and entropic effects, often leading to unstable or misfolded structures despite the use of fixed-backbone design and rotamer libraries intended to make protein folding more manageable. Although advances in algorithm development and statistical mechanics in the early 2000s enabled credible de novo designs, structural prediction continued to constrain the shift from conceptual feasibility to routine engineering. (13,17)
Change toward functional design began when physics-based platforms, such as ROSETTA and PyRosetta, improved conformational sampling. By using explicit energy functions to aid deep-learning approaches, including AlphaFold and RoseTTAFold, physics-based methods achieve near-experimental accuracy. (18,19) In parallel, large-scale protein language models (LLMs) such as ESM and ProGen are trained directly on the sequence data. These autoregressive architectures show that multiple sequence alignments (MSAs) are not always necessary and that structural and functional information can be inferred solely from sequence space. Diffusion-based models further expanded generative design in both structure and sequence space by reframing folding, docking, and binder generation as probabilistic sampling problems. Developments that allow sequence, structure, and function to be optimized in tandem result in the generation of proteins with diverse folds and improved functionality (7,8,13) (Figure 2).

Figure 2

Figure 2. Conceptual view of the protein functional universe. The diagram maps the relationships among sequence, structure, and function spaces. Each circle represents an individual protein defined by its amino acid sequence, 3D folds, and biological activity. The blue circles correspond to proteins accessible through natural evolution or traditional protein engineering, primarily clustered within well-explored regions (yellow). Gray circles indicate proteins that remain uncharacterized and lie within the unexplored sequence–structure–function space. The red circles represent proteins accessible through ML-driven de novo design, which extends exploration beyond natural boundaries into previously inaccessible regions. In this framework, sequence space (top layer) is linked to structure space (middle layer) and ultimately to function space (bottom layer), with A.I. methods systematically probing across all three layers. Reprinted or adapted with permission under a CC-BY 4.0 License from Zhang et al. (22) Copyright 2025 MDPI.

The integration of advances in structural prediction with engineering principles has increased protein tunability, as synthetic proteins are emerging as realistic alternatives to natural proteins in medicine, biotechnology, and sustainability. Active sites, binding interfaces, and stability profiles can now be co-optimized within defined thermodynamic and kinetic limits, resulting in designed enzymes that are being applied to green chemistry, protein binders for clinical pipelines, and orthogonal tRNA-synthetase pairs for the expansion of the genetic code and the incorporation of noncanonical amino acids (ncAAs). (13,20,21) Modern design tools support binder design, docking analysis, interaction scoring, and interface confidence measures for the predicted complexes. Working alongside first-principles approaches, ML expands classical chemical engineering into synthetic biology by enabling exploration of sequence space at previously inaccessible scales. Applications to defined biological targets–from generating programmable binders to repurposing existing scaffolds–show that the development of new generative models is a rapidly expanding field where the ideation of biological structure can be predicted for highly specialized applications. These tools are the “engines” driving a new era of biochemical innovation. These tools open the door to expanding genetic codes, new biocatalysts, and programmable protein machines by connecting molecular design to industrial applications in medicine, sustainability, and synthetic biology.

Foundational Structure Prediction and Design Frameworks

Click to copy section linkSection link copied!

ROSETTA and PyRosetta

Introduced in the late 1990s by the Baker laboratory, ROSETTA remains one of the most influential physics-based platforms in computational protein design, laying the foundation for modern docking and design protocols using a protein fragment assembly approach, in which short peptide fragments from protein structures are recombined to approximate unknown folds (Table 1). (23,24) ROSETTA uses Monte Carlo sampling together with hybrid energy functions that integrate physics-based terms such as van der Waals interactions, hydrogen bonding, electrostatics, and solvation with knowledge-based statistical potentials derived from structural data sets to predict unknown folds without requiring an exhaustive search over all conformations. (25) This approach made ab initio folding computationally practical. (26) Over time, the suite has expanded well beyond its original capabilities and now includes protein–protein docking, flexible small molecules with RosettaLigand, (27) antibody modeling, enzyme active-site design, RosettaMatch-based metal coordination tuning, and covalent docking extensions. (25,27,28) Application of the ROSETTA toolkit has led to analyses of resistance mutations in HIV-1 protease inhibitors, backbone redesign to create more thermostable metalloenzymes, and the creation of biocatalysts for plastic degradation and green chemical synthesis. (25,26,28)
Table 1. Summary of the Foundational Prediction and Validation Framework
packageprimary roleintegrationreported accuracy/benchmarkskey architecturelimitations
ROSETTA (1998)physics-based modeling, de novo folding, docking, enzyme/binder design(1) modular protocols (RosettaScripts)(1) ab initio folding within 2–4 Å RMSD for small proteins(1) Monte Carlo sampling with fragment assembly(1) computationally expensive
(2) integrates with PyRosetta and experimental pipelines(2) accurate ligand docking (≤2 Å in RosettaLigand); successful in antibody modeling and enzyme design (24−26)(2) hybrid energy function (physics and knowledge-based potentials)(2) limited backbone flexibility in fixed-backbone design
(3) underrepresents entropy and solvent dynamics
(4) requires large sampling for success
PyRosetta (2010)scriptable interface for custom design workflows(1) python API to ROSETTA core(1) comparable accuracy to ROSETTA protocolsexposes ROSETTA “Pose” object, scoring functions, and movers to Python(1) requires user scripting; limited scalability without HPC
(2) integrates with NumPy/pandas/ML tools(2) flexible pipelines for alanine scanning, ΔΔG, interface mapping (29,30)(2) inherits ROSETTA’s scoring function, biases
(3) not inherently generative
AlphaFold2 (2020)high-accuracy structure prediction from sequenceused in nearly all modern pipelines as a validation filter(1) CASP14: median GDT_TS > 90deep attention networks (Evoformer and structure module) with iterative refinement(1) deterministic outputs
(2) subangstrom accuracy for many folds(2) limited conformational diversity
(3) proteome-scale modeling (34,37,47,48)(3) no motif conditioning
(4) no explicit ligand/cofactor modeling
ColabFold (2021)accessible high-throughput structure prediction(1) integrates AlphaFold2/RF models(1) CASP14 free modeling accuracy close to AlphaFold2(1) AlphaFold2/RF backbone adapted to Colab notebook(1) dependent on MSA quality
(2) uses MMseq2 for fast MSA generation(2) ≥40× faster MSA generation; robust on toxin families and multimer predictions (40−42,49)(2) MMseq2 for sequence search(2) reduce precision vs AlphaFold2
(3) used on Google Colab/local(3) deterministic
(4) limited support for rare folds or novel chemistries
RoseTTAFold (2021–2023)multitrack prediction (RF), nucleic acid complexes (RFNA), all-atom assemblies (RFAA)(1) extends to motif scaffolding, protein–ligand/nucleic acid complexes(1) RF: three-track models within 2–3 Å(1) three-track neural network (RF)(1) deterministic (RF/RFAA/RFNA)
(2) paired with ProteinMPNN/LigandMPNN(2) RFAA: subangstrom ligand placement(2) graph-based all-atom encoding (RFAA)(2) limited dynamics
(3) RFNA: improved protein–DNA/RNA accuracy (44−46)(3) sequence and structure alphabet expansion (RFNA)(3) incomplete coverage of novel chemistries
First introduced in 2010, PyRosetta extended ROSETTA capabilities to the Python interface. (29) This enabled rapid prototyping of custom pipelines without modifying the C++ core by granting direct access to the internal pose object (Protein Data Bank file), scoring functions, and movers. PyRosetta functions as an interface layer that accelerates hypothesis-driven design and has been widely adopted as a platform for research and education, highlighted by Jupyter notebook-based teaching modules that guide users through tasks such as protein folding, protein–protein and protein–ligand docking, and antibody design. (30) In industrial settings, PyRosetta has been used to optimize enzyme stability, where ΔΔG scanning evaluates the effects of hundreds of mutations in silico before experimental screening. (29) In immunoengineering, alanine-scanning and ΔΔG protocols have identified critical hotspot residues at the antibody–antigen interface. (30) PyRosetta has also been used for mutagenesis of synthetase residues, freezing of tRNA backbones, and binding energy calculations for ncAAs, providing a computational platform for genetic code expansion. (31) This flexibility allows integration with statistical analysis and ML libraries, custom energy functions, metalloenzyme development, and covalent hotspot evaluation. (29−33) Despite its broad capabilities and flexibility as a “general-purpose” modeling suite, ROSETTA is still computationally demanding and heavily reliant on sampling. Fixed-backbone models tend to miss contributions from entropy, solvent interactions, and conformational flexibility. Failure to capture the full dynamics of the system directly affects how thoroughly sequence and conformational space are explored and thus has a direct effect on model success.

AlphaFold: Structure Prediction at Scale

Upon its release in 2020, AlphaFold2 (AF2) set new field-wide standards in CASP14 (Critical Assessment of Structure Prediction) by achieving near-experimental accuracy for most targets. In this blind benchmark, structures must be predicted before experimental coordinates are released. (34−36) AlphaFold’s major impact has been the reduction of structural prediction uncertainty, and its precision stems from the integration of evolutionary information with an attention-based deep-learning architecture, the Evoformer, coupled to an end-to-end coordinate generation model that enforces three-dimensional (3D) spatial constraints. Using this framework, AF2 outperformed competing methods and achieved Global Distance Test Total Scores (GDT_TS) above 90 for most targets, which measures how closely predicted structures match the overall fold and backbone geometry of the experimental model. By resolving many long-standing challenges in the “protein folding problem”, AlphaFold is now routinely used as a structural filter in protein engineering workflows. Sequences are first screened in silico, and only those predicted to refold into high-confidence conformations move forward to experimental characterization. (37) AlphaFold has also contributed to cryo-EM map interpretation, molecular replacement strategies, protein complex structure prediction, and validation of de novo designs. (34,35,37) AlphaFold2 was used at the proteome scale, providing structural coverage of nearly the entire human proteome and thousands of proteins across diverse organisms. (35) Despite its broad utility, AlphaFold predictions are largely deterministic and offer limited conformational diversity. There are still limitations in motif conditioning, ligand placement, and explicit modeling of ncAAs or cofactors. (38,39) Confidence metrics like pLDDT speak to structural reliability, but they do not predict whether a design will express, remain soluble, or function catalytically.

ColabFold: Standardized Prediction

ColabFold adapted AlphaFold for rapid user-friendly execution by using MSAs in the Google Colaboratory (Colab) environment. (40) Colab is a free, cloud-based platform hosted by Google that runs Jupyter Notebooks in a web browser, providing users with access to GPUs without the need for local installation. By taking advantage of cloud GPUs, ColabFold makes cutting-edge structure prediction available to laboratories and classrooms worldwide, delivering results significantly faster while achieving comparable accuracy. (41) The notebook interface allows beginners to run protein predictions with minimal setup, while advanced users can use command line tools for batch processing and parameter tuning, contributing to the routine use of the platform for teaching, prototyping, and large-scale protein modeling. (40) Although slightly less precise than full AF2 or RoseTTAFold pipelines, the trade-off of marginally reduced precision in exchange for greatly increased throughput has made ColabFold well-suited for exploratory and comparative studies in which researchers must screen hundreds of candidate scaffolds or assess entire protein families, cases that would be impractical with AlphaFold or RoseTTAFold alone. (42) ColabFold, like AlphaFold, suffers from deterministic outputs, dependence on sequence alignment quality, and database size, offering limited conditioning flexibility, and as a result, is also best viewed as a scalable front-end filter. Although ColabFold shares AlphaFold’s constraints, the strength of the software lies in its throughput and accessibility, making ColabFold an entry point for the broad application of A.I.-based protein structure modeling across research, education, and design applications. (41)

RoseTTAFold: Expansion to All-Atom Modeling

RoseTTAFold (RF) introduced a three-track architecture that combines sequence, pairwise, geometry, and 3D coordinates. (43) For monomeric targets, RF is comparable in accuracy to AlphaFold but also provides structural flexibility for motif scaffolding and downstream adaptations. Early adaptations highlighted the potential of the RF framework for motif scaffolding, in which functional sites or short structural motifs can be embedded in novel backbones in a single step. Subsequent extensions such as RoseTTAFoldNA (RFNA) and RoseTTAFold All-Atom (RFAA) broaden the scope through the incorporation of nucleic acids for protein–DNA/RNA modeling and expanded descriptions of ligands, metals, and covalent modifications using graph-based atomic encodings. (44,45) The combination of sequence-based representation of proteins with graph-based atomic representation of ligands enabled the modeling of metalloenzymes, glycosylated antibodies, and small-molecule complexes within a unified framework. Cross-model consistency between AlphaFold2 and RoseTTAFold correlates with experimental foldability and solubility, providing a secondary in silico filter for the predicted de novo structure. (46) The generative extension RFDiffusion reframed protein design as a denoising process in joint sequence–structure space, enabling conditioning on motifs, symmetry, and functional constraints (Figure 3). This transition from accurate monomer prediction to all-atom biomolecular assemblies via controlled sampling represents a conceptual shift toward a probabilistic generative design.

Figure 3

Figure 3. Overview of the current protein design dogma. Traditional protein science is often described as a one-way flow in which (A) amino acid sequences give rise to (B) folded structures, which in turn underpin (C) biological function. Modern de novo design inverts this logic: researchers now begin with the desired function and work backward to identify compatible folds and sequences. Current computational frameworks align with three broad strategies: (1) two-stage design, in which structural generators such as ROSETTA, RoseTTAFold, or PyRosetta first propose candidate protein backbones that are then optimized by sequence design engines; (2) sequence-driven methods, exemplified by AlphaFold2 and ColabFold, which predict protein structures directly from amino acid sequence information and are widely used to validate or filter design candidates; and (3) coguided approaches, including multitrack RoseTTAFold variants (RF, RFNA, RFAA) and diffusion-based models (RFDiffusion), which integrate amino acid sequence and protein structure generation simultaneously. These complementary strategies extend the protein design beyond natural sequence–structure relationships, enabling a function-first exploration of protein space. Reprinted or adapted with permission under a CC-BY 4.0 License from Zhang et al. (22) Copyright 2025 MDPI.

Shared Constraints of Foundational Frameworks

Limitations of de novo protein design, such as deterministic results, are evident across the foundational frameworks. These methods typically generate a single or narrow set of static structural “snapshots” rather than dynamic ensembles, emphasizing backbone geometry over solvent dynamics, entropy, and conformational transitions. Though useful for validation, static predictions limit exploration of alternative conformational changes or functional states, which are often critical in systems where flexibility and allostery are central to successful design. In addition, generalization to ncAAs, metals, and post-translational modifications (PTMs) remains uneven, and computational accuracy does not guarantee experimental foldability or function. Even with advances such as RFAA, predictive reliability for small molecules, metal centers, or covalent modifications is not yet comparable to that of physics-based docking or QM/MM refinement.
There is a trade-off between computational resources and predictive depth. Highly accurate methods often require substantial time and GPU power, whereas faster, more accessible approaches, such as ColabFold, are less precise and less flexible. Overlapping limitations indicate that current tools excel at answering whether a sequence will fold but are less well-equipped to determine which sequence should be engineered to achieve a desired function or binding outcome. This distinction drove the development of both generative methods that operate directly in sequence space and diffusion-based approaches that treat folding, docking, and binder generation as probabilistic sampling problems and the next step in protein design.

Generative Backbone and Sequence Design

Click to copy section linkSection link copied!

Energy minimization has dominated protein design for decades. Given a backbone sequence, the residue and rotamer spaces are explored in search of variants that lower the overall protein score. High computational costs and an expansive conformational landscape limit the sampling efficiency of the available protein search space. To address this limitation, newer ML models use probabilistic methods, graph-based neural networks, and other machine-learning architectures to generate proteins with structural and functional diversity. This approach places value on a strong training data set rather than on heuristics and physics, building upon a structure-and-sequence-only design. By combining symmetry and three-dimensional constraints, novel methods harness sequence–structure relationships to create stable scaffolds that support catalytic motifs, bind small molecules, and form higher-order assemblies with increased precision (Figure 4).

Figure 4

Figure 4. Overview of the A.I.-driven protein design toolbox. According to their functional roles in A.I.-driven generative protein design, the protein design toolbox can be divided into five categories: (A) structure prediction frameworks (e.g., AlphaFold2, RoseTTAFold) that validate fold accuracy; (B) de novo backbone generators (RFDiffusion, RFDiffusionAA) that embed motifs or active sites into novel folds; (C) fixed-backbone sequence designers (LigandMPNN) that optimize sequences against a defined structural context; (D) sequence generation models (ProteinMPNN), which not only perform fixed-backbone optimization but also function as a generative sampler of amino acid sequences; and (E) sequence–structure cogeneration and refinement frameworks (PLACER), which jointly optimize side chains, ligands, and active-site geometry. Reprinted or adapted with permission under a CC-BY 4.0 License from Zhang et al. (22) Copyright 2025 MDPI.

Diffusion: Backbone Construction and Modeling as a Generative Process

RoseTTAFold Diffusion (RFDiffusion) is a generative deep-learning model that uses denoising diffusion probabilistic models (DDPMs), or “diffusion models” for protein backbone generation. (19) DDPMs reframe the backbone design optimization problem as a generative issue. RFDiffusion refines residue “frames” into geometrically consistent protein structures through learned denoising steps, operating in a residue-centered frame representation. By treating each residue as a separate rigid frame, local backbone geometry is preserved and proteins retain global flexibility, resulting in realistic protein backbones and diverse protein structures under minimal structural constraints. (50) This approach maintains a balance between global flexibility and backbone geometry, which is important for generating symmetric assemblies, multimeric complexes, and scaffolds built on functional motifs. RFDiffusion All-Atom (RFDiffusionAA) builds upon the original framework by adding explicit all-atom parameters (Table 2). This allows for the reshaping of binding pockets around small molecules, and ligand-aware conditioning for uses such as designing catalytic centers with an expansive molecular library. (45,51)
Table 2. Summary of Generative ML Frameworks for Protein Design
packageprimary roleintegrationreported accuracy/benchmarkskey architecturelimitations
RFDiffusion (2023)De novo protein backbone and functional motif designgenerates protein scaffolds for motif/catalyst embedding(1) diverse novel folds (up to 600 residues)(1) 3D frame-based denoising diffusion model using RoseTTAFold(1) high GPU cost
(2) RMSD ∼ 2 Å for motif placement; 42–54% success for TIM barrels(2) supports symmetric design, self-conditioning, and partial motif constraints(2) sampling variance
(3) 19% hit rate for binders(3) limited explicit ligand handling (addressed in RFDiffusionAA)
(4) 23/25 success rate in motif scaffolding(4) sensitive to motif constraints
(5) improved interface and side-chain quality in RFDiffusionAA (19)(5) challenges with polar interfaces
(6) stochastic outputs may vary
RFDiffusionAA (2024)active-site-aware protein backbone generation and binder designused for enzyme pocket design, synthetase-ligand scaffolding, and interface tuning(1) >20% increase in ΔΔG success(1) RFDiffusion fine-tuned on active-site data(1) requires detailed active-site input
(2) supports joint active-site and motif design(2) supports per-residue conditioning, side-chain aware diffusion, and flexible residue input(2) no end-to-end sequence optimization (must be coupled with ProteinMPNN and LigandMPNN)
(3) improved hallucination accuracy (19,45)
ProteinMPNN (2022)amino acid sequence design for fixed backbonesfollows RF/RFDiffusion scaffold generation(1) ∼50–55% native sequence recovery overallmessage passing graph neural network on protein backbone context(1) fixed backbone
(2) ∼90–95% for buried residues; 200× faster than ROSETTA (57,58)(2) no ligand/cofactor support
(3) no noncanonical AA modeling
(4) lacks backbone flexibility
LigandMPNN (2025)sequence optimization in the presence of ligandspocket-specific redesign postdocking or PLACER-generated poses(1) 63.3% sequence recovery (small molecules), 50.5% nucleotides, 77.5% metals(1) dual-graph neural network linking ligand atoms and protein residues(1) requires accurate initial ligand pose and placement
(2) Chi1 recovery ∼86% (59)(2) ligand-aware autoregressive design and side-chain packing(2) sparse data for rare chemotypes
PLACER (2025)active-site evaluation and pose refinementfilters/optimizes RFDiffusion and LigandMPNN output(1) RMSD ≈ 1.1 Å for ligand active-site alignmentSE(3)-equivariant graph transformer and denoising-based side-chain and ligand optimization(1) requires known ligand pose or transition-state geometry
(2) improves functional design success by 3–5× in catalytic benchmarks (63,64)(2) limited support for de novo ligand generation
(3) sensitive to backbone geometry errors
Protein backbones are constructed around functional residues given user-input features, such as symmetry, catalytic information, and 3D constraints. Iterative backbone creation allows for the exploration of backbone diversity while retaining global flexibility. In recent benchmarks, RFDiffusion has shown successful monomer generation, cyclic and polyhedral assemblies, and motif scaffolding without requiring symmetry templates. Experimental validation confirmed correct folding and oligomerization. (19) However, despite these improvements, the performance varies across systems. Modeling of polar interfaces, noncanonical residues and RNA-associated systems remains challenging. Ongoing developments aim to address these constraints to enable enhanced control over the active-site design of proteins in future releases. (51,52)
Diffusion methods are a recently developed and successful tool for scientists to improve molecular docking and design (Table 2). DiffDock integrates diffusion and molecular docking by denoising ligand translations, torsions, and rotations, ranking structures to obtain a final prediction. (53) DiffDock showed substantial improvement from previous methods for both traditional docking and docking with de novo structures. Similarly, diffusion models such as EvoDiff use evolutionary sequence data to design proteins relative to natural sequence and functional space. (54) An advantage of diffusive methods is user control. Models can be conditioned to specific inputs and outputs to generate a variety of biologically relevant proteins. Building upon these sequence-only methods, Protpardelle is a diffusion model that codesigns sequence and structure by focusing on the side-chain positions at multiple states before collapsing into a single state. (55) By denoising side-chain backbones together, all-atom frameworks such as Protpardelle can be conditioned strictly on side-chain function groups. Use of both diffusion and evolutionary-scale data has led to substantial improvements in previous frameworks, yielding functionally diverse and biologically relevant natural sequences.

ProteinMPNN: Sequence Design as Geometric Prediction

Once a backbone is defined, sequence assignment becomes the constraint. ProteinMPNN uses a “message passing” neural network that replaces a combinatorial search with conditional probabilities, reframing the residue assignment based on geometric predictions. Trained on approximately 20,000 high-resolution protein structures, ProteinMPNN revealed that local protein backbone context is a major determinant of amino acid residue identity. In MPNN frameworks, each residue is treated as a node and exchanges information with neighboring residues. Much like friends in a social network, residues update each other about their surroundings. Sequences are predicted from N-terminus to C-terminus and conditioned on geometric features (such as α-carbon, distances, and side-chain orientations), resulting in context-aware predictions for amino acid residue identity. (56) Using this graph-based architecture to customize the decoding order and identify constraints across chains, backbone contexts are inferred and computational costs are reduced, resulting in native sequence recovery increases of up to nearly 10% relative to ROSETTA fixed-backbone energy minimization methods.
ProteinMPNN sequences have been shown to frequently refold successfully under AlphaFold validation, particularly in the absence of MSAs. (57) It accomplishes this by using noise augmentation during training, which increases tolerance for imperfect backbones. This allows for better accommodation of symmetry-aware and multichain design. (58) The primary limitation of this framework is the assumption of a structural rigidity. Backbone flexibility, induced fit, and explicit ligand interactions are outside the model’s core assumptions. Despite these pitfalls, ProteinMPNN has become an effective second-stage filter in modern design pipelines.

LigandMPNN: Incorporating Chemical Context into Sequence Design

LigandMPNN extends sequence design in the presence of small molecules, nucleotides, and metals critical for enzyme and binding site engineering. (59) The key innovation is a dual-graph architecture. One graph encodes protein residues and the other encodes ligand atoms, allowing for information transfer between the ligand and protein. With this, residue identities and side-chain orientations are refined based on the ligand chemistry and the overall local environment. (59) In essence, this process is similar to planning a dinner table for a multicourse meal, where guests (protein residues) choose their seats based not only on nearby friends (local protein backbone context) but also on the range of courses offered (the ligand and its atoms) and the order in which the dishes on the menu are served. This approach improves sequence recovery and packing accuracy at binding interfaces relative to backbone-only approaches.
Experimental studies demonstrate increased accuracy and broad utility using ProteinMPNN. Sequence recovery at ligand-contact positions reached 63.3% for small molecules, 50.5% for nucleotides, and 77.5% for metal-binding residues, significantly outperforming ProteinMPNN and ROSETTA (∼34–50%). (59) Successful binder redesign of weak or nonfunctional ROSETTA-derived structures improved binder affinity up to 100-fold, with over 100 confirmed complexes, including small molecules and nucleotide binders, metal coordination sites, and ligand-dependent protein switches. (60,61) When combined with generative backbone tools, LigandMPNN serves as a functional specificity filter in earlier stages of the design pipeline. The addition of ligand-aware conditioning without sacrificing speed enables the precise design of active sites and binding pockets tailored to specific chemical environments. Still, the effectiveness depends on accurate ligand placement and sufficient training data on rare chemotypes.

PLACER: Active-Site Geometry as a Filtering Step

PLACER (protein–ligand atomistic conformational ensemble resolver) focuses on the complementary challenge of catalytic preorganization and judging the precision of structures containing ligands and specialized residues. Unlike traditional docking tools, which treat proteins and ligands separately, PLACER represents both protein and ligand atoms as a unified molecular graph. By simultaneously docking and scoring structures instead of treating them as separate operations, PLACER learns spatial relationships directly from atomic coordinates. (62) This approach refines ligand coordinates and surrounding side chains within a geometry-aware neural network. The output includes both a predicted structure and confidence metric, predicted Root-Mean-Square Deviation (pRMSD), which ranks structures based on a geometric score that correlates with structural accuracy. (63)
In enzyme design benchmarks, structures filtered with PLACER achieve higher experimental success rates when compared directly to other docking frameworks, achieving 3–5× higher success rates. (62−65) The ability to predict ligand-binding structures without predocking or binding site mutations for catalytic compatibility, as well as its compatibility with other ML packages such as LigandMPNN distinguishes PLACER from other platforms. Overall, PLACER allows downstream ligand-specific sequence optimization before experimental testing (Figure 5). PLACER performs best when the starting backbone has the correct geometry but is dependent upon known ligand structures, has reduced performance on large and flexible cofactors, and is sensitive to receptor backbone displacement.

Figure 5

Figure 5. Timeline of major developments in protein structure prediction (black) and design methodologies (red). Following early innovations such as ROSETTA (1998) and PyRosetta (2010), the field saw nearly two decades of incremental progress before the emergence of transformative A.I.-based models such as AlphaFold2 (2020). Since then, breakthroughs in generative frameworks, including ProteinMPNN, RFDiffusion, and LigandMPNN, have rapidly expanded, marking a shift toward integrated prediction-design pipelines.

Protein Large Language Models and Sequence-Space Design

Click to copy section linkSection link copied!

Protein large language models (pLMs) are trained on an expansive protein sequence data set and learn statistical patterns that reflect physical constraints on folding, stability, and function. In contrast to structure-based approaches, pLMs infer relationships directly from sequence data. This enables structure prediction, functional analysis, and de novo sequence generation without the need for predefined structures.
ESM-2 is a protein language model that is trained by attempting to identify randomly masked amino acids in a protein sequence. By using ESM-2, a multiple sequence alignment (MSA) is not needed to complete structure predictions and a simpler neural architecture can be used. ESMFold is the single-sequence structure predictor that uses the ESM-2 language model. (66) In comparison to other structural predictors, because MSAs are not required, computational costs are significantly decreased and prediction speed improves substantially. ESM-2 showed comparable accuracy to AlphaFold and RoseTTAFold, revealing that unsupervised learning can use evolutionarily related sequences to predict the protein structure at high resolution. Operating within the latent space of ESM-2, ProtFlow is a flow-matching-based framework for generating de novo peptide sequences quickly with comprehensive semantic distribution learning. (67) ProtFlow was fine-tuned on antimicrobial peptides and successfully generated functional molecules that target underrepresented bacterial species.
ProtTrans was a foundational project that established how long and diverse pretraining significantly enhances the performance of pLMs. The project found that small-size supervised pLM embedded models performed similarly to methods that use MSAs. (68) ProGen is a language model that specializes in evolutionary sequence diversity and tunability through metrics that relate to primary sequence similarity, secondary structure accuracy, and conformational energy. (69) This is done by conditioning on keyword and taxonomic tags that relate sequences to cellular components, biological processes, and molecular function. Its successor, ProGen2, was trained on an even larger set of parameters with sequences sourced from genomic, metagenomic, and immune repertoire databases. (70) A feature of ProGen2 is its ability to generate new sequences and predict protein fitness without manual fine-tuning. Motivated by ProGen and similar protein autoregressive language models, ProtGPT2 was developed to generate proteins that are both stable and evolutionarily different from natural proteins. (71)
Language models are also being created to analyze sequences at the genome level. Evo is a multiscale model that can accomplish zero-shot prediction across biomolecule classes with comparable performance to domain-specific language models. (72) Evo can codesign protein–DNA and protein–RNA and successfully generate functional CRISPR-Cas complexes with transposable systems. Motivated by the success of ProTrans, Nucleotide Transformer applies masked language modeling to proteins and can accurately predict the context of nucleotide sequences without supervision. (73) Genome modeling has also expanded to other deep-learning forms, as shown with AlphaGenome. (74) Trained on protein-coding genes, AlphaGenome can perform multimodal prediction, long-sequence context, and base-pair resolution.

Generative A.I.

Generative A.I. has allowed new structural prediction models to be more informative, accurate, and customizable. Boltz-2 is a program that achieves similar structural prediction accuracy to AlphaFold while simultaneously excelling in binding affinity prediction. (75) Traditionally, free-energy perturbation (FEP) is the benchmark for predicting binding affinity; however, its high accuracy comes with increased computational cost. Boltz-2 trains on a diverse set of dynamic models and achieves binding affinity predictions with comparable accuracy to that of FEP while being over 1000 times faster. While structural predictions remain comparable to other models, Boltz-2’s key contribution is its improved ability to predict binding affinity. Other models, such as Chai-1, are reported to have higher accuracy for predicting protein multimer and protein–ligand structures than existing models. (76) To achieve this, Chai-1 was trained on both protein language model embeddings and multiple sequence alignments. Both Chai-1 and Boltz-2 are customizable models, where users can add constraints from experimentation to increase the prediction accuracy. Chroma is a generative model that extends user programmability in structural prediction even further. (77) By reversal of a correlated noise process, the generated structures follow the same distance-scaling patterns seen in real proteins. Users can apply external constraints, such as complex symmetry, predefined substructures, fixed-backbone arrangements, or even fully specified volumetric shapes, to guide the design process.
Generative modeling programs are also being applied to binder design, offering versatility and increased programmability. De novo antibody design is a unique challenge because complementarity-determining regions (CDR) must have an extremely precise binding affinity to target molecules. Germinal uses an antibody-specific language model and has demonstrated strong performance with experimental testing. (78) Bindcraft is compatible with many classes of protein targets, as shown through its success in generating binders for allergens, multidomain nucleases, and cell-surface receptors. (79) Other all-atom models, such as BoltzGen and ODesign allow user programmability when designing binders using features like covalent bonds, binding sites, and structural constraints, extending the functionality to binder design containing nucleic acid targets. (80,81)

Workflows for Model Training and Protein Design

A rapidly growing challenge when applying machine learning to biomolecular modeling is the necessary time and resources required to train large-scale models and complex biomolecular data. Computational workflows have emerged to streamline the protein design process to be automated, efficient, and more easily accessible to novice users. (82) BioNeMo is an open-source software to improve training throughput of A.I. models on GPUs for biomolecular design and drug discovery and has been recently used in algorithmic workflows for both blind docking and API-driven structural prediction. (83) Models such as these encourage individual user contributions to deepen and widen the scope of the current modeling tasks.
ProteinDJ is a specialized workflow for designing proteins on high-performance computing systems (HPC). (84) ProteinDJ has demonstrated that across eight GPUs, it can scale with 86.5% efficiency, substantially reducing the computational time. This pipeline includes tasks such as fold generation, sequence design, and design validation along with tunable features. BinderFlow specializes in de novo binder design and has a multifeature dashboard that provides real-time updates in a web interface. (85) Ovo has a data-driven quality control module, support for community plugins, and predicted-structure validation that uses the expansive ColabDesign library. (86) Each of these workflows supports different sets of programs and can provide automated options for protein design studies.

In Silico Evaluation of Designed Proteins

Click to copy section linkSection link copied!

In silico evaluations filter and rank desired proteins before wet-lab testing to reduce the experimental cost. Predictive tools estimate folding stability, binding affinity, catalytic potential, and solubility or aggregation risk to inform further optimization. Generative methods for sequence optimization are still imperfect since active-site models misrepresent chemical descriptors, folded proteins often fail to generate the intended geometry, and structural contexts can still hamper function through conformational instability or steric clashes. (87) Understanding and diagnosing these failures in a cost-effective way is critical to rapidly improving design reliability.

Static Scoring and Foldability Screening

Protein function is dependent on structural stability. Small destabilizations can diminish or eliminate catalytic function due to aggregation, misfolding, or proteolytic degradation. (88) Misfolded structures expose hydrophobic areas that lead to aggregation or prevent access to active-site residues essential for enzymatic function, as seen in loss-of-function mutations and diseases, such as Alzheimer’s disease and cystic fibrosis. (89,90) Energy-based scoring functions serve as the first structural stability filter, determining whether designed proteins adopt stable, physically plausible conformations. (90,91) ROSETTA’s ref2015 energy function combines van der Waals interactions, electrostatics, solvation, hydrogen bonding, and geometric preferences into a pseudoenergy score through a hybrid physics and knowledge-based framework. (91,92) The performance of static scoring functions like ref2015, however, is context-dependent. Reweighting approaches that blend experimental data with ML models, such as SRS2020, suggest that tailoring parameters to specific interfaces or mutations can improve ΔΔG predictions, outperforming the unmodified score functions.
Structure refinement tools, such as ROSETTA’s FastRelax protocol, are applied prior to scoring to relieve steric strain and optimize hydrogen-bond networks by iteratively repacking side chains and minimizing backbone energy. (92,93) Root-mean-square (RMSD) and quantitative energy comparison analyses between wild-type and mutant structures determine whether the designed proteins are structurally correct and energetically more favorable than other conformations. AlphaFold2-derived confidence metrics, such as predicted Local Distance Difference Test (pLDDT) and Predicted Aligned Error (PAE) matrix, act as proxies for protein flexibility and structural dynamics, offering a parallel “foldability screen” independent of static energy-function assumptions. (94) High average pLDDT scores above 80 correlate strongly with successful expression and folding, while low per-residue pLDDT often reflects flexibility rather than prediction error. The PAE matrix builds on this by quantifying interresidue positional confidence, pinpointing regions that static energy scores miss. (95,96) Though pLDDT and PAE provide static structural checks, neither predict catalytic potential nor do active-site geometry.
Static energy functions capture enthalpic contributions but underrepresent entropy, long-time scale flexibility, and solvent dynamics. Structures that depend on conformational rearrangement or tight geometric constraints cannot be fully captured by evaluations that provide “snapshots”, limiting the generation of more dynamic designs. AlphaFold3 scores protein–protein interactions during structural refinement using an interface-predicted template-modeling score (ipTM) to obtain confidence scores. (97)

Binding Energetics and Interface Quality

Binding affinity determines how effectively an enzyme can recognize and bind to a substrate, cofactor, or binding target. Poorly optimized interfaces result in weak or transient binding, off-target interactions, and a loss of function. The primary metric used to evaluate binding affinity is ΔΔG, which quantifies the free-energy change caused by a point mutation at a protein interface or active site and reflects the relative binding strength of individual residues (Figure 6). This approach has been used in protein engineering to help stabilize Fe(II)/αKG enzymes with ProteinMPNN to enhance thermostability and evolvability during directed evolution (110) and within alanine-scanning studies that mapped cooperative receptor-binding loops in Cry4Aa toxins, which are used in mosquito-targeting pesticides. (111) Further, a recent study of antimeasles virus antibodies used in silico alanine scanning and molecular dynamics (MD) to identify hotspot residue pairs within complementarity-determining regions (CDRs) that jointly influence both binding affinity and thermal stability, revealing an affinity-stability trade-off governed by relative hydropathy at key interaction sites. (112) Together, these methods link amino acid sequence changes to structural and functional outcomes, offering valuable insights into protein engineering and therapeutic design.

Figure 6

Figure 6. Computational strategies for evaluating amino acid sequence perturbations. (A) Structural stability analysis introduces mutations into a sequence and applies ab initio folding to predict conformational shifts, highlighting favorable and unfavorable perturbations. (B) Binding affinity analysis docks protein constituents, incorporates mutations, and estimates changes in binding free energy (ΔΔG) to evaluate the interaction stability. (C) Interface hotspot probing systematically mutates residues at binding interfaces to pinpoint the positions that are most critical for binding energetics.

ROSETTA ΔΔG calculations correlate reasonably with experimental data sets across large mutation libraries, though accuracy drops when backbone models are poorly defined. (89,98) Interface evaluation, however, extends beyond global ΔΔG. Solvent-accessible surface area (SASA), solvation energy, hydrogen-bond networks, and electrostatic forces all contribute to binding dynamics and are accessible through tools such as ROSETTA InterfaceAnalyzer. (99−101) Energetic hotspots can be identified via alanine scanning, which mutates interface residues to alanine, isolating side-chain contributions without disrupting the backbone geometry. (102,103) Alanine works as a neutral substitution due to its small size and nonpolar nature. Sites that show large energetic drops after alanine mutation stand out as binding hotspots that are important for stability or specificity. Proline and noncanonical amino acids (photo cross-linkers, metal coordinators, electrophilic warheads, etc.) have also been used as chemical probes at binding interfaces, capturing covalent interactions and assessing backbone rigidity, though these fall outside standard scoring workflows and need careful parametrization. (104−107) In efforts to standardize scoring across docking, ranking and protein screens, data sets like PDBbind and the CASF benchmark provide a basis for comparing and calibrating score functions. (108,109) Despite these efforts, static interface scoring continues to struggle with long-range electrostatics, solvation effects, and dynamic rearrangements, particularly in ligand-rich and flexible systems where backbone flexibility is directly linked to binding.

Dynamics, Sampling, and the Accuracy-Efficiency Trade-Off

Static “snapshots” produced by protein design tools inaccurately predict catalytic potential by overlooking dynamic movements crucial for stabilizing transition states such as loop openings and domain shifts. Molecular dynamics (MD) simulations can model how proteins shift and fluctuate over time by using Newton’s laws to predict atomic motion (Figure 7). Tools like GROMACS help researchers analyze how loop domains and hydrogen-bond networks change in single proteins and large assemblies over nanosecond to microsecond time scales. (113,114) MD has been used in tRNA-synthetase engineering, revealing how anticodon arms, acceptor stems, and binding loops respond to ncAA mutations and codon recognition. (115,116) To catch rare conformational states, metadynamics and replica-exchange MD (REMD) increase atomic detail by running parallel simulations with different constraints yet come with higher computational cost. (117,118) Course-grained (CG) models offer a useful alternative when system size or time scale makes simulation impractical, trading resolution for the ability to simulate larger assemblies at longer intervals by reducing the atomic detail into grouped sites. CG models capture global flexibility and thermodynamic changes, but cannot determine electronic rearrangements that determine bond formation, proton transfer, or charge redistribution. (119,120) In comparison, QM/MM treats the active site quantum-mechanically, while the surrounding protein is handled with classical force fields (FFs) such as AMBER, CHARMM, GROMOS, and OPLS-AA, which represent the potential energy of a system using mathematical functions and empirically derived parameters. (121−124) Though QM/MM are the most physically realistic options for estimating activation barriers, these frameworks are generally not practical beyond individual structure validation due to the high computational cost and time scales needed.

Figure 7

Figure 7. Computational evaluation of the biological and functional properties of proteins. (A) Molecular dynamics and catalysis simulate mutated proteins in solvated environments to capture conformational flexibility and catalytic changes through trajectory analyses. Hybrid pipelines that integrate molecular dynamics (MD) with ROSETTA and directed evolution have yielded efficient de novo and redesigned biocatalysts, such as HG3.17 and BH32.14, whose catalytic power emerges from MD-guided active-site reorganization and solvent shielding. (B) Solubility analysis predicts the effects of amino acid sequence variation on protein solubility by comparing mutant distributions to wild-type benchmarks. CamSol-based workflows enable the rational optimization of both solubility and conformational stability, as demonstrated for six antibodies (including two approved therapeutics), enhancing developability without compromising binding. (137) (C) Aggregation propensity assesses structural and sequence features to identify residues or motifs that drive aggregation, distinguishing soluble variants from aggregation-prone variants. Using Aggrescan3D, researchers computationally minimized aggregation hotspots to engineer green fluorescent protein (GFP) mutants with significantly improved solubility and reduced aggregation, resulting in a fast-folding, aggregation-resistant variant. (138) Together, these approaches extend computational evaluation to capture dynamic solubility and aggregation behaviors that critically influence protein performance in physiological and industrial contexts.

Empirical Valence Bond (EVB) methods provide a faster approximate alternative to QM/MM and are better suited for screening larger systems. EVB is used to clarify how the spatial arrangement of charged residues around an active site (electrostatic preorganization) lowers the activation energy and stabilizes reaction intermediates. Increased precision captures favorable sympathy conformations that static models are unable to capture acting as a structural template that is “pre-shaped” for the transition state. (115,116,125) Combining QM/MM with MD can further predict how synthetases discriminate between canonical and noncanonical amino acids based on electrostatics, hydrogen bonding, and active-site geometry. Despite their power, there are trade-offs. Due to increased precision, MD and QM are computationally demanding and require large data sets. MD often needs long trajectories because rare motions only appear with enough sampling, so runs can stretch into hundreds of nanoseconds before the system settles. (120) QM/MM accuracy is limited by computational cost and system size, and small FF errors can accumulate over long MD trajectories. (121−124,126)

Solubility and Aggregation Behaviors

Solubility remains one of the more difficult biophysical properties to predict with precision but is a critical filter for the manufacturability and therapeutic viability. A design that folds and binds may still fail during expression or purification. Natural proteins often exhibit poor solubility, limited thermostability, and low expression yields, particularly when reengineered for industrial or therapeutic use. (88,127,128) These constraints motivated the development of computational approaches to optimize the solubility and aggregation resistance. Structure- and sequence-based tools such as Camsol evaluate residue-level solubility using physicochemical features (e.g., hydrophobicity, β-sheet distributions) and suggest stabilizing mutations that do not affect global stability. (129) More recent ML-based tools such as PROTSOLM and GATSol use sequence- and structure-related features and long-range interactions to improve solubility prediction accuracy. (130,131) Another ML tool, soluble MPNN (MPNNsol), is the product of the ProteinMPNN network being retrained on a data set strictly made up of soluble proteins and can be used for the de novo design of proteins with a low fraction of surface hydrophobics. Despite these advances, solubility alone is insufficient for determining developability.
Amyloid-β shows how exposed hydrophobic areas and β-sheet-prone regions fold on themselves into insoluble complexes, leading to further aggregation. (132,133) To address this, aggregation models use statistical potentials and ML classifiers trained on known amyloidogenic sequences and known aggregation motifs. (133) When introduced early in the design process, frameworks that integrate solubility and aggregation risk metrics improve manufacturability and reduce downstream failure. Data set bias and the inability to capture the energetic contributions of unique structures (e.g., multidomain proteins, (134,135) proteins with noncanonical residues (106,136)) limit the use of these models for systems that must remain stable across variable buffer conditions, pH ranges, and expression systems.

Limitations of In Silico Evaluation

In silico validation enables ranking of protein designs, but no computational framework can predict experimental success with complete accuracy. ΔΔG underestimates entropy and solvent effects, losing accuracy with poorly refined backbones. Alanine and proline scans miss “non-hotspot” interactions and neighborhood residue effects. MD and QM/MM provide mechanistic insight but are computationally costly for high-throughput workflows. For monomeric, well-structured proteins, solubility predictions work well; however, they fall short on multidomain assemblies and intrinsically disordered proteins (IDPs). Aggregation scoring frameworks struggle to differentiate between functional and aggregation-prone β-sheets.
Individually, these tools underrepresent the biophysical properties that determine downstream expression (Table 3). The next step in prediction accuracy is the integration of these tools. More accurate analysis will require the combination of physics-based scoring (e.g., CamSol), ML-guided confidence metrics (e.g., PROTSOLM, GATSol), MD simulation, and experimental data in one framework. Integration of these ML tools with standardized experimental benchmarks such as PDBbind can be used to recalibrate predictions against observed outcomes for better analysis across diverse structural classes. Multilayer assessment with experimental feedback is essential for reducing false negatives during downstream processes and broadening model applicability.
Table 3. Summary of In Silico Protein Design Parameters
metricpurposeexample methodssignificancelimitations
structural stabilitypredict foldabilityROSETTA (ref2015), RMSD, AlphaFold, pLDDT (34,35,90,92)ensures the designed fold is retained postmutationstatic models neglect entropy and conformational flexibility
binding affinityassess interaction strengthflex ddG, InterfaceAnalyzer, alanine/proline scanning, PDBbind (98,102,104,105,108)guides interface design and ligand-binding optimizationsensitive to backbone quality and local packing residues
interface hotspot probinglocalize key residuesalanine/proline scanning, ncAA probe libraries (e.g., PheCN, Bpa) (103,105,139)identifies energetic “anchors” and enables targeted mutation designnoncanonical probes may bias geometry or introduce steric clashes
molecular dynamics and catalysismodel flexibility and transition statesMetaDynamics, MD, QM/MM, REMD, EVB (117,121,126,132,140)reveals loop dynamics and allosteric networks for catalytic preorganizationhigh computational cost: enhanced methods require expertise and tuning
solubilitypredict aggregation or expression riskCamSol, PROTOSOLM, GATSol (130,131,133,141)critical for developability, expression, and therapeutic viabilityunderperforms for IDPs, membrane proteins, or large multichain assemblies
aggregation propensityidentify aggregation-prone regionsAggrescan3D, β-strand exposure models (132,142−144)detects amyloid risk, hydrophobic patchesmay misclassify functional β-sheets or multimer interfaces

Directed Evolution as a Complement to De Novo Design

Click to copy section linkSection link copied!

Directed evolution (DE) can be a complementary partner to computational design and shows that building de novo proteins is not always efficient or necessary. Due to evolution, natural proteins occupy highly optimized regions of sequence space. (17,145) Success in enzyme design usually occurs when modifying existing functionality. Iterative rounds of mutation and selection can identify substitutions that restore or improve activity when the rational design is insufficient. Experimental variants reveal force field blind spots in active-site geometry that computational models are unable to catch. (12,146,147)
One major limitation with directed evolution is the scale. Experimental libraries typically cover only 103–106 variants in comparison to the nearly infinite number of possible sequences in sequence space, and most mutations are neutral or unfavorable. (145) Computational tools address this limitation by identifying suitable positions for mutagenesis by mapping sequence entropy and residue interactions. (16) When designs fail because ideal geometries, electrostatics, or loop dynamics are not properly represented, DE acts as a diagnostic tool that provides clarity by distinguishing incorrect hypotheses from structural errors. Quantitative differences between native and designed proteins demonstrate why this feedback loop is important. Naturally occurring enzymes accelerate reactions by more than 1012-fold, while most de novo catalysts achieve modest gains at best. (87) The question then becomes whether computational starting scaffolds can achieve native-like efficiency. Computational design can narrow the search space by targeting mutations on native scaffolds, and directed evolution can then test and refine those variants. Repetition of this workflow can create a feedback loop that improves enzyme performance and computational prediction accuracy (Figure 8).

Figure 8

Figure 8. Conceptual framework contrasting traditional and A.I.-assisted directed evolution (DE) workflows. The diagram is divided into two pathways: the upper route represents conventional DE, where (A) natural sequence diversity is explored, (B) mutational libraries are generated, (C) variants are expressed, and (E) high-throughput screening identifies improved candidates through iterative experimental cycles. The lower route introduces A.I./ML-assisted or hybrid methodologies, in which (D) supervised models with uncertainty quantification learn the sequence-fitness landscape and use acquisition functions to propose new variants, balancing exploration (high uncertainty) and exploitation (high predicted fitness). These feedback-driven optimization strategies accelerate variant discovery with a reduced screening effort. Combined approaches, such as active learning-assisted directed evolution (ALDE), have yielded (F) optimized protoglobin-based biocatalysts for nonnative cyclopropanation reactions, enhancing their activity, selectivity, and stability while minimizing experimental costs. (148)

Experimental Validation of AI-Generated Protein Tools

Click to copy section linkSection link copied!

Interactions that determine the fold stability, catalytic efficiency, binding affinity, solubility, and aggregation propensity cannot be captured by in silico measures alone. High-resolution structural assays serve as a quantitative filter between computational predictions and practical utility and are the leading methods for testing the design accuracy (Table 4). X-ray crystallography is considered the gold standard for atomic-resolution and, when paired with prediction software, enables direct comparison between predicted and observed backbone geometry and active-site positioning. For X-ray crystallography success, proteins must crystallize under precise conditions, presenting a significant barrier, as many proteins fail to crystallize or only crystallize after extensive screens, taking weeks to months before high-quality crystals are formed. (149) In addition, crystallography locks proteins into a static lattice, making it difficult to study pH-sensitive states, transient complexes, and post-translational modifications.
Table 4. Experimental Validation Methods for Computational Protein Design
methodmeasurementstrengthslimitationsexample applications
X-ray crystallographyatomic-resolution structural “snapshots”well-established refinement pipelines(1) requires crystallization (often challenging/time-consuming)(1) benchmarking AlphaFold prediction
(2) static lattice limits dynamic studies(2) validation of active sites (158,159)
Cryo-EMstructural validation of large assemblies and complexes(1) no crystallization needed(1) historically lower resolution for small proteins (<100 kDa)(1) antibody–antigen complexes
(2) captures transient or unstable complexes(2) requires advanced processing software(2) complement to crystallography
(3) excels at large proteins, complexes, and membrane proteins(3) ML refinement of maps (151,155,160)
NMR spectroscopyconformational ensembles, loop dynamics, chemical environment(1) probes protein, motion in solution(1) limited to smaller proteins(1) loop dynamics in catalysis
(2) reveals catalytic loop mobility and reaction intermediates(2) requires isotopic labeling; lower spatial resolution than crystals(2) conformational changes critical for function (152)
hybrid approachesintegrated models combining experimental and computational restraintscombines ML predictions (AlphaFold/ROSETTA) with sparse restraints (XL-MS, cryo-EM maps, covalent labeling)requires careful alignment of computational and experimental data setsrefinement of protein–protein interfaces and complexes via XL-MS and AlphaFold/ROSETTA (154,155)
Cryo-electron microscopy (cryo-EM) has expanded validation to systems that do not readily crystallize, accurately capturing large assemblies, membrane proteins, and antibody–antigen complexes. (150) Historically, resolution for smaller proteins has been weaker using Cryo-EM; however, advances in reconstruction algorithms continue to narrow this gap. (149,151) Nuclear magnetic resonance (NMR) analyzes protein dynamics in solution. Loop movements that determine substrate entry, product release, and catalytic residue positioning are critical for enzymatic turnover. (152) NMR structures show local and global flexibility via structural ensembles and intermediates that govern catalysis by balancing these features. Flexibility measures are important as active-site geometry depends on specific motions and dynamic local environments. Recent deep-learning-assisted assignment of side chains and dynamics have extended the applicability of NMR to larger proteins and functional sites. (153)
Hybrid approaches integrate experimental restraints directly into the modeling. Cross-linking mass spectrometry (XL-MS) and covalent labeling improve the predictions at protein–protein interfaces. Iterative rebuilding of AlphaFold or ROSETTA models against cryo-EM density improves prediction accuracy beyond what either approach achieves alone. (154,155) Even limited experimental input can significantly improve structural precision, showing that minimal experimental data can assist computational platforms in generating biologically realistic models. Experimental validation also plays a very important diagnostic role. When predicted and observed structures do not agree, discrepancies identify whether failure is a result of incorrect backbone generation, flawed scoring metrics, or missing catalytic features. (87) Incorporating experimental data improves subsequent design cycles by updating the energetic parameters. Experimental validation functions can be the final checkpoint and empirical layer that closes the loop between the hypothesis and function. Computational tools map the candidate space, the structural and functional assays determine which designs are physically feasible, and experimental feedback improves the design of engineered proteins and the predictive frameworks used to develop them (Figure 9).

Figure 9

Figure 9. Experimental–computational pipeline for protein engineering. (A) Protein mutant libraries are generated by introducing sequence variations across the regions of interest. AlphaFold-guided domain-motif design (e.g., FBXO23-STX1B) has revealed novel regulatory interfaces relevant to therapeutic target discovery. (155) (B) Mutants are expressed in Escherichia coli, yeast, or mammalian systems to generate protein ensembles for screening; such expression-labeling pipelines support enzyme and biocatalyst development used in pharmaceuticals and green chemistry. (154) (C) Structural characterization via cryo-EM, NMR, and X-ray crystallography (and hybrid methods) refined with predictive models such as AlphaFold2 or ROSETTA resolves folding and conformational dynamics, as shown in ribosomal complex refinements in NMR-ROSETTA modeling of ubiquitin. (156,157) (D) Functional screening evaluates the activity and binding properties of mutant sets, such as hydroxyl radical footprinting, to identify active-site or interface residues that control activity and stability, as applied to Hsp90-co-chaperone systems and engineered oxidoreductase. (154) (E) Finally, A.I./ML integration combines experimental data with modeling to predict next-generation variants, accelerating industrial enzyme design, antibody optimization, and biosensor development.

Limitations and Future Directions in Computational Protein Design

Computational protein design has made significant advances over the years. In silico measures follow a hierarchy that adds physical realism at each layer at the cost of more computational resources. Because of this, most workflows use these methods selectively instead of applying them all. The process begins with statistics-based scoring for foldability and then moves to ensemble ΔΔG calculations for interface refinement. Frameworks then extend to atomistic and coarse-grained MD for dynamic motion and finally reach QM/MM for precise chemical detail. However, overlapping challenges remain across foundational frameworks and diffusion-based generative models. Static models cannot fully capture active-site geometry and side-chain constraints, which limits enzyme development. (94) Most algorithms focus on fixed protein backbones or on single low-energy conformations but overlook the dynamic rearrangements proteins undergo in solution over time. In addition, natural enzymes use cofactors, metals, and post-translational modifications that most design workflows cannot accommodate. Generative models produce stable folds and sequences but primarily focus on canonical amino acids and small-molecule ligands, thereby missing key noncanonical chemical descriptors and energy parameters. Even highly parametrized energy functions require custom weights for noncanonical amino acid and rare cofactors that balance the interactions between physics-based terms in nonnative binding pockets. (92) These constraints limit progress in orthogonal translation systems, metalloenzymes, and covalent inhibitors, all of which rely on the precise modeling of chemical interactions.
At the functional level, designed enzymes lag far behind natural catalysts as de novo designs often show catalytic activity far lower than that of natural enzymes. In many cases, designed proteins exhibit the same efficiency as catalytic antibodies created decades ago. (87) Across the entire pipeline, there are many opportunities for failure. Incorrect active-site hypotheses in the early stages, incorrect geometry during scaffold development, and missing effects during characterization all lead to imprecise predictions. Force fields cannot accurately parametrize multiresidue networks, as seen in natural enzymes, because of a reliance on minimal active-site motifs to discriminate between similar transition states. (94) In downstream processes, proteins that appear stable in upstream in silico filters can still aggregate, misfold, or cause immunogenic responses in vivo.
Improvements in benchmarking and standardization are still needed in the field. Inconsistent metrics make direct comparisons across platforms difficult. Reproducible algorithms and standard community-wide data sets would enable high-throughput enzyme development and evaluation of novel methods. This parallel optimization helps separate real improvements from data-set-specific results. Integrating diffusion-based backbone generators with ligand-aware sequence optimizers while considering all-atom parameters presents a promising route to efficient, high-throughput enzyme and binder design. Expanding this precision and speed to RNA and DNA constructs, orthogonal tRNA-ncAA-synthetase systems, and covalent inhibitors allows the creation of full computational-to-experimental pipelines from first principles. Combining generative approaches with hybrid ML methods and closed-loop experimental feedback ultimately enables frameworks that enable the directed evolution of de novo enzymes at speeds and with precision not yet seen or readily adopted in the chemical engineering field.

Author Information

Click to copy section linkSection link copied!

  • Corresponding Author
    • Blaise R. Kimmel - Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W Woodruff Avenue, Columbus, Ohio 43210, United StatesCenter for Cancer Engineering, Ohio State University Comprehensive Cancer Center, The Ohio State University, 2255 Kenny Road, Columbus, Ohio 43210, United StatesPelotonia Institute for Immuno-Oncology, Ohio State University Comprehensive Cancer Center, The Ohio State University, 2255 Kenny Road, Columbus, Ohio 43210, United StatesOrcidhttps://orcid.org/0000-0002-9582-9887 Email: [email protected]
  • Authors
    • Joseph S. Bailey Jr. - Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W Woodruff Avenue, Columbus, Ohio 43210, United States
    • Søren C. Spina - Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W Woodruff Avenue, Columbus, Ohio 43210, United States
    • Andrew Hu - College of Medicine, The Ohio State University, 460 W 10th Avenue, Columbus, Ohio 43210, United States
    • Nathan Phan - Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W Woodruff Avenue, Columbus, Ohio 43210, United States
    • Rachel B. Getman - Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W Woodruff Avenue, Columbus, Ohio 43210, United StatesOrcidhttps://orcid.org/0000-0003-0755-0534
  • Author Contributions

    J.S.B. Jr.: Wrote the original draft of the manuscript, generated all figures and graphics for the manuscript, edited, revised, and approved the final version of the manuscript; S.C.S., A.H., and N.P.: supported the generation of graphics and writing for the manuscript; R.G.: edited, revised, and approved the final version of the manuscript; B.R.K.: wrote the original draft of the manuscript, edited, revised, and approved the final version of the manuscript, and acquired funding to support the work. CRediT: Joseph S. Bailey Jr. conceptualization, data curation, formal analysis, investigation, methodology, writing - original draft, writing - review & editing; Søren Spina visualization, writing - original draft, writing - review & editing; Andrew Hu visualization, writing - original draft, writing - review & editing; Nathan Phan visualization, writing - original draft, writing - review & editing; Rachel B. Getman funding acquisition, writing - review & editing; Blaise R. Kimmel conceptualization, data curation, formal analysis, funding acquisition, investigation, methodology, project administration, supervision, validation, visualization, writing - original draft, writing - review & editing.

  • Funding

    We gratefully thank the Ohio State University Comprehensive Cancer Center (OSUCCC), OSUCCC Center for Cancer, and the Department of Chemical and Biomolecular Engineering at The Ohio State University for support of this work. B.R.K. acknowledges financial support from the Prostate Cancer Foundation Young Investigator Award.

  • Notes
    The authors declare no competing financial interest.

Acknowledgments

Click to copy section linkSection link copied!

This work was supported in part by The Ohio State University Center for Cancer Engineering─Curing Cancer Through Research in Engineering and Sciences. B.R.K. acknowledges financial support from the Prostate Cancer Foundation Young Investigator Award. We acknowledge the use of PaperPal and Grammarly as AI tools to modify the grammar, phrasing, and sentence structure while writing this review. Each author takes full responsibility for the manuscript’s content.

References

Click to copy section linkSection link copied!

This article references 160 other publications.

  1. 1
    Yu, Y.; Hu, C.; Xia, L.; Wang, J. Artificial Metalloenzyme Design with Unnatural Amino Acids and Non-Native Cofactors. ACS Catal. 2018, 8, 18511863,  DOI: 10.1021/acscatal.7b03754
  2. 2
    Mirts, E. N.; Bhagi-Damodaran, A.; Lu, Y. Understanding and Modulating Metalloenzymes with Unnatural Amino Acids, Non-Native Metal Ions, and Non-Native Metallocofactors. Acc. Chem. Res. 2019, 52, 935944,  DOI: 10.1021/acs.accounts.9b00011
  3. 3
    Mann, S. I.; Nayak, A.; Gassner, G. T.; Therien, M. J.; DeGrado, W. F. De Novo Design, Solution Characterization, and Crystallographic Structure of an Abiological Mn–Porphyrin-Binding Protein Capable of Stabilizing a Mn(V) Species. J. Am. Chem. Soc. 2021, 143, 252259,  DOI: 10.1021/jacs.0c10136
  4. 4
    Bergman, M. T.; Xiao, X.; Hall, C. K. In Silico Design and Analysis of Plastic-Binding Peptides. J. Phys. Chem. B 2023, 127, 83708381,  DOI: 10.1021/acs.jpcb.3c04319
  5. 5
    García-Moreno, P. J. Recent advances in the production of emulsifying peptides with the aid of proteomics and bioinformatics. Curr. Opin. Food Sci. 2023, 51, 101039  DOI: 10.1016/j.cofs.2023.101039
  6. 6
    Ndochinwa, G. O.; Wang, Q. Y.; Okoro, N. O. New advances in protein engineering for industrial applications: Key takeaways. Open Life Sci. 2024, 19, 20220856  DOI: 10.1515/biol-2022-0856
  7. 7
    Marcos, E.; Silva, D. Essentials of de novo protein design: Methods and applications. WIREs Comput. Mol. Sci. 2018, 8 (6), e1374  DOI: 10.1002/wcms.1374
  8. 8
    Huang, P.-S.; Boyken, S. E.; Baker, D. The coming of age of de novo protein design. Nature 2016, 537, 320327,  DOI: 10.1038/nature19946
  9. 9
    Woolfson, D. N. A Brief History of De Novo Protein Design: Minimal, Rational, and Computational. J. Mol. Biol. 2021, 433, 167160  DOI: 10.1016/j.jmb.2021.167160
  10. 10
    Dill, K. A.; Ozkan, S. B.; Shell, M. S.; Weikl, T. R. Protein Folding Problem. Annu. Rev. Biophys. 2008, 37, 289316,  DOI: 10.1146/annurev.biophys.37.092707.153558
  11. 11
    Kocher, C. D.; Dill, K. A. Origins of life: The Protein Folding Problem all over again?. Proc. Natl. Acad. Sci. U. S. A. 2024, 121, e2315000121  DOI: 10.1073/pnas.2315000121
  12. 12
    Chen, S.-J.; Hassan, M.; Jernigan, R. L. Protein folds vs. protein folding: Differing questions, different challenges. Proc. Natl. Acad. Sci. U. S. A. 2023, 120, e2214423119  DOI: 10.1073/pnas.2214423119
  13. 13
    Kiss, G.; Çelebi-Ölçüm, N.; Moretti, R.; Baker, D.; Houk, K. N. Computational Enzyme Design. Angew. Chem., Int. Ed. 2013, 52, 57005725,  DOI: 10.1002/anie.201204077
  14. 14
    Ille, A. M.; Anas, E.; Mathews, M. B.; Burley, S. K. From sequence to protein structure and conformational dynamics with artificial intelligence/machine learning. Struct. Dyn. 2025, 12, 030902  DOI: 10.1063/4.0000765
  15. 15
    Anfinsen, C. B. Principles that Govern the Folding of Protein Chains. Science 1973, 181, 223230,  DOI: 10.1126/science.181.4096.223
  16. 16
    Voigt, C. A.; Mayo, S. L.; Arnold, F. H.; Wang, Z.-G. Computational method to reduce the search space for directed protein evolution. Proc. Natl. Acad. Sci. U. S. A. 2001, 98, 37783783,  DOI: 10.1073/pnas.051614498
  17. 17
    Kuhlman, B.; Baker, D. Native protein sequences are close to optimal for their structures. Proc. Natl. Acad. Sci. U. S. A. 2000, 97, 1038310388,  DOI: 10.1073/pnas.97.19.10383
  18. 18
    Sleator, R. D. Solving the protein folding problem···. FEBS Lett. 2024, 598, 28312835,  DOI: 10.1002/1873-3468.15043
  19. 19
    Watson, J. L.; Juergens, D.; Bennett, N. R. De novo design of protein structure and function with RFdiffusion. Nature 2023, 620, 10891100,  DOI: 10.1038/s41586-023-06415-8
  20. 20
    Leveson-Gower, R. B. Designing Enzymatic Reactivity with an Expanded Palette. ChemBioChem 2025, 26, e202500076  DOI: 10.1002/cbic.202500076
  21. 21
    Hartman, M. C. T. Non-canonical Amino Acid Substrates of Escherichia coli Aminoacyl-tRNA Synthetases. ChemBioChem 2022, 23, e202100299  DOI: 10.1002/cbic.202100299
  22. 22
    Zhang, G.; Liu, C.; Lu, J.; Zhang, S.; Zhu, L. The Role of AI-Driven De Novo Protein Design in the Exploration of the Protein Functional Universe. Biology 2025, 14, 1268  DOI: 10.3390/biology14091268
  23. 23
    Rohl, C. A.; Strauss, C. E. M.; Misura, K. M. S.; Baker, D. Protein Structure Prediction Using Rosetta. Methods Enzymol. 2004, 383, 6693,  DOI: 10.1016/S0076-6879(04)83004-0
  24. 24
    Simons, K. T.; Kooperberg, C.; Huang, E.; Baker, D. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and bayesian scoring functions. J. Mol. Biol. 1997, 268, 209225,  DOI: 10.1006/jmbi.1997.0959
  25. 25
    Leman, J. K.; Weitzner, B. D.; Lewis, S. M. Macromolecular modeling and design in Rosetta: recent methods and frameworks. Nat. Methods 2020, 17, 665680,  DOI: 10.1038/s41592-020-0848-2
  26. 26
    Das, R.; Baker, D. Macromolecular Modeling with Rosetta. Annu. Rev. Biochem. 2008, 77, 363382,  DOI: 10.1146/annurev.biochem.77.062906.171838
  27. 27
    Kaufmann, K. W.; Meiler, J. Using RosettaLigand for Small Molecule Docking into Comparative Models. PLoS One 2012, 7, e50769  DOI: 10.1371/journal.pone.0050769
  28. 28
    Lemmon, G.; Kaufmann, K.; Meiler, J. Prediction of HIV-1 Protease/Inhibitor Affinity using RosettaLigand. Chem. Biol. Drug Des. 2012, 79, 888896,  DOI: 10.1111/j.1747-0285.2012.01356.x
  29. 29
    Chaudhury, S.; Lyskov, S.; Gray, J. J. PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics 2010, 26, 689691,  DOI: 10.1093/bioinformatics/btq007
  30. 30
    Le, K. H.; Adolf-Bryfogle, J.; Klima, J. C. PyRosetta Jupyter Notebooks Teach Biomolecular Structure Prediction and Design. Biophysicist 2021, 2, 108122,  DOI: 10.35459/tbp.2019.000147
  31. 31
    Ford, A. S.; Weitzner, B. D.; Bahl, C. D. Integration of the Rosetta suite with the python software stack via reproducible packaging and core programming interfaces for distributed simulation. Protein Sci. 2020, 29, 4351,  DOI: 10.1002/pro.3721
  32. 32
    Van Stappen, C.; Deng, Y.; Liu, Y. Designing Artificial Metalloenzymes by Tuning of the Environment beyond the Primary Coordination Sphere. Chem. Rev. 2022, 122, 1197412045,  DOI: 10.1021/acs.chemrev.2c00106
  33. 33
    Tivon, B.; Wiese, J.; Müller, M. P. Computational Design of Lysine Targeting Covalent Binders Using Rosetta. J. Chem. Inf. Model. 2025, 65, 56125622,  DOI: 10.1021/acs.jcim.5c00212
  34. 34
    Jumper, J.; Evans, R.; Pritzel, A. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583589,  DOI: 10.1038/s41586-021-03819-2
  35. 35
    Tunyasuvunakool, K.; Adler, J.; Wu, Z. Highly accurate protein structure prediction for the human proteome. Nature 2021, 596, 590596,  DOI: 10.1038/s41586-021-03828-1
  36. 36
    Kryshtafovych, A.; Schwede, T.; Topf, M.; Fidelis, K.; Moult, J. Critical assessment of methods of protein structure prediction (CASP)─Round XIV. Proteins:Struct., Funct., Bioinf. 2021, 89, 16071617,  DOI: 10.1002/prot.26237
  37. 37
    Kovalevskiy, O.; Mateos-Garcia, J.; Tunyasuvunakool, K. AlphaFold two years on: Validation and impact. Proc. Natl. Acad. Sci. U. S. A. 2024, 121, e2315002121  DOI: 10.1073/pnas.2315002121
  38. 38
    Schneider, B.; Sweeney, B. A.; Bateman, A. When will RNA get its AlphaFold moment?. Nucleic Acids Res. 2023, 51, 95229532,  DOI: 10.1093/nar/gkad726
  39. 39
    Terwilliger, T. C.; Liebschner, D.; Croll, T. I. AlphaFold predictions are valuable hypotheses and accelerate but do not replace experimental structure determination. Nat. Methods 2024, 21, 110116,  DOI: 10.1038/s41592-023-02087-4
  40. 40
    Mirdita, M.; Schütze, K.; Moriwaki, Y. ColabFold: making protein folding accessible to all. Nat. Methods 2022, 19, 679682,  DOI: 10.1038/s41592-022-01488-1
  41. 41
    Kim, G.; Lee, S.; Levy Karin, E. Easy and accurate protein structure prediction using ColabFold. Nat. Protoc. 2025, 20, 620642,  DOI: 10.1038/s41596-024-01060-5
  42. 42
    Kalogeropoulos, K.; Bohn, M. F.; Jenkins, D. E. A comparative study of protein structure prediction tools for challenging targets: Snake venom toxins. Toxicon 2024, 238, 107559  DOI: 10.1016/j.toxicon.2023.107559
  43. 43
    Baek, M.; DiMaio, F.; Anishchenko, I. Accurate prediction of protein structures and interactions using a three-track neural network. Science 2021, 373, 871876,  DOI: 10.1126/science.abj8754
  44. 44
    Baek, M.; McHugh, R.; Anishchenko, I. Accurate prediction of protein–nucleic acid complexes using RoseTTAFoldNA. Nat. Methods 2024, 21, 117121,  DOI: 10.1038/s41592-023-02086-5
  45. 45
    Krishna, R.; Wang, J.; Ahern, W. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 2024, 384, eadl2528  DOI: 10.1126/science.adl2528
  46. 46
    Liu, S.; Wu, K.; Chen, C. Obtaining protein foldability information from computational models of AlphaFold2 and RoseTTAFold. Comput. Struct. Biotechnol. J. 2022, 20, 44814489,  DOI: 10.1016/j.csbj.2022.08.034
  47. 47
    Wayment-Steele, H. K.; Ojoawo, A.; Otten, R. Predicting multiple conformations via sequence clustering and AlphaFold2. Nature 2024, 625, 832839,  DOI: 10.1038/s41586-023-06832-9
  48. 48
    Casadevall, G.; Duran, C.; Osuna, S. AlphaFold2 and Deep Learning for Elucidating Enzyme Conformational Flexibility and Its Application for Design. JACS Au 2023, 3, 15541562,  DOI: 10.1021/jacsau.3c00188
  49. 49
    Vallejo, W.; Díaz-Uribe, C.; Fajardo, C. Google Colab and Virtual Simulations: Practical e-Learning Tools to Support the Teaching of Thermodynamics and to Introduce Coding to Students. ACS Omega 2022, 7, 74217429,  DOI: 10.1021/acsomega.2c00362
  50. 50
    Adiyaman, R.; Edmunds, N. S.; Genc, A. G.; Alharbi, S. M. A.; McGuffin, L. J. Improvement of protein tertiary and quaternary structure predictions using the ReFOLD refinement method and the AlphaFold2 recycling process. Bioinforma. Adv. 2023, 3 (1), vbad078  DOI: 10.1093/bioadv/vbad078
  51. 51
    Ahern, W.; Yim, J.; Tischer, D. Atom level enzyme active site scaffolding using RFdiffusion2. Nat. Methods 2026, 23, 96105,  DOI: 10.1038/s41592-025-02975-x
  52. 52
    Wang, W.; Feng, C.; Han, R. trRosettaRNA: automated prediction of RNA 3D structure with transformer network. Nat. Commun. 2023, 14, 7266  DOI: 10.1038/s41467-023-42528-4
  53. 53
    Corso, G.; Stärk, H.; Jing, B.; Barzilay, R.; Jaakkola, T. DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking, arXiv:2210.01776. arXiv.org e-Print archive. https://arxiv.org/abs/2210.01776. 2023.
  54. 54
    Alamdari, S. Protein generation with evolutionary diffusion: sequence is all you need. Preprint at https://doi.org/10.1101/2023.09.11.556673. 2023.
  55. 55
    Chu, A. E.; Kim, J.; Cheng, L. An all-atom protein generative model. Proc. Natl. Acad. Sci. U. S. A. 2024, 121, e2311500121  DOI: 10.1073/pnas.2311500121
  56. 56
    Dauparas, J. Robust deep learning based protein sequence design using ProteinMPNN.
  57. 57
    Sumida, K. H.; Núñez-Franco, R.; Kalvet, I. Improving Protein Expression, Stability, and Function with ProteinMPNN. J. Am. Chem. Soc. 2024, 146, 20542061,  DOI: 10.1021/jacs.3c10941
  58. 58
    De Haas, R. J.; Brunette, N.; Goodson, A. Rapid and automated design of two-component protein nanomaterials using ProteinMPNN. Proc. Natl. Acad. Sci. U. S. A. 2024, 121, e2314646121  DOI: 10.1073/pnas.2314646121
  59. 59
    Dauparas, J.; Lee, G. R.; Pecoraro, R. Atomic context-conditioned protein sequence design using LigandMPNN. Nat. Methods 2025, 22, 717723,  DOI: 10.1038/s41592-025-02626-1
  60. 60
    Clark-Elsayed, A. Comparing LigandMPNN and Directed Evolution for Altering the Effector-Binding Site in the RamR Transcription Factor.
  61. 61
    An, L.; Said, M.; Tran, L. Binding and sensing diverse small molecules using shape-complementary pseudocycles. Science 2024, 385, 276282,  DOI: 10.1126/science.adn3780
  62. 62
    Agu, P. C.; Afiukwa, C. A.; Orji, O. U. Molecular docking as a tool for the discovery of molecular targets of nutraceuticals in diseases management. Sci. Rep. 2023, 13, 13398  DOI: 10.1038/s41598-023-40160-2
  63. 63
    Anishchenko, I. Modeling protein-small molecule conformational ensembles with ChemNet. Preprint at https://doi.org/10.1101/2024.09.25.614868. 2024.
  64. 64
    Lauko, A.; Pellock, S. J.; Sumida, K. H. Computational design of serine hydrolases. Science 2025, 388, eadu2454  DOI: 10.1126/science.adu2454
  65. 65
    Park, H.; Zhou, G.; Baek, M.; Baker, D.; DiMaio, F. Force Field Optimization Guided by Small Molecule Crystal Lattice Data Enables Consistent Sub-Angstrom Protein–Ligand Docking. J. Chem. Theory Comput. 2021, 17, 20002010,  DOI: 10.1021/acs.jctc.0c01184
  66. 66
    Garcia, M.; Dixit, S. M.; Rocklin, G. J. Evaluating zero-shot prediction of protein design success by AlphaFold, ESMFold, and ProteinMPNN.
  67. 67
    Kong, Z. ProtFlow: Flow Matching-based Protein Sequence Design with Comprehensive Protein Semantic Distribution Learning and High-quality Generation.
  68. 68
    Elnaggar, A.; Heinzinger, M.; Dallago, C. ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 71127127,  DOI: 10.1109/TPAMI.2021.3095381
  69. 69
    Madani, A. ProGen: Language Modeling for Protein Generation, arXiv:2004.03497. arXiv.org e-Print archive. https://arxiv.org/abs/2004.03497. 2020.
  70. 70
    Nijkamp, E.; Ruffolo, J. A.; Weinstein, E. N.; Naik, N.; Madani, A. ProGen2: Exploring the boundaries of protein language models. Cell Syst. 2023, 14, 968978.e3,  DOI: 10.1016/j.cels.2023.10.002
  71. 71
    Ferruz, N.; Schmidt, S.; Höcker, B. ProtGPT2 is a deep unsupervised language model for protein design. Nat. Commun. 2022, 13, 4348  DOI: 10.1038/s41467-022-32007-7
  72. 72
    Nguyen, E.; Poli, M.; Durrant, M. G. Sequence modeling and design from molecular to genome scale with Evo. Science 2024, 386, eado9336  DOI: 10.1126/science.ado9336
  73. 73
    Dalla-Torre, H.; Gonzalez, L.; Mendoza-Revilla, J. Nucleotide Transformer: building and evaluating robust foundation models for human genomics. Nat. Methods 2025, 22, 287297,  DOI: 10.1038/s41592-024-02523-z
  74. 74
    Avsec, Ž.; Latysheva, N.; Cheng, J. Advancing regulatory variant effect prediction with AlphaGenome. Nature 2026, 649, 12061218,  DOI: 10.1038/s41586-025-10014-0
  75. 75
    Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction.
  76. 76
    Chai Discovery. Chai-1: Decoding the molecular interactions of life. Preprint at https://doi.org/10.1101/2024.10.10.615955. 2024.
  77. 77
    Ingraham, J. B.; Baranov, M.; Costello, Z. Illuminating protein space with a programmable generative model. Nature 2023, 623, 10701078,  DOI: 10.1038/s41586-023-06728-8
  78. 78
    Mille-Fragoso, L. S. Efficient generation of epitope-targeted de novo antibodies with Germinal.
  79. 79
    Pacesa, M.; Nickel, L.; Schellhaas, C. One-shot design of functional protein binders with BindCraft. Nature 2025, 646, 483492,  DOI: 10.1038/s41586-025-09429-6
  80. 80
    BoltzGen: Toward Universal Binder Design.
  81. 81
    Zhang, O. ODesign: A World Model for Biomolecular Interaction Design, arXiv:2510.22304. arXiv.org e-Print archive. https://arxiv.org/abs/2510.22304. 2025.
  82. 82
    Parks, M. Blind Virtual Screening at Scale: A Scalable End-to-End Pipeline for Blind Docking and Affinity Prediction.
  83. 83
    John, P. S. BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery, arXiv:2411.10548. arXiv.org e-Print archive. https://arxiv.org/abs/2411.10548. 2025.
  84. 84
    Silke, D.; Iskander, J.; Pan, J. ProteinDJ : A high-performance and modular protein design pipeline. Protein Sci. 2026, 35, e70464  DOI: 10.1002/pro.70464
  85. 85
    González-Rodríguez, N.; Chacón-Sánchez, C.; Llorca, O.; Fernández-Leiro, R. Automated and modular protein binder design with BinderFlow. PLOS Comput. Biol. 2025, 21, e1013747  DOI: 10.1371/journal.pcbi.1013747
  86. 86
    Danny, B. Ovo, an Open-Source Ecosystem for De Novo Protein Design.
  87. 87
    Baker, D. An exciting but challenging road ahead for computational enzyme design. Protein Sci. 2010, 19, 18171819,  DOI: 10.1002/pro.481
  88. 88
    Beadle, B. M.; Shoichet, B. K. Structural Bases of Stability–function Tradeoffs in Enzymes. J. Mol. Biol. 2002, 321, 285296,  DOI: 10.1016/S0022-2836(02)00599-5
  89. 89
    Barlow, K. A.; Conchúir, S. Ó.; Thompson, S. Flex ddG: Rosetta Ensemble-Based Estimation of Changes in Protein–Protein Binding Affinity upon Mutation. J. Phys. Chem. B 2018, 122, 53895399,  DOI: 10.1021/acs.jpcb.7b11367
  90. 90
    Shringari, S. R.; Giannakoulias, S.; Ferrie, J. J.; Petersson, E. J. Rosetta custom score functions accurately predict ΔΔG of mutations at protein–protein interfaces using machine learning. Chem. Commun. 2020, 56, 67746777,  DOI: 10.1039/D0CC01959C
  91. 91
    Smith, S. T.; Meiler, J. Assessing multiple score functions in Rosetta for drug discovery. PLoS One 2020, 15, e0240450  DOI: 10.1371/journal.pone.0240450
  92. 92
    Alford, R. F.; Leaver-Fay, A.; Jeliazkov, J. R. The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design. J. Chem. Theory Comput. 2017, 13, 30313048,  DOI: 10.1021/acs.jctc.7b00125
  93. 93
    Tyka, M. D.; Keedy, D. A.; André, I. Alternate States of Proteins Revealed by Detailed Energy Landscape Mapping. J. Mol. Biol. 2011, 405, 607618,  DOI: 10.1016/j.jmb.2010.11.008
  94. 94
    Planas-Iglesias, J.; Marques, S. M.; Pinto, G. P. Computational design of enzymes for biotechnological applications. Biotechnol. Adv. 2021, 47, 107696  DOI: 10.1016/j.biotechadv.2021.107696
  95. 95
    Guo, H.-B.; Perminov, A.; Bekele, S. AlphaFold2 models indicate that protein sequence determines both structure and dynamics. Sci. Rep. 2022, 12, 10696  DOI: 10.1038/s41598-022-14382-9
  96. 96
    Agarwal, V.; McShan, A. C. The power and pitfalls of AlphaFold2 for structure prediction beyond rigid globular proteins. Nat. Chem. Biol. 2024, 20, 950959,  DOI: 10.1038/s41589-024-01638-w
  97. 97
    Abramson, J.; Adler, J.; Dunger, J. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024, 630, 493500,  DOI: 10.1038/s41586-024-07487-w
  98. 98
    Friedland, G. D.; Linares, A. J.; Smith, C. A.; Kortemme, T. A Simple Model of Backbone Flexibility Improves Modeling of Side-chain Conformational Variability. J. Mol. Biol. 2008, 380, 757774,  DOI: 10.1016/j.jmb.2008.05.006
  99. 99
    Kellogg, E. H.; Leaver-Fay, A.; Baker, D. Role of conformational sampling in computing mutation-induced changes in protein structure and stability. Proteins:Struct., Funct., Bioinf. 2011, 79, 830838,  DOI: 10.1002/prot.22921
  100. 100
    Durham, E.; Dorr, B.; Woetzel, N.; Staritzbichler, R.; Meiler, J. Solvent accessible surface area approximations for rapid and accurate protein structure prediction. J. Mol. Model. 2009, 15, 10931108,  DOI: 10.1007/s00894-009-0454-9
  101. 101
    Bertalan, É.; Lešnik, S.; Bren, U.; Bondar, A.-N. Protein-water hydrogen-bond networks of G protein-coupled receptors: Graph-based analyses of static structures and molecular dynamics. J. Struct. Biol. 2020, 212, 107634  DOI: 10.1016/j.jsb.2020.107634
  102. 102
    Weiss, G. A.; Watanabe, C. K.; Zhong, A.; Goddard, A.; Sidhu, S. S. Rapid mapping of protein functional epitopes by combinatorial alanine scanning. Proc. Natl. Acad. Sci. U. S. A. 2000, 97, 89508954,  DOI: 10.1073/pnas.160252097
  103. 103
    Cunningham, B. C.; Wells, J. A. High-Resolution Epitope Mapping of hGH-Receptor Interactions by Alanine-Scanning Mutagenesis. Science 1989, 244, 10811085,  DOI: 10.1126/science.2471267
  104. 104
    Liu, H.; Song, L.; Meng, X. Proline-Mediated Enhancement in Evolvability of Disulfide-Rich Peptides for Discovering Protein Binders. J. Am. Chem. Soc. 2025, 147, 2487024883,  DOI: 10.1021/jacs.5c07075
  105. 105
    Holden, J. K.; Pavlovicz, R.; Gobbi, A.; Song, Y.; Cunningham, C. N. Computational Site Saturation Mutagenesis of Canonical and Non-Canonical Amino Acids to Probe Protein-Peptide Interactions. Front. Mol. Biosci. 2022, 9, 848689  DOI: 10.3389/fmolb.2022.848689
  106. 106
    Spina, S. C.; Bailey, J.; Kimmel, B. Bind, catalyze, and quantify: a modern protein and enzyme engineering toolbox of genetically encoded non-canonical amino acids Protein Eng. Des. Sel. 2026gzag007 DOI: 10.1093/protein/gzag007 .
  107. 107
    Chen, Y.; Clay, N.; Phan, N. Molecular Matchmakers: Bioconjugation Techniques Enhance Prodrug Potency for Immunotherapy. Mol. Pharmaceutics 2025, 22, 5880,  DOI: 10.1021/acs.molpharmaceut.4c00867
  108. 108
    Wang, R.; Fang, X.; Lu, Y.; Wang, S. The PDBbind Database: Collection of Binding Affinities for Protein–Ligand Complexes with Known Three-Dimensional Structures. J. Med. Chem. 2004, 47, 29772980,  DOI: 10.1021/jm030580l
  109. 109
    Liu, Z.; Su, M.; Han, L. Forging the Basis for Developing Protein–Ligand Interaction Scoring Functions. Acc. Chem. Res. 2017, 50, 302309,  DOI: 10.1021/acs.accounts.6b00491
  110. 110
    King, B. R.; Sumida, K. H.; Caruso, J. L.; Baker, D.; Zalatan, J. G. Computational Stabilization of a Non-Heme Iron Enzyme Enables Efficient Evolution of New Function. Angew. Chem., Int. Ed. 2025, 64, e202414705  DOI: 10.1002/anie.202414705
  111. 111
    Howlader, M. T. H.; Kagawa, Y.; Miyakawa, A. Alanine Scanning Analyses of the Three Major Loops in Domain II of Bacillus thuringiensis Mosquitocidal Toxin Cry4Aa. Appl. Environ. Microbiol. 2010, 76, 860865,  DOI: 10.1128/AEM.02175-09
  112. 112
    Paul, R.; Kasahara, K.; Sasaki, J. Unveiling the affinity–stability relationship in anti-measles virus antibodies: a computational approach for hotspots prediction. Front. Mol. Biosci. 2024, 10, 1302737  DOI: 10.3389/fmolb.2023.1302737
  113. 113
    Romero-Rivera, A.; Garcia-Borràs, M.; Osuna, S. Role of Conformational Dynamics in the Evolution of Retro-Aldolase Activity. ACS Catal. 2017, 7, 85248532,  DOI: 10.1021/acscatal.7b02954
  114. 114
    Lemkul, J. A. Introductory Tutorials for Simulating Protein Dynamics with GROMACS. J. Phys. Chem. B 2024, 128, 94189435,  DOI: 10.1021/acs.jpcb.4c04901
  115. 115
    Sanbonmatsu, K. Y.; Joseph, S.; Tung, C.-S. Simulating movement of tRNA into the ribosome during decoding. Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 1585415859,  DOI: 10.1073/pnas.0503456102
  116. 116
    Li, R.; Macnamara, L.; Leuchter, J.; Alexander, R.; Cho, S. MD Simulations of tRNA and Aminoacyl-tRNA Synthetases: Dynamics, Folding, Binding, and Allostery. Int. J. Mol. Sci. 2015, 16, 1587215902,  DOI: 10.3390/ijms160715872
  117. 117
    Patel, S.; Hosur, R. V. Replica exchange molecular dynamics simulations reveal self-association sites in M-Crystallin caused by mutations provide insights of cataract. Sci. Rep. 2021, 11, 23270  DOI: 10.1038/s41598-021-02728-8
  118. 118
    Stelzl, L. S.; Hummer, G. Kinetics from Replica Exchange Molecular Dynamics Simulations. J. Chem. Theory Comput. 2017, 13, 39273935,  DOI: 10.1021/acs.jctc.7b00372
  119. 119
    Feig, M.; Nawrocki, G.; Yu, I.; Wang, P.; Sugita, Y. Challenges and opportunities in connecting simulations with experiments via molecular dynamics of cellular environments. J. Phys. Conf. Ser. 2018, 1036, 012010  DOI: 10.1088/1742-6596/1036/1/012010
  120. 120
    Kumari, I.; Sandhu, P.; Ahmed, M.; Akhter, Y. Molecular Dynamics Simulations, Challenges and Opportunities: A Biologist’s Prospective. Curr. Protein Pept. Sci. 2017, 18, 11631179,  DOI: 10.2174/1389203718666170622074741
  121. 121
    Senn, H. M.; Thiel, W. QM/MM Methods for Biomolecular Systems. Angew. Chem., Int. Ed. 2009, 48, 11981229,  DOI: 10.1002/anie.200802019
  122. 122
    Lopes, P. E. M.; Guvench, O.; MacKerell, A. D. Current Status of Protein Force Fields for Molecular Dynamics Simulations. In Molecular Modeling of Proteins; Kukol, A., Ed.; Springer: New York, New York, NY, 2015; Vol. 1215, pp 4771.
  123. 123
    McMillin, D. R. Interatomic Repulsion and the Pauli Principle. J. Chem. Educ. 2021, 98, 29122918,  DOI: 10.1021/acs.jchemed.1c00326
  124. 124
    Guvench, O.; MacKerell, A. D. Comparison of Protein Force Fields for Molecular Dynamics Simulations. In Molecular Modeling of Proteins; Kukol, A., Ed.; Humana Press: Totowa, NJ, 2008; Vol. 443, pp 6388.
  125. 125
    Warshel, A.; Sharma, P. K.; Kato, M. Electrostatic Basis for Enzyme Catalysis. Chem. Rev. 2006, 106, 32103235,  DOI: 10.1021/cr0503106
  126. 126
    Van Der Kamp, M. W.; Mulholland, A. J. Combined Quantum Mechanics/Molecular Mechanics (QM/MM) Methods in Computational Enzymology. Biochemistry 2013, 52, 27082728,  DOI: 10.1021/bi400215w
  127. 127
    Magliery, T. J. Protein stability: computation, sequence statistics, and new experimental methods. Curr. Opin. Struct. Biol. 2015, 33, 161168,  DOI: 10.1016/j.sbi.2015.09.002
  128. 128
    Singh, A.; Upadhyay, V.; Upadhyay, A. K.; Singh, S. M.; Panda, A. K. Protein recovery from inclusion bodies of Escherichia coli using mild solubilization process. Microb. Cell Factories 2015, 14, 41  DOI: 10.1186/s12934-015-0222-8
  129. 129
    Sormanni, P.; Aprile, F. A.; Vendruscolo, M. The CamSol Method of Rational Design of Protein Mutants with Enhanced Solubility. J. Mol. Biol. 2015, 427, 478490,  DOI: 10.1016/j.jmb.2014.09.026
  130. 130
    Li, B.; Ming, D. GATSol, an enhanced predictor of protein solubility through the synergy of 3D structure graph and large language modeling. BMC Bioinf. 2024, 25, 204  DOI: 10.1186/s12859-024-05820-8
  131. 131
    Tan, Y.; Zheng, J.; Hong, L.; Zhou, B. ProtSolM: Protein Solubility Prediction with Multi-modal Features, arXiv:2406.19744. arXiv.org e-Print archive. https://arxiv.org/abs/2406.19744. 2024.
  132. 132
    Ghosh, D.; Biswas, A.; Radhakrishna, M. Advanced computational approaches to understand protein aggregation. Biophys. Rev. 2024, 5, 021302  DOI: 10.1063/5.0180691
  133. 133
    Oeller, M.; Kang, R.; Bell, R. Sequence-based prediction of pH-dependent protein solubility using CamSol. Brief. Bioinform. 2023, 24, bbad004  DOI: 10.1093/bib/bbad004
  134. 134
    Kimmel, B. R.; Mrksich, M. Development of an Enzyme-Inhibitor Reaction Using Cellular Retinoic Acid Binding Protein II for One-Pot Megamolecule Assembly. Chem. - Eur. J. 2021, 27, 1784317848,  DOI: 10.1002/chem.202103059
  135. 135
    Kimmel, B. R.; Modica, J. A.; Parker, K.; Dravid, V.; Mrksich, M. Solid-Phase Synthesis of Megamolecules. J. Am. Chem. Soc. 2020, 142, 45344538,  DOI: 10.1021/jacs.9b12003
  136. 136
    Adomanis, R.; Phan, N.; Walter, G.; Kimmel, B. R. Modular Nanobody Conjugates with Controlled Topology Using Genetically Encoded Non-canonical Amino Acids. Preprint at https://doi.org/10.1101/2025.11.27.691038. 2025.
  137. 137
    Rosace, A.; Bennett, A.; Oeller, M. Automated optimization of solubility and conformational stability of antibodies and proteins. Nat. Commun. 2023, 14, 1937  DOI: 10.1038/s41467-023-37668-6
  138. 138
    Kuriata, A.; Iglesias, V.; Pujols, J. Aggrescan3D (A3D) 2.0: prediction and engineering of protein solubility. Nucleic Acids Res. 2019, 47, W300W307,  DOI: 10.1093/nar/gkz321
  139. 139
    Hirsch, M.; Desai, R. R.; Annaswamy, S.; Keatinge-Clay, A. T. Mutagenesis Supports AlphaFold Prediction of How Modular Polyketide Synthase Acyl Carrier Proteins Dock With Downstream Ketosynthases. Proteins:Struct., Funct., Bioinf. 2024, 92, 13751384,  DOI: 10.1002/prot.26733
  140. 140
    Araki, M.; Ekimoto, T.; Takemura, K. Molecular Dynamics Unveils Multiple-Site Binding of Inhibitors with Reduced Activity on the Surface of Dihydrofolate Reductase. J. Am. Chem. Soc. 2024, 146, 2868528695,  DOI: 10.1021/jacs.4c04648
  141. 141
    Pimtawong, T.; Ren, J.; Lee, J.; Lee, H.-M.; Na, D. A review on computational models for predicting protein solubility. J. Microbiol. 2025, 63, 2408001  DOI: 10.71150/jm.2408001
  142. 142
    Navarro, S.; Ventura, S. Computational methods to predict protein aggregation. Curr. Opin. Struct. Biol. 2022, 73, 102343  DOI: 10.1016/j.sbi.2022.102343
  143. 143
    Prediction and Evaluation of Protein Aggregation with Computational Methods. In Methods in Molecular Biology; Springer US: New York, NY, 2025; pp 299314  DOI: 10.1007/978-1-0716-4196-5_17 .
  144. 144
    Santos, J.; Pujols, J.; Pallarès, I.; Iglesias, V.; Ventura, S. Computational prediction of protein aggregation: Advances in proteomics, conformation-specific algorithms and biotechnological applications. Comput. Struct. Biotechnol. J. 2020, 18, 14031413,  DOI: 10.1016/j.csbj.2020.05.026
  145. 145
    Arnold, F. H. Design by Directed Evolution. Acc. Chem. Res. 1998, 31, 125131,  DOI: 10.1021/ar960017f
  146. 146
    Arnold, F. H. Directed evolution: Creating biocatalysts for the future. Chem. Eng. Sci. 1996, 51, 50915102,  DOI: 10.1016/S0009-2509(96)00288-6
  147. 147
    Cobb, R. E.; Chao, R.; Zhao, H. Directed evolution: Past, present, and future. AIChE J. 2013, 59, 14321440,  DOI: 10.1002/aic.13995
  148. 148
    Yang, J.; Lal, R. G.; Bowden, J. C. Active learning-assisted directed evolution. Nat. Commun. 2025, 16, 714  DOI: 10.1038/s41467-025-55987-8
  149. 149
    Terashi, G.; Wang, X.; Maddhuri Venkata Subramaniya, S. R.; Tesmer, J. J. G.; Kihara, D. Residue-wise local quality estimation for protein models from cryo-EM maps. Nat. Methods 2022, 19, 11161125,  DOI: 10.1038/s41592-022-01574-4
  150. 150
    Graille, M.; Sacquin-Mora, S.; Taly, A. Best Practices of Using AI-Based Models in Crystallography and Their Impact in Structural Biology. J. Chem. Inf. Model. 2023, 63, 36373646,  DOI: 10.1021/acs.jcim.3c00381
  151. 151
    Wang, X.; Zhu, H.; Terashi, G.; Taluja, M.; Kihara, D. DiffModeler: large macromolecular structure modeling for cryo-EM maps using a diffusion model. Nat. Methods 2024, 21, 23072317,  DOI: 10.1038/s41592-024-02479-0
  152. 152
    Serapian, S. A.; Crosby, J.; Crump, M. P.; Van Der Kamp, M. W. Path to Actinorhodin: Regio- and Stereoselective Ketone Reduction by a Type II Polyketide Ketoreductase Revealed in Atomistic Detail. JACS Au 2022, 2, 972984,  DOI: 10.1021/jacsau.2c00086
  153. 153
    Shukla, V. K.; Karunanithy, G.; Vallurupalli, P.; Hansen, D. F. A combined NMR and deep neural network approach for enhancing the spectral resolution of aromatic side chains in proteins. Sci. Adv. 2024, 10, eadr2155  DOI: 10.1126/sciadv.adr2155
  154. 154
    Drake, Z. C.; Fowler, A. G.; Blum, A. A.; Lindert, S. Enhanced Protein Complex Prediction via Rosetta, AlphaFold, and Nondifferential Covalent Labeling Mass Spectrometry. J. Phys. Chem. B 2025, 129, 64896497,  DOI: 10.1021/acs.jpcb.5c02872
  155. 155
    Lee, C. Y.; Hubrich, D.; Varga, J. K. Systematic discovery of protein interaction interfaces using AlphaFold and experimental validation. Mol. Syst. Biol. 2024, 20, 7597,  DOI: 10.1038/s44320-023-00005-6
  156. 156
    Koehler Leman, J.; Künze, G. Recent Advances in NMR Protein Structure Prediction with ROSETTA. Int. J. Mol. Sci. 2023, 24, 7835,  DOI: 10.3390/ijms24097835
  157. 157
    Alshammari, M.; Wriggers, W.; Sun, J.; He, J. Refinement of AlphaFold2 models against experimental and hybrid cryo-EM density maps. QRB Discovery 2022, 3, e16  DOI: 10.1017/qrd.2022.13
  158. 158
    Humphreys, I. R.; Pei, J.; Baek, M. Computed structures of core eukaryotic protein complexes. Science 2021, 374, eabm4805  DOI: 10.1126/science.abm4805
  159. 159
    Bordin, N.; Sillitoe, I.; Nallapareddy, V. AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms. Commun. Biol. 2023, 6, 160  DOI: 10.1038/s42003-023-04488-9
  160. 160
    Wang, H.; Wang, J. How cryo-electron microscopy and X-ray crystallography complement each other. Protein Sci. 2017, 26, 3239,  DOI: 10.1002/pro.3022

Cited By

Click to copy section linkSection link copied!

This article has not yet been cited by other publications.

ACS Engineering Au

Cite this: ACS Eng. Au 2026, XXXX, XXX, XXX-XXX
Click to copy citationCitation copied!
https://doi.org/10.1021/acsengineeringau.5c00099
Published March 18, 2026

© 2026 The Authors. Published by American Chemical Society. This publication is licensed under

CC-BY-NC-ND 4.0 .

Article Views

606

Altmetric

-

Citations

-
Learn about these metrics

Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.

Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.

The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.

  • Abstract

    Figure 1

    Figure 1. From sequence to protein structure and conformational behavior. (A) Biological information transfer follows a deterministic pathway from DNA to RNA to protein, linking the encoded sequence information to the emergent molecular function and dynamics. (B) Input amino acid sequences serve as the basis for predictive modeling frameworks. (C) Sequence-informed A.I./ML frameworks trained on sequence and structural ensemble data learn the mapping between linear sequences and conformation space. (D) The resulting structural ensemble offers a data-driven view of protein flexibility and structural diversity derived directly from the amino acid sequence. Reprinted or adapted with permission under a CC-BY 3.0 License from Ille et al. (14) Copyright 2025 AIP Publishing.

    Figure 2

    Figure 2. Conceptual view of the protein functional universe. The diagram maps the relationships among sequence, structure, and function spaces. Each circle represents an individual protein defined by its amino acid sequence, 3D folds, and biological activity. The blue circles correspond to proteins accessible through natural evolution or traditional protein engineering, primarily clustered within well-explored regions (yellow). Gray circles indicate proteins that remain uncharacterized and lie within the unexplored sequence–structure–function space. The red circles represent proteins accessible through ML-driven de novo design, which extends exploration beyond natural boundaries into previously inaccessible regions. In this framework, sequence space (top layer) is linked to structure space (middle layer) and ultimately to function space (bottom layer), with A.I. methods systematically probing across all three layers. Reprinted or adapted with permission under a CC-BY 4.0 License from Zhang et al. (22) Copyright 2025 MDPI.

    Figure 3

    Figure 3. Overview of the current protein design dogma. Traditional protein science is often described as a one-way flow in which (A) amino acid sequences give rise to (B) folded structures, which in turn underpin (C) biological function. Modern de novo design inverts this logic: researchers now begin with the desired function and work backward to identify compatible folds and sequences. Current computational frameworks align with three broad strategies: (1) two-stage design, in which structural generators such as ROSETTA, RoseTTAFold, or PyRosetta first propose candidate protein backbones that are then optimized by sequence design engines; (2) sequence-driven methods, exemplified by AlphaFold2 and ColabFold, which predict protein structures directly from amino acid sequence information and are widely used to validate or filter design candidates; and (3) coguided approaches, including multitrack RoseTTAFold variants (RF, RFNA, RFAA) and diffusion-based models (RFDiffusion), which integrate amino acid sequence and protein structure generation simultaneously. These complementary strategies extend the protein design beyond natural sequence–structure relationships, enabling a function-first exploration of protein space. Reprinted or adapted with permission under a CC-BY 4.0 License from Zhang et al. (22) Copyright 2025 MDPI.

    Figure 4

    Figure 4. Overview of the A.I.-driven protein design toolbox. According to their functional roles in A.I.-driven generative protein design, the protein design toolbox can be divided into five categories: (A) structure prediction frameworks (e.g., AlphaFold2, RoseTTAFold) that validate fold accuracy; (B) de novo backbone generators (RFDiffusion, RFDiffusionAA) that embed motifs or active sites into novel folds; (C) fixed-backbone sequence designers (LigandMPNN) that optimize sequences against a defined structural context; (D) sequence generation models (ProteinMPNN), which not only perform fixed-backbone optimization but also function as a generative sampler of amino acid sequences; and (E) sequence–structure cogeneration and refinement frameworks (PLACER), which jointly optimize side chains, ligands, and active-site geometry. Reprinted or adapted with permission under a CC-BY 4.0 License from Zhang et al. (22) Copyright 2025 MDPI.

    Figure 5

    Figure 5. Timeline of major developments in protein structure prediction (black) and design methodologies (red). Following early innovations such as ROSETTA (1998) and PyRosetta (2010), the field saw nearly two decades of incremental progress before the emergence of transformative A.I.-based models such as AlphaFold2 (2020). Since then, breakthroughs in generative frameworks, including ProteinMPNN, RFDiffusion, and LigandMPNN, have rapidly expanded, marking a shift toward integrated prediction-design pipelines.

    Figure 6

    Figure 6. Computational strategies for evaluating amino acid sequence perturbations. (A) Structural stability analysis introduces mutations into a sequence and applies ab initio folding to predict conformational shifts, highlighting favorable and unfavorable perturbations. (B) Binding affinity analysis docks protein constituents, incorporates mutations, and estimates changes in binding free energy (ΔΔG) to evaluate the interaction stability. (C) Interface hotspot probing systematically mutates residues at binding interfaces to pinpoint the positions that are most critical for binding energetics.

    Figure 7

    Figure 7. Computational evaluation of the biological and functional properties of proteins. (A) Molecular dynamics and catalysis simulate mutated proteins in solvated environments to capture conformational flexibility and catalytic changes through trajectory analyses. Hybrid pipelines that integrate molecular dynamics (MD) with ROSETTA and directed evolution have yielded efficient de novo and redesigned biocatalysts, such as HG3.17 and BH32.14, whose catalytic power emerges from MD-guided active-site reorganization and solvent shielding. (B) Solubility analysis predicts the effects of amino acid sequence variation on protein solubility by comparing mutant distributions to wild-type benchmarks. CamSol-based workflows enable the rational optimization of both solubility and conformational stability, as demonstrated for six antibodies (including two approved therapeutics), enhancing developability without compromising binding. (137) (C) Aggregation propensity assesses structural and sequence features to identify residues or motifs that drive aggregation, distinguishing soluble variants from aggregation-prone variants. Using Aggrescan3D, researchers computationally minimized aggregation hotspots to engineer green fluorescent protein (GFP) mutants with significantly improved solubility and reduced aggregation, resulting in a fast-folding, aggregation-resistant variant. (138) Together, these approaches extend computational evaluation to capture dynamic solubility and aggregation behaviors that critically influence protein performance in physiological and industrial contexts.

    Figure 8

    Figure 8. Conceptual framework contrasting traditional and A.I.-assisted directed evolution (DE) workflows. The diagram is divided into two pathways: the upper route represents conventional DE, where (A) natural sequence diversity is explored, (B) mutational libraries are generated, (C) variants are expressed, and (E) high-throughput screening identifies improved candidates through iterative experimental cycles. The lower route introduces A.I./ML-assisted or hybrid methodologies, in which (D) supervised models with uncertainty quantification learn the sequence-fitness landscape and use acquisition functions to propose new variants, balancing exploration (high uncertainty) and exploitation (high predicted fitness). These feedback-driven optimization strategies accelerate variant discovery with a reduced screening effort. Combined approaches, such as active learning-assisted directed evolution (ALDE), have yielded (F) optimized protoglobin-based biocatalysts for nonnative cyclopropanation reactions, enhancing their activity, selectivity, and stability while minimizing experimental costs. (148)

    Figure 9

    Figure 9. Experimental–computational pipeline for protein engineering. (A) Protein mutant libraries are generated by introducing sequence variations across the regions of interest. AlphaFold-guided domain-motif design (e.g., FBXO23-STX1B) has revealed novel regulatory interfaces relevant to therapeutic target discovery. (155) (B) Mutants are expressed in Escherichia coli, yeast, or mammalian systems to generate protein ensembles for screening; such expression-labeling pipelines support enzyme and biocatalyst development used in pharmaceuticals and green chemistry. (154) (C) Structural characterization via cryo-EM, NMR, and X-ray crystallography (and hybrid methods) refined with predictive models such as AlphaFold2 or ROSETTA resolves folding and conformational dynamics, as shown in ribosomal complex refinements in NMR-ROSETTA modeling of ubiquitin. (156,157) (D) Functional screening evaluates the activity and binding properties of mutant sets, such as hydroxyl radical footprinting, to identify active-site or interface residues that control activity and stability, as applied to Hsp90-co-chaperone systems and engineered oxidoreductase. (154) (E) Finally, A.I./ML integration combines experimental data with modeling to predict next-generation variants, accelerating industrial enzyme design, antibody optimization, and biosensor development.

  • References


    This article references 160 other publications.

    1. 1
      Yu, Y.; Hu, C.; Xia, L.; Wang, J. Artificial Metalloenzyme Design with Unnatural Amino Acids and Non-Native Cofactors. ACS Catal. 2018, 8, 18511863,  DOI: 10.1021/acscatal.7b03754
    2. 2
      Mirts, E. N.; Bhagi-Damodaran, A.; Lu, Y. Understanding and Modulating Metalloenzymes with Unnatural Amino Acids, Non-Native Metal Ions, and Non-Native Metallocofactors. Acc. Chem. Res. 2019, 52, 935944,  DOI: 10.1021/acs.accounts.9b00011
    3. 3
      Mann, S. I.; Nayak, A.; Gassner, G. T.; Therien, M. J.; DeGrado, W. F. De Novo Design, Solution Characterization, and Crystallographic Structure of an Abiological Mn–Porphyrin-Binding Protein Capable of Stabilizing a Mn(V) Species. J. Am. Chem. Soc. 2021, 143, 252259,  DOI: 10.1021/jacs.0c10136
    4. 4
      Bergman, M. T.; Xiao, X.; Hall, C. K. In Silico Design and Analysis of Plastic-Binding Peptides. J. Phys. Chem. B 2023, 127, 83708381,  DOI: 10.1021/acs.jpcb.3c04319
    5. 5
      García-Moreno, P. J. Recent advances in the production of emulsifying peptides with the aid of proteomics and bioinformatics. Curr. Opin. Food Sci. 2023, 51, 101039  DOI: 10.1016/j.cofs.2023.101039
    6. 6
      Ndochinwa, G. O.; Wang, Q. Y.; Okoro, N. O. New advances in protein engineering for industrial applications: Key takeaways. Open Life Sci. 2024, 19, 20220856  DOI: 10.1515/biol-2022-0856
    7. 7
      Marcos, E.; Silva, D. Essentials of de novo protein design: Methods and applications. WIREs Comput. Mol. Sci. 2018, 8 (6), e1374  DOI: 10.1002/wcms.1374
    8. 8
      Huang, P.-S.; Boyken, S. E.; Baker, D. The coming of age of de novo protein design. Nature 2016, 537, 320327,  DOI: 10.1038/nature19946
    9. 9
      Woolfson, D. N. A Brief History of De Novo Protein Design: Minimal, Rational, and Computational. J. Mol. Biol. 2021, 433, 167160  DOI: 10.1016/j.jmb.2021.167160
    10. 10
      Dill, K. A.; Ozkan, S. B.; Shell, M. S.; Weikl, T. R. Protein Folding Problem. Annu. Rev. Biophys. 2008, 37, 289316,  DOI: 10.1146/annurev.biophys.37.092707.153558
    11. 11
      Kocher, C. D.; Dill, K. A. Origins of life: The Protein Folding Problem all over again?. Proc. Natl. Acad. Sci. U. S. A. 2024, 121, e2315000121  DOI: 10.1073/pnas.2315000121
    12. 12
      Chen, S.-J.; Hassan, M.; Jernigan, R. L. Protein folds vs. protein folding: Differing questions, different challenges. Proc. Natl. Acad. Sci. U. S. A. 2023, 120, e2214423119  DOI: 10.1073/pnas.2214423119
    13. 13
      Kiss, G.; Çelebi-Ölçüm, N.; Moretti, R.; Baker, D.; Houk, K. N. Computational Enzyme Design. Angew. Chem., Int. Ed. 2013, 52, 57005725,  DOI: 10.1002/anie.201204077
    14. 14
      Ille, A. M.; Anas, E.; Mathews, M. B.; Burley, S. K. From sequence to protein structure and conformational dynamics with artificial intelligence/machine learning. Struct. Dyn. 2025, 12, 030902  DOI: 10.1063/4.0000765
    15. 15
      Anfinsen, C. B. Principles that Govern the Folding of Protein Chains. Science 1973, 181, 223230,  DOI: 10.1126/science.181.4096.223
    16. 16
      Voigt, C. A.; Mayo, S. L.; Arnold, F. H.; Wang, Z.-G. Computational method to reduce the search space for directed protein evolution. Proc. Natl. Acad. Sci. U. S. A. 2001, 98, 37783783,  DOI: 10.1073/pnas.051614498
    17. 17
      Kuhlman, B.; Baker, D. Native protein sequences are close to optimal for their structures. Proc. Natl. Acad. Sci. U. S. A. 2000, 97, 1038310388,  DOI: 10.1073/pnas.97.19.10383
    18. 18
      Sleator, R. D. Solving the protein folding problem···. FEBS Lett. 2024, 598, 28312835,  DOI: 10.1002/1873-3468.15043
    19. 19
      Watson, J. L.; Juergens, D.; Bennett, N. R. De novo design of protein structure and function with RFdiffusion. Nature 2023, 620, 10891100,  DOI: 10.1038/s41586-023-06415-8
    20. 20
      Leveson-Gower, R. B. Designing Enzymatic Reactivity with an Expanded Palette. ChemBioChem 2025, 26, e202500076  DOI: 10.1002/cbic.202500076
    21. 21
      Hartman, M. C. T. Non-canonical Amino Acid Substrates of Escherichia coli Aminoacyl-tRNA Synthetases. ChemBioChem 2022, 23, e202100299  DOI: 10.1002/cbic.202100299
    22. 22
      Zhang, G.; Liu, C.; Lu, J.; Zhang, S.; Zhu, L. The Role of AI-Driven De Novo Protein Design in the Exploration of the Protein Functional Universe. Biology 2025, 14, 1268  DOI: 10.3390/biology14091268
    23. 23
      Rohl, C. A.; Strauss, C. E. M.; Misura, K. M. S.; Baker, D. Protein Structure Prediction Using Rosetta. Methods Enzymol. 2004, 383, 6693,  DOI: 10.1016/S0076-6879(04)83004-0
    24. 24
      Simons, K. T.; Kooperberg, C.; Huang, E.; Baker, D. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and bayesian scoring functions. J. Mol. Biol. 1997, 268, 209225,  DOI: 10.1006/jmbi.1997.0959
    25. 25
      Leman, J. K.; Weitzner, B. D.; Lewis, S. M. Macromolecular modeling and design in Rosetta: recent methods and frameworks. Nat. Methods 2020, 17, 665680,  DOI: 10.1038/s41592-020-0848-2
    26. 26
      Das, R.; Baker, D. Macromolecular Modeling with Rosetta. Annu. Rev. Biochem. 2008, 77, 363382,  DOI: 10.1146/annurev.biochem.77.062906.171838
    27. 27
      Kaufmann, K. W.; Meiler, J. Using RosettaLigand for Small Molecule Docking into Comparative Models. PLoS One 2012, 7, e50769  DOI: 10.1371/journal.pone.0050769
    28. 28
      Lemmon, G.; Kaufmann, K.; Meiler, J. Prediction of HIV-1 Protease/Inhibitor Affinity using RosettaLigand. Chem. Biol. Drug Des. 2012, 79, 888896,  DOI: 10.1111/j.1747-0285.2012.01356.x
    29. 29
      Chaudhury, S.; Lyskov, S.; Gray, J. J. PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics 2010, 26, 689691,  DOI: 10.1093/bioinformatics/btq007
    30. 30
      Le, K. H.; Adolf-Bryfogle, J.; Klima, J. C. PyRosetta Jupyter Notebooks Teach Biomolecular Structure Prediction and Design. Biophysicist 2021, 2, 108122,  DOI: 10.35459/tbp.2019.000147
    31. 31
      Ford, A. S.; Weitzner, B. D.; Bahl, C. D. Integration of the Rosetta suite with the python software stack via reproducible packaging and core programming interfaces for distributed simulation. Protein Sci. 2020, 29, 4351,  DOI: 10.1002/pro.3721
    32. 32
      Van Stappen, C.; Deng, Y.; Liu, Y. Designing Artificial Metalloenzymes by Tuning of the Environment beyond the Primary Coordination Sphere. Chem. Rev. 2022, 122, 1197412045,  DOI: 10.1021/acs.chemrev.2c00106
    33. 33
      Tivon, B.; Wiese, J.; Müller, M. P. Computational Design of Lysine Targeting Covalent Binders Using Rosetta. J. Chem. Inf. Model. 2025, 65, 56125622,  DOI: 10.1021/acs.jcim.5c00212
    34. 34
      Jumper, J.; Evans, R.; Pritzel, A. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583589,  DOI: 10.1038/s41586-021-03819-2
    35. 35
      Tunyasuvunakool, K.; Adler, J.; Wu, Z. Highly accurate protein structure prediction for the human proteome. Nature 2021, 596, 590596,  DOI: 10.1038/s41586-021-03828-1
    36. 36
      Kryshtafovych, A.; Schwede, T.; Topf, M.; Fidelis, K.; Moult, J. Critical assessment of methods of protein structure prediction (CASP)─Round XIV. Proteins:Struct., Funct., Bioinf. 2021, 89, 16071617,  DOI: 10.1002/prot.26237
    37. 37
      Kovalevskiy, O.; Mateos-Garcia, J.; Tunyasuvunakool, K. AlphaFold two years on: Validation and impact. Proc. Natl. Acad. Sci. U. S. A. 2024, 121, e2315002121  DOI: 10.1073/pnas.2315002121
    38. 38
      Schneider, B.; Sweeney, B. A.; Bateman, A. When will RNA get its AlphaFold moment?. Nucleic Acids Res. 2023, 51, 95229532,  DOI: 10.1093/nar/gkad726
    39. 39
      Terwilliger, T. C.; Liebschner, D.; Croll, T. I. AlphaFold predictions are valuable hypotheses and accelerate but do not replace experimental structure determination. Nat. Methods 2024, 21, 110116,  DOI: 10.1038/s41592-023-02087-4
    40. 40
      Mirdita, M.; Schütze, K.; Moriwaki, Y. ColabFold: making protein folding accessible to all. Nat. Methods 2022, 19, 679682,  DOI: 10.1038/s41592-022-01488-1
    41. 41
      Kim, G.; Lee, S.; Levy Karin, E. Easy and accurate protein structure prediction using ColabFold. Nat. Protoc. 2025, 20, 620642,  DOI: 10.1038/s41596-024-01060-5
    42. 42
      Kalogeropoulos, K.; Bohn, M. F.; Jenkins, D. E. A comparative study of protein structure prediction tools for challenging targets: Snake venom toxins. Toxicon 2024, 238, 107559  DOI: 10.1016/j.toxicon.2023.107559
    43. 43
      Baek, M.; DiMaio, F.; Anishchenko, I. Accurate prediction of protein structures and interactions using a three-track neural network. Science 2021, 373, 871876,  DOI: 10.1126/science.abj8754
    44. 44
      Baek, M.; McHugh, R.; Anishchenko, I. Accurate prediction of protein–nucleic acid complexes using RoseTTAFoldNA. Nat. Methods 2024, 21, 117121,  DOI: 10.1038/s41592-023-02086-5
    45. 45
      Krishna, R.; Wang, J.; Ahern, W. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 2024, 384, eadl2528  DOI: 10.1126/science.adl2528
    46. 46
      Liu, S.; Wu, K.; Chen, C. Obtaining protein foldability information from computational models of AlphaFold2 and RoseTTAFold. Comput. Struct. Biotechnol. J. 2022, 20, 44814489,  DOI: 10.1016/j.csbj.2022.08.034
    47. 47
      Wayment-Steele, H. K.; Ojoawo, A.; Otten, R. Predicting multiple conformations via sequence clustering and AlphaFold2. Nature 2024, 625, 832839,  DOI: 10.1038/s41586-023-06832-9
    48. 48
      Casadevall, G.; Duran, C.; Osuna, S. AlphaFold2 and Deep Learning for Elucidating Enzyme Conformational Flexibility and Its Application for Design. JACS Au 2023, 3, 15541562,  DOI: 10.1021/jacsau.3c00188
    49. 49
      Vallejo, W.; Díaz-Uribe, C.; Fajardo, C. Google Colab and Virtual Simulations: Practical e-Learning Tools to Support the Teaching of Thermodynamics and to Introduce Coding to Students. ACS Omega 2022, 7, 74217429,  DOI: 10.1021/acsomega.2c00362
    50. 50
      Adiyaman, R.; Edmunds, N. S.; Genc, A. G.; Alharbi, S. M. A.; McGuffin, L. J. Improvement of protein tertiary and quaternary structure predictions using the ReFOLD refinement method and the AlphaFold2 recycling process. Bioinforma. Adv. 2023, 3 (1), vbad078  DOI: 10.1093/bioadv/vbad078
    51. 51
      Ahern, W.; Yim, J.; Tischer, D. Atom level enzyme active site scaffolding using RFdiffusion2. Nat. Methods 2026, 23, 96105,  DOI: 10.1038/s41592-025-02975-x
    52. 52
      Wang, W.; Feng, C.; Han, R. trRosettaRNA: automated prediction of RNA 3D structure with transformer network. Nat. Commun. 2023, 14, 7266  DOI: 10.1038/s41467-023-42528-4
    53. 53
      Corso, G.; Stärk, H.; Jing, B.; Barzilay, R.; Jaakkola, T. DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking, arXiv:2210.01776. arXiv.org e-Print archive. https://arxiv.org/abs/2210.01776. 2023.
    54. 54
      Alamdari, S. Protein generation with evolutionary diffusion: sequence is all you need. Preprint at https://doi.org/10.1101/2023.09.11.556673. 2023.
    55. 55
      Chu, A. E.; Kim, J.; Cheng, L. An all-atom protein generative model. Proc. Natl. Acad. Sci. U. S. A. 2024, 121, e2311500121  DOI: 10.1073/pnas.2311500121
    56. 56
      Dauparas, J. Robust deep learning based protein sequence design using ProteinMPNN.
    57. 57
      Sumida, K. H.; Núñez-Franco, R.; Kalvet, I. Improving Protein Expression, Stability, and Function with ProteinMPNN. J. Am. Chem. Soc. 2024, 146, 20542061,  DOI: 10.1021/jacs.3c10941
    58. 58
      De Haas, R. J.; Brunette, N.; Goodson, A. Rapid and automated design of two-component protein nanomaterials using ProteinMPNN. Proc. Natl. Acad. Sci. U. S. A. 2024, 121, e2314646121  DOI: 10.1073/pnas.2314646121
    59. 59
      Dauparas, J.; Lee, G. R.; Pecoraro, R. Atomic context-conditioned protein sequence design using LigandMPNN. Nat. Methods 2025, 22, 717723,  DOI: 10.1038/s41592-025-02626-1
    60. 60
      Clark-Elsayed, A. Comparing LigandMPNN and Directed Evolution for Altering the Effector-Binding Site in the RamR Transcription Factor.
    61. 61
      An, L.; Said, M.; Tran, L. Binding and sensing diverse small molecules using shape-complementary pseudocycles. Science 2024, 385, 276282,  DOI: 10.1126/science.adn3780
    62. 62
      Agu, P. C.; Afiukwa, C. A.; Orji, O. U. Molecular docking as a tool for the discovery of molecular targets of nutraceuticals in diseases management. Sci. Rep. 2023, 13, 13398  DOI: 10.1038/s41598-023-40160-2
    63. 63
      Anishchenko, I. Modeling protein-small molecule conformational ensembles with ChemNet. Preprint at https://doi.org/10.1101/2024.09.25.614868. 2024.
    64. 64
      Lauko, A.; Pellock, S. J.; Sumida, K. H. Computational design of serine hydrolases. Science 2025, 388, eadu2454  DOI: 10.1126/science.adu2454
    65. 65
      Park, H.; Zhou, G.; Baek, M.; Baker, D.; DiMaio, F. Force Field Optimization Guided by Small Molecule Crystal Lattice Data Enables Consistent Sub-Angstrom Protein–Ligand Docking. J. Chem. Theory Comput. 2021, 17, 20002010,  DOI: 10.1021/acs.jctc.0c01184
    66. 66
      Garcia, M.; Dixit, S. M.; Rocklin, G. J. Evaluating zero-shot prediction of protein design success by AlphaFold, ESMFold, and ProteinMPNN.
    67. 67
      Kong, Z. ProtFlow: Flow Matching-based Protein Sequence Design with Comprehensive Protein Semantic Distribution Learning and High-quality Generation.
    68. 68
      Elnaggar, A.; Heinzinger, M.; Dallago, C. ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 71127127,  DOI: 10.1109/TPAMI.2021.3095381
    69. 69
      Madani, A. ProGen: Language Modeling for Protein Generation, arXiv:2004.03497. arXiv.org e-Print archive. https://arxiv.org/abs/2004.03497. 2020.
    70. 70
      Nijkamp, E.; Ruffolo, J. A.; Weinstein, E. N.; Naik, N.; Madani, A. ProGen2: Exploring the boundaries of protein language models. Cell Syst. 2023, 14, 968978.e3,  DOI: 10.1016/j.cels.2023.10.002
    71. 71
      Ferruz, N.; Schmidt, S.; Höcker, B. ProtGPT2 is a deep unsupervised language model for protein design. Nat. Commun. 2022, 13, 4348  DOI: 10.1038/s41467-022-32007-7
    72. 72
      Nguyen, E.; Poli, M.; Durrant, M. G. Sequence modeling and design from molecular to genome scale with Evo. Science 2024, 386, eado9336  DOI: 10.1126/science.ado9336
    73. 73
      Dalla-Torre, H.; Gonzalez, L.; Mendoza-Revilla, J. Nucleotide Transformer: building and evaluating robust foundation models for human genomics. Nat. Methods 2025, 22, 287297,  DOI: 10.1038/s41592-024-02523-z
    74. 74
      Avsec, Ž.; Latysheva, N.; Cheng, J. Advancing regulatory variant effect prediction with AlphaGenome. Nature 2026, 649, 12061218,  DOI: 10.1038/s41586-025-10014-0
    75. 75
      Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction.
    76. 76
      Chai Discovery. Chai-1: Decoding the molecular interactions of life. Preprint at https://doi.org/10.1101/2024.10.10.615955. 2024.
    77. 77
      Ingraham, J. B.; Baranov, M.; Costello, Z. Illuminating protein space with a programmable generative model. Nature 2023, 623, 10701078,  DOI: 10.1038/s41586-023-06728-8
    78. 78
      Mille-Fragoso, L. S. Efficient generation of epitope-targeted de novo antibodies with Germinal.
    79. 79
      Pacesa, M.; Nickel, L.; Schellhaas, C. One-shot design of functional protein binders with BindCraft. Nature 2025, 646, 483492,  DOI: 10.1038/s41586-025-09429-6
    80. 80
      BoltzGen: Toward Universal Binder Design.
    81. 81
      Zhang, O. ODesign: A World Model for Biomolecular Interaction Design, arXiv:2510.22304. arXiv.org e-Print archive. https://arxiv.org/abs/2510.22304. 2025.
    82. 82
      Parks, M. Blind Virtual Screening at Scale: A Scalable End-to-End Pipeline for Blind Docking and Affinity Prediction.
    83. 83
      John, P. S. BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery, arXiv:2411.10548. arXiv.org e-Print archive. https://arxiv.org/abs/2411.10548. 2025.
    84. 84
      Silke, D.; Iskander, J.; Pan, J. ProteinDJ : A high-performance and modular protein design pipeline. Protein Sci. 2026, 35, e70464  DOI: 10.1002/pro.70464
    85. 85
      González-Rodríguez, N.; Chacón-Sánchez, C.; Llorca, O.; Fernández-Leiro, R. Automated and modular protein binder design with BinderFlow. PLOS Comput. Biol. 2025, 21, e1013747  DOI: 10.1371/journal.pcbi.1013747
    86. 86
      Danny, B. Ovo, an Open-Source Ecosystem for De Novo Protein Design.
    87. 87
      Baker, D. An exciting but challenging road ahead for computational enzyme design. Protein Sci. 2010, 19, 18171819,  DOI: 10.1002/pro.481
    88. 88
      Beadle, B. M.; Shoichet, B. K. Structural Bases of Stability–function Tradeoffs in Enzymes. J. Mol. Biol. 2002, 321, 285296,  DOI: 10.1016/S0022-2836(02)00599-5
    89. 89
      Barlow, K. A.; Conchúir, S. Ó.; Thompson, S. Flex ddG: Rosetta Ensemble-Based Estimation of Changes in Protein–Protein Binding Affinity upon Mutation. J. Phys. Chem. B 2018, 122, 53895399,  DOI: 10.1021/acs.jpcb.7b11367
    90. 90
      Shringari, S. R.; Giannakoulias, S.; Ferrie, J. J.; Petersson, E. J. Rosetta custom score functions accurately predict ΔΔG of mutations at protein–protein interfaces using machine learning. Chem. Commun. 2020, 56, 67746777,  DOI: 10.1039/D0CC01959C
    91. 91
      Smith, S. T.; Meiler, J. Assessing multiple score functions in Rosetta for drug discovery. PLoS One 2020, 15, e0240450  DOI: 10.1371/journal.pone.0240450
    92. 92
      Alford, R. F.; Leaver-Fay, A.; Jeliazkov, J. R. The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design. J. Chem. Theory Comput. 2017, 13, 30313048,  DOI: 10.1021/acs.jctc.7b00125
    93. 93
      Tyka, M. D.; Keedy, D. A.; André, I. Alternate States of Proteins Revealed by Detailed Energy Landscape Mapping. J. Mol. Biol. 2011, 405, 607618,  DOI: 10.1016/j.jmb.2010.11.008
    94. 94
      Planas-Iglesias, J.; Marques, S. M.; Pinto, G. P. Computational design of enzymes for biotechnological applications. Biotechnol. Adv. 2021, 47, 107696  DOI: 10.1016/j.biotechadv.2021.107696
    95. 95
      Guo, H.-B.; Perminov, A.; Bekele, S. AlphaFold2 models indicate that protein sequence determines both structure and dynamics. Sci. Rep. 2022, 12, 10696  DOI: 10.1038/s41598-022-14382-9
    96. 96
      Agarwal, V.; McShan, A. C. The power and pitfalls of AlphaFold2 for structure prediction beyond rigid globular proteins. Nat. Chem. Biol. 2024, 20, 950959,  DOI: 10.1038/s41589-024-01638-w
    97. 97
      Abramson, J.; Adler, J.; Dunger, J. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024, 630, 493500,  DOI: 10.1038/s41586-024-07487-w
    98. 98
      Friedland, G. D.; Linares, A. J.; Smith, C. A.; Kortemme, T. A Simple Model of Backbone Flexibility Improves Modeling of Side-chain Conformational Variability. J. Mol. Biol. 2008, 380, 757774,  DOI: 10.1016/j.jmb.2008.05.006
    99. 99
      Kellogg, E. H.; Leaver-Fay, A.; Baker, D. Role of conformational sampling in computing mutation-induced changes in protein structure and stability. Proteins:Struct., Funct., Bioinf. 2011, 79, 830838,  DOI: 10.1002/prot.22921
    100. 100
      Durham, E.; Dorr, B.; Woetzel, N.; Staritzbichler, R.; Meiler, J. Solvent accessible surface area approximations for rapid and accurate protein structure prediction. J. Mol. Model. 2009, 15, 10931108,  DOI: 10.1007/s00894-009-0454-9
    101. 101
      Bertalan, É.; Lešnik, S.; Bren, U.; Bondar, A.-N. Protein-water hydrogen-bond networks of G protein-coupled receptors: Graph-based analyses of static structures and molecular dynamics. J. Struct. Biol. 2020, 212, 107634  DOI: 10.1016/j.jsb.2020.107634
    102. 102
      Weiss, G. A.; Watanabe, C. K.; Zhong, A.; Goddard, A.; Sidhu, S. S. Rapid mapping of protein functional epitopes by combinatorial alanine scanning. Proc. Natl. Acad. Sci. U. S. A. 2000, 97, 89508954,  DOI: 10.1073/pnas.160252097
    103. 103
      Cunningham, B. C.; Wells, J. A. High-Resolution Epitope Mapping of hGH-Receptor Interactions by Alanine-Scanning Mutagenesis. Science 1989, 244, 10811085,  DOI: 10.1126/science.2471267
    104. 104
      Liu, H.; Song, L.; Meng, X. Proline-Mediated Enhancement in Evolvability of Disulfide-Rich Peptides for Discovering Protein Binders. J. Am. Chem. Soc. 2025, 147, 2487024883,  DOI: 10.1021/jacs.5c07075
    105. 105
      Holden, J. K.; Pavlovicz, R.; Gobbi, A.; Song, Y.; Cunningham, C. N. Computational Site Saturation Mutagenesis of Canonical and Non-Canonical Amino Acids to Probe Protein-Peptide Interactions. Front. Mol. Biosci. 2022, 9, 848689  DOI: 10.3389/fmolb.2022.848689
    106. 106
      Spina, S. C.; Bailey, J.; Kimmel, B. Bind, catalyze, and quantify: a modern protein and enzyme engineering toolbox of genetically encoded non-canonical amino acids Protein Eng. Des. Sel. 2026gzag007 DOI: 10.1093/protein/gzag007 .
    107. 107
      Chen, Y.; Clay, N.; Phan, N. Molecular Matchmakers: Bioconjugation Techniques Enhance Prodrug Potency for Immunotherapy. Mol. Pharmaceutics 2025, 22, 5880,  DOI: 10.1021/acs.molpharmaceut.4c00867
    108. 108
      Wang, R.; Fang, X.; Lu, Y.; Wang, S. The PDBbind Database: Collection of Binding Affinities for Protein–Ligand Complexes with Known Three-Dimensional Structures. J. Med. Chem. 2004, 47, 29772980,  DOI: 10.1021/jm030580l
    109. 109
      Liu, Z.; Su, M.; Han, L. Forging the Basis for Developing Protein–Ligand Interaction Scoring Functions. Acc. Chem. Res. 2017, 50, 302309,  DOI: 10.1021/acs.accounts.6b00491
    110. 110
      King, B. R.; Sumida, K. H.; Caruso, J. L.; Baker, D.; Zalatan, J. G. Computational Stabilization of a Non-Heme Iron Enzyme Enables Efficient Evolution of New Function. Angew. Chem., Int. Ed. 2025, 64, e202414705  DOI: 10.1002/anie.202414705
    111. 111
      Howlader, M. T. H.; Kagawa, Y.; Miyakawa, A. Alanine Scanning Analyses of the Three Major Loops in Domain II of Bacillus thuringiensis Mosquitocidal Toxin Cry4Aa. Appl. Environ. Microbiol. 2010, 76, 860865,  DOI: 10.1128/AEM.02175-09
    112. 112
      Paul, R.; Kasahara, K.; Sasaki, J. Unveiling the affinity–stability relationship in anti-measles virus antibodies: a computational approach for hotspots prediction. Front. Mol. Biosci. 2024, 10, 1302737  DOI: 10.3389/fmolb.2023.1302737
    113. 113
      Romero-Rivera, A.; Garcia-Borràs, M.; Osuna, S. Role of Conformational Dynamics in the Evolution of Retro-Aldolase Activity. ACS Catal. 2017, 7, 85248532,  DOI: 10.1021/acscatal.7b02954
    114. 114
      Lemkul, J. A. Introductory Tutorials for Simulating Protein Dynamics with GROMACS. J. Phys. Chem. B 2024, 128, 94189435,  DOI: 10.1021/acs.jpcb.4c04901
    115. 115
      Sanbonmatsu, K. Y.; Joseph, S.; Tung, C.-S. Simulating movement of tRNA into the ribosome during decoding. Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 1585415859,  DOI: 10.1073/pnas.0503456102
    116. 116
      Li, R.; Macnamara, L.; Leuchter, J.; Alexander, R.; Cho, S. MD Simulations of tRNA and Aminoacyl-tRNA Synthetases: Dynamics, Folding, Binding, and Allostery. Int. J. Mol. Sci. 2015, 16, 1587215902,  DOI: 10.3390/ijms160715872
    117. 117
      Patel, S.; Hosur, R. V. Replica exchange molecular dynamics simulations reveal self-association sites in M-Crystallin caused by mutations provide insights of cataract. Sci. Rep. 2021, 11, 23270  DOI: 10.1038/s41598-021-02728-8
    118. 118
      Stelzl, L. S.; Hummer, G. Kinetics from Replica Exchange Molecular Dynamics Simulations. J. Chem. Theory Comput. 2017, 13, 39273935,  DOI: 10.1021/acs.jctc.7b00372
    119. 119
      Feig, M.; Nawrocki, G.; Yu, I.; Wang, P.; Sugita, Y. Challenges and opportunities in connecting simulations with experiments via molecular dynamics of cellular environments. J. Phys. Conf. Ser. 2018, 1036, 012010  DOI: 10.1088/1742-6596/1036/1/012010
    120. 120
      Kumari, I.; Sandhu, P.; Ahmed, M.; Akhter, Y. Molecular Dynamics Simulations, Challenges and Opportunities: A Biologist’s Prospective. Curr. Protein Pept. Sci. 2017, 18, 11631179,  DOI: 10.2174/1389203718666170622074741
    121. 121
      Senn, H. M.; Thiel, W. QM/MM Methods for Biomolecular Systems. Angew. Chem., Int. Ed. 2009, 48, 11981229,  DOI: 10.1002/anie.200802019
    122. 122
      Lopes, P. E. M.; Guvench, O.; MacKerell, A. D. Current Status of Protein Force Fields for Molecular Dynamics Simulations. In Molecular Modeling of Proteins; Kukol, A., Ed.; Springer: New York, New York, NY, 2015; Vol. 1215, pp 4771.
    123. 123
      McMillin, D. R. Interatomic Repulsion and the Pauli Principle. J. Chem. Educ. 2021, 98, 29122918,  DOI: 10.1021/acs.jchemed.1c00326
    124. 124
      Guvench, O.; MacKerell, A. D. Comparison of Protein Force Fields for Molecular Dynamics Simulations. In Molecular Modeling of Proteins; Kukol, A., Ed.; Humana Press: Totowa, NJ, 2008; Vol. 443, pp 6388.
    125. 125
      Warshel, A.; Sharma, P. K.; Kato, M. Electrostatic Basis for Enzyme Catalysis. Chem. Rev. 2006, 106, 32103235,  DOI: 10.1021/cr0503106
    126. 126
      Van Der Kamp, M. W.; Mulholland, A. J. Combined Quantum Mechanics/Molecular Mechanics (QM/MM) Methods in Computational Enzymology. Biochemistry 2013, 52, 27082728,  DOI: 10.1021/bi400215w
    127. 127
      Magliery, T. J. Protein stability: computation, sequence statistics, and new experimental methods. Curr. Opin. Struct. Biol. 2015, 33, 161168,  DOI: 10.1016/j.sbi.2015.09.002
    128. 128
      Singh, A.; Upadhyay, V.; Upadhyay, A. K.; Singh, S. M.; Panda, A. K. Protein recovery from inclusion bodies of Escherichia coli using mild solubilization process. Microb. Cell Factories 2015, 14, 41  DOI: 10.1186/s12934-015-0222-8
    129. 129
      Sormanni, P.; Aprile, F. A.; Vendruscolo, M. The CamSol Method of Rational Design of Protein Mutants with Enhanced Solubility. J. Mol. Biol. 2015, 427, 478490,  DOI: 10.1016/j.jmb.2014.09.026
    130. 130
      Li, B.; Ming, D. GATSol, an enhanced predictor of protein solubility through the synergy of 3D structure graph and large language modeling. BMC Bioinf. 2024, 25, 204  DOI: 10.1186/s12859-024-05820-8
    131. 131
      Tan, Y.; Zheng, J.; Hong, L.; Zhou, B. ProtSolM: Protein Solubility Prediction with Multi-modal Features, arXiv:2406.19744. arXiv.org e-Print archive. https://arxiv.org/abs/2406.19744. 2024.
    132. 132
      Ghosh, D.; Biswas, A.; Radhakrishna, M. Advanced computational approaches to understand protein aggregation. Biophys. Rev. 2024, 5, 021302  DOI: 10.1063/5.0180691
    133. 133
      Oeller, M.; Kang, R.; Bell, R. Sequence-based prediction of pH-dependent protein solubility using CamSol. Brief. Bioinform. 2023, 24, bbad004  DOI: 10.1093/bib/bbad004
    134. 134
      Kimmel, B. R.; Mrksich, M. Development of an Enzyme-Inhibitor Reaction Using Cellular Retinoic Acid Binding Protein II for One-Pot Megamolecule Assembly. Chem. - Eur. J. 2021, 27, 1784317848,  DOI: 10.1002/chem.202103059
    135. 135
      Kimmel, B. R.; Modica, J. A.; Parker, K.; Dravid, V.; Mrksich, M. Solid-Phase Synthesis of Megamolecules. J. Am. Chem. Soc. 2020, 142, 45344538,  DOI: 10.1021/jacs.9b12003
    136. 136
      Adomanis, R.; Phan, N.; Walter, G.; Kimmel, B. R. Modular Nanobody Conjugates with Controlled Topology Using Genetically Encoded Non-canonical Amino Acids. Preprint at https://doi.org/10.1101/2025.11.27.691038. 2025.
    137. 137
      Rosace, A.; Bennett, A.; Oeller, M. Automated optimization of solubility and conformational stability of antibodies and proteins. Nat. Commun. 2023, 14, 1937  DOI: 10.1038/s41467-023-37668-6
    138. 138
      Kuriata, A.; Iglesias, V.; Pujols, J. Aggrescan3D (A3D) 2.0: prediction and engineering of protein solubility. Nucleic Acids Res. 2019, 47, W300W307,  DOI: 10.1093/nar/gkz321
    139. 139
      Hirsch, M.; Desai, R. R.; Annaswamy, S.; Keatinge-Clay, A. T. Mutagenesis Supports AlphaFold Prediction of How Modular Polyketide Synthase Acyl Carrier Proteins Dock With Downstream Ketosynthases. Proteins:Struct., Funct., Bioinf. 2024, 92, 13751384,  DOI: 10.1002/prot.26733
    140. 140
      Araki, M.; Ekimoto, T.; Takemura, K. Molecular Dynamics Unveils Multiple-Site Binding of Inhibitors with Reduced Activity on the Surface of Dihydrofolate Reductase. J. Am. Chem. Soc. 2024, 146, 2868528695,  DOI: 10.1021/jacs.4c04648
    141. 141
      Pimtawong, T.; Ren, J.; Lee, J.; Lee, H.-M.; Na, D. A review on computational models for predicting protein solubility. J. Microbiol. 2025, 63, 2408001  DOI: 10.71150/jm.2408001
    142. 142
      Navarro, S.; Ventura, S. Computational methods to predict protein aggregation. Curr. Opin. Struct. Biol. 2022, 73, 102343  DOI: 10.1016/j.sbi.2022.102343
    143. 143
      Prediction and Evaluation of Protein Aggregation with Computational Methods. In Methods in Molecular Biology; Springer US: New York, NY, 2025; pp 299314  DOI: 10.1007/978-1-0716-4196-5_17 .
    144. 144
      Santos, J.; Pujols, J.; Pallarès, I.; Iglesias, V.; Ventura, S. Computational prediction of protein aggregation: Advances in proteomics, conformation-specific algorithms and biotechnological applications. Comput. Struct. Biotechnol. J. 2020, 18, 14031413,  DOI: 10.1016/j.csbj.2020.05.026
    145. 145
      Arnold, F. H. Design by Directed Evolution. Acc. Chem. Res. 1998, 31, 125131,  DOI: 10.1021/ar960017f
    146. 146
      Arnold, F. H. Directed evolution: Creating biocatalysts for the future. Chem. Eng. Sci. 1996, 51, 50915102,  DOI: 10.1016/S0009-2509(96)00288-6
    147. 147
      Cobb, R. E.; Chao, R.; Zhao, H. Directed evolution: Past, present, and future. AIChE J. 2013, 59, 14321440,  DOI: 10.1002/aic.13995
    148. 148
      Yang, J.; Lal, R. G.; Bowden, J. C. Active learning-assisted directed evolution. Nat. Commun. 2025, 16, 714  DOI: 10.1038/s41467-025-55987-8
    149. 149
      Terashi, G.; Wang, X.; Maddhuri Venkata Subramaniya, S. R.; Tesmer, J. J. G.; Kihara, D. Residue-wise local quality estimation for protein models from cryo-EM maps. Nat. Methods 2022, 19, 11161125,  DOI: 10.1038/s41592-022-01574-4
    150. 150
      Graille, M.; Sacquin-Mora, S.; Taly, A. Best Practices of Using AI-Based Models in Crystallography and Their Impact in Structural Biology. J. Chem. Inf. Model. 2023, 63, 36373646,  DOI: 10.1021/acs.jcim.3c00381
    151. 151
      Wang, X.; Zhu, H.; Terashi, G.; Taluja, M.; Kihara, D. DiffModeler: large macromolecular structure modeling for cryo-EM maps using a diffusion model. Nat. Methods 2024, 21, 23072317,  DOI: 10.1038/s41592-024-02479-0
    152. 152
      Serapian, S. A.; Crosby, J.; Crump, M. P.; Van Der Kamp, M. W. Path to Actinorhodin: Regio- and Stereoselective Ketone Reduction by a Type II Polyketide Ketoreductase Revealed in Atomistic Detail. JACS Au 2022, 2, 972984,  DOI: 10.1021/jacsau.2c00086
    153. 153
      Shukla, V. K.; Karunanithy, G.; Vallurupalli, P.; Hansen, D. F. A combined NMR and deep neural network approach for enhancing the spectral resolution of aromatic side chains in proteins. Sci. Adv. 2024, 10, eadr2155  DOI: 10.1126/sciadv.adr2155
    154. 154
      Drake, Z. C.; Fowler, A. G.; Blum, A. A.; Lindert, S. Enhanced Protein Complex Prediction via Rosetta, AlphaFold, and Nondifferential Covalent Labeling Mass Spectrometry. J. Phys. Chem. B 2025, 129, 64896497,  DOI: 10.1021/acs.jpcb.5c02872
    155. 155
      Lee, C. Y.; Hubrich, D.; Varga, J. K. Systematic discovery of protein interaction interfaces using AlphaFold and experimental validation. Mol. Syst. Biol. 2024, 20, 7597,  DOI: 10.1038/s44320-023-00005-6
    156. 156
      Koehler Leman, J.; Künze, G. Recent Advances in NMR Protein Structure Prediction with ROSETTA. Int. J. Mol. Sci. 2023, 24, 7835,  DOI: 10.3390/ijms24097835
    157. 157
      Alshammari, M.; Wriggers, W.; Sun, J.; He, J. Refinement of AlphaFold2 models against experimental and hybrid cryo-EM density maps. QRB Discovery 2022, 3, e16  DOI: 10.1017/qrd.2022.13
    158. 158
      Humphreys, I. R.; Pei, J.; Baek, M. Computed structures of core eukaryotic protein complexes. Science 2021, 374, eabm4805  DOI: 10.1126/science.abm4805
    159. 159
      Bordin, N.; Sillitoe, I.; Nallapareddy, V. AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms. Commun. Biol. 2023, 6, 160  DOI: 10.1038/s42003-023-04488-9
    160. 160
      Wang, H.; Wang, J. How cryo-electron microscopy and X-ray crystallography complement each other. Protein Sci. 2017, 26, 3239,  DOI: 10.1002/pro.3022