• Editors Choice
Perspective

Practically Significant Method Comparison Protocols for Machine Learning in Small Molecule Drug Discovery
Click to copy article linkArticle link copied!

Other Access OptionsSupporting Information (1)

Journal of Chemical Information and Modeling

Cite this: J. Chem. Inf. Model. 2025, 65, 18, 9398–9411
Click to copy citationCitation copied!
https://doi.org/10.1021/acs.jcim.5c01609
Published September 11, 2025
Copyright © 2025 American Chemical Society

Abstract

Click to copy section linkSection link copied!
Abstract Image

Machine Learning (ML) methods that relate molecular structure to properties are frequently proposed as in silico surrogates for expensive or time-consuming experiments. In small molecule drug discovery, such methods inform high-stakes decisions like compound synthesis and in vivo studies. This application lies at the intersection of multiple scientific disciplines. When comparing new ML methods to baseline or state-of-the-art approaches, statistically rigorous method comparison protocols and domain-appropriate performance metrics are essential to ensure replicability and ultimately the adoption of ML in small molecule drug discovery. This paper proposes a set of guidelines to incentivize rigorous and domain-appropriate techniques for method comparison tailored to small molecule property modeling. These guidelines, accompanied by annotated examples using open-source software tools, lay a foundation for robust ML benchmarking and thus the development of more impactful methods.

Copyright © 2025 American Chemical Society

Read this article

To access this article, please review the available access options below.

Get instant access

Purchase Access

Read this article for 48 hours. Check out below using your ACS ID or as a guest.

Recommended

Access through Your Institution

You may have access to this article through your institution.

Your institution does not have access to this content. Add or change your institution or let them know you’d like them to include access.

Supporting Information

Click to copy section linkSection link copied!

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.5c01609.

  • Additional best practices for exceptional cases, a detailed description of our cross-validation experiment, background information on statistical testing, as well as on performance metrics, and details on a supporting visualization (PDF)

Terms & Conditions

Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.

Cited By

Click to copy section linkSection link copied!

This article is cited by 23 publications.

  1. Woruo Chen, Yao Tian, Nian Liao, Youchao Deng, Dejun Jiang, Dongsheng Cao. TabPFN Opens New Avenues for Small-Data Tabular Learning in Drug Discovery. Journal of Chemical Information and Modeling 2026, Article ASAP.
  2. Karmen Čondić-Jurkić, Irfan Alibay, Woody Sherman, Mallory R. Tollefson, W. Patrick Walters, Zachary Baker, Lillian T. Chong, Jennifer N. Wei, Jeffrey Gray, Brian D. Weitzner, Daniel G. A. Smith, Julia Koehler Leman, Chris Bahl, David L. Mobley. The Open Molecular Software Foundation (OMSF) and the Growing Role of Open Source Software in Molecular Modeling. Journal of Chemical Information and Modeling 2026, 66 (6) , 2967-2984. https://doi.org/10.1021/acs.jcim.5c03137
  3. Manal A. Nael, Laxman M. Alakonda, Khaled M. Elokely. Defining the Data set Defines the QSAR Claim. Journal of Chemical Information and Modeling 2026, 66 (6) , 2951-2954. https://doi.org/10.1021/acs.jcim.6c00514
  4. Martin Adamczewski, Britta Nisius, Nina Kausch-Busies, Niklas Tötsch. Combining High-Throughput Screening and In Silico Modeling to Derisk Novel Agrochemicals for Androgen Receptor Binding. Chemical Research in Toxicology 2026, Article ASAP.
  5. Cole Baker, Francis A. Acquah, Lakshmi G. Chivukula, Liping Wu, Laurence Philippe-Venec, Mostafa Abedi, Yujun Tao, Daniel Ramírez, Matthew D. McCoy, Brandon Moore, Jennifer O. Asher. PEGASUS: Unlocking Polarity in Cell-Permeable Cyclic Peptides Using AI Models Built on Massively Parallel Biological Assays. Journal of Medicinal Chemistry 2026, 69 (5) , 5175-5198. https://doi.org/10.1021/acs.jmedchem.5c01836
  6. Raquel Parrondo-Pizarro, Jessica Lanini, Raquel Rodríguez-Pérez. Uncertainty Quantification in Molecular Machine Learning for Property Predictions under Data Shifts. Journal of Chemical Information and Modeling 2026, 66 (2) , 923-935. https://doi.org/10.1021/acs.jcim.5c02381
  7. Mohamed Kouider Amar, Mohamed Hentabli, Nabil Touzout, Ilhem Bouaziz, Soufiane Rahal, Maamar Laidi, Abdeltif Amrane, Salah Hanini, Jie Zhang, Muhammad Farhan Saeed, Aftab Jamal. Interpretable Yield Prediction of Supercritical CO2 Extraction from Various Essential Oil Sources Using Optimized Machine Learning and PCA-Based Descriptors. Journal of Chemical Information and Modeling 2026, 66 (1) , 194-215. https://doi.org/10.1021/acs.jcim.5c02171
  8. Jie Li, Santeri Aikonen, Nicolás M. Morato, Bo Hao, Zhicai Shi, Iulia I. Strambeanu, R. Graham Cooks. Catalyst-Free C–N Coupling under Ambient Conditions via High-Throughput Microdroplet Reactions. The Journal of Organic Chemistry 2025, 90 (51) , 18172-18180. https://doi.org/10.1021/acs.joc.5c02022
  9. Bola Khalil, Kajetan Schweighofer, Natalia Dyubankova, Gerard J. P. van Westen, Herman van Vlijmen. Combining Bayesian and Evidential Uncertainty Quantification for Improved Bioactivity Modeling. Journal of Chemical Information and Modeling 2025, 65 (24) , 13057-13069. https://doi.org/10.1021/acs.jcim.5c01597
  10. Yaëlle Fischer, Thibaud Southiratn, Dhoha Triki, Ruel Cedeno. Deep Learning vs Classical Methods in Potency and ADME Prediction: Insights from a Computational Blind Challenge. Journal of Chemical Information and Modeling 2025, 65 (24) , 13115-13131. https://doi.org/10.1021/acs.jcim.5c01982
  11. Rajarshi Guha. Paths to cheminformatics: Q&A with Rajarshi Guha. Journal of Cheminformatics 2026, 18 (1) https://doi.org/10.1186/s13321-025-01133-x
  12. Tiantao Liu, Jiangcheng Xu, Xinke Zhan, Shaolong Lin, Shirley W. I. Siu. Enzyformer: a two-stage pretrained model for enzymatic retrosynthesis. Journal of Cheminformatics 2026, 18 (1) https://doi.org/10.1186/s13321-026-01164-y
  13. Zhenyong Cheng, Dinghao Liu, Yuanpeng Fu, Kewei Sheng, Yan Xing, Yanling Qiao, Shangxuan Cai, Jubo Wang, Peng Xu, Bin Di, Jun Liao. MSIGN: A deep learning framework based on multi-scale interaction graph neural networks for predicting binding of synthetic cannabinoids to receptors. Digital Discovery 2026, 5 (3) , 1351-1362. https://doi.org/10.1039/D5DD00317B
  14. Jibai Li, Xintong Qu, Wenlong Zhang, Shifa Zhong. Collision-free morgan fingerprints: a principled approach to enhance machine learning performance and interpretability in chemistry. Journal of Cheminformatics 2026, 5 https://doi.org/10.1186/s13321-026-01170-0
  15. Petra Čechová, Petra Kührová, Martin Šrejber, Mariana Valério, Mariia Borbuliak, Paulo C. T. Souza, Michal Otyepka, Markéta Paloncýová. Computational Microscopy of Lipid Confined Systems: Challenges and Opportunities. Small Structures 2026, 7 (3) https://doi.org/10.1002/sstr.202500697
  16. Amit Gangwal, Antonio Lavecchia. IMPACT Framework: Establishing Global Standards for Artificial Intelligence Implementation, Methodology, and Translation in Drug Discovery. WIREs Computational Molecular Science 2026, 16 (2) https://doi.org/10.1002/wcms.70072
  17. Simon D Rihm, Aleksandar Kondinski, Markus Kraft. Product design, synthesis, and lab automation with The World Avatar. Current Opinion in Chemical Engineering 2026, 51 , 101203. https://doi.org/10.1016/j.coche.2025.101203
  18. Rafael F. Lameiro, Luiz F. Barbosa, Evelin R. Cardoso, Beatriz Siqueira Ho, Felipe Cardoso Prado Martins, Bruna C. de Melo, Fabiana Rosini, Anwar Shamim, Priscila M. Souza, Wellington Falcão de Souza, Carlos A. Montanari. Machine Learning‐Guided Repositioning of a SARS‐CoV‐2‐Targeting Molecular Series as Cruzain Inhibitors. ChemMedChem 2026, 21 (2) https://doi.org/10.1002/cmdc.202500630
  19. Peer Schliephacke, Daniel Kuhn, Lukas Friedrich. Improving Absorption, Distribution, Metabolism, and Excretion Property Predictions by Integrating Public and Proprietary Data. ChemMedChem 2026, 21 (1) https://doi.org/10.1002/cmdc.202500713
  20. Oleg V. Tinkov, Pavel E. Gurevich, Sergei A. Nikolenko, Shamil D. Kadyrov, Natalya S. Bogatyreva, Veniamin Y. Grigorev, Dmitry N. Ivankov, Marina A. Pak. KRASAVA—An Expert System for Virtual Screening of KRAS G12D Inhibitors. International Journal of Molecular Sciences 2026, 27 (1) , 120. https://doi.org/10.3390/ijms27010120
  21. Daniyar Mazitov, Timur Gimadiev, Assima Poyezzhayeva, Valentina Afonina, Timur Madzhidov. Conditional Variational AutoEncoder to Predict Suitable Conditions for Hydrogenation Reactions. Molecules 2026, 31 (1) , 75. https://doi.org/10.3390/molecules31010075
  22. Raquel Parrondo-Pizarro, Luca Menestrina, Ricard Garcia-Serna, Adrià Fernández-Torras, Jordi Mestres. Enhancing molecular property prediction through data integration and consistency assessment. Journal of Cheminformatics 2025, 17 (1) https://doi.org/10.1186/s13321-025-01103-3
  23. Kiarash Farajzadehahary, Shaghayegh Hamzehlou, Nicholas Ballard. Adding machine learning to the polymer reaction engineering toolbox. Progress in Polymer Science 2025, 170 , 102029. https://doi.org/10.1016/j.progpolymsci.2025.102029

Journal of Chemical Information and Modeling

Cite this: J. Chem. Inf. Model. 2025, 65, 18, 9398–9411
Click to copy citationCitation copied!
https://doi.org/10.1021/acs.jcim.5c01609
Published September 11, 2025
Copyright © 2025 American Chemical Society

Article Views

14k

Altmetric

-

Citations

Learn about these metrics

Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.

Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.

The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.