Data-Driven Learning Systems for Chemical Reaction Prediction: An Analysis of Recent Approaches
- Philippe Schwaller *Philippe Schwaller*E-mail: [email protected]IBM Research – Zurich, Rueschlikon 8803, SwitzerlandDepartment of Chemistry and Biochemistry, University of Berne, Berne 3012, SwitzerlandMore by Philippe Schwaller
- and
- Teodoro Laino
Abstract
One of the critical challenges in efficient synthesis route design is the accurate prediction of chemical reactivity. Unlocking it could significantly facilitate chemical synthesis and hence, accelerate the discovery of novel molecules and materials. With the current rise of artificial intelligence (AI) algorithms, access to cheap computing power, and the wide availability of chemical data, it became possible to develop entirely data-driven mathematical models able to predict chemical reactivity. Similar to how a human chemist would learn chemical reactions, those learn by repeatedly looking at examples, the underlying patterns in the data. In this chapter, we compare the state-of-the-art data-driven learning systems for forward chemical reaction prediction, analyzing the reaction representations, the data, and the model architectures. We discuss the advantages and limitations of the different AI model strategies and make comparisons on standard open-source benchmark datasets. The intention is to provide a critical assessment of the different data-driven approaches recently developed not only for the cheminformatics community, but also for the AI models end-users, the organic chemists, and for early adoption of such technologies.


