Preface

Preface

  • Edward O. Pyzer-Knapp
    Edward O. Pyzer-Knapp
    IBM Research—UK Daresbury, UK
  •  and 
  • Teodoro Laino
    Teodoro Laino
    IBM Research—Zurich Rueschlikon, Switzerland
DOI: 10.1021/bk-2019-1326.pr001
  • Free to Read
Publication Date (Web):November 20, 2019
Copyright © 2019 American Chemical Society. This publication is available under these Terms of Use.
Machine Learning in Chemistry: Data-Driven Algorithms, Learning Systems, and Predictions
pp ix-x
ACS Symposium SeriesVol. 1326
ISBN13: 9780841235052eISBN: 9780841235045

Chapter Views

520

Citations

1
LEARN ABOUT THESE METRICS

Chapter Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.

Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.

PDF (815 KB)

Recent prominent successes within areas such as natural language processing, voice and image analysis, enabled by growth in accessible computational power driven by accelerators such as the GPU and accessibility to larger and more complex datasets has revived the excitement around machine learning.

It is inevitable that these developments have affected disciplines outside of computer science, and one of the front-runners has been the field of chemistry and chemical discovery. With several successful applications, machine learning – and more generally the field of Artificial Intelligence (AI) - is more and more considered the answer to boost the development of new molecules, materials, formulations or processes. In fact, by reasoning on a large data volume practitioners can gain insight perhaps hidden in the sheer volume of data now available, thus reducing the number of trial-and-error experiments, thereby saving time and money.

Machine learning and artificial intelligence offer the possibility of training computers by using the properties of materials that we already know, to describe and reason complex physical systems without the need to have an analytical representation. Recently, we have seen an explosion in range of applications of machine learning in chemistry, including areas such as QSAR, chemical reaction prediction, protein structure prediction, quantum chemistry, and inverse materials design. In fact, due the great amount of information contained within chemical databases arising from research and industry, machine learning can ensure that useful, but often hidden, information contained in the data is interpreted effectively and utilized to its fullest potential. When carefully utilized, this has the potential to drive a paradigm shift in research, by finding trends that a human researcher may miss due to bias towards a given interpretation.

If machine learning and AI are the vehicles for such a transformation, then data is the fuel. Key drivers that strongly fuel application of these approaches in chemistry are the growth of open databases and the quality of data which is now being recorded, enabled by the introduction of automated data collection systems in the lab. To labour a metaphor, all the fuel in the world is useless without a good engine to convert it into useful energy. In the case of machine learning, this role is played by the data representation (aka descriptos). Recently, we have seen the extensive development of a large number of molecular descriptors – some of which are learned directly from the data - that enable the combination of chemical knowledge and domain expertise that can be used to represent the complexity of chemistry in a way which is understandable to an AI system.

We are experiencing a paradigm change on how chemists will do research in the future: the use of AI and machine learning in chemistry is evolving from a somewhat isolated area of research into an integral part of the scientific method. Intelligent or cognitive modeling will enable the creation of tools that can be easily implemented into laboratory equipment. With the capability of integrating optimized software to carry out many tasks, these algorithms can be coupled with data generating systems and directly provide outputs for many purposes in the chemical fields. On the other hand, the intrinsic nature of computational artificial intelligence allows for the ability to update and refresh the models as new data is generated, leading to more robust tools that cover larger windows of operation and eliminate negative confounding factors.

Here, we review in six different chapters, cutting-edge applications of AI in chemistry and material science without professing completeness. In these pages we try to capture the excitement of few selected contributions that show different domains of applicability of AI across the spectrum of chemical research.

The chapters have been organized starting from methodological contribution connected to data representation to high-level overviews highlighting industrial impact. In Chapter 1, we present a work demonstrating the importance of incorporating three-dimensional symmetries in the context of Gaussian process regression models (statistical learning models) geared towards the interpolation of the tensorial properties of atomic-scale structures. To provide an example of the usage of machine, or statistical, learning approaches to predict material properties, we present in Chapter 2, a work focusing on the inference of hardness in naturally occurring ceramic materials, which integrates atomic and electronic features from composition directly across a wide variety of mineral compositions and crystal systems.

Machine learning is not confined to providing, high quality surrogate models, however. In Chapter 3, we review the high-dimensional neural network potentials (HDNNPs) for material simulations. These types of neural networks are a general-purpose reactive potential method that can be used for simulations of an arbitrary number of atoms and can describe all types of chemical interactions (e.g., covalent, metallic, and dispersion), including the breaking and forming of chemical bonds.

The journey about the impact of AI in chemistry would not be complete without looking into applications of neural networks and machine learning in the chemoinformatic space. In Chapter 4, we present a review of the state-of-the-art of data-driven learning systems for forward chemical reaction prediction, analyzing the reaction representations, the data and the model architectures. We will discuss the advantages and limitations of the different AI models’ strategies and make comparisons on standard open-source benchmark datasets. Chapter 5 shows how AI and machine learning are impacting the pharmaceutical industry – one of the first chemical industries to embrace AI and machine learning techniques. Here, the author provides an overview of how methods and models are conceived, built, validated and their benefits quantified.

Finally, in Chapter 6, we present a new way of doing materials discovery, by integrating natural language processing, knowledge representation, and automated reasoning. The authors present how this revolution will bring the entire chemical R&D from the current “4th paradigm” of discovery driven by data science and machine learning to a “5th paradigm” era where cognitive systems seamlessly integrate information from human experts, experimental data, physics-based models, and data-driven models to speed discovery.

We hope that this book will benefit graduate students and researchers in chemistry, computer scientists interested in applications of AI and machine learning to chemistry and scientists who are interested in understanding the application possibilities of AI and machine learning in chemistry in different environments from Universities to Industrial companies.

  • This publication has no figures.
  • This publication has no References.