
Machine Learning in Chemistry: Data-Driven Algorithms, Learning Systems, and Predictions
New Possibilities for Artificial Intelligence in Chemical Research. Artificial intelligence, and especially its application to chemistry, is an exciting and rapidly expanding area of research. This volume presents groundbreaking work in this field to facilitate researcher engagement and to serve as a solid base from which new researchers can break into this exciting and rapidly transforming field. This interdisciplinary volume will be a valuable tool for those working in cheminformatics, physical chemistry, and computational chemistry.
Title, Copyright, Foreword
This publication is free to access through this site. Learn More
Preface
Edward O. Pyzer-Knapp - and
Teodoro Laino
This publication is free to access through this site. Learn More
Atomic-Scale Representation and Statistical Learning of Tensorial Properties
Andrea Grisafi - ,
David M. Wilkins - ,
Michael J. Willatt - , and
Michele Ceriotti
This publication is free to access through this site. Learn More
This chapter discusses the importance of incorporating three-dimensional symmetries in the context of statistical learning models geared towards the interpolation of the tensorial properties of atomic-scale structures. We focus on Gaussian process regression, and in particular on the construction of structural representations, and the associated kernel functions, that are endowed with the geometric covariance properties compatible with those of the learning targets. We summarize the general formulation of such a symmetry-adapted Gaussian process regression model, and how it can be implemented based on a scheme that generalizes the popular smooth overlap of atomic positions representation. We give examples of the performance of this framework when learning the polarizability, the hyperpolarizability, and the ground-state electron density of a molecule.
Prediction of Mohs Hardness with Machine Learning Methods Using Compositional Features
Joy C. Garnett
Hardness, or the quantitative value of resistance to permanent or plastic deformation, plays a crucial role in materials design for many applications, such as ceramic coatings and abrasives. Hardness testing is an especially useful method because it is nondestructive and simple to implement and gauge the plastic properties of a material. In this study, I proposed a machine, or statistical, learning approach to predict hardness in naturally occurring ceramic materials, which integrates atomic and electronic features from composition directly across a wide variety of mineral compositions and crystal systems. First, atomic and electronic features, such as van der Waals, covalent radii, and the number of valence electrons, were extracted from composition. The results showed that this proposed method is very promising for predicting Mohs hardness with F1-scores >0.85. The dataset in this study included modeling across a larger set of materials and hardness values, which have never been predicted in previous studies. Next, feature importances were used to identify the strongest contributions of these compositional features across multiple regimes of hardness. Finally, the models that were trained on naturally occurring ceramic minerals were applied to synthetic, artificially grown single crystal ceramics.
High-Dimensional Neural Network Potentials for Atomistic Simulations
Matti Hellström - and
Jörg Behler
Machine-learning methods have become increasingly popular for describing potential energy surfaces for molecular and materials simulations, and they are even beginning to challenge the present-day dominance of force fields for this task. This chapter reviews high-dimensional neural network potentials (HDNNPs), which are a general-purpose reactive potential method that can be used for simulations of an arbitrary number of atoms, can describe all types of chemical interactions (e.g., covalent, metallic, and dispersion), and includes the breaking and forming of chemical bonds. Before an HDNNP can be applied, it must be parameterized using electronic structure data, and great care must be taken at the parameterization stage to ensure that all pertinent parts of the potential energy surface are adequately covered. Typically, this is done iteratively through the addition of more training data and refitting of parameters. This chapter illustrates these points through the use of two case studies from our recent work for aqueous NaOH solutions and the ZnO/water interface.
Data-Driven Learning Systems for Chemical Reaction Prediction: An Analysis of Recent Approaches
Philippe Schwaller - and
Teodoro Laino
One of the critical challenges in efficient synthesis route design is the accurate prediction of chemical reactivity. Unlocking it could significantly facilitate chemical synthesis and hence, accelerate the discovery of novel molecules and materials. With the current rise of artificial intelligence (AI) algorithms, access to cheap computing power, and the wide availability of chemical data, it became possible to develop entirely data-driven mathematical models able to predict chemical reactivity. Similar to how a human chemist would learn chemical reactions, those learn by repeatedly looking at examples, the underlying patterns in the data. In this chapter, we compare the state-of-the-art data-driven learning systems for forward chemical reaction prediction, analyzing the reaction representations, the data, and the model architectures. We discuss the advantages and limitations of the different AI model strategies and make comparisons on standard open-source benchmark datasets. The intention is to provide a critical assessment of the different data-driven approaches recently developed not only for the cheminformatics community, but also for the AI models end-users, the organic chemists, and for early adoption of such technologies.
Using Machine Learning To Inform Decisions in Drug Discovery: An Industry Perspective
Darren V. S. Green
Modern machine-learning techniques have powered a wave of creative approaches that aim to solve or improve long-standing productivity and attrition problems in drug discovery. While industrial practitioners are keen to embrace new technology, it is important for the community to understand the need to produce actionable decisions for scientists in the field and the implications o for how methods and models conceived, built, validated and their benefits quantified.
Cognitive Materials Discovery and Onset of the 5th Discovery Paradigm
Dmitry Y. Zubarev - and
Jed W. Pitera
The discovery of novel materials can generate immense technological, economic, and social benefits. However, these are slow, challenging, expert-intensive efforts. Our thesis is that new capabilities of cognitive computing—particularly natural language processing, knowledge representation, and automated reasoning—are poised to transform the process of materials discovery and take us from our current “4th paradigm” of discovery driven by data science and machine learning to a “5th paradigm” era where cognitive systems seamlessly integrate information from human experts, experimental data, physics-based models, and data-driven models to speed discovery. We discuss the key bottlenecks to discovery that need to be removed to enable this new approach and illustrate progress towards this cognitive future with examples from IBM research efforts as well as the broader literature.
Editors’ Biographies
Subject Index
This publication is free to access through this site. Learn More
