Molecular Representations for Machine Learning
Book logo
Book series logo

Molecular Representations for Machine Learning

Author(s):
Publication Date:
May 19, 2023
Copyright © 2023 American Chemical Society
eISBN:
‍9780841299788
DOI:
10.1021/acsinfocus.7e7006
Read Time:
five to six hours
Collection:
2
Publisher:
American Chemical Society
Google Play Store

This primer helps the reader understand the basic categories of molecular representations and provides computational tools to generate molecular descriptors in each of these categories. After reading this primer, you will be able to use various methods to generate machine and/or human interpretable representations of molecular systems for inputs to machine learning models or for general chemical data science applications.

Book series logo
Detailed Table of Contents
About the Series
Preface
Chapter 1
Introduction to Molecular Representations
1.1
Preface
1.2
Historical Background
1.2.1
The Atom Theory
1.2.2
How to Represent a Molecule?
1.3
Motivation
1.4
Properties of Molecular Representations
1.5
Conclusions
1.6
That’s a Wrap
1.7
Read These Next
1.8
Insider Q&A: Guowei Wei
Chapter 2
Graph-Based Representations
2.1
Introduction
2.2
What Is a Molecular Graph
2.3
Graphs and Matrices
2.3.1
Adjacency Matrix
2.3.2
Distance Matrix
2.3.3
Weighted Graphs
2.4
Topological Indices
2.4.1
The Wiener Index and the Hyper-Wiener Index
2.4.2
The Randić Index
2.4.3
Zagreb Indices
2.4.4
Other Common Topological Indices
2.5
Autocorrelation Functions
2.6
Structural Keys
2.6.1
Circular Fingerprints
2.6.2
Molecular ACCess Systems Fingerprint (MACCS)
2.7
SMILES Notation and Its Variants
2.7.1
SMILES
2.7.2
Popular Variants of SMILES
2.8
International Chemical Identifier (InChI)
2.9
Conclusions
2.10
That’s a Wrap
2.11
Read These Next
2.12
Insider Q&A: Guowei Wei
Chapter 3
Topology-Based Representations
3.1
Introduction
3.2
Simplicial Complexes
3.3
Persistent Homology
3.4
Capturing Persistent Homology
3.5
Comparing Persistent Homology
3.6
Persistent Homology and Machine Learning
3.6.1
Persistence Images
3.6.2
Chemically Driven Persistence Images
3.7
Conclusions
3.8
That’s a Wrap
3.9
Read These Next
3.10
Insider Q&A: Henry Adams
3.11
Insider Q&A: Gunnar Carlsson
3.12
Insider Q&A: Guowei Wei
Chapter 4
Physics-Based Representations
4.1
Introduction
4.2
Coulomb Matrices (CMs) and Derivatives
4.2.1
Coulomb Matrices (CMs)
4.2.2
Bag of Bonds (BoBs)
4.2.3
Many-Body Tensor Representation (MBTR)
4.3
Atom-Centered Symmetry Functions
4.3.1
Behler–Parrinello Atom-Centered Symmetry Functions (ACSFs)
4.3.2
Faber–Christensen–Huang–Lilienfeld (FCHL)
4.3.3
Smooth Overlap of Atomic Positions (SOAPs)
4.3.4
Permutationally Invariant Potentials (PIPs)
4.3.5
SchNet
4.4
Ab Initio Representations
4.4.1
Molecular Orbital Based-ML (MOB-ML)
4.4.2
Representations for Data-Driven Quantum Chemistry (DDQC)
4.5
Conclusions
4.6
That’s a Wrap
4.7
Read These Next
Bibliography
Footnotes
Glossary
Index
Reviewer quotes
Laura Weiler, Graduate Student, Stanford University
This text provides useful guidance for navigating the large space of possible molecular representations.
Author Info
Grier M. Jones
Grier M. Jones is a Ph.D. graduate student in the Department of Chemistry at the University of Tennessee, Knoxville. In 2018, he received his B.S. in chemistry from the College of Charleston. His current work is at the intersection of chemistry and data science, with projects related to data-driven quantum chemistry schemes and the application of topological data science in chemistry.
author image
Brittany Story
Brittany Story is a postdoctoral research associate in the National Institute for Mathematical and Biological Synthesis (NIMBioS) at the University of Tennessee, Knoxville. She obtained a Bachelors of Science in Mathematics and a Bachelor of Science Education in Mathematics from Northern Arizona University in 2017. She earned her MS degree in mathematics in 2019 and her Ph.D. in mathematics in 2022 from Colorado State University. Dr. Story’s current work looks at applying tools from topological data analysis across a range of applications such as chemistry, human-machine teaming, and machine learning.
author image
Vasileios Maroulas
Vasileios Maroulas is a Professor of Mathematics at the University of Tennessee Knoxville. He also holds adjunct appointments at the Business Analytics and Statistics at the Haslam College of Business, and the Bredesen Center’s Data Science Engineering. He is an Elected Member of the International Statistical Institute, an Editor-in-Chief of AIMS Foundations of Data Science, and an Editor of Springer Nature Statistics and Computing. He served as a Senior Research Fellow at the US Army Research Lab during 2019-2021, and a Visiting Leverhulme Trust Fellow at the University of Bath in the UK during 2013-2014. Following his PhD graduation from the Statistics Department at the University of North Carolina at Chapel Hill in 2008, he continued as a Lockheed Martin Postdoctoral Fellow at the Institute for Mathematics and its Applications (IMA) at the University of Minnesota for two years until he joined UTK as an Assistant Professor in 2010.
author image
Konstantinos D. Vogiatzis
Konstantinos D. Vogiatzis is an Associate Professor at the Department of Chemistry of the University of Tennessee. He received his bachelor’s degree in chemistry at the University of Athens, Greece, in 2006. In 2008, he obtained his MSc in Applied Molecular Spectroscopy from the University of Crete, Greece, and he received his Ph.D. in 2012 from the Karlsruhe Institute of Technology, Germany. He held post-doctoral appointments at the Institute of Nanotechnology of the Karlsruhe Institute of Technology, and at the University of Minnesota. In 2016, Dr. Vogiatzis joined the University of Tennessee, Knoxville, as an Assistant Professor of theoretical and computational chemistry and in 2021 he was early tenured and promoted to Associate Professor. Dr. Vogiatzis is the recipient of the ACS OpenEye Outstanding Junior Faculty Award for Spring 2021, the 2022 NSF CAREER award, and the 2022 Ffrancon Williams Endowed Faculty Award in Chemistry.
author image