Atomic-Scale Representation and Statistical Learning of Tensorial Properties

Atomic-Scale Representation and Statistical Learning of Tensorial Properties

  • Andrea Grisafi
    Andrea Grisafi
    Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
  • David M. Wilkins
    David M. Wilkins
    Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
  • Michael J. Willatt
    Michael J. Willatt
    Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
  • , and 
  • Michele Ceriotti *
    Michele Ceriotti
    Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
    *E-mail: [email protected]
DOI: 10.1021/bk-2019-1326.ch001
  • Free to Read
Publication Date (Web):November 20, 2019
Copyright © 2019 American Chemical Society. This publication is available under these Terms of Use.
Machine Learning in Chemistry: Data-Driven Algorithms, Learning Systems, and Predictions
Chapter 1pp 1-21
ACS Symposium SeriesVol. 1326
ISBN13: 9780841235052eISBN: 9780841235045

Chapter Views

4287

Citations

12
LEARN ABOUT THESE METRICS

Chapter Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.

Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.

PDF (3 MB)

Abstract

This chapter discusses the importance of incorporating three-dimensional symmetries in the context of statistical learning models geared towards the interpolation of the tensorial properties of atomic-scale structures. We focus on Gaussian process regression, and in particular on the construction of structural representations, and the associated kernel functions, that are endowed with the geometric covariance properties compatible with those of the learning targets. We summarize the general formulation of such a symmetry-adapted Gaussian process regression model, and how it can be implemented based on a scheme that generalizes the popular smooth overlap of atomic positions representation. We give examples of the performance of this framework when learning the polarizability, the hyperpolarizability, and the ground-state electron density of a molecule.

This publication is licensed for personal use by The American Chemical Society.

Introduction

CHAPTER SECTIONS
Jump To

The purpose of a statistical learning model is the prediction of regression targets by means of simple and easily accessible input parameters ( 1). In chemistry, physics and materials science, regression targets are usually scalars or tensors, including electronic energies ( 2, 3, 4, 5), quantum-mechanical forces ( 6, 7, 8), electronic multipoles ( 9, 10, 11), response functions and scalar fields like the electron density ( 12, 13, 14, 15, 16, 17, 18). For ground-state properties, the regression input usually consists of all the information connected with the atomic structure at a given point of the Born-Oppenheimer surface (e.g., nuclear charges and atomic positions). A more or less complex manipulation of these primitive inputs leads to what is usually called a structural descriptor, or representation ((Figure 1)).

Figure 1

Figure 1. Structural descriptors should identify unequivocally and concisely the geometry and composition of a molecule or condensed phase.
It is widely recognized that an essential ingredient for maximizing the efficiency of machine learning models is to use representations that mirror the properties one wants to predict. Here we discuss an effective approach to build linear regression models for tensors. The notion that the representation should mirror the property means when a symmetry operation is applied to an atomic structure, the associated representation should transform in a way that mimics the transformation of the properties of the structure. It should be stressed that it is completely possible to build a ML model that does not incorporate such transformation properties. The universal symmetries of the property must then be learned by the model through exposure to data in the training set, making the training process less efficient. A crucial focus of this chapter is the creation of symmetry-adapted representations. Once one has a symmetry-adapted representation at hand, the linear regression model is bound to fulfill the symmetry requirements imposed by the property ( 19, 20, 21, 22, 23). There is, however, another important consideration when building a model for tensors, expressed in terms of a Cartesian reference system. It is well known that any tensor can be decomposed into a set of spherical components that transform independently under rotations ( 24, 25). Particularly for high-order tensors, the irreducible spherical decomposition of a tensor simplifies greatly the learning task, compared to the Cartesian representation, as we will discuss later on.
The process of symmetry-adapting a representation is general but rather abstract, and for it to be practical one must choose the initial representation with care. For this purpose we use the smooth overlap of atomic positions (SOAP) framework, which is based on the representation of atom-centered environments constructed from a smooth atom density built up using Gaussians centered on each neighbor of the central atom. This density-based representation can be adapted to incorporate correlations between atoms to any order. It has been applied successfully to a vast number of ML investigations for physical properties of atomic structures ( 26, 27, 28). After summarizing the derivation and efficient implementation of an extension to SOAP, called λ-SOAP, which is particularly well-suited to the learning of tensorial properties, we present a few examples to demonstrate its effectiveness for this task.

Linear Regression

CHAPTER SECTIONS
Jump To

Suppose one wanted to build a linear regression model to predict a scalar property y(X) for an input X,
In this equation |w〉 represents the weight vector we wish to learn and |X〉 is a representation of the input. The usual approach for learning the weight vector is to suppose the properties are independently and normally distributed, that is,
One then maximizes the log likelihood of a set of N observations {yn} with respect to the weight vector. The log likelihood (loss or cost function) is given by
where the regularizer α2w|w〉 appears if one introduces a Gaussian prior on w with variance α−2. L(w) attains its maximum at
where the covariance Ĉ is
and η = α/σ.
The preceding linear regression scheme in which one handles the representation |X〉 explicitly is often called the primal formulation. There is in fact another, complementary formulation called the dual (kernel ridge regression [KRR] or Gaussian process regression [GPR]) in which the equations take a slightly different form. In the dual, one does not handle the representation explicitly but rather introduces a kernel function which, roughly speaking, measures the similarity between two inputs. The link between the primal and dual lies in the observation that a positive-definite kernel k(X,X′) can always be written as an inner product ( 1),
This means that given a kernel one can always construct a representation and vice versa. From the perspective of GPR, the kernel is interpreted as the covariance between the properties of its two arguments,
The properties are assumed to be normally distributed, which means one can straightforwardly find the conditional distribution of the property y(X) given a set of observations in a training set {yn}. The mean of this distribution is given by
where the jth component of k(X) is k(X, Xj), Kjk = k(Xj, Xk) and y is a vector formed from {yn}.
When the feature space associated with a kernel is known explicitly, and finite-dimensional, the primal and dual formulations are formally equivalent, and the choice of which to use is an important but purely practical question. Constructing a primal model requires inversion of the covariance matrix, while the dual requires inversion of the kernel matrix K. If the feature space (i.e., the space occupied by the representation) is larger than the training set then the GPR approach is more convenient. Of course, the real utility of the kernel trick becomes apparent when the kernel is a complex, non-linear function for which the feature space is unknown and/or infinite-dimensional. In these circumstances, working in the dual makes it possible to formulate regression as a linear problem, where reference configurations (or a sparse set of representative states) is used to define a basis for the target, as in the right hand side of eq (8). As such, all the complexity of the input space representation is contained in the definition of the kernel function.

Tensors, Symmetries and Correlations

CHAPTER SECTIONS
Jump To

The previous discussion defines the general architecture of regression models which can be used to predict any scalar quantity associated with the molecular geometry. We now discuss the implications of learning tensors, or, similarly, any quantity that is not invariant under a rigid rotation or reflection of the atomic structure. In so doing, we will introduce a formalism which is general enough to encompass both proper Cartesian tensors, such as molecular polarizabilities, and three-dimensional scalar fields that can be conveniently decomposed in atom-centered contributions, such as the ground-state charge density of a molecule.
Let us start by considering the prototypical case of a Cartesian tensor yyαβ... of rank r, with the combination of indices {αβ...} running over a number of Cartesian components equal to 3r. Given any arbitrary distorted atomic structure with no particular internal symmetry, we are interested in characterizing the transformations of the tensor under only three families of symmetry operations (viz., translations, rotations and reflections). Since these symmetry operations do not affect the internal geometry of an atomic structure, we can think equivalently in terms of active transformations, in which the system undergoes the symmetry operation and the reference frame remains fixed, or in terms of passive transformations, in which the reference frame undergoes the symmetry operation and the system remains fixed. In the following, we summarize the symmetry operations by adopting an active picture and assume the system is not subjected to an external field.

Translations

Any physical property of an atomic structure X remains unchanged under a rigid translation of atomic positions, that is,

Rotations

Under the application of a rigid rotation to an atomic structure X, we assume that each Cartesian component of the tensor undergoes a covariant linear transformation. Using Einstein notation for convenience, and representing by R the rotation matrix corresponding to , the rotated tensor is

Reflections

Applying a reflection operator to an atomic structure X through any mirror plane leads to the following reflected tensor,

Covariant Descriptors

CHAPTER SECTIONS
Jump To

In general terms, a primitive representation that mirrors a tensor of a given rank r could formally be built by considering
where |X〉 is an arbitrary description of the system, while |α〉 represents a set of Cartesian axes which is rigidly attached to the system. When using this primitive representation in a linear regression model, the tensor component corresponding to αβ... would be
or
After maximizing the log likelihood, the former possibility leads to a model that predicts every component to be the same, while the latter ignores the known correlations between the components and is therefore likely to overfit. For example, consider a training set in which only one of the tensor components is non-zero. All but one of the regression weights {|wαβ...〉} would be driven towards zero to maximize the log likelihood, so the trained model would only predict a finite value for the component it had been explicitly exposed to in the training set. The model would therefore incorrectly predict the tensor components for a structure differing only by a rigid rotation from one in the training set.
To address these problems, one should adapt the primitive descriptor so that it fulfills each of the symmetries detailed in eqs ((9)-(11)). Since the Cartesian basis vectors are invariant under translations, eq (9) implies the core representation should itself be invariant under translations. Using Haar integration one can construct a core representation that is invariant under translations by integrating an arbitrary representation over the translation operator ( 29). One can then proceed to consider covariance under SO(3) group operations. Eq (10) implies that a covariant representation for |Xαβ...〉 should satisfy the invariance relationship
for any rotation . Starting from the primitive definition of eq (12), there are a variety of ways to enforce this invariance relationship. One possibility is to use
where the operator X is defined to rotate X into a specified orientation which is common to all the molecules of the dataset ((Figure 2)).

Figure 2

Figure 2. Provided that one can define a local reference system, it is possible to learn tensorial properties by aligning each molecule (or environment) into a fixed reference frame.
This works under the assumption that it is always possible to define a unique (and therefore unambiguous) internal reference frame to rotate X into a specified orientation, which might be possible when the system involved has a particularly rigid internal structure. A more general strategy, which does not require any assumption on the molecular geometry to be made, consists in considering the covariant integration over the operator (Haar integration),
On the top of this definition, the requirement that a representation be covariant in O(3), including the reflection symmetry of the tensor as in eq (11), means that improper rotations must be included (i.e., O(3) = SO(3) × {Î, }) with representing a reflection operator. This this is done by a simple linear combination of the SO(3) descriptor with its reflected counterpart with respect to any arbitrary mirror plane of the system; that is,
Any other reflection operation can be automatically included by having made the descriptor covariant under rotations.

Covariant Regression

CHAPTER SECTIONS
Jump To

Having shown how to build a symmetry-adapted representation of the system, let us see the implications of this procedure for linear regression. Using a symmetry-adapted representation in a linear regression model leads to the following solution for the regression weight,
where the covariance is
Note that the solution for the linear regression weight does not change when the training structures and corresponding tensors simultaneously undergo a symmetry operation that the representation has been adapted to. In other words, the same model results regardless of the arbitrary orientation of structures in the training set.
When moving to the dual, we find the kernel to be
This result corresponds to
As stressed earlier, performing the linear regression in the dual using this kernel leads to a formally-equivalent model to that resulting from the primal formulation described above, yet this kernel appears to be more complicated than a symmetry-adapted descriptor since it involves two integrations over rotations. If, however, we assume the core representation |X〉 undergoes a unitary transformation when the system is rotated,
the kernel reduces to
where k(X, X′) = (X|X′) is the kernel corresponding to the core representation. The requirement that the core representation should undergo a unitary transformation when the system is rotated is reasonable since, if it were not true, the autocorrelation k(X, X) would depend on the absolute orientation of X, which is unphysical given our assumption of the absence of external fields. Note that upon defining a collective tensorial index {αβ...}, a kernel matrix of size 3rN × 3rN can be constructed by stacking together each of the 3r × 3r vector-valued correlation functions. Then, a covariant tensorial prediction of the property of interest can eventually be carried out according to the GPR prescription of eq (8). It should be noted that the symmetry-adapted kernel of eq (24) is just a generalization of the covariant kernels that have been introduced in Glielmo et al. ( 7) to learn forces. Taking scalar products of symmetry-adapted representations provide a route to design easy-to-compute covariant kernels for tensors of arbitrary order.
It is instructive to compare the symmetry-adapted kernel definition of eq (24) to the kernel that one gets from the aligned descriptors of eq (16). In this case, building a kernel function on the top of this descriptor effectively means carrying out the structural comparison in a common reference frame where the two molecules are mutually aligned. One can then conveniently learn the tensor of interest component-by-component through a much simpler scalar regression framework. For the simple case of rank-1 tensors, for instance, we would get,
where we have defined the best alignment operator as . This strategy has been successfully used in the learning of electronic multipoles of organic molecules as well as for predicting optical response functions of water molecules in their liquid environments ( 10, 12). For the latter example, a representation of the best-alignment structural comparison is reported in (Figure 3).
This method for tensor learning has the clear drawback of relying on the definition of a rigid molecular geometry, for which an internal reference frame can be effectively used to perform the procedure of best alignment. Following this line of thought, the availability of a covariant kernel function allows us to implicitly carry out both the structural comparison and the geometric alignment of two molecules simultaneously, neglecting any prior consideration about the internal structure of the molecule at hand.

Figure 3

Figure 3. Representation of the reciprocal alignment between water environments.

Spherical Representation

CHAPTER SECTIONS
Jump To

The family of Cartesian symmetry-adapted descriptors previously introduced can be effectively used, in principle, to predict any Cartesian tensor of arbitrary order. However, we should notice that having a tensor product for each additional Cartesian axis makes the cost of the regression scale unfavorably with the tensor order, producing a global kernel matrix of dimension (3r)2. In fact, it is well established that a more natural representation of Cartesian tensors is given by their irreducible spherical components (ISC) ( 25). As described in Stone ( 25), the transformation matrix from Cartesian to spherical tensors can be found recursively, starting from the known transformation for rank-2 tensors.
Upon trivial manipulations, that might account for the non-symmetric nature of the tensor, each ISC transforms separately as spherical harmonics . Spherical harmonics form a complete basis set of the SO(3) group. In particular, each λ-component of the tensor spans an orthogonal subspace of dimension 2λ + 1. For instance, the 9 components of a rank-2 tensor separate out into a term (proportional to the trace) that transforms like a scalar, three terms that transform like , and five terms that transform like . When using a spherical representation, the kernel matrix is block diagonal, which greatly reduces the number of non-zero entries, and makes it possible to learn separately the different components. An additional advantage is that the possible symmetry of the tensor can be naturally incorporated by retaining only the spherical components λ that have the same parity as the tensor rank r. For instance, the λ = 1 component of a symmetric rank-2 tensor vanishes identically, meaning that only the 6 surviving elements of the tensor need to be considered when doing the regression. Especially for high rank tensors, this property means that the number of components can be cut down significantly.
In light of the discussion carried out for Cartesian tensors, it is straightforward to realize how a symmetry-adapted descriptor that transforms covariantly with spherical harmonics of order λ should look. Since each ISC is effectively a vector of dimension 2λ + 1, we can first write a primitive spherical harmonic representation as
where |λµ〉 is an angular momentum state of order λ, such that . Its symmetry-adapted counterpart, which is covariant in SO(3), is
Finally, since the parity of |λµ〉 with respect to the inversion operator î is determined by λ, a spherical tensor descriptor that is covariant in O(3) can be obtained by considering
Note that a tensorial kernel function built on the top of this descriptor would transform under rotations as the Wigner-D matrix of order λ, :
In addition to being the most natural strategy to perform the regression of Cartesian tensors, using a representation like that of eq (28) comes in handy when building regression models for the many physical properties that can be decomposed in a basis of atom-centered spherical harmonics. In the following sections, we will give an example of this kind by predicting the ground-state electronic charge density of molecular systems.

SOAP Representation

CHAPTER SECTIONS
Jump To

We now proceed to characterize the exact functional form of a symmetry-adapted representation of order λ which can be used to carry out a covariant prediction of any property that transforms as a spherical harmonic. In the section above, it was pointed out that, within a framework of linear regression, both the primal and the dual formulation can be adopted to actually implement the interpolation of a given tensorial property. In what follows, however, we will focus our attention on the dual formulation, discussing in parallel the feature vector associated with the λ-SOAP representation and the corresponding kernel function. This choice is justified by the greater flexibility of the kernel formulation, allowing a non-linear extension of the framework as discussed below.
An atom-centered environment Xj describes the set of atoms that are included within a spherical cutoff rcut around the central atom j. We will label as |Xj〉 the abstract vector which describes the local structure. A convenient definition of |Xj〉 in real space can be obtained by writing a smooth probability amplitude, for each atomic species α, as a superposition of Gaussians with spread σ that are centered on the positions {ri} of the atoms that surround the central atom j:
This definition descends naturally from the requirement of translational invariance of a representation of the entire structure and corresponds to the construction that is used in Bartok et al. ( 21) to define the SOAP kernel ( 29). Formally, one can then write
with the ket |α〉 tagging the identity of each species. Even though it might be convenient to use a lower-dimensional chemical space ( 30), particularly when building models for dataset containing many elements, in what follows we will assume that each element is associated with an orthogonal subspace (i.e., 〈α|β〉 = δαβ). This implies that, when using this representation to define a scalar-product kernel, only the density distributions of the same atomic type are overlapped,
With this choice, the two adjustable parameters rcut and σ determine respectively the range and the resolution of the representation. To simplify the notation, we will omit the α labels, assuming that a single element is present. The extension to the case with multiple chemical species follows straightforwardly.

λ-SOAP(1) Representation

CHAPTER SECTIONS
Jump To

To the first order in structural correlations, including the environmental state |Xj〉 in the definition of a local symmetry-adapted descriptor of order λ reads
The real space representation of can be understood as a rotational average of the environmental density which is rigidly attached to a spherical harmonic of order λ,
A more concise, and easily-computed version of this representation results from projecting on a basis of spherical harmonics, in which the integral over rotations can be performed analytically,
It is clear that many of the indices in this representation are redundant, and would have no effect when taking an inner product between two such representations. The most concise form that produces the same scalar product kernel as eq (34) is
where we introduced the spherical density component
This ket corresponds to the kernel
which is straightforward to calculate using a quadrature in r or an expansion on a radial basis.
It is insightful to consider the explicit expression for eq (36) in terms of the atom density. Taking for instance the case of λ = 1, µ = 0, for which :
One sees that the 2-body λ-SOAP representation corresponds to moments of the smooth atom density, resolved over different shells around the central atom. A linear model built on these features can respond to changes in the atomic density at different distances, simultaneously adapting the magnitude and geometric orientation of the target property.

λ-SOAP(2) Representation

CHAPTER SECTIONS
Jump To

Describing an atomic environment in a way that goes beyond the two-body structural correlations (ν > 1) is of fundamental importance, because information on distances alone is not sufficient to uniquely determine an atomic structure. Building on the definition of eq (33), and on the symmetrized-atom-density framework of Willatt et al. ( 29), this can be achieved by introducing an additional tensor product in the environmental state |Xj〉 within the rotational average,
By projecting on a real-space basis, the representation becomes
Similarly to the ν = 1 case, one can compute the ket without an explicit rotational average by projecting on a basis of spherical harmonics,
where the parentheses denote a Wigner 3j symbol. Just as for the λ-SOAP(1) case considered earlier, it is clear that many of the indices in this expression are redundant. When taking an inner product between two such representations, one can use orthogonality of Wigner 3j symbols to simplify to an inner product between two objects with the following form,
The Clebsch-Gordan coefficient 〈lk, lk′|λµ〉 has the role of combining two angular momentum components of the atomic environment Xj to be compatible with the spherical tensor order λ. This contains all the essential information of the abstract representation that is needed for λ-SOAP(2) linear regression. Note that 〈lk, lk'|λµ〉 is zero unless k + k′ = µ, that the indices l, l' and λ must satisfy the inequality |ll'| ≤ λl + l' and that the representation is invariant under transposition of r and r'.
Let us see how the representation changes under inversion. Given the parity of the spherical harmonics,
it follows that
This condition implies that a representation that is covariant in O(3), eq (28), can be easily obtained by retaining only the components of the feature vectors for which l + l′ +λ is even.
In practice it is often more convenient to use real spherical harmonics instead of |λν〉 in the representation. Using real spherical harmonics ensures the kernel is purely real, but the components of the representation need not be because the phases are unimportant. In fact, what one finds upon replacing |λµ〉 with a real spherical harmonic is that the components are either purely real or purely imaginary, depending on whether l + l′ + λ is even or odd. For example, the representation for µ > 0 becomes
which satisfies
and the same relation also holds for the other real spherical harmonics for ν < 0 and |λ0〉 for ν = 0). One can therefore discard all imaginary components of the representation to enforce inversion invariance.
Generalization of this procedure to higher orders of λ-SOAP is tedious but straightforward using well-known formulae for integrals of products of Wigner-D matrices over rotations.

Non-linearity

CHAPTER SECTIONS
Jump To

As already mentioned in the introduction, a crucial aspect to improve regression performance is to incorporate non-linearities in the construction of the representation. For instance, tensor products of the scalar representation introduce higher body order correlations, in a way that can be easily implemented in a kernel framework by raising the kernel to an integer power ( 29). When working with tensorial representations, however, one has to be careful to avoid breaking the covariant transformation properties of the feature vector. Taking products of kets would require re-projecting the product onto the irreducible representations of the group, which would be as cumbersome as increasing the body order exponent ν. One obvious solution to this problem is to multiply the spherical kernel of order λ by its scalar and rotationally invariant counterpart, which can then be raised to an integer power ζ without breaking the tensorial nature of the kernel. For any generic order ν and ν′ in structural correlations, this procedure consists in considering the tensor product
which leads to the kernel definition
For ζ = 1, one recovers the original tensorial kernel, while a non-linear behavior is introduced for ζ > 1. A considerable improvement of the learning power is usually obtained when using ζ = 2, while negligible further improvement is observed for ζ > 2.
These considerations also apply to the use of fully non-linear ML models like a neural network. To guarantee that the prediction of the model is consistent with the group covariances, the tensorial λ-SOAP features must enter the network at the last layer, and all the previous non-linear layers can only contribute to different linear combinations of the tensorial features, for example,
where each of the frr′ll′ can be an arbitrary non-linear combination of the scalar SOAP features Similar ideas have already been implemented in the context of generalizing the construction of spherical convolutional neural networks ( 31).

Implementation

CHAPTER SECTIONS
Jump To

In the previous discussion it was pointed out that beyond the formal definition of the structural descriptor in real space, the kernel evaluation eventually requires the computation of the SOAP density power spectrum . In turn, computing this quantity requires the evaluation of the density expansion coefficients 〈rlm|Xj〉. In practice, the continuous variable r can be replaced by an expansion over a discrete set of orthogonal radial functions Rn(r) that are defined within the spherical cutoff rcut. For this reason, we will refer, from now on, to the density expansion coefficients as 〈nlm|Xj〉.
Having represented the environmental density distribution as a superposition of Gaussian functions centered on each atom, the spherical harmonics projection can be carried out analytically ( 32), leading to:
where the sum over i runs over the neighboring atoms of a given chemical element, and ιl represents a modified spherical Bessel function of the first kind. Under suitable choices of the functions Rn(r), the radial integration can also be carried out analytically, too.
One possibility is to start with non-orthogonal Gaussian type functions, k(r), reminiscent of Gaussian-type orbitals commonly used in quantum chemistry:
where Nk is a normalization factor, such that . The set of Gaussian widths {σk} can be chosen to effectively span the radial interval involved in the environment definition. For instance, one can take , obtaining functions that have equally-spaced peaks between 0 and rcut. The explicit functional form of the primitive radial integrals is
where Γ is the Gamma function, while 1F1 is the confluent hypergeometric function of the first kind. These primitive integrals can be finally orthogonalized by applying the orthogonalization matrix S−1/2, with S representing the overlap matrix between primitive functions,
for which well-known analytical expressions exist ( 33).

Examples

CHAPTER SECTIONS
Jump To

In this section, the effectiveness of a KRR model that is adapted to the fundamental physical symmetries of the target is demonstrated, considering two very different quantities as examples. As a first example, we consider the prediction of the dielectric response series of the Zundel cation, when training a λ-SOAP(2) regression model on the ISC of the tensors at hand. In the second example, we show how to predict the charge density ρ(r) of a small, yet flexible, hydrocarbon molecule like butane, by decomposing ρ(r) into atom-centered spherical harmonics. In both cases, a comparison of the prediction performance is carried out between λ-SOAP(2) descriptors that are covariant in SO(3), which were used in previous work, and those that have been made fully O(3) compliant by symmetrization over î.

Dielectric Response Series

CHAPTER SECTIONS
Jump To

Consider the dielectric response series of a molecule including the dipole µ, the polarizability α and the hyperpolarizability β. The latter, for instance, is a rank-3 tensor describing the third-order response of the molecular energy U with respect to an applied electric field E, with components. By construction this tensor is symmetric, meaning that it can be decomposed into two spherical components, 3 of λ = 1 symmetry and 7 of λ = 3 symmetry. The total number of components to be learned is thus 10, consistently with the number of non-equivalent components of the Cartesian tensor. The dataset is made of 1000 configurations, of which 800 are randomly selected to train the regression model, while the remaining 200 are using to test the prediction performances. λ-SOAP(2) kernels that are adapted to SO(3) and O(3) group symmetry were constructed using a Gaussian smearing of σ = 0.3 Å and an environment cutoff of rcut = 4.0 Å.
The performances of each independent learning exercises (λ = 1, 2, 3) are reported in (Figure 4). For all the spherical components we observe a systematic improvement of the regression when endowing the kernel with the inversion symmetry about the atomic centers.

Figure 4

Figure 4. Learning curves of the Zundel cation dielectric response series µ,α and β as decomposed in their anisotropic (λ > 0) spherical tensor components. Full and dashed lines refer to predictions that are carried out with λ-SOAP kernel functions that are covariant in SO(3) and O(3) respectively.
This improvement is particularly pronounced for few training points, while it becomes less relevant for larger training set sizes, where the symmetry under inversion is eventually learned by the SO(3)-kernel as well.

Electronic Charge Densities

CHAPTER SECTIONS
Jump To

Another learning task that can benefit from a symmetry-adapted regression scheme involves the learning of scalar fields such as the electron charge density. ML models for the charge density have been proposed based on the coefficients in a plane wave basis–this is convenient due to orthogonality, but leads to poor transferability when considering flexible molecules, or learning across different molecular species–or based on direct prediction of the density on a real-space grid ( 16, 17, 34). By expanding the density on an atom-centred basis set, composed of radial functions multiplied by spherical harmonics,
one obtains a model that is localized and transferable, concise, and easily integrated with the many electronic structure codes that are based on atom-centered basis functions. The coefficients in the expansion transform under rotations like spherical harmonics, and can therefore be learned efficiently using a symmetry adapted GPR model,
where the sum runs over a set of reference environments Zi centered around atoms of the same kind as i, and the weights are computed by a regression procedure that is complicated by the fact that the basis set is not orthogonal ( 18).

Figure 5

Figure 5. Learning curves of the predicted charge density of 200 randomly selected butane molecules, when considering up to 800 reference molecules to train the model. The molecular geometries and computational details are the same as in Grisafi et al. ( 18) The black full line refers to the prediction error as reported in Grisafi et al. ( 18) Blue lines refer to the result obtained with the RI-cc-pV5Z basis, both with a λ-SOAP(2) descriptor covariant in SO (3) (full) and O(3) (dashed). Dotted lines refer to the basis set error. In both cases, 100 reference atomic environments have been used to define the problem dimensionality.
In (Figure 5) we report the result obtained for a dataset of butane molecules (C4H10), for which 1000 reference pseudo-valence densities have been computed at the DFT/PBE level. The dimensionality of the regression problem is defined by considering the 100 most diverse atomic environments, out of a total of 14,000, selected by farthest point sampling through the 0-SOAP(2) distance metric ( 35). Given that in our previous work the learning performance was essentially limited by the basis set expansion error for the density, we decided to compare the optimized basis set used in Grisafi et al. ( 18) with a resolution of the identity (RI) basis set, usually adopted in the context of avoiding the computation of the four-center Hartree integral in electronic structure theory ( 36). When considering in particular the RI-cc-pV5Z basis, which accounts for basis functions up to l = 4 of angular momentum, we find that the basis set decomposition error is almost halved (~0.6%) with respect to Grisafi et al. ( 18), as shown by the asymptotic convergence in (Figure 5). The figure also compares, in the case of the RI basis, the learning performances associated with λ-SOAP(2) descriptors that have been made covariant in SO(3) and O(3) respectively. As seen for the case of polarizability, the O(3) features improve, although only slightly, the prediction accuracy. The improvement is more substantial at the smallest training set size, where the incorporation of prior knowledge on the symmetries of the system can make up for the scarcity of data.

Conclusions

CHAPTER SECTIONS
Jump To

The previous examples show how statistical learning of a tensorial quantity across the configurational space of atomic coordinates and composition represents a challenging methodological task which requires considerable modifications to the architecture of more familiar scalar learning models. The efficiency of a regression model benefits greatly from the incorporation of symmetry, as it effectively reduces the dimensionality of the space in which the algorithm is asked to interpolate the values of the target property. Symmetry of tensorial quantities should be included in two distinct ways. First, one should decompose the tensor into ISC, so as to minimize the amount of information that is needed to account for geometric covariance. Particularly for high-rank Cartesian tensors, the matrix of correlations between tensor elements can be made block diagonal, which reflects on the size and complexity of the associated kernel matrices. Second, by constructing representations of the molecular structure that are made isomorphic with the tensor of interest, one can obtain a linear basis that satisfies the expected covariant transformations. An important aspect to consider is that, in order to preserve the properties of the symmetry-matched basis, non-linearities have to be treated with care. We discuss how it is possible to do so in the context of KRR models, and how one should proceed to design a covariant neural network that can be used to efficiently accomplish a symmetry-adapted regression task.
We discuss a practical implementation of these ideas within the framework of the SOAP representations, that uses a spherical-harmonics representation of the atom density and is therefore particularly well-suited to incorporate SO(3) covariance. We discuss an extension, that we refer to as λ-SOAP, that provides a natural linear basis to regress quantities that transform like spherical harmonics, and can be made to represent arbitrarily high body-order correlations between atomic coordinates. As an original result of this work, we also discuss how to satisfy the inversion symmetry of the tensor, showing that representations that incorporate the full O(3) covariances improve the performance of the ML model, particularly in the limit of a small training set. We also show an example of the use of λ-SOAP representations to learn a scalar field in three-dimension as a sum of atom-centered contributions, choosing the electron density as a physically relevant example. We believe that this strategy–although more complex than alternatives that use orthogonal basis functions or a real-space grid–has the best promise to be transferable across different systems, and to be combined with standard electronic structure packages.

Acknowledgments

CHAPTER SECTIONS
Jump To

The Authors acknowledge support by the European Research Council under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 677013-HBMAP).

References

CHAPTER SECTIONS
Jump To

This chapter references 36 other publications.

  1. 1
    Williams, C. K. I.; Rasmussen, C. E. Gaussian Processes for Machine Learning; MIT Press, 2006.
  2. 2
    Bartók A. P. Payne M. C. Kondor R. Csányi G. Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons Phys. Rev. Lett. 2010 104 136403
  3. 3
    Jain A. Ong S. P. Hautier G. Chen W. Richards W. D. Dacek S. Cholia S. Gunter D. Skinner D. Ceder G. Persson K. A. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation APL Mater. 2013 1 011002
  4. 4
    Calderon C. E. Plata J. J. Toher C. Oses C. Levy O. Fornari M. Natan A. Mehl M. J. Hart G. Nardelli M. B. Curtarolo S. The AFLOW standard for high-throughput materials science calculations Comput. Mater. Sci. 2015 108 233 238
  5. 5
    Ward L. Wolverton C. Atomistic calculations and materials informatics: A review Curr. Opin. Solid State Mater. Sci. 2017 21 167 176
  6. 6
    Li Z. Kermode J. R. De Vita A. Molecular Dynamics with On-the-Fly Machine Learning of Quantum-Mechanical Forces Phys. Rev. Lett. 2015 114 096405
  7. 7
    Glielmo A. Sollich P. De Vita A. Accurate interatomic force fields via machine learning with covariant kernels Phys. Rev. B 2017 95 214302
  8. 8
    Glielmo A. Zeni C. De Vita A. Efficient nonparametric n-body force fields from machine learning Phys. Rev. B 2018 97 184307
  9. 9
    Yuan Y. Mills M. J. Popelier P. L. Multipolar electrostatics based on the Kriging machine learning method: an application to serine J. Mol. Model. 2014 20 2172
  10. 10
    Bereau T. Andrienko D. von Lilienfeld O. A. Transferable Atomic Multipole Machine Learning Models for Small Organic Molecules J. Chem. Theory Comput. 2015 11 3225 3233
  11. 11
    Bereau T. DiStasio R. A. Tkatchenko A. von Lilienfeld O. A. Non-covalent interactions across organic and biological subsets of chemical space: Physics-based potentials parametrized from machine learning J. Chem. Phys. 2018 148 241706
  12. 12
    Liang C. Tocci G. Wilkins D. M. Grisafi A. Roke S. Ceriotti M. Solvent fluctuations and nuclear quantum effects modulate the molecular hyperpolarizability of water Phys. Rev. B 2017 96 041407
  13. 13
    Grisafi A. Wilkins D. M. Csányi G. Ceriotti M. Symmetry-Adapted Machine Learning for Tensorial Properties of Atomistic Systems Phys. Rev. Lett. 2018 120 036002
  14. 14
    Wilkins D. M. Grisafi A. Yang Y. Lao K. U. DiStasio R. A. Ceriotti M. Accurate molecular polarizabilities with coupled cluster theory and machine learning Proc. Natl. Acad. Sci. 2019 116 3401 3406
  15. 15
    Christensen A. S. Faber F. A. von Lilienfeld O. A. Operators in quantum machine learning: Response properties in chemical space J. Chem. Phys. 2019 150 064105
  16. 16
    Brockherde F. Vogt L. Li L. Tuckerman M. E. Burke K. Mu¨ller K.-R. Bypassing the Kohn-Sham equations with machine learning Nat. Commun. 2017 8 872
  17. 17
    Alred J. M. Bets K. V. Xie Y. Yakobson B. I. Machine learning electron density in sulfur crosslinked carbon nanotubes Compos. Sci. Technol. 2018 166 3 9
  18. 18
    Grisafi A. Fabrizio A. Meyer B. Wilkins D. M. Corminboeuf C. Ceriotti M. Transferable Machine-Learning Model of the Electron Density ACS Cent. Sci. 2019 5 57 64
  19. 19
    Braams B. J. Bowman J. M. Permutationally invariant potential energy surfaces in high dimensionality Int. Rev. Phys. Chem. 2009 28 577 606
  20. 20
    Behler J. Parrinello M. Generalized Neural-Network Representation of High-Dimensional Potential-Energy Surfaces Phys. Rev. Lett. 2007 98 146401
  21. 21
    Bartók A. P. Kondor R. Csányi G. On representing chemical environments Phys. Rev. B 2013 87 184115
  22. 22
    Shapeev A. Moment Tensor Potentials: A Class of Systematically Improvable Interatomic Potentials Multiscale Model. Sim. 2016 14 1153 1173
  23. 23
    Zhang L. Han J. Wang H. Car R. E W. Deep Potential Molecular Dynamics: A Scalable Model with the Accuracy of Quantum Mechanics Phys. Rev. Lett. 2018 120 143001
  24. 24
    Weinert U. Spherical tensor representation Arch. Ration. Mech. Anal. 1980 74 165 196
  25. 25
    Stone A. J. Transformation between cartesian and spherical tensors Mol. Phys. 1975 29 1461 1471
  26. 26
    De S. Bartók A. P. Csányi G. Ceriotti M. Comparing molecules and solids across structural and alchemical space Phys. Chem. Chem. Phys. 2016 18 13754 13769
  27. 27
    Musil F. De S. Yang J. Campbell J. E. J. Day G. G. M. Ceriotti M. Machine learning for the structure-energy-property landscapes of molecular crystals Chem. Sci. 2018 9 1289 1300
  28. 28
    Bartók A. P. De S. Poelking C. Bernstein N. Kermode J. R. Csányi G. Ceriotti M. Machine learning unifies the modeling of materials and molecules Sci. Adv. 2017 3
  29. 29
    Willatt M. J. Musil F. Ceriotti M. Atom-density representations for machine learning J. Chem. Phys. 2019 150 154110
  30. 30
    Willatt M. J. Musil F. Ceriotti M. Feature Optimization for Atomistic Machine Learning Yields a Data-Driven Construction of the Periodic Table of the Elements Phys. Chem. Chem. Phys. 2018 20 29661 29668
  31. 31
    Kondor R. Zhen L. Trivedi S. Clebsch-Gordan Nets: a Fully Fourier Space Spherical Convolutional Neural Network arXiv:1806.09231 2018
  32. 32
    Kaufmann K. Baumeister W. Single-centre expansion of Gaussian basis functions and the angular decomposition of their overlap integrals J. Phys. B: At. Mol. Opt. 1989 22 1
  33. 33
    Gradshteyn, I. S.; Ryzhik, I. M. Table of integrals, series, and products, 7th ed.; Elsevier/Academic Press, Amsterdam, 2007; pp xlviii+1171, Translated from the Russian, Translation edited and with a preface by Alan Jeffrey and Daniel Zwillinger, With one CD-ROM (Windows, Macintosh and UNIX).
  34. 34
    Chandrasekaran A. Kamal D. Batra R. Kim C. Chen L. Ramprasad R. Solving the electronic structure problem with machine learning Npj Comput. Mater. 2019 5 22
  35. 35
    Ceriotti M. Tribello G. A. Parrinello M. Demonstrating the Transferability and the Descriptive Power of Sketch-Map J. Chem. Theory Comput. 2013 9 1521 1532
  36. 36
    Hättig C. Optimization of auxiliary basis sets for RI-MP2 and RI-CC2 calculations: Corevalence and quintuple- basis sets for H to Ar and QZVPP basis sets for Li to Kr Phys. Chem. Chem. Phys. 2005 7 59 66
  • Figure 1

    Figure 1. Structural descriptors should identify unequivocally and concisely the geometry and composition of a molecule or condensed phase.

    Figure 2

    Figure 2. Provided that one can define a local reference system, it is possible to learn tensorial properties by aligning each molecule (or environment) into a fixed reference frame.

    Figure 3

    Figure 3. Representation of the reciprocal alignment between water environments.

    Figure 4

    Figure 4. Learning curves of the Zundel cation dielectric response series µ,α and β as decomposed in their anisotropic (λ > 0) spherical tensor components. Full and dashed lines refer to predictions that are carried out with λ-SOAP kernel functions that are covariant in SO(3) and O(3) respectively.

    Figure 5

    Figure 5. Learning curves of the predicted charge density of 200 randomly selected butane molecules, when considering up to 800 reference molecules to train the model. The molecular geometries and computational details are the same as in Grisafi et al. ( 18) The black full line refers to the prediction error as reported in Grisafi et al. ( 18) Blue lines refer to the result obtained with the RI-cc-pV5Z basis, both with a λ-SOAP(2) descriptor covariant in SO (3) (full) and O(3) (dashed). Dotted lines refer to the basis set error. In both cases, 100 reference atomic environments have been used to define the problem dimensionality.
  • References

    CHAPTER SECTIONS
    Jump To

    This chapter references 36 other publications.

    1. 1
      Williams, C. K. I.; Rasmussen, C. E. Gaussian Processes for Machine Learning; MIT Press, 2006.
    2. 2
      Bartók A. P. Payne M. C. Kondor R. Csányi G. Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons Phys. Rev. Lett. 2010 104 136403
    3. 3
      Jain A. Ong S. P. Hautier G. Chen W. Richards W. D. Dacek S. Cholia S. Gunter D. Skinner D. Ceder G. Persson K. A. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation APL Mater. 2013 1 011002
    4. 4
      Calderon C. E. Plata J. J. Toher C. Oses C. Levy O. Fornari M. Natan A. Mehl M. J. Hart G. Nardelli M. B. Curtarolo S. The AFLOW standard for high-throughput materials science calculations Comput. Mater. Sci. 2015 108 233 238
    5. 5
      Ward L. Wolverton C. Atomistic calculations and materials informatics: A review Curr. Opin. Solid State Mater. Sci. 2017 21 167 176
    6. 6
      Li Z. Kermode J. R. De Vita A. Molecular Dynamics with On-the-Fly Machine Learning of Quantum-Mechanical Forces Phys. Rev. Lett. 2015 114 096405
    7. 7
      Glielmo A. Sollich P. De Vita A. Accurate interatomic force fields via machine learning with covariant kernels Phys. Rev. B 2017 95 214302
    8. 8
      Glielmo A. Zeni C. De Vita A. Efficient nonparametric n-body force fields from machine learning Phys. Rev. B 2018 97 184307
    9. 9
      Yuan Y. Mills M. J. Popelier P. L. Multipolar electrostatics based on the Kriging machine learning method: an application to serine J. Mol. Model. 2014 20 2172
    10. 10
      Bereau T. Andrienko D. von Lilienfeld O. A. Transferable Atomic Multipole Machine Learning Models for Small Organic Molecules J. Chem. Theory Comput. 2015 11 3225 3233
    11. 11
      Bereau T. DiStasio R. A. Tkatchenko A. von Lilienfeld O. A. Non-covalent interactions across organic and biological subsets of chemical space: Physics-based potentials parametrized from machine learning J. Chem. Phys. 2018 148 241706
    12. 12
      Liang C. Tocci G. Wilkins D. M. Grisafi A. Roke S. Ceriotti M. Solvent fluctuations and nuclear quantum effects modulate the molecular hyperpolarizability of water Phys. Rev. B 2017 96 041407
    13. 13
      Grisafi A. Wilkins D. M. Csányi G. Ceriotti M. Symmetry-Adapted Machine Learning for Tensorial Properties of Atomistic Systems Phys. Rev. Lett. 2018 120 036002
    14. 14
      Wilkins D. M. Grisafi A. Yang Y. Lao K. U. DiStasio R. A. Ceriotti M. Accurate molecular polarizabilities with coupled cluster theory and machine learning Proc. Natl. Acad. Sci. 2019 116 3401 3406
    15. 15
      Christensen A. S. Faber F. A. von Lilienfeld O. A. Operators in quantum machine learning: Response properties in chemical space J. Chem. Phys. 2019 150 064105
    16. 16
      Brockherde F. Vogt L. Li L. Tuckerman M. E. Burke K. Mu¨ller K.-R. Bypassing the Kohn-Sham equations with machine learning Nat. Commun. 2017 8 872
    17. 17
      Alred J. M. Bets K. V. Xie Y. Yakobson B. I. Machine learning electron density in sulfur crosslinked carbon nanotubes Compos. Sci. Technol. 2018 166 3 9
    18. 18
      Grisafi A. Fabrizio A. Meyer B. Wilkins D. M. Corminboeuf C. Ceriotti M. Transferable Machine-Learning Model of the Electron Density ACS Cent. Sci. 2019 5 57 64
    19. 19
      Braams B. J. Bowman J. M. Permutationally invariant potential energy surfaces in high dimensionality Int. Rev. Phys. Chem. 2009 28 577 606
    20. 20
      Behler J. Parrinello M. Generalized Neural-Network Representation of High-Dimensional Potential-Energy Surfaces Phys. Rev. Lett. 2007 98 146401
    21. 21
      Bartók A. P. Kondor R. Csányi G. On representing chemical environments Phys. Rev. B 2013 87 184115
    22. 22
      Shapeev A. Moment Tensor Potentials: A Class of Systematically Improvable Interatomic Potentials Multiscale Model. Sim. 2016 14 1153 1173
    23. 23
      Zhang L. Han J. Wang H. Car R. E W. Deep Potential Molecular Dynamics: A Scalable Model with the Accuracy of Quantum Mechanics Phys. Rev. Lett. 2018 120 143001
    24. 24
      Weinert U. Spherical tensor representation Arch. Ration. Mech. Anal. 1980 74 165 196
    25. 25
      Stone A. J. Transformation between cartesian and spherical tensors Mol. Phys. 1975 29 1461 1471
    26. 26
      De S. Bartók A. P. Csányi G. Ceriotti M. Comparing molecules and solids across structural and alchemical space Phys. Chem. Chem. Phys. 2016 18 13754 13769
    27. 27
      Musil F. De S. Yang J. Campbell J. E. J. Day G. G. M. Ceriotti M. Machine learning for the structure-energy-property landscapes of molecular crystals Chem. Sci. 2018 9 1289 1300
    28. 28
      Bartók A. P. De S. Poelking C. Bernstein N. Kermode J. R. Csányi G. Ceriotti M. Machine learning unifies the modeling of materials and molecules Sci. Adv. 2017 3
    29. 29
      Willatt M. J. Musil F. Ceriotti M. Atom-density representations for machine learning J. Chem. Phys. 2019 150 154110
    30. 30
      Willatt M. J. Musil F. Ceriotti M. Feature Optimization for Atomistic Machine Learning Yields a Data-Driven Construction of the Periodic Table of the Elements Phys. Chem. Chem. Phys. 2018 20 29661 29668
    31. 31
      Kondor R. Zhen L. Trivedi S. Clebsch-Gordan Nets: a Fully Fourier Space Spherical Convolutional Neural Network arXiv:1806.09231 2018
    32. 32
      Kaufmann K. Baumeister W. Single-centre expansion of Gaussian basis functions and the angular decomposition of their overlap integrals J. Phys. B: At. Mol. Opt. 1989 22 1
    33. 33
      Gradshteyn, I. S.; Ryzhik, I. M. Table of integrals, series, and products, 7th ed.; Elsevier/Academic Press, Amsterdam, 2007; pp xlviii+1171, Translated from the Russian, Translation edited and with a preface by Alan Jeffrey and Daniel Zwillinger, With one CD-ROM (Windows, Macintosh and UNIX).
    34. 34
      Chandrasekaran A. Kamal D. Batra R. Kim C. Chen L. Ramprasad R. Solving the electronic structure problem with machine learning Npj Comput. Mater. 2019 5 22
    35. 35
      Ceriotti M. Tribello G. A. Parrinello M. Demonstrating the Transferability and the Descriptive Power of Sketch-Map J. Chem. Theory Comput. 2013 9 1521 1532
    36. 36
      Hättig C. Optimization of auxiliary basis sets for RI-MP2 and RI-CC2 calculations: Corevalence and quintuple- basis sets for H to Ar and QZVPP basis sets for Li to Kr Phys. Chem. Chem. Phys. 2005 7 59 66