Machine Learning Framework for CIDER Functionals

This page covers the machine learning framework used to train exchange-correlation functionals in the CIDER formalism, which is built on Gaussian process regression (GPR).

Gaussian Process Regression

Gaussian process regression is a nonparametric, Bayesian statistical learning model. The detailed theory of Gaussian processes can be found in the excellent textbook by Rasmussen and Williams. [1] This section covers the application of Gaussian process regression to functional design. More details on the formalism can be found in Bystrom and Kozinsky[2] and Bystrom et al.[3]

Consider the problem of learning some function \(f(\mathbf{x})\), with \(\mathbf{x}\) being a feature vector of independent variables. Let \(\mathbf{X}\) be a feature matrix, where each row \(\mathbf{x}_i\) is the feature vector for training point \(i\), and let \(\mathbf{y}\) be a target vector, where each element \(y_i\) is the observed value of the target function for training point \(i\). Then, there are two key components necessary to construct a Gaussian process regression for \(f(\mathbf{x})\). The first is the covariance kernel \(k(\mathbf{x}, \mathbf{x}')\), which defines the covariance between the values of the predictive function for two points, i.e. \(k(\mathbf{x}, \mathbf{x}')=\text{Cov}(f(\mathbf{x}), f(\mathbf{x}'))\). The matrix of covariances between training points is denoted \(\mathbf{K}\) with matrix elements \(K_{ij}=k(\mathbf{x}_i, \mathbf{x}_j)\). The second is the noise labels \(\sigma_i\), indicating the estimated prior uncertainty on each training observation \(i\). We will denote the diagonal matrix whose entries are the noise covariances \({\sigma_i^{}}^2\) as \(\boldsymbol{\Sigma}_\text{noise}\). Using these components, the predictive function for a test point \(\mathbf{x}_*\) takes the form [1]

\[f(\mathbf{x}_*) = \sum_\alpha k(\mathbf{x}_*, \mathbf{x}_i) \alpha_i\]

with the following definition for the weight vector \(\boldsymbol{\alpha}\):

\[\boldsymbol{\alpha} = \left(\mathbf{K} + \boldsymbol{\Sigma}_\text{noise}\right)^{-1} \mathbf{y}\]

In the case of fitting energies of molecules and solids, we need to fit an extensive quantity in which the training labels are integrals of the predictive function \(f(\mathbf{x})\) over real space. The next section covers adjustments to the Gaussian process model necessary to perform this task.

Fitting Total Energy Data

Consider an extensive quantity \(F\), which is a contribution to the total electronic energy of a chemical system. In the case of CiderPress, this quanity will be the exchange, correlation, or exchange-correlation energy. We can fit \(F\) by learning a function that gets integrated over real-space to yield \(F\) for a given system:

\[F = \int \text{d}^3\mathbf{r}\,f\left(\mathbf{x}(\mathbf{r})\right) \label{eq:F_int}\]

In the above equation, \(f\left(\mathbf{x}(\mathbf{r})\right)\) is the predictive function for the energy density to be learned by the Gaussian process. In practice, the integral must be performed numerically. For a given chemical system indexed by \(m\), we write

\[F^m = \sum_{g\in m} w_g^m f\left(\mathbf{x}_g^m\right) \label{eq:extensive_functional}\]

where \(g\) indexes quadrature points and \(w_g^m\) are the respective quadrature weights. The covariances between the numerical integrals \(F^m\) and \(F^n\) can be written as

\[\text{Cov}(F^m, F^n) = \sum_{g \in m} \sum_{h \in n} w_g^m w_h^n k(\mathbf{x}_g^m, \mathbf{x}_h^m)\]

where \(k(\mathbf{x}, \mathbf{x}')\) is the covariance kernel for \(f(\mathbf{x})\). Computing the above double numerical integral directly would be expensive. To overcome this issue, we define a small set of “control points” \(\tilde{\mathbf{x}}_a\) and approximate \(\text{Cov}(F^m, F^n)\) using a resolution-of-the-identity approximation:

\[\begin{split}\text{Cov}(F^m, F^n) = K_{mn} &\approx \tilde{\mathbf{k}}_m \tilde{\mathbf{K}}^{-1} \tilde{\mathbf{k}}_n \label{eq:roi_cov} \\ \left(\tilde{\mathbf{K}}\right)_{ab} &= k(\tilde{\mathbf{x}}_a, \tilde{\mathbf{x}}_b) \\ \left(\tilde{\mathbf{k}}_m\right)_a &= \sum_{g\in m} w_g^m k(\mathbf{x}_g^m, \tilde{\mathbf{x}}_a)\end{split}\]

Using this definition of the covariance kernel, the predictive function can be expressed as

\[\begin{split}f(\mathbf{x}_*) &= \sum_a k(\mathbf{x}_*, \mathbf{\tilde{x}}_a) \alpha_a \label{eq:gp_sum_formula} \\ \boldsymbol{\alpha} &= \sum_m \mathbf{\tilde{k}}_m \left\{\left[\mathbf{K} + \boldsymbol{\Sigma}_\text{noise}\right]^{-1} \mathbf{y}\right\}_m \label{eq:gp_predictive_cider3}\end{split}\]

with \(\mathbf{y}\) being the vector of training labels \(F^m\).

Fitting Eigenvalues

The Gaussian process scheme discussed above can be extended to fit the eigenvalues of the Kohn-Sham Hamiltonian. [3] The \(i\)-th eigenvalue \(\epsilon_i^m\) of chemical system \(m\) is the partial derivate of the total energy with respect to the occupation number \(f_i^m\) of the orbital: [4]

\[\epsilon_i^m = \frac{\partial E}{\partial f_i^m}\]

Most of the Kohn-Sham eigenvalues \(\epsilon_i^m\) are fictional and lack explicit physical meaning, but the LUMO and HOMO eigenvalues of the exact functional correspond to the electron affinity and negative of the ionization potential, respectively. Therefore, we are interested in explicitly fitting the derivative of our target quantity \(\frac{\partial F}{\partial f_i^m}\). In the case of fitting exact exchange \(E_\text{x}^\text{exact}\), one can explicitly compute \(\frac{\partial E_\text{x}^\text{exact}}{\partial f_i^m}\) for a given set of orbitals. For the full exchange-correlation energy, no explicit formula exists, but \(\frac{\partial E_\text{xc}^\text{exact}}{\partial f_i^m}\) can be extracted from an experimental/quantum chemistry measurement of the electron affinity (EA) or ionization potential (IP). The total energy in DFT is

\[\begin{split}E[n] &= T[n] + V[n] + U[n] + E_\text{xc}[n] \\ E[n] &= E_0[n] + E_\text{xc}[n]\end{split}\]

where \(E_0[n]\) is the sum of the kinetic (\(T\)), external (\(V\)), and Hartree (\(U\)) energies, all of which can be written explicitly in terms of the Kohn-Sham orbitals. Therefore, if we know an eigenvalue \(\epsilon_i^m\) from experimental/quantum chemistry measurements of the IP or EA, we can write

\[\frac{\partial E_\text{xc}[n]}{\partial f_i^m} = \epsilon_i^m - \frac{\partial E_0[n]}{\partial f_i^m}\]

This gives us an explicit expression for \(\frac{\partial E_\text{xc}[n]}{\partial f_i^m}\) that can be used as training data for the XC functional.

Given such training data, we can relate the occupation derivative of \(F\) to our model energy density \(f(\mathbf{x})\) as

\[\frac{\partial F^m}{\partial f_i^m} = \sum_{g \in m} w_g^m \frac{\partial f(\mathbf{x}_g^m)}{\partial\mathbf{x}_g^m} \cdot \frac{\partial \mathbf{x}_g^m}{\partial f_i^m}\]

To fit our model to the above equation, we only need to know the covariance between \(\frac{\partial F^m}{\partial f_i^m}\) and \(\frac{\partial F^n}{\partial f_j^n}\) or \(F^n\). This relationship is given by

\[\begin{split}\text{Cov}\left(\frac{\partial F^m}{\partial f_i^m}, F^n\right) &= \tilde{\mathbf{d}}_{mi} \tilde{\mathbf{K}}^{-1} \tilde{\mathbf{k}}_n \\ \text{Cov}\left(\frac{\partial F^m}{\partial f_i^m}, \frac{\partial F^n}{\partial f_j^n}\right) &= \tilde{\mathbf{d}}_{mi} \tilde{\mathbf{K}}^{-1} \tilde{\mathbf{d}}_{nj} \\ \left(\tilde{\mathbf{d}}_{mi}\right)_a &= \sum_{g\in m} w_g^m \frac{\partial \mathbf{x}_g^m}{\partial f_i^m} \cdot \frac{\partial}{\partial \mathbf{x}_g^m} k(\mathbf{x}_g^m, \tilde{\mathbf{x}}_a)\end{split}\]

which allows occupation derivative training data to be included in the Gaussian process. For further details, see Bystrom et al.[3]