Examples

Gaussian Multiple (GMP) Descriptors

In addition to conventional Atom-centered Symmetry Functions as fingerprinting scheme, AmpTorch also support GMP descriptors that uses multipole expansions to describe the reconstructed electronic density around every central atom and its neighbors to encode local environments. Because the formulation of symmetry functions does not take into element types into account, the interactions among different elements are divided into different columns as input. As a result, the number of feature dimensions undesirably increases with the number of elements present. A major advantage of GMPs is that the input dimensions remain constant regardless of the number of chemical elements, and therefore can be adopted for complex datasets. For more technical details and theorical backgrounds, please refer to Lei, X., & Medford, A. J. (2021). A Universal Framework for Featurization of Atomistic Systems. http://arxiv.org/abs/2102.02390

For example scripts of using GMP for structure to energy (and forces), please refer to:

examples/1_GMP

SingleNN Atomistic Neural Network Structures

As GMPs encode the information about chemical elements based on reconstructed electronic environments, GMPs work naturally with the atomistic Neural Network Structures SingleNN as published by Liu and Kitchin (Liu, M., & Kitchin, J. R. (2020). SingleNN: Modified Behler-Parrinello Neural Network with Shared Weights for Atomistic Simulations with Transferability. Journal of Physical Chemistry C, 124(32), 17811–17818. https://doi.org/10.1021/acs.jpcc.0c04225).

To use SingleNN instead of the default Behler-Parrinello High-dimensional Neural Network scheme, in config for NN trainer, define:

config["model"]["name"] == "singlenn"

as shown in examples/1_GMP/1_GMP_S2E.py

lmdb as Database Management Solution for Large Dataset

For AmpTorch to be compatible to train with large datasets such as Open Catalyst Project, we leverage lmdb, a Btree-based database management library, to resolve possible memory issues when it comes to loading and training. It can be used in either full- or partial-cache fashion depending on whether the dataset can be fit into RAM altogether.

Examples of using no, full or partial cache can be found in:

examples/3_lmdb

Uncertainty Quantification (UQ) via Conformal Prediction (CP)

AmpTorch implements UQ as an optional feature during the prediction. Here we use conformal prediction method with the distances in neural network’s latent space to output the uncertainty associated with the predicted energy. CP method ensures calibration while showing advantage of being sharp and scalable when tested against benchmarking systems such as MD17, QM9 and OC20 with trained models.

An example python script can be found in:

examples/1_GMP/3_GMP_S2E_w_uncertainty.py