API

amptorch.descriptor

amptorch.ase_utils

class amptorch.ase_utils.AmpTorch(trainer)[source]

Bases: Calculator

Create an ase.calculators.calculator.Calculator class to compute the energy (and forces) for the given ase.Atoms object.

calculate : Calculates the corresponding energy (and forces) with loaded parameters in the model.

calculate(atoms, properties, system_changes)[source]

Do the calculation.

properties: list of str: List of what needs to be calculated. Can be any combination of ‘energy’, ‘forces’, ‘stress’, ‘dipole’, ‘charges’, ‘magmom’ and ‘magmoms’.
system_changes: list of str: List of what has changed since last calculation. Can be any combination of these six: ‘positions’, ‘numbers’, ‘cell’, ‘pbc’, ‘initial_charges’ and ‘initial_magmoms’.

Subclasses need to implement this, but can ignore properties and system_changes if they want. Calculated properties should be inserted into results dictionary like shown in this dummy example:

self.results = {'energy': 0.0,
                'forces': np.zeros((len(atoms), 3)),
                'stress': np.zeros(6),
                'dipole': np.zeros(3),
                'charges': np.zeros(len(atoms)),
                'magmom': 0.0,
                'magmoms': np.zeros(len(atoms))}

The subclass implementation should first call this implementation to set the atoms attribute and create any missing directories.

implemented_properties: List[str] = ['energy', 'forces']: Properties calculator can handle (energy, forces, …)

amptorch.data_parallel

Adapted from https://github.com/Open-Catalyst-Project/ocp/blob/master/ocpmodels/common/data_parallel.py

class amptorch.data_parallel.DataParallel(module, output_device, num_gpus)[source]

Bases: DataParallel

Data Parallelization for GPU scheme.

forward(batch_list)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

class amptorch.data_parallel.ParallelCollater(num_gpus, collater)[source]

Bases: object

Data collater for multi-GPU training.

amptorch.dataset

class amptorch.dataset.AtomsDataset(images, descriptor_setup, forcetraining=True, save_fps=True, scaling={'range': (0, 1), 'threshold': 1e-06, 'type': 'normalize'}, cores=1, process=True)[source]

Bases: Dataset

Dataset class to hold information about the ase.Atoms including element, energy, fingerprint (and forces).

Args:

images (list): A list of ase.Atoms objects.

descriptor_setup (dict): A dictionary containing parameters for fingerprint generation.

forcetraining (bool): Whether to train with forces (default is True).

save_fps (bool): Whether to save the fingerprints (default is True).

scaling (dict): A dictionary on how to scale the fingerprints (default is {“type”: “normalize”, “range”: (0, 1), “threshold”: 1e-6}).

cores (int): The number of cores to use for parallel processing (default is 1).

process (bool): Whether to process the data during initialization (default is True).

property input_dim

process()[source]: Compute the fingerprints according to the defined fingerprinting scheme and parameters, scale the feature and targets.

class amptorch.dataset.DataCollater(train=True, forcetraining=True)[source]

Bases: object

Helper function to batch the dataset.

amptorch.dataset.construct_descriptor(descriptor_setup)[source]: Pass into different fingerprinting classes to obtain the corresponding atomic representations as fingerprints.

amptorch.dataset_lmdb

class amptorch.dataset_lmdb.AtomsLMDBDataset(db_paths)[source]

Bases: Dataset

lmdb dataset with no cache

This is the straight forward yet slow way of using lmdb files To access a given image of lmdb files (i.e. the __getitem__ function), it has to go to disk, connect to the corresponding lmdb file, and access the desired image. Since this is random access for each image, the performance is slow.

It does support large amount of data, limited only by disk space, and NOT memory (RAM)

It should be avoid for bad access performance if possible

Parameter: db_paths [str] : a list of strings pointing to the paths of lmdb files.

connect_db(lmdb_path)[source]

get_descriptor(descriptor_setup)[source]

property input_dim

class amptorch.dataset_lmdb.AtomsLMDBDatasetCache(db_paths)[source]

Bases: Dataset

lmdb dataset with full cache

This is the fastest way for training with multiple lmdb files in case they CAN be fitted into RAM all at once. It loads all images into RAM from disk up front.

It does not large amount of data, as it’s limited by RAM size.

It is the fastest way among the three for trianing, ~3x faster than partial caching.

Parameter: db_paths [str] : a list of strings pointing to the paths of lmdb files.

connect_db(lmdb_path)[source]

get_descriptor(descriptor_setup)[source]

property input_dim

class amptorch.dataset_lmdb.AtomsLMDBDatasetPartialCache(db_paths)[source]

Bases: Dataset

lmdb dataset with partial cache

This is the optimized way for training with multiple lmdb files that CAN NOT be fitted into RAM all at once. It assumes the trainer to sequentially look at the images and lmdb files (i.e., first the images in lmdb_file1 in order, then images in lmdb_file2 in order, and so on) With the above assumption, the dataset load and cache all the images of the current accessing lmdb file into RAM, and access the desired image (__getitem__ ) from RAM. This is ~ 1-2 orders of magnitudes faster than no cache, because of serial acess of lmdb_files.

It does support large amount of data, limited only by disk space, as long as each lmdb file can be loaded into RAM entirely.

It has to be used with in-order spliter and randomized dataset.

Parameter: db_paths [str] : a list of strings pointing to the paths of lmdb files.

connect_db(lmdb_path)[source]

get_descriptor(descriptor_setup)[source]

get_length_list()[source]

property input_dim

class amptorch.dataset_lmdb.PartialCacheSampler(length_list, val_frac)[source]

Bases: Sampler

Sampling strategy for partial cache scheme.

amptorch.dataset_lmdb.get_lmdb_dataset(lmdb_paths, cache_type)[source]: A helper function to assign lmdb dataset types.

amptorch.metrics

class amptorch.metrics.MemEffEpochScoring(scoring, lower_is_better=True, on_train=False, name=None, target_extractor=None, use_caching=True)[source]

Bases: EpochScoring

Memory-efficient epoch scorer that caches the predictions for all batches during the epoch.

on_batch_end(net, y, y_pred, training, **kwargs)[source]: Called at the end of each batch.

amptorch.metrics.evaluator(val_split, metric, identifier, forcetraining, cp_metric)[source]: For metric display with callbacks.

amptorch.metrics.mae_energy_score(net, X, y)[source]: Compute the energy MAE of the model.

amptorch.metrics.mae_forces_score(net, X, y)[source]: Compute the force MAE of the model.

amptorch.metrics.mse_energy_score(net, X, y)[source]: Compute the energy MSE of the model.

amptorch.metrics.mse_forces_score(net, X, y)[source]: Compute the force MSE of the model.

amptorch.metrics.to_cpu(X)[source]: Detach to cpu.

amptorch.model

class amptorch.model.BPNN(elements, input_dim, num_nodes=20, num_layers=5, hidden_layers=None, get_forces=True, batchnorm=False, dropout=False, dropout_rate=0.5, activation=<class 'torch.nn.modules.activation.Tanh'>, name='bpnn', initialization='xavier')[source]

Bases: Module

Atomistic neural network structure described as 2nd generation or Behler-Parrinello neural network for energy (and force) training.

Args: elements : list of str

List of unique element symbols in the system.

input_dimint: Dimensionality of the input. The dimension depends on the atomistic fingerprinting scheme.
num_nodesint, optional (default=20): Number of nodes in each hidden layer.
num_layersint, optional (default=5): Number of hidden layers in the network.
hidden_layerslist of int, optional (default=None): A list of integers, where each element corresponds to the number of nodes in a hidden layer. Overrides num_nodes and num_layers. E.g. [10, 10, 10]
get_forcesbool, optional (default=True): Whether to train with the forces in addition to the energy.
batchnormbool, optional (default=False): Whether to se batch normalization in the network.
dropoutbool, optional (default=False): Whether to use to apply dropout in the network.
dropout_ratefloat, optional (default=0.5): The dropout probability in [0, 1].
activationtorch.nn.Module, optional (default=Tanh): The activation function to use in the network.
namestr, optional (default=’bpnn’): Name of the network.
initializationstr, optional (default=’xavier’): Initialization method to use for weights in the network.

forward(batch)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

property num_params

training: bool

class amptorch.model.CustomLoss(force_coefficient=0, loss='mae')[source]

Bases: Module

Customize the loss function based on Parrinello’s publication with alpha as the force coefficient.

forward(prediction, target)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

class amptorch.model.ElementMask(elements)[source]

Bases: Module

Mask for different chemical element types for BPNN.

Args:: elements (List[str]) : a list of strings of unique chemical elements in the system.

forward(atomic_numbers)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

class amptorch.model.MLP(n_input_nodes, n_layers, n_hidden_size, activation, batchnorm, dropout, dropout_rate, hidden_layers=None, n_output_nodes=1, initialization='xavier')[source]

Bases: Module

Multi-layer perceptron model modified for atomistic input.

Args:

n_input_nodes (int): Number of input nodes for the network. n_layers (int): Number of hidden layers in the network. n_hidden_size (int): Number of hidden units per layer. activation (torch.nn.Module): Activation function to use in each layer. batchnorm (bool): Whether to use batch normalization after each layer. dropout (bool): Whether to use dropout after each layer. dropout_rate (float): Dropout rate to use if dropout is True. hidden_layers (Optional[List[int]]): List of hidden layer sizes. If not None,

n_layers and n_hidden_size will be ignored.

n_output_nodes (int): Number of output nodes for the network. initialization (str): Initialization method for the network weights. “xavier” or “zero”.

forward(inputs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

reset_parameters(initialization)[source]

training: bool

class amptorch.model.SingleNN(elements, input_dim, num_nodes=20, num_layers=5, hidden_layers=None, get_forces=True, batchnorm=False, dropout=False, dropout_rate=0.5, activation=<class 'torch.nn.modules.activation.Tanh'>, name='singlenn', initialization='xavier')[source]

Bases: Module

A modified version of Behler-Parrinello atomistic neural network where all elements shared the same for energy (and force) training.

Args: elements : list of str

List of unique element symbols in the system.

input_dimint: Dimensionality of the input. The dimension depends on the atomistic fingerprinting scheme.
num_nodesint, optional (default=20): Number of nodes in each hidden layer.
num_layersint, optional (default=5): Number of hidden layers in the network.
hidden_layerslist of int, optional (default=None): A list of integers, where each element corresponds to the number of nodes in a hidden layer. Overrides num_nodes and num_layers. E.g. [10, 10, 10]
get_forcesbool, optional (default=True): Whether to train with the forces in addition to the energy.
batchnormbool, optional (default=False): Whether to se batch normalization in the network.
dropoutbool, optional (default=False): Whether to use to apply dropout in the network.
dropout_ratefloat, optional (default=0.5): The dropout probability in [0, 1].
activationtorch.nn.Module, optional (default=Tanh): The activation function to use in the network.
namestr, optional (default=’singlenn’): Name of the network.
initializationstr, optional (default=’xavier’): Initialization method to use for weights in the network.

forward(batch)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

property num_params

training: bool

amptorch.trainer

class amptorch.trainer.AtomsTrainer(config=None)[source]

Bases: object

Main trainer class to define the atomistic neural network force field for energy (and force prediction).

config: [dict]: A dictionary that defines configuration of the trainer in model, optim, dataset and cmd parts. Please refer to Usage and Example sections in documentation for more information.

get_calc()[source]

Convert the AtomsTrainer class to an ase.Calculator class for interfacing with ase.

AmpTorchase.Calculator class: After attaching the Calculator to ase.Atoms object, the user can use get_potential_energy() method to obtain the corresponding energy in ase.

get_unique_elements(training_images)[source]: Get a list of chemical elements in str if not given in config.

load(load_dataset=True)[source]: Loading the parameters passed from config.

load_config()[source]: Set up attributes from input configuration dictionary.

load_criterion()[source]: Load custom loss function in optimizaiton scheme from config.

load_dataset()[source]: Load dataset either in the format of a list of ase.Atoms objects or a path to those objects, or alternatively, the saved lmdb files with generated fingerprints.

load_extras()[source]: Load extra commands such as validation splits, CV splits, debug mode and various callbacks.

load_logger()[source]: Load wandb logger from config.

load_model()[source]: Load the parameters for atomistic neural network models from config.

load_optimizer()[source]: Load set-up parameters in optimizers from config.

load_pretrained(checkpoint_path=None, gpu2cpu=False)[source]

Load pretrained model with configuration and parameters in the checkpoint.

Args:: checkpoint_path: str, Path to checkpoint directory gpu2cpu: bool, True if checkpoint was trained with GPUs and you wish to load on cpu instead.

load_rng_seed()[source]: Load a fixed random seed for reproducibility.

load_skorch()[source]: Load the skorch atomistic neural network regression model with parameters from config.

predict(images, disable_tqdm=True, get_latent=None, get_descriptor=False, save_fps=False)[source]

Method used to make energy (and force) predictions for input images.

imagesList[ase.Atoms]: A list of ase.Atoms objects for the model to make predictions on.
disabled_tqdmbool: Option to disable tqdm for Jupyter notebook compatibility.
get_latentint (default to None): Option to record the last-layer latent representation for uncertainty quantification purpose. Usually used with -2 to get the last-layer neural network latent representation. If this flag is specified to an integer instead of default None, the output dictionary would contain an extra entry of “latent”.
get_descriptorbool: Option to record the feature space representation. If set to True, , the output dictionary would contain an extra entry of “feature”.
save_fpsbool: Option to save the calculated fingerprints for accelerated computation.

predictionsdict: A dictionary that contains two basic entries, “energy” and “forces”. prediction[“energy”] is a list of energies, and prediction[“forces”] is a list of force components in Cartesian coordiantes, if only the model is trained with energy and forces.

train(raw_data=None)[source]: Method used to train the model with defined config by initiating the AtomsTrainer instance with a user-defined config dictionary. Can be fed with a list of ase.Atoms objects as training data.

amptorch.utils

class amptorch.utils.InOrderSplit(val_frac)[source]: Bases: object

class amptorch.utils.check_memory[source]

Bases: Callback

on_batch_end(net, **kwargs)[source]: Called at the end of each batch.

amptorch.utils.save_normalizers(normalizers, path)[source]

amptorch.utils.target_extractor(y)[source]

amptorch.utils.to_tensor(X, device, accept_sparse=False)[source]

class amptorch.utils.train_end_load_best_loss(filename)[source]

Bases: Callback

on_train_end(net, X, y)[source]: Called at the end of training.