Input Configuration

The file inp.yaml in the working directory is used to configure the input data for the model. This file will be parsed by the sys_utils.ParseConfig as a dictionary.

Input controlling file structure

This lists all possible inputs which can be listed in inp.yaml. Some options may be omitted, in which case default values are chosen. The type columns follow the Python typing conventions. The default column lists if the variable is required or not (then it lists the default value).

For all the path-related variables, the path can be either a relative path or an absolute path.

SALTED definition inp.salted

var name type default usage
saltedname str Required A label to identify a particular SALTED setup.
saltedpath str Required Location of all files produced by SALTED. Either relative to the working directory or an absolute path.
saltedtype str density Option for selecting the type of SALTED target (density or density-response)
seed int 42 Random seed. Only implemented in parsing input; not implemented in the code.
verbose bool False Output verbosity.

System difinition inp.system

var name type default usage
filename str Required An extended-XYZ file consisting of input structures.
species list[str] Required List of element species considered in the electron density expansion.
average bool True Whether we use averaged coefficients to set a baseline for the density. Normally this should be true, unless a density difference is learned.

Information about QM training set generation inp.qm

var name type default usage
path2qm str Required Location of the quantum-mechanical training data.
qmcode Literal["aims"] \| Literal["cp2k"] \| Literal["pyscf"] Required Which ab initio software was used to generate training data.
dfbasis str Required A label for the auxiliary basis set used to expand the density.
qmbasis str Required if qmcode=pyscf Wavefunction basis set to use when generating the training data (only for PySCF).
functional str Required if qmcode=pyscf DFT functional to use when generating the training data (only for PySCF).
pseudocharge float Required if qmcode=cp2k Pseudo nuclear charge (only for CP2K).
coeffile str Required if qmcode=cp2k Density coefficients file name as printed by CP2K.
ovlpfile str Required if qmcode=cp2k Overlap matrix file name as printed by CP2K.
periodic bool Required if qmcode=cp2k The periodic boundary conditions (only for CP2K).

Rascaline atomic environment parameters inp.descriptor.rep[n]

var name type default usage
type Literal["rho"] \| Literal["V"] Required Representation type, "rho" for atomic density and "V" for atomic potential.
rcut float Required Radial cutoff (Angstrom) of the structural representation.
nrad int Required Number of radial functions to be used for the structural representation.
nang int Required Maximum angular momentum to be used for the structural representation.
sig float Required Gaussian function width (Angstrom) for atomic density and/or derived atomic potential.
neighspe list[str] Required List of atomic species to be used for structural representation.

Feature sparsification parameters inp.descriptor.sparsify

var name type default usage
nsamples int 100 Number of structures to use for feature sparsification.
ncut int 0 Sets maximum number of sparse (by FPS) descriptor features to retain. 0 for no sparsification.

Prediction variabls inp.predict

Remember to set inp.predict if one wants to predict densities.

var name type default usage
filename str Required if predict An extended-XYZ file consisting of structures whose densities we wish to predict.
predname str Required if predict A label to identify a particular set of predictions.
predict_data str Required if predict and qmcode=aims Path to ab initio output for prediction, relative to path2qm.
alpha_only bool False Whether to limit predictions to the subset of coefficients associated with the calculation of the polarizability tensor. Only usable in combintation with saltedtype=density-response and qmcode=cp2k.

ML (GPR) variables inp.gpr

var name type default usage
z float 2.0 Kernel exponent \(\zeta\).
Menv int Required Number of reference environments.
Ntrain int Required Number of training structures.
trainfrac float 1.0 Training dataset fraction. Training dataset size is Ntrain * trainfrac.
regul float 1e-6 Regularization parameter \(\eta\).
eigcut float 1e-10 Eigenvalues cutoff for RKHS projection.
gradtol float 1e-5 Minimum gradient norm tolerance for CG minimization.
restart bool False Whether to restart from previous minimization checkpoint.
trainsel Literal["sequential"] \| Literal["random"] "random" Select the training set at random or sequentially from the entire dataset.
sparse_algorithm Literal["dense"] \| Literal["omp_sparse"] "omp_sparse" The algorithm to compute Hessian matrices, if making use of the RKHS vector sparsity.

API

For details please check the source code.

salted.sys_utils.ParseConfig

Input configuration file parser

To use it, make sure an inp.yaml file exists in the current working directory, and simply run ParseConfig().parse_input().

In our context, "input file" equals to "confiuration file", refers to the SALTED input file named inp.yaml.

__init__(_dev_inp_fpath=None)

Initialize configuration parser

Parameters:

Name Type Description Default
_dev_inp_fpath str | None

Path to the input file. Defaults to None. Don't use this argument, it's for testing only!!!

None

check_input(inp)

Check keys (required, optional, not allowed), and value types and ranges

Format: (required, default value, value type, value extra check)

About required
  • True -> required
  • False -> optional, will fill in default value if not found
  • False + PLACEHOLDER -> optional in some cases, but required in others cases
  • (if the default value is $PLACEHOLDER, it means the key is optional for some cases, but required for others)
About PLACEHOLDER
  • If a key is optional in some cases, but required in others, the default value is set to PLACEHOLDER.
  • The extra value checking should consider the PLACEHOLDER value!
About sparsify
  • The config doesn't explicitly require the sparsify section, and ncut is 0 by default (don't sparsify).

get_all_params()

return all parameters with a tuple

About sparsify in the return tuple: - If ncut <=0, sparsify = False. - If ncut > 0, sparsify = True.

Please copy & paste:

(saltedname, saltedpath, saltedtype,
 filename, species, average,
 path2qm, qmcode, qmbasis, dfbasis,
 filename_pred, predname, predict_data, alpha_only,
 rep1, rcut1, sig1, nrad1, nang1, neighspe1,
 rep2, rcut2, sig2, nrad2, nang2, neighspe2,
 sparsify, nsamples, ncut,
 zeta, Menv, Ntrain, trainfrac, regul, eigcut,
 gradtol, restart, trainsel,
 nspe1, nspe2, HP1, HP2) = ParseConfig().get_all_params()
HP1 and HP2 are the featomic hyperparameter dicts for rep1 and rep2, built from their respective configs via build_featomic_hyper_params().

get_all_params_simple1()

return all parameters with a tuple

Please copy & paste:

(
    filename, species, average,
    rep1, rcut1, sig1, nrad1, nang1, neighspe1,
    rep2, rcut2, sig2, nrad2, nang2, neighspe2,
    sparsify, nsamples, ncut,
    z, Menv, Ntrain, trainfrac, regul, eigcut,
    gradtol, restart, trainsel
) = ParseConfig().get_all_params_simple1()

get_loader()

Add constructors to the yaml.SafeLoader For details, see: https://pyyaml.org/wiki/PyYAMLDocumentation

parse_input()

Parse input file Procedure: - get loader (for constructors and resolvers) - load yaml

Returns:

Name Type Description
AttrDict AttrDict

Parsed input file