Input Configuration
The file inp.yaml in the working directory is used to configure the input data for the model.
This file will be parsed by the sys_utils.ParseConfig as a dictionary.
Input controlling file structure
This lists all possible inputs which can be listed in inp.yaml.
Some options may be omitted, in which case default values are chosen.
The type columns follow the Python typing conventions.
The default column lists if the variable is required or not (then it lists the default value).
For all the path-related variables, the path can be either a relative path or an absolute path.
SALTED definition inp.salted
| var name | type | default | usage |
|---|---|---|---|
saltedname |
str |
Required | A label to identify a particular SALTED setup. |
saltedpath |
str |
Required | Location of all files produced by SALTED. Either relative to the working directory or an absolute path. |
saltedtype |
str |
density |
Option for selecting the type of SALTED target (density or density-response) |
seed |
int |
42 |
Random seed. Only implemented in parsing input; not implemented in the code. |
verbose |
bool |
False |
Output verbosity. |
System difinition inp.system
| var name | type | default | usage |
|---|---|---|---|
filename |
str |
Required | An extended-XYZ file consisting of input structures. |
species |
list[str] |
Required | List of element species considered in the electron density expansion. |
average |
bool |
True |
Whether we use averaged coefficients to set a baseline for the density. Normally this should be true, unless a density difference is learned. |
Information about QM training set generation inp.qm
| var name | type | default | usage |
|---|---|---|---|
path2qm |
str |
Required | Location of the quantum-mechanical training data. |
qmcode |
Literal["aims"] \| Literal["cp2k"] \| Literal["pyscf"] |
Required | Which ab initio software was used to generate training data. |
dfbasis |
str |
Required | A label for the auxiliary basis set used to expand the density. |
qmbasis |
str |
Required if qmcode=pyscf |
Wavefunction basis set to use when generating the training data (only for PySCF). |
functional |
str |
Required if qmcode=pyscf |
DFT functional to use when generating the training data (only for PySCF). |
pseudocharge |
float |
Required if qmcode=cp2k |
Pseudo nuclear charge (only for CP2K). |
coeffile |
str |
Required if qmcode=cp2k |
Density coefficients file name as printed by CP2K. |
ovlpfile |
str |
Required if qmcode=cp2k |
Overlap matrix file name as printed by CP2K. |
periodic |
bool |
Required if qmcode=cp2k |
The periodic boundary conditions (only for CP2K). |
Rascaline atomic environment parameters inp.descriptor.rep[n]
| var name | type | default | usage |
|---|---|---|---|
type |
Literal["rho"] \| Literal["V"] |
Required | Representation type, "rho" for atomic density and "V" for atomic potential. |
rcut |
float |
Required | Radial cutoff (Angstrom) of the structural representation. |
nrad |
int |
Required | Number of radial functions to be used for the structural representation. |
nang |
int |
Required | Maximum angular momentum to be used for the structural representation. |
sig |
float |
Required | Gaussian function width (Angstrom) for atomic density and/or derived atomic potential. |
neighspe |
list[str] |
Required | List of atomic species to be used for structural representation. |
Feature sparsification parameters inp.descriptor.sparsify
| var name | type | default | usage |
|---|---|---|---|
nsamples |
int |
100 |
Number of structures to use for feature sparsification. |
ncut |
int |
0 |
Sets maximum number of sparse (by FPS) descriptor features to retain. 0 for no sparsification. |
Prediction variabls inp.predict
Remember to set inp.predict if one wants to predict densities.
| var name | type | default | usage |
|---|---|---|---|
filename |
str |
Required if predict | An extended-XYZ file consisting of structures whose densities we wish to predict. |
predname |
str |
Required if predict | A label to identify a particular set of predictions. |
predict_data |
str |
Required if predict and qmcode=aims |
Path to ab initio output for prediction, relative to path2qm. |
alpha_only |
bool |
False |
Whether to limit predictions to the subset of coefficients associated with the calculation of the polarizability tensor. Only usable in combintation with saltedtype=density-response and qmcode=cp2k. |
ML (GPR) variables inp.gpr
| var name | type | default | usage |
|---|---|---|---|
z |
float |
2.0 |
Kernel exponent \(\zeta\). |
Menv |
int |
Required | Number of reference environments. |
Ntrain |
int |
Required | Number of training structures. |
trainfrac |
float |
1.0 |
Training dataset fraction. Training dataset size is Ntrain * trainfrac. |
regul |
float |
1e-6 |
Regularization parameter \(\eta\). |
eigcut |
float |
1e-10 |
Eigenvalues cutoff for RKHS projection. |
gradtol |
float |
1e-5 |
Minimum gradient norm tolerance for CG minimization. |
restart |
bool |
False |
Whether to restart from previous minimization checkpoint. |
trainsel |
Literal["sequential"] \| Literal["random"] |
"random" |
Select the training set at random or sequentially from the entire dataset. |
sparse_algorithm |
Literal["dense"] \| Literal["omp_sparse"] |
"omp_sparse" |
The algorithm to compute Hessian matrices, if making use of the RKHS vector sparsity. |
API
For details please check the source code.
salted.sys_utils.ParseConfig
Input configuration file parser
To use it, make sure an inp.yaml file exists in the current working directory,
and simply run ParseConfig().parse_input().
In our context, "input file" equals to "confiuration file", refers to the SALTED input file named inp.yaml.
__init__(_dev_inp_fpath=None)
Initialize configuration parser
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
_dev_inp_fpath |
str | None
|
Path to the input file. Defaults to None. Don't use this argument, it's for testing only!!! |
None
|
check_input(inp)
Check keys (required, optional, not allowed), and value types and ranges
Format: (required, default value, value type, value extra check)
About required
- True -> required
- False -> optional, will fill in default value if not found
- False + PLACEHOLDER -> optional in some cases, but required in others cases
- (if the default value is $PLACEHOLDER, it means the key is optional for some cases, but required for others)
About PLACEHOLDER
- If a key is optional in some cases, but required in others, the default value is set to PLACEHOLDER.
- The extra value checking should consider the PLACEHOLDER value!
About sparsify
- The config doesn't explicitly require the sparsify section, and ncut is 0 by default (don't sparsify).
get_all_params()
return all parameters with a tuple
About sparsify in the return tuple:
- If ncut <=0, sparsify = False.
- If ncut > 0, sparsify = True.
Please copy & paste:
(saltedname, saltedpath, saltedtype,
filename, species, average,
path2qm, qmcode, qmbasis, dfbasis,
filename_pred, predname, predict_data, alpha_only,
rep1, rcut1, sig1, nrad1, nang1, neighspe1,
rep2, rcut2, sig2, nrad2, nang2, neighspe2,
sparsify, nsamples, ncut,
zeta, Menv, Ntrain, trainfrac, regul, eigcut,
gradtol, restart, trainsel,
nspe1, nspe2, HP1, HP2) = ParseConfig().get_all_params()
HP1 and HP2 are the featomic hyperparameter dicts for rep1 and rep2,
built from their respective configs via build_featomic_hyper_params().
get_all_params_simple1()
return all parameters with a tuple
Please copy & paste:
(
filename, species, average,
rep1, rcut1, sig1, nrad1, nang1, neighspe1,
rep2, rcut2, sig2, nrad2, nang2, neighspe2,
sparsify, nsamples, ncut,
z, Menv, Ntrain, trainfrac, regul, eigcut,
gradtol, restart, trainsel
) = ParseConfig().get_all_params_simple1()
get_loader()
Add constructors to the yaml.SafeLoader For details, see: https://pyyaml.org/wiki/PyYAMLDocumentation
parse_input()
Parse input file Procedure: - get loader (for constructors and resolvers) - load yaml
Returns:
| Name | Type | Description |
|---|---|---|
AttrDict |
AttrDict
|
Parsed input file |