Prepare Dataset
This section describes how to prepare the dataset for training the SALTED model with different ab initio software packages.
What do we need?
- Product basis overlap matrices
- Density fitting coefficients
Generate Dataset
To date, support for generating these overlap matrices and coefficients is included in three electronic structure packages - PySCF, FHI-aims and CP2K. If you develop another package and would like to develop SALTED integration, please contact one of the developers.
Whichever code is used, the result should be the generation of new directories named overlaps and coefficients in the saltedpath directory. These will be used to train a SALTED model as described in the next section.
PySCF
- The following input arguments must be added to the
inp.qmsection:qmcode: define the quantum-mechanical code aspyscfpath2qm: set the path where the PySCF data are going to be savedqmbasis: define the wave function basis set for the Kohn-Sham calculation (example:cc-pvqz)functional: define the functional for the Kohn-Sham calculation (example:b3lyp)
- Define the auxiliary basis set using the input variable
dfbasis, as provided in theinp.qmsection. This must be chosen consistently with the wave function basis set (example:RI-cc-pvqz). Then, add this basis set information to SALTED by running:python3 -m salted.get_basis_info - Run PySCF to compute the Kohn-Sham density matrices:
python3 -m salted.pyscf.run_pyscf - From the computed density matrices, perform the density fitting on the selected auxiliary basis set by running:
python3 -m salted.pyscf.dm2df
FHI-aims
A detailed description of how to generate the training data for SALTED using FHI-aims can be found at the dedicated SALTED/FHI-aims tutorial.
CP2K
- The following input arguments must be included in the
inp.qmsection:qmcode: define quantum-mechanical code ascp2kpath2qm: set the path where the CP2K data are going to be savedperiodic: set the periodicity of the system (0D,2D,3D)coeffile: filename of RI density coefficients as printed by CP2Kovlpfile: filename of 2-center RI integrals as printed by CP2Kdfbasis: RI (density-fitting) basis filename appended for each species, extracted from CP2Kpseudocharge: list of pseudocharges associated with the adopted GTH pseudopotential. NB: the list ordering must be consistent with the ordering of species provided ininp.system.species.
- Initialize the systems used for the CP2K calculation by running:
System cells and coordinates are extracted from the configuration dataset in XYZ format and saved in folders namedpython3 -m salted.cp2k.xyz2sysconf_1,conf_2, ... located in the pathinp.qm.path2qm. NB: cell information (Lattice) must be included in second line of each XYZ configuration, even if it does not change. - Run SCF calculations and save the optimized wavefunction for each configuration in the corresponding folders previously generated. An example CP2K input is provided in
cp2k-inputs/SCF.inp. - Print the RI density-fitting coefficients and 2-center RI integrals by restarting the CP2K calculation from the optimized wavefunction. This restart operation derives from the large memory required by the RI fitting procedure, which might require using larger computational resources than the plain SCF cycle. An example CP2K input is provided in
cp2k-inputs/rho-RI-print.inp. NB: The RI basis is automatically generated by CP2K from the selected wavefunction basis set, as described in https://doi.org/10.1021/acs.jctc.6b01041, following SMALL, MEDIUM, or LARGE tiers. - Print the RI basis set information required for SALTED postprocessing of the CP2K density. An example CP2K input is provided in
cp2k-inputs/RI-basis.inp. This operation can be performed only once for any arbitrary configuration included in the dataset adopting the given choice of RI basis. The output is a single file including wavefunction and RI basis set information of all the species included in the selected test configuration. To extract the RI basis information for each species, run
withpython3 -m salted.cp2k.extract_basis cp2k_basis_filenamecp2k_basis_filenamethe output basis set filename. This will create a separate file for each species in the format, e.g., H-dfbasis, O-dfbasis. - Add the RI basis set information to SALTED by running:
python3 -m salted.get_basis_info - Set the
inp.qm.coeffileandinp.qm.ovlpfileinput arguments according to the filenames of the RI density-fitting coefficients and 2-center RI integrals generated at step 4. Then, convert the full training dataset in SALTED format by running:python3 -m salted.cp2k.cp2k2salted