Gaussian 03 Online Manual
Gaussian has been designed to work efficiently given a variety of computer configurations. In general, the program attempts to select the most efficient algorithm given the memory and disk constraints imposed upon it. Since Gaussian does offer a wide choice of algorithms, an understanding of the possibilities and tradeoffs can help you to achieve optimal performance.
Before proceeding, however, let us emphasize two very important points:
-M- available-memory -#- MaxDisk=available-disk
Estimating Calculation Memory Requirements
The following formula can be used to estimate the memory requirement of various types of Gaussian jobs (in 8-byte words):
M + 2NB2
where NB is the number of basis functions used in the calculation, and M is a minimum value that depends on the job type, given in the following table:
For example, on a 32-bit system, a 300 basis function HF geometry optimization using g functions would require about 5.2 MW (~42 MB) of memory.
Note that 1 MW = 1,048,576 words (= 8,388,608 bytes). The values in the table are for 32-bit computer systems; they would need to be doubled for 64-bit systems. They also reflect the use of uncontracted higher angular momentum functions—f and above—which is the default type. Larger amounts of memory may be required for derivatives of contracted high angular momentum functions.
The remainder of this chapter is designed for users who wish to understand more about the tradeoffs inherent in the various choices in order to obtain optimal performance for an individual job, not just good overall performance. Techniques for both very large and small jobs will be covered. Additional, related information may be found in reference .
Memory Requirements for Parallel Calculations
When using multiple processors with shared memory, a good estimate of the memory required is the amount of memory from the preceding table for each processor. Thus, if the value from the table is 10 MW and you want to use four shared memory processors, set %Mem to be at least 40 MW.
For distributed memory calculations (i.e., those performed via Linda), the amount of memory specified in %Mem should be equal to or greater than the value from the preceding table.
In Gaussian 03, these two parallelization methods can be combined. For example, you would use the following directive in order to run a job on 8 CPUs located on four two-headed shared memory multiprocessors (assuming that the memory value from the table is 10 MW):
%Mem=20MW Memory required by each multiprocessor. %NProcLinda=4 Use four Linda workers (one per multiprocessor). %NProcShared=2 Use two shared memory processors on each multiprocessor computer.
Storage, Transformation, and Recomputation of Integrals
One of the most important performance-related choices is the way in which the program processes the numerous electron repulsion integrals. There are five possible approaches to handling two-electron repulsion integrals implemented in Gaussian:
At least two of these approaches are available for all methods in Gaussian. The default method for a given job is chosen to give good performance on small to medium sized molecules. The various options and tradeoffs for each method are described in the following sections.
SCF Energies and Gradients
The performance issues that arise for SCF calculations include how the integrals are to be handled, and which alternative calculation method to select in the event that the default procedure fails to converge.
By default, SCF calculations use the direct algorithm. It might seem that direct SCF would be preferred only when disk space is insufficient. However, this is not the case in practice. Because of the use of cutoffs, the cost of direct SCF scales with molecular size as N2.7 or better, while conventional SCF scales in practice as N3.5 . Consequently, a point is reached fairly quickly where recomputing the integrals (really, only those integrals that are needed) actually consumes less CPU time than relying on external storage. Where this crossover occurs depends on how fast the integral evaluation in direct SCF is, and it varies from machine to machine. However, on modern computer systems, the most efficient strategy is to do an in-core SCF as long as it is feasible, and use the direct algorithm from that point on; the conventional algorithm is virtually never a good choice on such systems.
The change to direct SCF as the default algorithm in Gaussian 98 was made in consideration of these facts. SCF=Conven keyword is only needed on small memory computer systems like obsolete PCs.
In-core SCF is also available. Direct SCF calculations that have enough memory to store the integrals are automatically converted to in-core runs. SCF=InCore can be requested explicitly, in which case the job will be terminated if insufficient memory is available to store the integrals. Generally, about N4/8 + 500,000 words of memory are necessary for closed-shell in-core SCF, and N4/4 + 500,000 words for UHF or ROHF in-core SCF. This corresponds to about 100 MB for a 100 basis function job, 1.6 GB for a 200 basis function job, and 8.1 GB for a 300 basis function job (closed-shell).
GVB and MCSCF calculations can also be done using direct or in-core algorithms . Memory requirements are similar to the open-shell Hartree-Fock case described above. The primary difference is that many Fock operators must be formed in each iteration. For GVB, there are 2Norb operators, where Norb is the number of orbitals in GVB pairs. For MCSCF, there are Nactive(Nactive-1)/2 + 1 operators, where Nactive is the number of orbitals in the active space. Consequently:
Direct SCF Procedure
In order to speed up direct HF calculations, the iterations are done in two phases:
This approach is substantially faster than using full integral accuracy throughout without slowing convergence in all cases tested so far. In the event of difficulties, full accuracy of the integrals throughout can be requested using SCF=NoVarAcc, at the expense of additional CPU time. See the discussion of the SCF keyword for more details.
Single-Point Direct SCF Convergence
In order to improve performance for single-point direct and in-core SCF calculations, a modification of the default SCF approach is used:
This is sufficient accuracy for the usual uses of single-point SCF calculations, including relative energies, population analysis, multipole moments, electrostatic potentials, and electrostatic potential derived charges. Conventional SCF single points and all jobs other than single points use tight convergence of 10-8 on the density. The tighter convergence can be applied to single-point direct SCF by requesting SCF=Tight. See the discussion of the SCF keyword for more details.
Problem Convergence Cases
The default SCF algorithm now uses a combination of two Direct Inversion in the Iterative Subspace (DIIS) extrapolation methods EDIIS and CDIIS. EDIIS  uses energies for extrapolation, and it dominates the early iterations of the SCF convergence process. CDIIS, which performs extrapolation based on the commutators of the Fock and density matrices, handles the latter phases of SCF convergence. This new algorithm is very reliable, and previously troublesome SCF convergence cases now almost always converge with the default algorithm. For the few remaining pathological convergence cases, Gaussian 03 offers Fermi broadening and damping in combination with CDIIS (including automatic level shifting).
These are the available alternatives if the default approach fails to converge (labeled by their corresponding keyword):
These approaches all tend to force convergence to the closest stationary point in the orbital space, which may not be a minimum with respect to orbital rotations. A stability calculation can be used to verify that a proper SCF solution has been obtained (see the Stable keyword). Note also that you should verify that the final wavefunction corresponds to the desired electronic state, especially when using Guess=Alter.
Four alternatives for integral processing are available for Hartree-Fock second derivatives:
By default, during in-core frequencies, the integrals are computed once by each link that needs them. This keeps the disk storage down to the same modest amount as for direct-O(N3). If N4/8 disk is available, and in-core is being used only for speed, then specifying SCF=(InCore,Pass) will cause the integrals to be stored on disk (on the read-write file) after they are computed for the first time, and then read from disk rather than be recomputed by later steps.
HF frequency calculations include prediction of the infrared and Raman vibrational intensities by default. The IR intensities add negligible overhead to the calculation, but the Raman intensities add 10-20%. If the Raman intensities are not of interest, they can be suppressed by specifying Freq=NoRaman.
Freq=Raman produces Raman intensities by numerical differentiation for DFT and MP2 frequency calculations. Using this option does not change the calculation's disk requirements, but it will increase the CPU time for the job. Computing pre-resonance Raman intensities (with CPHF=RdFreq) will approximately double the job's CPU requirements.
While frequency calculations can be done using very modest amounts of memory, performance on very large jobs will be considerably better if enough memory is available to complete the major steps in one pass. Link 1110 must form a "skeleton derivative Fock matrix" for every degree of freedom (i.e., 3 x Number-of-atoms) and if only some of the matrices can be held in memory, it will compute the integral derivatives more than once. Similarly, in every iteration of the CPHF solutions, link 1002 must form updates to all the derivative Fock matrices. Link 1110 requires 3NAN2/2 words of memory, plus a constant amount for the integral derivatives to run optimally. Link 1002 requires 3NAN2 words, plus a constant amount, to run optimally.
The freqmem utility program returns the optimal memory size for different parameters of frequency calculation (i.e., the amount required to perform the major steps in a single pass).
Four algorithms are available for MP2, but most of the decision-making is done automatically by the program. The critical element of this decision making is the value of MaxDisk, which should be set according to your particular system configuration (see chapter 3). It indicates the maximum amount of disk space available in words. If no value is specified for MaxDisk, either in the route section or in the Default.Route file, Gaussian will assume that enough disk is available to perform the calculation with no redundant work, which may not be the case for larger runs. Thus, specifying the amount of available memory and disk is by far the most important way of optimizing performance for MP2 calculations. Doing so allows the program to decide between the various available algorithms, selecting the optimal one for your particular system configuration. This is best accomplished with -M- directive and MaxDisk keyword in the Default.Route file (although MaxDisk and %Mem may be included in the input file).
The algorithms available for MP2 energies are:
In addition, when the direct, semi-direct, and in-core MP2 algorithms are used, the SCF phase can be either conventional, direct, or in-core. The default is direct or in-core SCF.
The choices for MP2 gradients are much the same as for MP2 energies, except:
As for the MP2 energy, the default is to do direct or in-core SCF and then dynamically choose between semi-direct, direct, or in-core E2.
Only semi-direct methods are available for analytic MP2 second derivatives. These reduce the disk storage required below what a conventional algorithm requires.
MP2 frequency jobs also require significant amounts of memory. The default of six million words should be increased for larger jobs. If f functions are used, eight million words should be provided for computer systems using 64-bit integers.
Higher Correlated Methods
The correlation methods beyond MP2 (MP3, MP4, CCSD, CISD, QCISD, etc.) all require that some transformed (MO) integrals be stored on disk and thus (unlike MP2 energies and gradients) have disk space requirements that rise quartically with the size of the molecule. There are, however, several alternatives as to how the transformed integrals are generated, how many are stored, and how the remaining terms are computed:
If a post-SCF calculation can be done using a full integral transformation while keeping disk usage under MaxDisk, this is done; if not, a partial transformation is done and some terms are computed in the AO basis. Thus, it is crucial for a value for MaxDisk to be specified explicitly for these types of jobs, either within the route section or via a system wide setting in the Default.Route file. If MaxDisk is left unset, the program assumes that disk is abundant and performs a full transformation by default. If MaxDisk is not set and sufficient disk space is not available for a full transformation, the job will fail.
The following points summarize the effect of MaxDisk for post-SCF methods:
Excited State Energies and Gradients
In addition to integral storage selection, the judicious use of the restart facilities can improve the economy of CIS and TD calculations.
Excited states using CI with single excitations can be done using five methods (labeled by their corresponding option to the CIS keyword). Note that only the first two options are available for the TD method:
Restarting Jobs and Reuse of Wavefunctions
CIS and TD jobs can be restarted from a Gaussian checkpoint file. This is of limited use for smaller calculations, which may be performed in the MO basis, as new integrals and transformation must be done, but is invaluable for direct CIS. If a direct CIS job is aborted during the CIS phase, then SCF=Restart should be specified in addition to CIS=Restart or TD=Restart, as the final SCF wavefunction is not moved to its permanent location (suitable for Guess=Read) until the entire job step (or optimization step) completes.
CIS Excited State Densities
If only density analysis is desired, and the excited states have already been found, the CIS density can be recovered from the checkpoint file, using Density=(Check,Current) Guess=Only, which recovers whatever generalized density was stored for the current method (presumably CIS) and repeats the population analysis. Note that the one-particle (unrelaxed) density as well as the generalized (relaxed) density can be examined, but that dipole moments and other properties at the CIS level are known to be much less accurate if the one-particle density is used (i.e., if the orbital relaxation terms are neglected) [108,447]. Consequently, the use of the CIS one-particle density is strongly discouraged, except for comparison with the correct density and with other programs that cannot compute the generalized density.
Separate calculations are required to produce the generalized density for several states, since a CPHF calculation must be performed for each state. To do this, first solve for all the states and the density for the first excited state:
# CIS=(Root=1,NStates=N) Density=Current
if N states are of interest. Then do N-1 additional runs, using a route section of the form:
for states M=2 through N.
Pitfalls for Open-Shell Excited States
Since the UHF reference state is not an eigenfunction of S2, neither are the excited states produced by CIS or TD .
Tests of Triplet and Singlet instabilities of RHF and UHF and restricted and unrestricted DFT wavefunctions can be requested using the Stable keyword. The MO, AO, Direct, and InCore options are available, which request the corresponding algorithm. The default is Direct. Direct stability calculations can be restarted as described above for CIS.
The primary challenge in using the CASSCF method is selecting appropriate active space orbitals. There are several possible tactics:
In all cases, a single-point calculation should be performed before any optimization, so that the converged active space can be checked to ensure that the desired electrons have been correlated before proceeding. There are additional considerations in solving for CASSCF wavefunctions for excited states (see the discussion of the CASSCF keyword for details).
CASSCF frequencies require large amounts of memory. Increasing the amount of available memory will always improve performance for CASSCF frequency jobs (the same is not true of frequency calculations performed with other methods). These calculations also require O2N2 disk space.