The Python script 'clusterstat.py' is used perform a random walk
over
a potential energy surface and generate raw data concerning a
particular cluster expansion defined in the input file.
Within the cluster-expansion a high-dimensional potential V is
expanded into n-particle interaction terms, called the clusters.
(Elsewhere, this expansion is also refered to as 'HDMR' or
'n-mode representation'.) The
raw data can be
further analyzed with the Python script 'clusterana.py'.
The program is executed in three stages:
Remark:
The script relies on many features of the NumPy/SciPy
package which is not
installed by default on many platforms. Please make sure that NumPy
1.1 or later is installed. (Note: for Apple computers the Numpy/SciPy
developers strongly recommentd using the official Python distribution
and not the version shipped with Mac OS X.)
Possible options can be obtained with
clusterstat [-opt] input_file
clusterstat -h
------------------------------------------------------------------------------- Purpose: Statistical analysis for the cluster expansion of a potential surface.
Usage: clusterstat [-mnd -c -D -deb -w -ver -h -? ] inputfile.inp
-mnd Make name directory -c Continuation run -n <int> Perform <int> steps (ignore value in input file) -t <file> Use a trajectory stored in <file> -D <dir> Denotes the directory where files are written to (name in .inp file ignored). -deb Run in debug mode (more detailed logging) -w Allow to overwrite existing data. -ver Print version info -h -? Print this help text. -------------------------------------------------------------------------------
Note: Continuation runs (-c option) can be used to re-start a
crashed or canceled job.
If -c is given, the program attempts to read a previously
created restart-file in the name directory and reads the current state.
Section | Description |
---|---|
RUN | What is to be done. |
COORDINATE | Definition of the physical coordinates and their ranges. |
MODES | Definition of logical coordinates, i.e., the mode-combination scheme (similar to the definition of single-particle functions). |
REFERENCE | Reference points around which the PES is expanded. |
EXPANSION | Specification of the expansion terms. |
Example inputs can be found in $MCTDH_DIR/inputs/clusters/statistics.
|
||
|
|
|
name = S | The 'name' directory to which output files are written. | |
user-source = S | The path (relative or absolute) to the module containing the potential energy routine and, if required, a probe routine and a switching function. See using the 'user-source' keyword. | |
steps = I | The number of steps of the random walker. | |
stepnorm = R | The size, i.e, the Euclidean norm, of a step of the random walker | |
|
||
|
|
|
title = S | The title of the calculation. | |
potential-routine = S | The name of the routine that evaluates the PES for a given coordinate vector (see below). The routine must be found in the module set by the 'user-source' keyword (see above). Default: S = "potential" | |
probe-routine = S | The name of the routine that evaluates a probe function for a given coordinate vector (see below). This routine can be different from the PES routine (see below). The routine must be found in the module set by the 'user-source' keyword (see ). Default: the same routine set by the 'potential-routine' keyword | |
temperature = R(,S) | The 'temperature' used in the Monte-Carlo process to
calculate the trajectory of the random walker. S can be a unit
(see 'outunit' keyword for possible units). Default unit is "au"
. Note: there is no multiplication with the Boltzmann constant.
The number given here really is kBT = β-1
already. This keyword can be omitted if 'trajectory' is set. |
|
outunit = S | By default, all output is written in atomic units ("au"). It can, however, be changed to other units. 'outunit' can be one of "cm-1", "eV", "meV", "au", "kcal/mol" or "kJ/mol". | |
trajectory = S | Equivalent to option -t. The name of or path to a file containing a trajectory. If trajectory is given, no random walk is performed but the coordinates are obtained from this file. Format: ASCII, one set of whitespace separated coordinate values (float) per line. Optional additional column containing a weight. See coordinates file in output documentation. |
|
overwrite | Allow overwriting of existing files in the 'name' directory. Similar to option -w in the command line. | |
mean | While generating the cluster-data calculate the cumulative and order-dependent mean error of the expansion. See mean_error_order file in output documentation. | |
rms | While generating the cluster-data calculate the cumulative and order-dependent root-mean-square of the error of the expansion. See rms_error_order file in output documentation. | |
walk-only | Do not generate any data concerning the cluster-expansion but only perform a random walk following the probe-function. | |
ignore-weights | If trajectory = S or comand line option -t is set, possibly existing weight factors in the trajectory file are ignored and all coordinates are weighted equally. |
The potential-routine and probe-routine keywords
The potential-routine keyword is used to specify the name of the PES routine in the module specified as 'user-source'. The subroutine receives a 1D numpy array (type float) containing a coordinate vector and must return a single float number, the potential energy for the given coordinates in atomic units. The program assumes that the result is in atomic units.
The probe-routine keyword is used to specify the name of a probe routine. The probe routine is only used to determine the trajectory of the random walker during the Monte-Carlo process. I.e, the trajectory of the random walker follows the Boltzmann distribution of the probe function (~exp(-βVprobe)). If the probe-function is not specified, the routine specified by the potential-routine keyword is used.
Simple example of a PES subroutine:
#!/usr/bin/env python
#
# file: my_pes.py
import numpy
def potential(Q):
"My PES routine - just a harmonic oscillator"
return numpy.dot(Q,Q)/2
The coordinates are set up one per line, starting with the coordinate label, followed by the start of the coordinate range and its end.
In addition to the range, also a scaling factor for the random walker can be added. Since the step size of the random walker is always of the same length, one can add a weighting factor for each component after the range (default is unity). If the weighting factor is set to zero, the initial value of coordinate is not altered.
Note: The initial position of the random walker is set to the first reference point.
Example
MODES-SECTION
Q1, Q2 # mode 1
Q3 # mode 2
Q5, Q4 # mode 3
Q6 # mode 4
end-modes-section
Within the REFERENCE-SECTION, these reference points are specified one per line between the begin-reference and end-reference statements. The first entry in a line is the label of the coordinate as specified in the COORDINATES-SECTION. The label is followed by whitespace separated coordinate values for each reference point. Note: The initial position of the random walker is set to the first reference point.
Other keywords
In the case of multiple reference points additional keywords can be
used
to define how the clusters are summed. By default all clusters will be
summed with the same relative weight.
|
|
|
|
weights = R,R1,... | The clusters for each reference point are summed together with weights R,R1, etc. If weights are given, there must be as many weights as reference points. weights cannot be negative and the sum must be unity. If neither weights nor a switching function are given, all weights are set to 1/N where N is the number of reference points. |
switch-function = S | Alternatively to the weights keyword, a switching function can be used to generate coordinate dependent weights. S is the name of a routine in the module specified by the 'user-source' keyword in the RUN-SECTION (see below). |
Using a switching function
Using a user-defined switching function is similar to defining the
potential energy routine.
A routine (or other callable object) of the name set by the
'switch-function' keyword must be found in the module or library defined by
the 'user-source' keyword in the run-section. The switching function
receives a complete coordinate vector containing the currunt position of the
random walker (1D numpy array, type float) and must
return a 1D float array (or any other ordered sequence) of weights.
The array must contain as
many entries as there are reference points. There must not be negative
elements and the sum of all elements must be unity.
Example: two reference points connected by a switching function
REFERENCE-SECTION begin-reference # label ref1 ref2 Q1 0.0 0.0 Q2 0.0 0.0 Q3 0.0 0.0 Q4 -1.0 1.0 Q5 1.0 -1.0 Q6 1.0 1.0 end-reference switch-function = my_switch_routine end-reference-section
Complete modes are specified by integers, while single coordinates are specified by their labels as given within the COORDINATE-SECTION or PRIMITIVE-BASIS-SECTION. The integer defining a mode is determined by its position in the MODES-SECTION, i.e., the first mode defined in the MODES-SECTION is adressed by the integer 1, the second by 2, etc. For instance, the integer 3 will refer to mode three containing the coordinates 'Q4' and 'Q5' in the example above.
A tuple is a list of mode integers and/or coordinate labels enclosed in round brackets. For example the tuple (1,2,Q4) defines a cluster containing the coordinates 'Q1', 'Q2', 'Q3' and 'Q4', where the first two modes are completely taken into account by specification through integers, but only one coordinate ('Q4') of mode '3' is included. The same cluster could of course also be specified by the tuple (Q1,Q2,Q3,Q4).
Ordering of the tuples
Within a tuple, the mode number (whether or not a mode is specified by
its integer or only parts of it are included by a coordinate label)
must always increase from left to right so that the smallest mode number is in
the most left position. Also, a mode cannot be specified multiply, for
example when a mode integer and a coordinate label are used.
Within the example above, the tuples (2,1,3) and (2,3,Q5) are invalid;
the first because of the order of integers, the second because of
double definition of the coordinate label 'Q5' as it is part of mode
3.
The empty tuple
The empty tuple () is used for clusters with no coordinate
dependence, i.e, a singe (reference) point (see REFERENCE-SECTION).
Using wildcards
If a lage number of clusters is to be calculated, specification of all
single clusters can be cumbersome. The character '*' can be used as a
wildcard. Wildcards always represent mode integers and cannot be used for
coordinate labels. Depending on the position within a tuple
a wildcard is expanded to all possible mode integers for this particular position.
Examples using the mode definition above:
The tuple (*,3,*) is expanded into (1,3,4),(2,3,4).
The tuple (*,Q5) is expanded into (1,Q5),(2,Q5).
The tuple (*,*,4) is an acronym for the tuples (1,2,4), (1,3,4) and (2,3,4).
The tuple (*,*,2) is invalid as the first mode integer after the wildcards is too small.
The tuple (*,*) is expanded to tuples (1,2),(1,3),(1,4),(2,3),(2,4),(3,4).
Example 1
Expansion to second order around a single point:
EXPANSION-SECTION
(),(*),(*,*)
end-expansion-section
Example 2:
Full expansion to first order around logical
coordinate 4 plus two additional 2nd-order clusters:
EXPANSION-SECTION
(4)
(*,4)
(*,3,4)
end-expansion-section
1. Initialization
In the very beginning the program opens a log file clusterstat.log which remains open until termination of the program. It recieves all messages sent through the logging system and contains time-stamps, log levels and messages.
Also in the very beginning a file input containing a string representation of the original input is generated. This file is reproduced from allready processed input and is not merely a copy of the original input file.
In Addition, the restart file is generated. This file is continuously updated to allow restarting the calculation after a crash or manual interruption. Note: The restart file is generated using the Python built-in cPickle module and can be easily inspected and altered in an interactive Python shell (see cPickle documentation).
2. Random walk
After the initialization, if required, a random walk using a Metropolis algorithm is performed. During the random walk the file coordinates is generated. It contains the coordinate vector for each accepted Monte-Carlo step in ASCII format, one vector per line with whitespace separated floating point numbers. Also generated at this stage is the file probe. The n-th line of this file contains the energy returned by the probe function of the n-th accepted Monte-Carlo step, the mean of the first n energies and the root-mean-square of the first n energies. These files are not generated if the trajectory is read from file.
3. Calcululation of the Clusters
When the trajectory is generated, the clusters are calculated for
each coordinate vector and reference point. The following files are
created (all ASCII and whitespace separated for multiple columns):