CARD FILES: FROM http://www.scripps.edu/brooks/c29docs/io.html#%20Coordinate
Reading coordinates
The reading of coordinates is done
with the READ COOR command,
and there are several options (which may change over in future versions).
Coordinates may be read into the main set or the comparison coordinate set
using the COMP keyword.
There are three possible file formats
that can be used to read in
coordinates. They are coordinate binary files, dynamics coordinate
trajectories, and coordinate card images. Protein Data Bank (PDB) formatted
files can also be read. They do however require some editing first. All
the HEADER and other junk before the actual coordinate section has to
be removed and optionally replaced by a standard CHARMM title. There should
be no line with NATOM (= number of atoms) preceding the actual coordinates.
CHARMM does no translation whatsoever of residue or atom names, so you
would either have to rename some entries in the PSF or in the coordinate
file in case there are differences.
For all formats, a subset of the
atoms in the PSF may be selected
using the standard atom selection syntax. For binary files, This is a
risky maneuver, and warning messages are given when this is attempted.
Only coordinates of selected atoms may be modified. When reading binary
files, or using the IGNOre keyword, coordinate values are mapped into
the selected atoms sequentially (NO checking is done!).
The reading of the first two file
formats is specified with the
FILE option. The program reads the file header to tell which format it
is dealing with. The coordinate binary files have a file header of
'COOR' and contain only one set of coordinates. These are created with a
WRIT COOR FILE command. The dynamics coordinate trajectories have a file
header of 'CORD' and have multiple coordinate sets. These files are
created by the dynamics function of the program. To specify which
coordinate set in the trajectory to be read, the IFILE option is
provided. One specifies the coordinates position within the file. The
default value for this option will cause the first coordinate set to be
read. If the IFILE value is negative, then the next file (other than
the first one) will be read. This will only work if a set has already been
read from the file with a positive IFILE value.
For binary files, the APPEnd command
will 'deselect' all atoms
up to the highest one with a known position. This is done in addition
to the normal atom selection. This is useful for structures with several
distinct segments where it is desireable to keep separate coordinate
modules.
The CARD file format is the standard
means in CHARMM for
providing a human readable and writable coordinate file. The format is
as follows:
title
NATOM (I5)
ATOMNO RESNO RES
TYPE X Y Z
SEGID RESID Weighting
I5
I5 1X A4 1X A4 F10.5 F10.5 F10.5 1X A4 1X A4 F10.5
The title is a title for the
coordinates, see *note syn:
(usage.doc)Syntactic
Glossary, for details. Next comes the
number of coordinates. If this number is zero or too large, the entire
file will be read. Finally, there is one line for each coordinate.
ATOMNO gives the number of the atom
in the file. It is ignored
on reading. RESNO gives the residue number of the atom. It must be
specified relative to the first residue in the PSF. The OFFSet option
should be specified if one wishes to read coordinates into other positions.
The APPEnd option adds an additional offset which points to the
the residue just beyond the highest one with known positions. This option
also 'deselects' all atoms below this residue (inclusive).
For example, if one is reading in coordinates for the second segment of a
two chain protein using two card files, and the APPEnd option is used,
RESNO must start at 1 in both files for the file reading to work
correctly.
It should also be remembered that for
card images, residues are
identified by RESIDUE NUMBER. This number can be modified by using the
OFFSet feature, which allows coordinates to be read from a different PSF.
Both positive and negative values are allowed. The RESId option will
cause the residue number field to be ignored and map atoms from SEGID
and RESID labels instead.
RES gives the residue type of the
atom. RES is checked against
the residue type in the PSF for consistency. TYPE gives the IUPAC name
of the atom. The coordinates of an atom within a residue need not be
specified in any particular order. A search is made within each residue
in the PSF for an atom whose IUPAC name is given in the coordinate file.
The RESId option overrides the
residue number and fills coordinates
based on the SEGID and RESID identifiers in the coordinate file.
This is the recommended method where different PSF's are used.
The IGNORE option allows one to read
in a card coordinate file
while bypassing the normal tests of the residue name, number, and atom
name. When IGNORE is specified in place of card, the identifying
information is ignored completely. Starting from the first selected
atom, the coordinates are copied sequentially from the file.
The PDB option works very much like
the CARD option, but expects the
actual file format to be according to Protein Data Bank standards:
text IATOM TYPE RES IRES X
Y Z W
A6 I5 2X A4 A4 I5
4X 3F8.3 6X F6.2
Normally, the coordinates are not
reinitialized before new values
are read, but if this is desired, the INITialize keyword, will cause the
coordinate values for all selected atoms to be initialized. Note that only
atoms that have been selected, will be initialized (9999.0). The COOR INIT
command provides a more general way to initialize coordinates.
The READ COOR DYNR variant reads a full coordinate set from a dynamics
RESTart file. It REQUIRES a matching PSF and allows no selections or
other manipulations. A restart file (usually) contains three sets of
atom data, and you chose which one to read in with keywords:
CURR the current coordinates
DELT the displacement to be taken from the current
coordinates
VEL the current velocities (in AKMA units)
NOTE: The restart file written after a crash may be sligthly different,
at present (c28a2) it contains the previous coordinates instead of velocities.
Protein Structure Files (PSF) are used by EGO as a summary of the atom type, mass, partial charge and connectivity of the molecular system. PSF files are generated from the original PDB file in combination with X-PLOR topology file using X-PLOR.
The topology data files used by X-PLOR specify the atom parameters and connectivity for all amino acids and nucleotides. X-PLOR extracts all the information necessary (along with patches and modifications from the default configuration) for a given molecule in the PSF file in the form of:
An example PSF file follows (pti.psf):
PSF 4 !NTITLE REMARKS FILENAME="pti.psf" REMARKS BPTI COORDINATES TAKEN FROM CRISTALLOGRAPHIC DATA W/O WATERS REMARKS HYDROGEN POSITIONS GENERATED USING HBUILD (2 ITERATIONS) REMARKS RMS FLUCTUATIONS (T. ICHEYE) FOR HEAVY PROTEIN ATOMS INCLUDED REMARKS DATE:24-Apr-89 02:34:58 created by user: heller 568 !NATOM 1 MAIN 1 ARG HT1 HC 0.260000 1.00800 0 2 MAIN 1 ARG HT2 HC 0.260000 1.00800 0 3 MAIN 1 ARG N NH3 0.000000E+00 14.0067 0 4 MAIN 1 ARG HT3 HC 0.260000 1.00800 0 ... 582 !NBOND: bonds 3 5 5 18 18 19 5 6 6 7 7 8 8 9 9 10 ... 834 !NTHETA: angles 3 5 18 3 5 6 5 18 19 18 5 6 5 6 7 6 7 8 ... 351 !NPHI: dihedrals 3 5 6 7 5 6 7 8 6 7 8 9 7 8 9 11 ... 259 !NIMPHI: impropers 5 3 18 6 9 8 11 10 11 12 15 9 22 20 25 23 ... 114 !NDON: donors 9 10 12 13 12 14 15 16 15 17 3 1 3 2 3 4 ... 79 !NACC: acceptors 19 18 26 25 32 31 33 31 35 34 47 46 54 53 63 62 ... 24 !NNB 45 44 43 97 96 95 210 209 208 224 223 222 236 235 234 328 ... 222 0 !NGRP 0 0 0 5 0 0 7 0 0 ...
From: http://www.sinica.edu.tw/~scimath/msi/insight2K/charmm_principles/Ch02_model_build.FM5.html#444493