CHARMM: Trajectory and Analysis Tutorial


Objective and Overview

The objective of this tutorial is to introduce common analyses of structures and simulation trajectories using various CHARMM analysis facilities. All examples shown in this tutorial are based on analysis of a short MD trajectory of the protein G B1 domain solvated in a cubic box of TIP3P water (shown on the right).


Solvated protein G B1 domain (PDB:3GB1)

Tutorial Components:

  1. Extract RMSD vs Initial Structure and Radius of Gyration of the Protein
  2. Extract Histograms of φ/ψ Distributions
  3. Analyze the Secondary Structure Using the Kabsch and Sander DSSP Algorithm
  4. Find Hydrogen Bonds in the Initial Structure, and from the Trajectory
  5. Analyze Water Diffusion and Water Structure Around the Protein
  6. Reorient/Recenter the Trajectory, and also Remove All Solvent
  7. Extract NMR Relxation Rates and Order Parameters for the Backbone Amides
  8. Analyze the Time-dependent Fluorescence Anisotropy Decay of a Trp Residue
  9. Cluster Structures from the Trajectory, Based on Selected φ and ψ dihedrals

These examples represent somewhat routine analyses but can be combined with the CHARMM scripting language to achieve many other more extensive analyses. Along with performing the examples, please examine the structure and trajectory using VMD or other visualization software. Also, one can choose favorite software for making plots from the data files produced following CHARMM analysis, although all data plots shown here are were generated by the Gnuplot, and these scripts provided.

Please also keep in mind that the provided 100 ps trajectory with 100 coordinate frames sampled every 1 ps is far too short for many of the phenomena are analyzed here.

To run any of the CHARMM analysis scripts:

 
charmm < scriptname.inp > scriptname.log

To run the gnuplot scripts:

gnuplot scriptname.gnu


1. Root Mean Square Deviation and Radius of Gyration

  
         rmsd-singlecrd.inp  
  rmsd-rgyr-traj-corman.inp
  rmsd-rgyr-traj-correl.inp

Two simple characteristics of a structure are

  1. The Root Mean Squared Difference (RMSD) with respect to a reference structure (e.g., the starting point of a simulation), tells us something about the dissimilarity between the structures - 1 Å RMSD is barely visible to the eye, while an RMSD 10 Å is leading to a significantly different structure.
  2. The Radius of Gyration (RGYR) is a combined measure of its overall size and shape. A sphere has the smallest RGYR of any structure with the same volume. Increasing the volume, making the shape more anisotropic, or having more of its mass at the periphery, all lead to an increased RGYR. The proper definition of RGYR in mechanics is mass-weighted, which can be obtained by adding keyword MASS to the COOR RGYR command.

Three scripts are provided to demonstrate how to evaluate RMSD and RGYR using CHARMM. The rmsd-singlecrd.inp script is extremely simple and calculates only the Cα (CHARMM atom type CA) RMSD between two structures. The rmsd-rgyr-traj-corman.inp and rmsd-rgyr-traj-correl.inp scripts calculate RMSD and RGYR of a trajectory from a reference structure using two methods (a loop with CORMAN versus no loop with CORREL). In CORREL, the results could have been output to one file for each property, but the ability to edit the dimensions of the extracted time series in CORREL (edit ... veccod) is used to put all the information into one file.


CA RMSD and Rg as Functions of Time (Generated by plot_rmsd.gnu).

Back to Top

2. φ/ψ Distributions

  phi-psi-dist.inp

Peptide backbone dihdrals φ and ψ determine the overall fold of a protein. In this example, histograms of φ and ψ are extracted, and a distribution from the trajectory for any selected residue using correl.doc is obtained. This script has the ability to take commandline arguments. The argument, @rid (residue ID), has a default value of 41 by the virtue of the following command in phi-psi-dist.inp:

if @?rid eq 0   set rid = 41
However, the value of @rid can be overwritten by following CHARMM excution:
charmm rid=15 < phi-psi-dist.inp | tee phipsi.log

After the TRAJ command in the CORREL module has been executed averages and fluctuations are printed out for each time series. Both histograms are output to a single file named phi-psi-dist.dat, and the φ/ψ time series are output to phipsi.dat. Following is a plot of of the results for Gly41.


Gly41 φ/ψ Histograms (left) and Time Series (Generated by plot_phipsi.gnu).

Back to Top

3. Secondary Structure

2nd-structure.inp

COOR SECS (corman.doc) computes the secondary structure characteristics of a protein using the DSSP algorithm (Kabsch and Sander 1983), which is based on backbone hydrogen bond patterns. The CHARMM implementation is a slight simplification and uses the donor(hydrogen atom)-acceptor-atom definition of a hydrogen bond. Results are summarized in the output file shown below:

  

 ...

 CHARMM>    COOR SECS SELE .not. resn tip3 end VERBOSE
 SELRPN>    855 atoms have been selected out of  17088

Secondary structure (Kabsch&Sander) analysis.
Using     56 aa in a context of     56 aa.
     14 aa in alpha-helix ( 25%), and     24 aa in beta-strands ( 42%).

      1: MET-THR-TYR-LYS-LEU-ILE-LEU-ASN-GLY-LYS-THR-LEU-LYS-GLY-GLU
              E   E   E   E   E   E   E                   E   E   E

     16: THR-THR-THR-GLU-ALA-VAL-ASP-ALA-ALA-THR-ALA-GLU-LYS-VAL-PHE
          E   E   E   E               H   H   H   H   H   H   H   H

     31: LYS-GLN-TYR-ALA-ASN-ASP-ASN-GLY-VAL-ASP-GLY-GLU-TRP-THR-TYR
          H   H   H   H   H   H                       E   E   E   E

     46: ASP-ASP-ALA-THR-LYS-THR-PHE-THR-VAL-THR-GLU-
          E                   E   E   E   E   E    

     ...

Results from the COOR SECS command are also returned as a numerical flag in the CHARMM WMAIN array. An example where WMAIN is plotted together with the order parameters from NMR relaxation analysis with the plot_nmr-s2.gnu input file is shown in NMR section below.

Back to Top

4. Hydrogen Bond Analysis

hbond.inp

Hydrogen bonding patterns often provide useful information. In this exercise the COOR HBOND (corman.doc) command is used to find hydrogen bonds in a single structure, as well as average number of hydrogen bonds and their average life times from a trajectory. Hydrogen bonds can be defined in many ways, but a simple geometric criterion will be used: a hydrogen bond exists if the distance between the hydrogen donor (D) and acceptor atoms (A) is less than 2.4 Å. This works well in practice. Note that when there is information about the hydrogen position, the hydrogen bond definition is not very sensitive to angular criteria, as are often used when determining hydrogen bonds in X-ray structures (lacking hydrogens).

Hydrogen bonds within the protein, between protein and water, and also on water-mediated hydrogen bonding contacts between different parts of the protein will be examined. Hydrogen bonded interactions of the form A/D - water - A/D, where A/D denotes a hydrogen bond donor or acceptor in the protein will also be examined. COOR HBOND uses the information about acceptors and donors in the PSF, so quite simple and general atom-selections will be used. All hydrogen bonds between the acceptors/donors in the two selections are calculated. A second form of the command (COOR CONTact) applies only distance criterion to all the selected atoms.

The results are presented as hydrogen bonds per each acceptor/donor in the first atom-selection; if the VERBose keyword is present a break-down in terms of accptors/donors in the second atom-selection is given. The VERBose keyword is not useful in a CHARMM loop when it is desired to extract total number of hydrogen bonds through CHARMM variables (subst.doc). <occupancy> is the average number of hydrogen bonds formed by a given accptor/donor during the trajectory. The <lifetime> (in ps) is the average duration of each instance of a given hydrogen bond.

Back to Top

5. Solvent Dynamics and Structure

solvent.inp

All cells live an aqueous environment and their insides (cytoplasm) are also largely aqueous. Biomacromolecules thus have evolved to function (Membrane proteins are, of course, different in this respect); consquently CHARMM has well-developed facilities to analyze solvation behavior (see also the Hydrogen Bonds exercise). Here, the COOR ANAL (corman.doc) module is used to extract solvent dynamics (translational and rotational diffusion) and structure, in terms of radial distribution functions. The example analyzes these properties for different regions of the water depending on the distance from protein surface (<5 A, 5-10 A, >10 A). As shown in the following plots, water molecules near the protein surface tend to diffuse and rotate slower (with larger effective rotational correlation time and smaller diffusion constant). The Rg of different residues also indicate the degree of solvation.


Solvent Structure and Dynamics: MSD(t) (left), g(r) (center) and P2(t) (Generated by plot_solvent.gnu).

Back to Top

6. Recentering/Orienting and Removing Water from the Trajectory

orient.inp

The raw trajectory often needs to be pre-processed to facilitate some of the later analyses. The trajectrory was obtained from a simulation using periodic boundary conditions (PBC), which means that it is possible for the solute (the protein) to be partly out of the primary simulation box. By recentering the trajectory solvent molecules are moved according to the PBC so that the protein is in the center of the box in each frame. Furthermore, each frame is then re-oriented, such that the protein is superimposed on the coordinates of the initial protein structure, thus removing overall protein rotation/translation motions. For a very flexible system, the removal of overall rotation may be a non-trivial task, but G B1 is compact and quite rigid, so the superposition is based on all atoms in the protein.

The script uses the CHARMM MERGE command (dynamc.doc) and demonstates only the part that we re-orient the protein and remove all water molecules. A new trajectory file is produced. The script also contains an example on how to re-center and orient the protein without deleting water molecules (this line is not excuted as it is after the "stop" command). MERGE can also be used to join or split trajectory files or to change (reduce) sampling frequency.

Back to Top

7. NMR Relaxation Rates and Generalized Order Parameters

nmr.inp

MD simulations and NMR relaxation experiments often cover similar time-scales (ps-ns). Relaxation phenomena observed in NMR experiments depend on motional behavior of nuclear dipoles in the macromolecule, and the NMR module in CHARMM has been designed to allow efficient extraction of NMR related parameters from a trajectory (nmr.doc). This example analyzes relaxation parameters, including relaxation rates, generalized order parameters, and configurational entropy estimates for the backbone amide groups.

Results, with some details such as the computed correlation functions are given in the output file nmr-rt_ct.dat, and are also summarized in nmr-r1r2.dat. Note that the protein trajectory with overall translation/rotation removed is used. It is assumed that the internal and overall motions occur on very different time scales and can confidently be separated, allowing the focus on the internal motions. Such an assumption is necessary, as most simulations are not long enough to sufficiently sample global rotational diffusion. The following figure plots the general order parameter S2 for each HN vector together with the secondary structure (generated by 2nd-structure.inp, see above). Clearly, even for a 100 ps simulation, it is evident that loops are more flexible with smaller S2.


General order parameter and secondary structures (generated by plot_nmr-s2.gnu).

Back to Top

8. Fluorescence Anisotropy

trp--anisotropy.inp

The motion of chromophores may be detected by following the time-dependence of the polarized components of fluorescence emission following a brief pulse of polarized excitation. This time-dependent anisotropy can also be computed from the (rank two) autocorrelation of a unit vector along the transmission dipole, or more strictly, the correlation between absorption and emission dipoles. Trp is the most useful intrinsic chromophore in proteins, but Tyr also has a certain fluorescence which is less intense than that of Trp. Extrinsic probes, including dansyl chloride and many others, are often used. The fast decay of this fluorescence anisotropy means that internal motions can be decoupled from the overall rotational diffusion of the protein (See also the NMR Exercise.). Protein G B1 only has one buried Trp. Within 100 ps simulation, Trp43 does not change its rotamer state, and the correlation fuction does not decay significantly.


Trp43 Ch1/Ch2 Time Series and Dipole Orientational Correlation Function (Generated by plot_trp.gnu).

Back to Top

9. φ/ψ Based Clustering

cluster.inp

As shown above in φ/ψ distribution analysis, CORREL (correl.doc) can be used to extract time series of backbone dihedral angles, and many other properties. Some of the dihiedral angles that vary during the simulation can be used for clustering the structures in the trajectory. As shown from the RMSD analysis, the protein does not move much within 200 ps. Thus, the loop 38-41 is chosen as one that has the largest fluctuation (or smaller order parameters from nmr analysis) to demonstrate clustering.

The number of clusters and cluster centers are summarized in clus-center.dat. Membership and distance from center of each frame are output in clus-member.dat.

Back to Top