# Basic analysis of MD simulation¶

In this example, we will analyze a trajectory of a Gromacs MD simulation: The trajectory contains simulation data of lysozyme over the course of 1 ns. The data is the result of the famous GromacsLysozyme in Water’ tutorial.

The trajectory file can be downloaded here and the template PDB can be downloaded here.

We begin by loading the template PDB file as AtomArray, sanitizing it and using it to load the trajectory as AtomArrayStack.

# Code source: Patrick Kunzmann

import biotite
import biotite.structure as struc
import biotite.structure.io as strucio
import biotite.structure.io.xtc as xtc
import numpy as np
import matplotlib.pyplot as plt

# Gromacs does not set the element symbol in its PDB files,
# but Biotite guesses the element names from the atom names,
# emitting a warning
# The structure still has water and ions, that are not needed for our
# calculations, we are only interested in the protein itself
# These are removed for the sake of computational speed using a boolean
# We could have loaded the trajectory also with
# those coordinates that belong to the already selected atoms of the
# template structure.
# Hence, we use the 'XTCFile' class directly to load the trajectory
# This gives us the additional option that allows us to select the
# coordinates belonging to the amino acids.
xtc_file = xtc.XTCFile()
trajectory = xtc_file.get_structure(template)
# Get simulation time for plotting purposes
time = xtc_file.get_time()


Since the MD simulation used periodic boundaries, the protein might be segmented over the box boundary. For further analysis we need to reassemble the protein chain into a whole molecule, without periodic boundaries. in Gromacs we could have used gmx trjconv for this, but this problem can be handled in Biotite, too.

trajectory = struc.remove_pbc(trajectory)


Now our trajectory is ready for some analysis! At first we want to see if the simulation converged. For this purpose we take the RMSD of a frame compared to the initial model as measure. In order to calculate the RMSD we must superimpose all models onto a reference, in this case we also choose the initial structure.

trajectory, transform = struc.superimpose(trajectory[0], trajectory)
rmsd = struc.rmsd(trajectory[0], trajectory)

figure = plt.figure(figsize=(6,3))
ax.plot(time, rmsd, color=biotite.colors["dimorange"])
ax.set_xlim(time[0], time[-1])
ax.set_ylim(0, 2)
ax.set_xlabel("Time (ps)")
ax.set_ylabel("RMSD (Å)")
figure.tight_layout()


As we can see the simulation seems to converge already early in the simulation. After a about 200 ps the RMSD stays in a range of approx. 2 - 3 Å.

In order to futher evaluate the unfolding of our enzyme in the course of simulation, we calculate and plot the radius of gyration (a measure for the protein radius).

radius = struc.gyration_radius(trajectory)

figure = plt.figure(figsize=(6,3))
ax.set_xlim(time[0], time[-1])
ax.set_ylim(14.0, 14.5)
ax.set_xlabel("Time (ps)")
figure.tight_layout()


From this perspective, the protein seems really stable. The radius does merely fluctuate in a range of approximately 0.3 Å during the entire simulation.

Let’s have a look at single amino acids: Which residues fluctuate most? For answering this question we calculate the RMSF (Root mean square fluctuation). It is similar to the RMSD, but instead of averaging over the atoms and looking at each time step, we average over the time and look at each residue. Usually the average model is taken as reference (compared to the starting model for RMSD).

Since side chain atoms fluctuate quite a lot, they are not suitable for evaluation of the residue flexibility. Therefore, we consider only CA atoms.

# In all models, mask the CA atoms
ca_trajectory = trajectory[:, trajectory.atom_name == "CA"]
rmsf = struc.rmsf(struc.average(ca_trajectory), ca_trajectory)

figure = plt.figure(figsize=(6,3))