Working with molecular trajectories#

As biotite.structure provides efficient tools for analyzing multi-model structures, the package is predestined for handling trajectories from molecular dynamics (MD) simulations. If you like, you can even use the biotite.interface.openmm subpackage with the OpenMM molecular simulation toolkit to integrate MD simulations.

Reading trajectory files#

Biotite provides a read/write interface for different trajectory file formats. All supported trajectory formats have in common, that they store only coordinates. These can be extracted as ndarray with the XTCFile.get_coord() method.

from tempfile import NamedTemporaryFile, gettempdir
import requests
import biotite.structure.io.xtc as xtc

# Download 1L2Y as XTC file for demonstration purposes
temp_xtc_file = NamedTemporaryFile("wb", suffix=".xtc", delete=False)
response = requests.get(
    "https://raw.githubusercontent.com/biotite-dev/biotite/master/"
    "tests/structure/data/1l2y.xtc"
)
temp_xtc_file.write(response.content)

traj_file = xtc.XTCFile.read(temp_xtc_file.name)
coord = traj_file.get_coord()
print(coord.shape)
(38, 304, 3)

If only an excerpt of frames is desired, the behavior of the read() function can be customized with the start, stop and step parameters.

# Read only every second frame
traj_file = xtc.XTCFile.read(temp_xtc_file.name, step=2)
coord = traj_file.get_coord()
print(coord.shape)
(19, 304, 3)

In order to extract an entire structure, i.e. an AtomArrayStack, from a trajectory file, a template structure must be given, since the trajectory file contains only coordinate information.

import biotite.database.rcsb as rcsb
import biotite.structure.io.pdbx as pdbx

pdbx_file = pdbx.BinaryCIFFile.read(rcsb.fetch("1l2y", "bcif", gettempdir()))
template = pdbx.get_structure(pdbx_file, model=1)

traj_file = xtc.XTCFile.read(temp_xtc_file.name)
trajectory = traj_file.get_structure(template)
temp_xtc_file.close()

Comparing frames with each other#

In the analysis of MD simulations, the frames are seldom only observed in isolation, but also compared with each other. For example, the root mean square deviation (RMSD) over the course of the simulation gives a hint about the convergence of the simulation. On the other hand, the root mean square fluctuation (RMSF) describes the structural flexibility of each atom.

import biotite.structure as struc

# The comparison requires that the frames are properly superimposed
# Arbitrarily choose the first frame as reference
trajectory, _ = struc.superimpose(trajectory[0], trajectory)
# Compute RMSD/RMSF only for CA atoms
trajectory = trajectory[:, trajectory.atom_name == "CA"]
rmsd = struc.rmsd(trajectory[0], trajectory)
print("RMSD:")
print(rmsd)
# Compute RMSF for each residue relative to the average over the trajectory
rmsf = struc.rmsf(struc.average(trajectory), trajectory)
print("RMSF:")
print(rmsf)
RMSD:
[0.   0.85 1.03 0.62 0.85 1.12 0.9  0.64 1.03 0.87 0.91 1.39 0.95 0.97
 1.02 0.6  0.44 1.21 1.27 0.92 0.62 0.84 0.86 1.15 0.74 1.18 1.05 1.27
 0.83 0.93 0.98 0.88 0.71 1.02 1.04 0.62 1.19 0.92]
RMSF:
[1.4  0.4  0.3  0.31 0.32 0.26 0.27 0.46 0.53 0.38 0.32 0.37 0.48 0.55
 0.75 0.41 0.37 0.44 0.49 0.99]

Simulation boxes and unit cells#

Usually each frame in a trajectory has an associated simulation box, that describes the box boundaries in terms of vectors. Biotite represents these boxes (and unit cells from crystal structures alike), as (3,3)-shaped ndarray objects. Each element in the array is one of the three vectors spanning the box or unit cell. Let’s create an orthorhombic box from the vector lengths and the angles between the vectors.

import numpy as np

# The function uses angles in radians
box = struc.vectors_from_unitcell(10, 20, 30, np.pi/2, np.pi/2, np.pi/2)
print("Box:")
print(box)
print("Box volume:", struc.box_volume(box))
print("Is the box orthogonal?", struc.is_orthogonal(box))
len_a, len_b, len_c, alpha, beta, gamma = struc.unitcell_from_vectors(box)
print("Cell lengths:")
print(len_a, len_b, len_c)
print("Cell angles:")
print(np.rad2deg(alpha), np.rad2deg(beta), np.rad2deg(gamma))
Box:
[[10.  0.  0.]
 [ 0. 20.  0.]
 [ 0.  0. 30.]]
Box volume: 6000.0
Is the box orthogonal? True
Cell lengths:
10.0 20.0 30.0
Cell angles:
90.0 90.0 90.0

An AtomArray may have an associated box, which is used in functions, that consider periodic boundary conditions. AtomArrayStack objects require a (m,3,3)-shaped ndarray, that contains the box vectors for each frame. The box is accessed via the box attribute, which is None by default.

array = struc.AtomArray(length=100)
print(array.box)
array.box = struc.vectors_from_unitcell(10, 20, 30, np.pi/2, np.pi/2, np.pi/2)
print(array.box)
None
[[10.  0.  0.]
 [ 0. 20.  0.]
 [ 0.  0. 30.]]

When loaded from a structure file, the box described in the file is automatically used.

pdbx_file = pdbx.BinaryCIFFile.read(rcsb.fetch("1aki", "bcif", gettempdir()))
stack = pdbx.get_structure(pdbx_file)
print(stack.box)
[[[59.06  0.    0.  ]
  [ 0.   68.45  0.  ]
  [ 0.    0.   30.52]]]

Measurements over periodic boundaries#

A common feature (and issue) in MD simulations are periodic boundary conditions (PBC). This means the simulation box is virtually repeated in all directions, an infinite number of times, or in other words, an atom leaving the box on the right side reenters on the left side. Many measurement functions in biotite.structure accept a periodic or box parameter to deal with PBCs. If the respective parameter are set, the function considers periodic boundary conditions for the measurement, i.e. the nearest periodic copies of the atoms are considered.

Note

The measurements in the following snippet are nonsensical, as the associated box of the used structure is a placeholder from the PDB entry. However, the code works also for actual trajectories.

# PBC-aware distance between first and second CA atoms
distances = struc.distance(
    trajectory[:, 0], trajectory[:, 1], box=trajectory.box
)
# PBC-aware radial distribution function of each CA atom to all CA atoms
bins, rdf = struc.rdf(
    center=trajectory, atoms=trajectory, interval=(0, 10), periodic=True
)

As PBC-aware measurements are computationally more expensive, it can be reasonable to remove the segmentation of a structure over the periodic boundary via remove_pbc() to be able to use the faster non-PBC measurements.

trajectory = struc.remove_pbc(trajectory)
# The same distance calculation but PBC-awareness is not necessary anymore
distances = struc.distance(trajectory[:, 0], trajectory[:, 1])