Working with molecular trajectories#
As biotite.structure provides efficient tools for analyzing multi-model
structures, the package is predestined for handling trajectories from
molecular dynamics (MD) simulations.
If you like, you can even use the biotite.interface.openmm subpackage with the
OpenMM molecular simulation toolkit to integrate MD
simulations.
Reading trajectory files#
Biotite provides a read/write interface for different trajectory file
formats.
All supported trajectory formats have in common, that they store
only coordinates.
These can be extracted as ndarray with the
XTCFile.get_coord() method.
from tempfile import NamedTemporaryFile, gettempdir
import requests
import biotite.structure.io.xtc as xtc
# Download 1L2Y as XTC file for demonstration purposes
temp_xtc_file = NamedTemporaryFile("wb", suffix=".xtc", delete=False)
response = requests.get(
"https://raw.githubusercontent.com/biotite-dev/biotite/master/"
"tests/structure/data/1l2y.xtc"
)
temp_xtc_file.write(response.content)
traj_file = xtc.XTCFile.read(temp_xtc_file.name)
coord = traj_file.get_coord()
print(coord.shape)
(38, 304, 3)
If only an excerpt of frames is desired, the behavior of the
read() function can be customized with the start, stop and
step parameters.
# Read only every second frame
traj_file = xtc.XTCFile.read(temp_xtc_file.name, step=2)
coord = traj_file.get_coord()
print(coord.shape)
(19, 304, 3)
In order to extract an entire structure, i.e. an
AtomArrayStack, from a trajectory file, a template
structure must be given, since the trajectory file contains only
coordinate information.
import biotite.database.rcsb as rcsb
import biotite.structure.io.pdbx as pdbx
pdbx_file = pdbx.BinaryCIFFile.read(rcsb.fetch("1l2y", "bcif", gettempdir()))
template = pdbx.get_structure(pdbx_file, model=1)
traj_file = xtc.XTCFile.read(temp_xtc_file.name)
trajectory = traj_file.get_structure(template)
temp_xtc_file.close()
Comparing frames with each other#
In the analysis of MD simulations, the frames are seldom only observed in isolation, but also compared with each other. For example, the root mean square deviation (RMSD) over the course of the simulation gives a hint about the convergence of the simulation. On the other hand, the root mean square fluctuation (RMSF) describes the structural flexibility of each atom.
import biotite.structure as struc
# The comparison requires that the frames are properly superimposed
# Arbitrarily choose the first frame as reference
trajectory, _ = struc.superimpose(trajectory[0], trajectory)
# Compute RMSD/RMSF only for CA atoms
trajectory = trajectory[:, trajectory.atom_name == "CA"]
rmsd = struc.rmsd(trajectory[0], trajectory)
print("RMSD:")
print(rmsd)
# Compute RMSF for each residue relative to the average over the trajectory
rmsf = struc.rmsf(struc.average(trajectory), trajectory)
print("RMSF:")
print(rmsf)
RMSD:
[0. 0.85 1.03 0.62 0.85 1.12 0.9 0.64 1.03 0.87 0.91 1.39 0.95 0.97
1.02 0.6 0.44 1.21 1.27 0.92 0.62 0.84 0.86 1.15 0.74 1.18 1.05 1.27
0.83 0.93 0.98 0.88 0.71 1.02 1.04 0.62 1.19 0.92]
RMSF:
[1.4 0.4 0.3 0.31 0.32 0.26 0.27 0.46 0.53 0.38 0.32 0.37 0.48 0.55
0.75 0.41 0.37 0.44 0.49 0.99]
Simulation boxes and unit cells#
Usually each frame in a trajectory has an associated simulation box, that
describes the box boundaries in terms of vectors.
Biotite represents these boxes (and unit cells from crystal structures alike),
as (3,3)-shaped ndarray objects.
Each element in the array is one of the three vectors spanning the box or unit
cell.
Let’s create an orthorhombic box from the vector lengths and the angles between
the vectors.
import numpy as np
# The function uses angles in radians
box = struc.vectors_from_unitcell(10, 20, 30, np.pi/2, np.pi/2, np.pi/2)
print("Box:")
print(box)
print("Box volume:", struc.box_volume(box))
print("Is the box orthogonal?", struc.is_orthogonal(box))
len_a, len_b, len_c, alpha, beta, gamma = struc.unitcell_from_vectors(box)
print("Cell lengths:")
print(len_a, len_b, len_c)
print("Cell angles:")
print(np.rad2deg(alpha), np.rad2deg(beta), np.rad2deg(gamma))
Box:
[[10. 0. 0.]
[ 0. 20. 0.]
[ 0. 0. 30.]]
Box volume: 6000.0
Is the box orthogonal? True
Cell lengths:
10.0 20.0 30.0
Cell angles:
90.0 90.0 90.0
An AtomArray may have an associated box,
which is used in functions, that consider periodic boundary conditions.
AtomArrayStack objects require a (m,3,3)-shaped ndarray,
that contains the box vectors for each frame.
The box is accessed via the box attribute, which is None by
default.
array = struc.AtomArray(length=100)
print(array.box)
array.box = struc.vectors_from_unitcell(10, 20, 30, np.pi/2, np.pi/2, np.pi/2)
print(array.box)
None
[[10. 0. 0.]
[ 0. 20. 0.]
[ 0. 0. 30.]]
When loaded from a structure file, the box described in the file is automatically used.
pdbx_file = pdbx.BinaryCIFFile.read(rcsb.fetch("1aki", "bcif", gettempdir()))
stack = pdbx.get_structure(pdbx_file)
print(stack.box)
[[[59.06 0. 0. ]
[ 0. 68.45 0. ]
[ 0. 0. 30.52]]]
Measurements over periodic boundaries#
A common feature (and issue) in MD simulations are
periodic boundary conditions (PBC).
This means the simulation box is virtually repeated in all directions, an
infinite number of times, or in other words, an atom leaving the box on the
right side reenters on the left side.
Many measurement functions in biotite.structure accept a periodic
or box parameter to deal with PBCs.
If the respective parameter are set, the function considers periodic boundary
conditions for the measurement, i.e. the nearest periodic copies of the atoms
are considered.
Note
The measurements in the following snippet are nonsensical, as the associated box of the used structure is a placeholder from the PDB entry. However, the code works also for actual trajectories.
# PBC-aware distance between first and second CA atoms
distances = struc.distance(
trajectory[:, 0], trajectory[:, 1], box=trajectory.box
)
# PBC-aware radial distribution function of each CA atom to all CA atoms
bins, rdf = struc.rdf(
center=trajectory, atoms=trajectory, interval=(0, 10), periodic=True
)
As PBC-aware measurements are computationally more expensive, it can be
reasonable to remove the segmentation of a structure over the periodic
boundary via remove_pbc() to be able to use the faster non-PBC
measurements.
trajectory = struc.remove_pbc(trajectory)
# The same distance calculation but PBC-awareness is not necessary anymore
distances = struc.distance(trajectory[:, 0], trajectory[:, 1])