biotite.structure

A subpackage for handling molecular structures.

In this context an atom is described by two kinds of attributes: the coordinates and the annotations. The annotations include information about polypetide chain id, residue id, residue name, hetero atom information, atom name and optionally more. The coordinates are a NumPy float ndarray of length 3, containing the x, y and z coordinates.

An Atom contains data for a single atom, it stores the annotations as scalar values and the coordinates as length 3 ndarray. An AtomArray stores data for an entire structure model containing n atoms. Therefore the annotations are represented as ndarray objects of length n, so called annotation arrays. The coordinates are a (n x 3) ndarray. AtomArrayStack stores data for m models. Each AtomArray in the AtomArrayStack has the same annotation arrays, since each atom must be represented in all models in the stack. Each model may differ in atom coordinates. Therefore the annotation arrays are represented as ndarray`s of length *n*, while the coordinates are a (m x n x 3) `ndarray . All types must not be subclassed.

The following annotation categories are mandatory:

Category Type Examples Description
chain_id string (U3) ‘A’,’S’,’AB’, … Polypeptide chain
res_id int 1,2,3, … Sequence position of residue
res_name string (U3) ‘GLY’,’ALA’, … Residue name
hetero bool True, False True for non AA/NUC residues
atom_name string (U6) ‘CA’,’N’, … Atom name
element string (U2) ‘C’,’O’,’SE’, … Chemical Element

For all Atom, AtomArray and AtomArrayStack objects these annotations must be set, otherwise some functions will not work or errors will occur. Additionally to these annotations, an arbitrary amount of annotation categories can be added (use add_annotation() or add_annotation() for this). The annotation arrays can be accessed either via the function get_annotation() or directly (e.g. array.res_id).

The following annotation categories are optionally used by some functions:

Category Type Examples Description
atom_id int 1,2,3, … Atom serial number
b_factor float 0.9, 12.3, … Temperature factor
occupancy float .1, .3, .9, … Occupancy
charge int -2,-1,0,1,2, … Electric charge of the atom

For each type, the attributes can be accessed directly. Both AtomArray and AtomArrayStack support NumPy style indexing, the index is propagated to each attribute. If a single integer is used as index, an object with one dimension less is returned (AtomArrayStack -> AtomArray, AtomArray -> Atom). Do not expect a deep copy, when slicing an AtomArray or AtomArrayStack. The attributes of the sliced object may still point to the original ndarray.

An optional attribute for AtomArray and AtomArrayStack instances are associated BondList objects, that specify the indices of atoms that form a chemical bonds.

Based on the implementation in NumPy arrays, this package furthermore contains functions for structure analysis, manipulation and visualization.

The universal length unit in this package is Å.

Structure types

Atom A representation of a single atom.
AtomArray An array representation of a model consisting of multiple atoms.
AtomArrayStack A collection of multiple AtomArray instances, where each atom array has equal annotation arrays.
array Create an AtomArray from a list of Atom.
stack Create an AtomArrayStack from a list of AtomArray.

Boxes and unit cells

vectors_from_unitcell Calculate the three vectors spanning a box from the unit cell lengths and angles.
unitcell_from_vectors Get the unit cell lengths and angles from box vectors.
box_volume Get the volume of one ore multiple boxes.
repeat_box Repeat the atoms in a box by duplicating and placing them in adjacent boxes.
repeat_box_coord Similar to repeat_box(), repeat the coordinates in a box by duplicating and placing them in adjacent boxes.
move_inside_box Move all coordinates into the given box, with the box vectors originating at (0,0,0).
remove_pbc Remove segmentation caused by periodic boundary conditions from a given structure.
remove_pbc_from_coord Remove segmentation caused by periodic boundary conditions from given coordinates.
coord_to_fraction Transform coordinates to fractions of box vectors.
fraction_to_coord Transform fractions of box vectors to coordinates.
is_orthogonal Check, whether a box or multiple boxes is/are orthogonal.

Bonds

BondList A bond list stores indices of atoms (usually of an AtomArray or AtomArrayStack) that form chemical bonds together with the type (or order) of the bond.
BondType This enum type represents the type of a chemical bond.

Geometry

displacement Measure the displacement vector, i.e.
index_displacement Measure the displacement, i.e.
distance Measure the euclidian distance between atoms.
index_distance Measure the euclidian distance between pairs of atoms.
angle Measure the angle between 3 atoms.
index_angle Measure the angle between triples of atoms.
dihedral Measure the dihedral angle between 4 atoms.
index_dihedral Measure the dihedral angle between quadruples of atoms.
dihedral_backbone Measure the characteristic backbone dihedral angles of a structure.
centroid Measure the centroid of a structure.
mass_center Calculate the center(s) of mass of an atom array or stack.
gyration_radius Compute the radius/radii of gyration of an atom array or stack.
rdf Compute the radial distribution function g(r) (RDF) for one or multiple given central positions based on a given system of particles.

Manipulation

translate Translate a list of atoms by a given vector.
rotate Rotates a list of atoms by given angles.
rotate_centered Rotates a list of atoms by given angles.
superimpose Superimpose structures onto a fixed structure.
superimpose_apply Superimpose structures using a given transformation tuple.

Filters

filter_amino_acids Filter all atoms of one array that belong to amino acid residues.
filter_backbone Filter all peptide backbone atoms of one array.
filter_solvent Filter all atoms of one array that are part of the solvent.
filter_monoatomic_ions Filter all atoms of an atom array, that are monoatomic ions (e.g.
filter_intersection Filter all atoms of one array that exist also in another array.
filter_inscode_and_altloc Filter all atoms having the desired altloc or inscode.

Checks

check_bond_continuity Check if the peptide backbone atoms (“N”,”CA”,”C”) have a non-reasonable distance to the next atom.
check_id_continuity Check if the residue IDs are incremented by more than 1 or decremented, from one atom to the next one.
check_duplicate_atoms Check if a structure contains duplicate atoms, i.e.

Residue level utility

get_residue_starts Get the indices in an atom array, which indicates the beginning of a residue.
get_residues Get the residue IDs and names of an atom array (stack).
apply_residue_wise Apply a function to intervals of data, where each interval corresponds to one residue.
spread_residue_wise Creates an ndarray with residue-wise spread values from an input ndarray.
get_residue_count Get the amount of residues in an atom array (stack).
residue_iter Iterate over all residues in an atom array (stack).

Structure comparison

average Calculate an average structure.
rmsd Calculate the RMSD between two structures.
rmsf Calculate the RMSF between two structures.

Advanced analysis

sasa Calculate the Solvent Accessible Surface Area (SASA) of a protein.
annotate_sse Calculate the secondary structure elements (SSE) of a peptide chain based on the P-SEA algorithm.
hbond Find hydrogen bonds in a structure using the Baker-Hubbard algorithm.
hbond_frequency Get the relative frequency of each hydrogen bond in a multi-model structure.

Miscellaneous

BadStructureError Indicates that a structure is not suitable for a certain operation.
CellList This class enables the efficient search of atoms in vicinity of a defined location.
coord Get the atom coordinates of the given array.

Subpackages

biotite.structure.io A subpackage for reading and writing structure related data.
biotite.structure.io.pdb This subpackage is used for reading and writing an AtomArray or AtomArrayStack using the popular PDB format.
biotite.structure.io.gro This subpackage is used for reading and writing an AtomArray or AtomArrayStack using the Gro format used by the gromacs software package.
biotite.structure.io.tng This subpackage is used for reading and writing trajectories in the compressed Gromacs TNG format.
biotite.structure.io.trr This subpackage is used for reading and writing trajectories in the uncompressed Gromacs TRR format.
biotite.structure.io.pdbx This subpackage provides support for the the modern PDBx/mmCIF file format.
biotite.structure.io.netcdf This subpackage is used for reading and writing trajectories in the AMBER NetCDF format.
biotite.structure.io.dcd This subpackage is used for reading and writing trajectories in the CDC format used by software like CHARMM, OpenMM and NAMD.
biotite.structure.io.mmtf This subpackage is used for reading and writing an AtomArray or AtomArrayStack using the binary MMTF format.
biotite.structure.io.xtc This subpackage is used for reading and writing trajectories in the compressed Gromacs XTC format.
biotite.structure.io.npz This subpackage is used for reading and writing an AtomArray or AtomArrayStack using the internal NPZ file format.
biotite.structure.info A subpackage for obtaining all kinds of chemical information about atoms and residues, including masses, radii, bonds, etc.