biotite.structure

A subpackage for handling molecular structures.

In this context an atom is described by two kinds of attributes: the coordinates and the annotations. The annotations include information about polypetide chain id, residue id, residue name, hetero atom information, atom name and optionally more. The coordinates are a NumPy float ndarray of length 3, containing the x, y and z coordinates.

An Atom contains data for a single atom, it stores the annotations as scalar values and the coordinates as length 3 ndarray. An AtomArray stores data for an entire structure model containing n atoms. Therefore the annotations are represented as ndarray objects of length n, so called annotation arrays. The coordinates are a (n x 3) ndarray. AtomArrayStack stores data for m models. Each AtomArray in the AtomArrayStack has the same annotation arrays, since each atom must be represented in all models in the stack. Each model may differ in atom coordinates. Therefore the annotation arrays are represented as ndarray`s of length *n*, while the coordinates are a (m x n x 3) `ndarray . All types must not be subclassed.

The following annotation categories are mandatory:

Category

Type

Examples

Description

chain_id

string (U3)

‘A’,’S’,’AB’, …

Polypeptide chain

res_id

int

1,2,3, …

Sequence position of residue

res_name

string (U3)

‘GLY’,’ALA’, …

Residue name

hetero

bool

True, False

True for non AA/NUC residues

atom_name

string (U6)

‘CA’,’N’, …

Atom name

element

string (U2)

‘C’,’O’,’SE’, …

Chemical Element

For all Atom, AtomArray and AtomArrayStack objects these annotations must be set, otherwise some functions will not work or errors will occur. Additionally to these annotations, an arbitrary amount of annotation categories can be added (use add_annotation() or add_annotation() for this). The annotation arrays can be accessed either via the function get_annotation() or directly (e.g. array.res_id).

The following annotation categories are optionally used by some functions:

Category

Type

Examples

Description

atom_id

int

1,2,3, …

Atom serial number

b_factor

float

0.9, 12.3, …

Temperature factor

occupancy

float

.1, .3, .9, …

Occupancy

charge

int

-2,-1,0,1,2, …

Electric charge of the atom

For each type, the attributes can be accessed directly. Both AtomArray and AtomArrayStack support NumPy style indexing, the index is propagated to each attribute. If a single integer is used as index, an object with one dimension less is returned (AtomArrayStack -> AtomArray, AtomArray -> Atom). Do not expect a deep copy, when slicing an AtomArray or AtomArrayStack. The attributes of the sliced object may still point to the original ndarray.

An optional attribute for AtomArray and AtomArrayStack instances are associated BondList objects, that specify the indices of atoms that form a chemical bonds.

Based on the implementation in NumPy arrays, this package furthermore contains functions for structure analysis, manipulation and visualization.

The universal length unit in this package is Å.

Structure types

Atom

A representation of a single atom.

AtomArray

An array representation of a model consisting of multiple atoms.

AtomArrayStack

A collection of multiple AtomArray instances, where each atom array has equal annotation arrays.

array

Create an AtomArray from a list of Atom.

stack

Create an AtomArrayStack from a list of AtomArray.

Boxes and unit cells

vectors_from_unitcell

Calculate the three vectors spanning a box from the unit cell lengths and angles.

unitcell_from_vectors

Get the unit cell lengths and angles from box vectors.

box_volume

Get the volume of one ore multiple boxes.

repeat_box

Repeat the atoms in a box by duplicating and placing them in adjacent boxes.

repeat_box_coord

Similar to repeat_box(), repeat the coordinates in a box by duplicating and placing them in adjacent boxes.

move_inside_box

Move all coordinates into the given box, with the box vectors originating at (0,0,0).

remove_pbc

Remove segmentation caused by periodic boundary conditions from a given structure.

remove_pbc_from_coord

Remove segmentation caused by periodic boundary conditions from given coordinates.

coord_to_fraction

Transform coordinates to fractions of box vectors.

fraction_to_coord

Transform fractions of box vectors to coordinates.

is_orthogonal

Check, whether a box or multiple boxes is/are orthogonal.

Bonds

BondList

A bond list stores indices of atoms (usually of an AtomArray or AtomArrayStack) that form chemical bonds together with the type (or order) of the bond.

BondType

This enum type represents the type of a chemical bond.

Geometry

displacement

Measure the displacement vector, i.e.

index_displacement

Measure the displacement, i.e.

distance

Measure the euclidian distance between atoms.

index_distance

Measure the euclidian distance between pairs of atoms.

angle

Measure the angle between 3 atoms.

index_angle

Measure the angle between triples of atoms.

dihedral

Measure the dihedral angle between 4 atoms.

index_dihedral

Measure the dihedral angle between quadruples of atoms.

dihedral_backbone

Measure the characteristic backbone dihedral angles of a structure.

centroid

Measure the centroid of a structure.

mass_center

Calculate the center(s) of mass of an atom array or stack.

gyration_radius

Compute the radius/radii of gyration of an atom array or stack.

rdf

Compute the radial distribution function g(r) (RDF) for one or multiple given central positions based on a given system of particles.

Manipulation

translate

Translate a list of atoms by a given vector.

rotate

Rotates a list of atoms by given angles.

rotate_centered

Rotates a list of atoms by given angles.

superimpose

Superimpose structures onto a fixed structure.

superimpose_apply

Superimpose structures using a given transformation tuple.

Filters

filter_amino_acids

Filter all atoms of one array that belong to amino acid residues.

filter_backbone

Filter all peptide backbone atoms of one array.

filter_solvent

Filter all atoms of one array that are part of the solvent.

filter_monoatomic_ions

Filter all atoms of an atom array, that are monoatomic ions (e.g.

filter_intersection

Filter all atoms of one array that exist also in another array.

filter_inscode_and_altloc

Filter all atoms having the desired altloc or inscode.

Checks

check_bond_continuity

Check if the peptide backbone atoms (“N”,”CA”,”C”) have a non-reasonable distance to the next atom.

check_id_continuity

Check if the residue IDs are incremented by more than 1 or decremented, from one atom to the next one.

check_duplicate_atoms

Check if a structure contains duplicate atoms, i.e.

Residue level utility

get_residue_starts

Get the indices in an atom array, which indicates the beginning of a residue.

get_residues

Get the residue IDs and names of an atom array (stack).

apply_residue_wise

Apply a function to intervals of data, where each interval corresponds to one residue.

spread_residue_wise

Creates an ndarray with residue-wise spread values from an input ndarray.

get_residue_count

Get the amount of residues in an atom array (stack).

residue_iter

Iterate over all residues in an atom array (stack).

Chain level utility

get_chain_starts

Get the indices in an atom array, which indicates the beginning of a new chain.

get_chains

Get the chain IDs of an atom array (stack).

get_chain_count

Get the amount of chains in an atom array (stack).

chain_iter

Iterate over all chains in an atom array (stack).

Structure comparison

average

Calculate an average structure.

rmsd

Calculate the RMSD between two structures.

rmsf

Calculate the RMSF between two structures.

Advanced analysis

sasa

Calculate the Solvent Accessible Surface Area (SASA) of a protein.

annotate_sse

Calculate the secondary structure elements (SSE) of a peptide chain based on the P-SEA algorithm.

hbond

Find hydrogen bonds in a structure using the Baker-Hubbard algorithm.

hbond_frequency

Get the relative frequency of each hydrogen bond in a multi-model structure.

Miscellaneous

BadStructureError

Indicates that a structure is not suitable for a certain operation.

CellList

This class enables the efficient search of atoms in vicinity of a defined location.

coord

Get the atom coordinates of the given array.

Subpackages

biotite.structure.io

A subpackage for reading and writing structure related data.

biotite.structure.io.pdb

This subpackage is used for reading and writing an AtomArray or AtomArrayStack using the popular PDB format.

biotite.structure.io.gro

This subpackage is used for reading and writing an AtomArray or AtomArrayStack using the Gro format used by the gromacs software package.

biotite.structure.io.tng

This subpackage is used for reading and writing trajectories in the compressed Gromacs TNG format.

biotite.structure.io.trr

This subpackage is used for reading and writing trajectories in the uncompressed Gromacs TRR format.

biotite.structure.io.pdbx

This subpackage provides support for the the modern PDBx/mmCIF file format.

biotite.structure.io.netcdf

This subpackage is used for reading and writing trajectories in the AMBER NetCDF format.

biotite.structure.io.dcd

This subpackage is used for reading and writing trajectories in the CDC format used by software like CHARMM, OpenMM and NAMD.

biotite.structure.io.mmtf

This subpackage is used for reading and writing an AtomArray or AtomArrayStack using the binary MMTF format.

biotite.structure.io.xtc

This subpackage is used for reading and writing trajectories in the compressed Gromacs XTC format.

biotite.structure.io.npz

This subpackage is used for reading and writing an AtomArray or AtomArrayStack using the internal NPZ file format.

biotite.structure.info

A subpackage for obtaining all kinds of chemical information about atoms and residues, including masses, radii, bonds, etc.