biotite.structure

A subpackage for handling molecular structures.

In this context an atom is described by two kinds of attributes: the coordinates and the annotations. The annotations include information about polypetide chain id, residue id, residue name, hetero atom information, atom name and optionally more. The coordinates are a NumPy float ndarray of length 3, containing the x, y and z coordinates.

An Atom contains data for a single atom, it stores the annotations as scalar values and the coordinates as length 3 ndarray.

An AtomArray stores data for an entire structure model containing n atoms. Therefore the annotations are represented as ndarray objects of length n, the so called annotation arrays. The coordinates are a (n x 3) ndarray.

An AtomArrayStack stores data for m models, where each model contains the same atoms at different positions. Hence, the annotation arrays are represented as ndarray objects of length n like the AtomArray, while the coordinates are a (m x n x 3) ndarray.

Like an AtomArray can be iterated to obtain Atom objects, an AtomArrayStack yields AtomArray objects. All three types must not be subclassed.

The following annotation categories are mandatory:

Category

Type

Examples

Description

chain_id

string (U4)

‘A’,’S’,’AB’, …

Polypeptide chain

res_id

int

1,2,3, …

Sequence position of residue

ins_code

string (U1)

‘’, ‘A’,’B’,..

PDB insertion code (iCode)

res_name

string (U5)

‘GLY’,’ALA’, …

Residue name

hetero

bool

True, False

False for ATOM, true for HETATM

atom_name

string (U6)

‘CA’,’N’, …

Atom name

element

string (U2)

‘C’,’O’,’SE’, …

Chemical Element

For all Atom, AtomArray and AtomArrayStack objects these annotations are initially set with default values. Additionally to these annotations, an arbitrary amount of annotation categories can be added via add_annotation() or set_annotation(). The annotation arrays can be accessed either via the method get_annotation() or directly (e.g. array.res_id).

The following annotation categories are optionally used by some functions:

Category

Type

Examples

Description

atom_id

int

1,2,3, …

Atom serial number

b_factor

float

0.9, 12.3, …

Temperature factor

occupancy

float

.1, .3, .9, …

Occupancy

charge

int

-2,-1,0,1,2, …

Electric charge of the atom

For each type, the attributes can be accessed directly. Both AtomArray and AtomArrayStack support NumPy style indexing. The index is propagated to each attribute. If a single integer is used as index, an object with one dimension less is returned (AtomArrayStack -> AtomArray, AtomArray -> Atom). If a slice, index array or a boolean mask is given, a substructure is returned (AtomArrayStack -> AtomArrayStack, AtomArray -> AtomArray) As in NumPy, these are not necessarily deep copies of the originals: The attributes of the sliced object may still point to the original ndarray. Use the copy() method if a deep copy is required.

Bond information can be associated to an AtomArray or AtomArrayStack by setting the bonds attribute with a BondList. A BondList specifies the indices of atoms that form chemical bonds. Some functionalities require that the input structure has an associated BondList. If no BondList is associated, the bonds attribute is None.

Based on the implementation in NumPy arrays, this package furthermore contains a comprehensive set of functions for structure analysis, manipulation and visualization.

The universal length unit in this package is Å.

Structure types

Atom

A representation of a single atom.

AtomArray

An array representation of a model consisting of multiple atoms.

AtomArrayStack

A collection of multiple AtomArray instances, where each atom array has equal annotation arrays.

array

Create an AtomArray from a list of Atom.

stack

Create an AtomArrayStack from a list of AtomArray.

repeat

Repeat atoms (AtomArray or AtomArrayStack) multiple times in the same model with different coordinates.

from_template

Create an AtomArrayStack using template atoms and given coordinates.

Boxes and unit cells

vectors_from_unitcell

Calculate the three vectors spanning a box from the unit cell lengths and angles.

unitcell_from_vectors

Get the unit cell lengths and angles from box vectors.

box_volume

Get the volume of one ore multiple boxes.

repeat_box

Repeat the atoms in a box by duplicating and placing them in adjacent boxes.

repeat_box_coord

Similar to repeat_box(), repeat the coordinates in a box by duplicating and placing them in adjacent boxes.

move_inside_box

Move all coordinates into the given box, with the box vectors originating at (0,0,0).

remove_pbc

Remove segmentation caused by periodic boundary conditions from each molecule in the given structure.

remove_pbc_from_coord

Remove segmentation caused by periodic boundary conditions from given coordinates.

coord_to_fraction

Transform coordinates to fractions of box vectors.

fraction_to_coord

Transform fractions of box vectors to coordinates.

is_orthogonal

Check, whether a box or multiple boxes is/are orthogonal.

Bonds

BondList

A bond list stores indices of atoms (usually of an AtomArray or AtomArrayStack) that form chemical bonds together with the type (or order) of the bond.

BondType

This enum type represents the type of a chemical bond.

connect_via_residue_names

Create a BondList for a given atom array (stack), based on the deposited bonds for each residue in the RCSB components.cif dataset.

connect_via_distances

Create a BondList for a given atom array, based on pairwise atom distances.

find_connected

Get indices to all atoms that are directly or inderectly connected to the root atom indicated by the given index.

find_rotatable_bonds

Find all rotatable bonds in a given BondList.

Geometry

displacement

Measure the displacement vector, i.e. the vector difference, from one array of atom coordinates to another array of coordinates.

index_displacement

Measure the displacement, i.e. the vector difference, between pairs of atoms.

distance

Measure the euclidian distance between atoms.

index_distance

Measure the euclidian distance between pairs of atoms.

angle

Measure the angle between 3 atoms.

index_angle

Measure the angle between triples of atoms.

dihedral

Measure the dihedral angle between 4 atoms.

index_dihedral

Measure the dihedral angle between quadruples of atoms.

centroid

Measure the centroid of a structure.

mass_center

Calculate the center(s) of mass of an atom array or stack.

gyration_radius

Compute the radius/radii of gyration of an atom array or stack.

rdf

Compute the radial distribution function g(r) (RDF) for one or multiple given central positions based on a given system of particles.

Transformations

translate

Translate the given atoms or coordinates by a given vector.

rotate

Rotate the given atoms or coordinates about the x, y and z axes by given angles.

rotate_centered

Rotate the given atoms or coordinates about the x, y and z axes by given angles.

rotate_about_axis

Rotate the given atoms or coordinates about a given axis by a given angle.

align_vectors

Apply a transformation to atoms or coordinates, that would transfer a origin vector to a target vector.

orient_principal_components

Translate and rotate the atoms to be centered at the origin with the principal axes aligned to the Cartesian axes, as specified by the order parameter.

superimpose

Superimpose structures onto a fixed structure.

superimpose_apply

Superimpose structures using a given AffineTransformation.

Filters

filter_canonical_nucleotides

Filter all atoms of one array that belong to canonical nucleotides.

filter_nucleotides

Filter all atoms of one array that belong to nucleotides.

filter_canonical_amino_acids

Filter all atoms of one array that belong to canonical amino acid residues.

filter_amino_acids

Filter all atoms of one array that belong to amino acid residues.

filter_carbohydrates

Filter all atoms of one array that belong to carbohydrates.

filter_backbone

Filter all peptide backbone atoms of one array.

filter_peptide_backbone

Filter all peptide backbone atoms of one array.

filter_phosphate_backbone

Filter all phosphate backbone atoms of one array.

filter_linear_bond_continuity

Filter for atoms such that their bond length with the next atom lies within the provided boundaries.

filter_polymer

Filter for atoms that are a part of a consecutive standard macromolecular polymer entity.

filter_solvent

Filter all atoms of one array that are part of the solvent.

filter_monoatomic_ions

Filter all atoms of an atom array, that are monoatomic ions (e.g.

filter_intersection

Filter all atoms of one array that exist also in another array.

filter_first_altloc

Filter all atoms, that have the first altloc ID appearing in a residue.

filter_highest_occupancy_altloc

For each residue, filter all atoms, that have the altloc ID with the highest occupancy for this residue.

Checks

check_id_continuity

Check if the residue IDs are incremented by more than 1 or decremented, from one atom to the next one.

check_atom_id_continuity

Check if the atom IDs are incremented by more than 1 or decremented, from one atom to the next one.

check_res_id_continuity

Check if the residue IDs are incremented by more than 1 or decremented, from one atom to the next one.

check_backbone_continuity

Check if the (peptide or phosphate) backbone atoms have non-reasonable distance to the next atom.

check_duplicate_atoms

Check if a structure contains duplicate atoms, i.e. two atoms in a structure have the same annotations (coordinates may be different).

check_bond_continuity

Check if the peptide or phosphate backbone atoms have a non-reasonable distance to the next residue.

check_linear_continuity

Check linear (consecutive) bond continuity of atoms in atom array.

Residue level utility

get_residue_starts

Get indices for an atom array, each indicating the beginning of a residue.

get_residues

Get the residue IDs and names of an atom array (stack).

apply_residue_wise

Apply a function to intervals of data, where each interval corresponds to one residue.

spread_residue_wise

Expand residue-wise data to atom-wise data.

get_residue_masks

Get boolean masks indicating the residues to which the given atom indices belong.

get_residue_starts_for

For each given atom index, get the index that points to the start of the residue that atom belongs to.

get_residue_positions

For each given atom index, obtain the position of the residue corresponding to this index in the input array.

get_residue_count

Get the amount of residues in an atom array (stack).

residue_iter

Iterate over all residues in an atom array (stack).

Chain level utility

get_chain_starts

Get the indices in an atom array, which indicates the beginning of a new chain.

apply_chain_wise

Apply a function to intervals of data, where each interval corresponds to one chain.

spread_chain_wise

Expand chain-wise data to atom-wise data.

get_chain_masks

Get boolean masks indicating the chains to which the given atom indices belong.

get_chain_starts_for

For each given atom index, get the index that points to the start of the chain that atom belongs to.

get_chain_positions

For each given atom index, obtain the position of the chain corresponding to this index in the input array.

chain_iter

Iterate over all chains in an atom array (stack).

get_chains

Get the chain IDs of an atom array (stack).

get_chain_count

Get the amount of chains in an atom array (stack).

chain_iter

Iterate over all chains in an atom array (stack).

Molecule level utility

get_molecule_indices

Get an index array for each molecule in the given structure.

get_molecule_masks

Get a boolean mask for each molecule in the given structure.

molecule_iter

Iterate over each molecule in a input structure.

Structure comparison

average

Calculate an average structure.

rmsd

Calculate the RMSD between two structures.

rmspd

Calculate the RMSD of atom pair distances for given structures relative to those found in a reference structure.

rmsf

Calculate the RMSF between two structures.

General analysis

sasa

Calculate the Solvent Accessible Surface Area (SASA) of a protein.

hbond

Find hydrogen bonds in a structure using the Baker-Hubbard algorithm.

hbond_frequency

Get the relative frequency of each hydrogen bond in a multi-model structure.

partial_charges

Compute the partial charge of the individual atoms comprised in a given AtomArray depending on their electronegativity.

density

Compute the density of the selected atoms.

Proteins

dihedral_backbone

Measure the characteristic backbone dihedral angles of a protein structure.

annotate_sse

Calculate the secondary structure elements (SSEs) of a peptide chain based on the P-SEA algorithm.

Nucleic acids

Edge

This enum type represents the interacting edge for a given base.

GlycosidicBond

This enum type represents the relative glycosidic bond orientation for a given base pair.

map_nucleotide

Map a nucleotide to one of the 5 common bases Adenine, Guanine, Thymine, Cytosine, and Uracil.

base_pairs

Use DSSR criteria to find the base pairs in an AtomArray.

base_stacking

Find pi-stacking interactions between aromatic rings in nucleic acids.

pseudoknots

Identify the pseudoknot order for each base pair in a given set of base pairs.

base_pairs_edge

Get the interacting edges for given base pairs in an AtomArray according to the Leontis-Westhof nomenclature.

base_pairs_glycosidic_bond

Calculate the glycosidic bond orientation for given base pairs in an AtomArray according to the Leontis-Westhof nomenclature.

dot_bracket

Represent a nucleic acid strand in dot-bracket-letter-notation (DBL-notation).

dot_bracket_from_structure

Represent a nucleic-acid-strand in dot-bracket-letter-notation (DBL-notation).

base_pairs_from_dot_bracket

Extract the base pairs from a nucleic-acid-strand in dot-bracket-letter-notation (DBL-notation).

Miscellaneous

AffineTransformation

An affine transformation, consisting of translations and a rotation.

BadStructureError

Indicates that a structure is not suitable for a certain operation.

CellList

This class enables the efficient search of atoms in vicinity of a defined location.

IncompleteStructureWarning

Indicates that a structure is not complete.

UnexpectedStructureWarning

Indicates that a structure was not expected.

coord

Get the atom coordinates of the given array.

renumber_atom_ids

Renumber the atom IDs of the given array.

renumber_res_ids

Renumber the residue IDs of the given array.

Subpackages

biotite.structure.io

A subpackage for reading and writing structure related data.

biotite.structure.io.netcdf

This subpackage is used for reading and writing trajectories in the AMBER NetCDF format.

biotite.structure.io.mol

The MOL format is used to depict atom positions and bonds for small molecules.

biotite.structure.io.dcd

This subpackage is used for reading and writing trajectories in the CDC format used by software like CHARMM, OpenMM and NAMD.

biotite.structure.io.trr

This subpackage is used for reading and writing trajectories in the uncompressed Gromacs TRR format.

biotite.structure.io.pdbx

This subpackage provides support for the the modern PDBx file formats.

biotite.structure.io.pdbqt

This subpackage is used for reading and writing an AtomArray or AtomArrayStack using the PDBQT format used by the AutoDock software series.

biotite.structure.io.xtc

This subpackage is used for reading and writing trajectories in the compressed Gromacs XTC format.

biotite.structure.io.npz

This subpackage is used for reading and writing an AtomArray or AtomArrayStack using the internal NPZ file format.

biotite.structure.io.mmtf

This subpackage is used for reading and writing an AtomArray or AtomArrayStack using the binary MMTF format.

biotite.structure.io.gro

This subpackage is used for reading and writing an AtomArray or AtomArrayStack using the Gro format used by the gromacs software package.

biotite.structure.io.pdb

This subpackage is used for reading and writing an AtomArray or AtomArrayStack using the popular PDB format.

biotite.structure.io.tng

This subpackage is used for reading and writing trajectories in the compressed Gromacs TNG format.

biotite.structure.info

A subpackage for obtaining all kinds of chemical information about atoms and residues, including masses, radii, bonds, etc.

biotite.structure.graphics

A subpackage for visualizing structure related objects.