A subpackage for handling molecular structures.
In this context an atom is described by two kinds of attributes: the
coordinates and the annotations. The annotations include information
about polypetide chain id, residue id, residue name, hetero atom
information, atom name and optionally more. The coordinates are a
NumPy float ndarray
of length 3, containing the x, y and z
An Atom
contains data for a single atom, it stores the
annotations as scalar values and the coordinates as length 3
An AtomArray
stores data for an entire structure model
containing n atoms.
Therefore the annotations are represented as ndarray
objects of
length n, the so called annotation arrays.
The coordinates are a (n x 3) ndarray
An AtomArrayStack
stores data for m models, where each model
contains the same atoms at different positions.
Hence, the annotation arrays are represented as ndarray
of length n like the AtomArray
, while the coordinates are a
(m x n x 3) ndarray
Like an AtomArray
can be iterated to obtain Atom
objects, an AtomArrayStack
yields AtomArray
All three types must not be subclassed.
The following annotation categories are mandatory:
Category |
Type |
Examples |
Description |
chain_id |
string (U4) |
‘A’,’S’,’AB’, … |
Polypeptide chain |
res_id |
int |
1,2,3, … |
Sequence position of residue |
ins_code |
string (U1) |
‘’, ‘A’,’B’,.. |
PDB insertion code (iCode) |
res_name |
string (U5) |
‘GLY’,’ALA’, … |
Residue name |
hetero |
bool |
True, False |
False for |
atom_name |
string (U6) |
‘CA’,’N’, … |
Atom name |
element |
string (U2) |
‘C’,’O’,’SE’, … |
Chemical Element |
For all Atom
, AtomArray
and AtomArrayStack
objects these annotations are initially set with default values.
Additionally to these annotations, an arbitrary amount of annotation
categories can be added via add_annotation()
The annotation arrays can be accessed either via the method
or directly (e.g. array.res_id
The following annotation categories are optionally used by some functions:
Category |
Type |
Examples |
Description |
atom_id |
int |
1,2,3, … |
Atom serial number |
b_factor |
float |
0.9, 12.3, … |
Temperature factor |
occupancy |
float |
.1, .3, .9, … |
Occupancy |
charge |
int |
-2,-1,0,1,2, … |
Electric charge of the atom |
sym_id |
string |
‘1’,’2’,’3’, … |
Symmetry ID for assemblies/symmetry mates |
For each type, the attributes can be accessed directly.
Both AtomArray
and AtomArrayStack
NumPy style indexing.
The index is propagated to each attribute.
If a single integer is used as index,
an object with one dimension less is returned
-> AtomArray
-> Atom
If a slice, index array or a boolean mask is given, a substructure is
-> AtomArrayStack
-> AtomArray
As in NumPy, these are not necessarily deep copies of the originals:
The attributes of the sliced object may still point to the original
Use the copy()
method if a deep copy is required.
Bond information can be associated to an AtomArray
by setting the bonds
attribute with a
A BondList
specifies the indices of atoms that form chemical
Some functionalities require that the input structure has an associated
If no BondList
is associated, the bonds
attribute is
Based on the implementation in NumPy arrays, this package furthermore contains a comprehensive set of functions for structure analysis, manipulation and visualization.
The universal length unit in this package is Å.
Structure types#
A representation of a single atom. |
An array representation of a model consisting of multiple atoms. |
A collection of multiple |
Concatenate multiple |
Create an |
Repeat atoms ( |
Create an |
Boxes and unit cells#
Calculate the three vectors spanning a box from the unit cell lengths and angles. |
Get the unit cell lengths and angles from box vectors. |
Get the volume of one ore multiple boxes. |
Repeat the atoms in a box by duplicating and placing them in adjacent boxes. |
Similar to |
Move all coordinates into the given box, with the box vectors originating at (0,0,0). |
Remove segmentation caused by periodic boundary conditions from each molecule in the given structure. |
Remove segmentation caused by periodic boundary conditions from given coordinates. |
Transform coordinates to fractions of box vectors. |
Transform fractions of box vectors to coordinates. |
Check, whether a box or multiple boxes is/are orthogonal. |
A bond list stores indices of atoms (usually of an |
This enum type represents the type of a chemical bond. |
Create a |
Create a |
Get indices to all atoms that are directly or inderectly connected to the root atom indicated by the given index. |
Find all rotatable bonds in a given |
Measure the displacement vector, i.e. the vector difference, from one array of atom coordinates to another array of coordinates. |
Measure the displacement, i.e. the vector difference, between pairs of atoms. |
Measure the euclidian distance between atoms. |
Measure the euclidian distance between pairs of atoms. |
Measure the angle between 3 atoms. |
Measure the angle between triples of atoms. |
Measure the dihedral angle between 4 atoms. |
Measure the dihedral angle between quadruples of atoms. |
Measure the centroid of a structure. |
Calculate the center(s) of mass of an atom array or stack. |
Compute the radius/radii of gyration of an atom array or stack. |
Compute the radial distribution function g(r) (RDF) for one or multiple given central positions based on a given system of particles. |
Translate the given atoms or coordinates by a given vector. |
Rotate the given atoms or coordinates about the x, y and z axes by given angles. |
Rotate the given atoms or coordinates about the x, y and z axes by given angles. |
Rotate the given atoms or coordinates about a given axis by a given angle. |
Apply a transformation to atoms or coordinates, that would transfer a origin vector to a target vector. |
Translate and rotate the atoms to be centered at the origin with the principal axes aligned to the Cartesian axes, as specified by the order parameter. |
Superimpose structures onto each other, minimizing the RMSD between them. |
Superimpose one protein or nucleotide chain onto another one, considering sequence differences and conformational outliers. |
Superimpose structures onto a fixed structure, ignoring conformational outliers. |
An affine transformation, consisting of translations and a rotation. |
Filter all atoms of one array that belong to canonical nucleotides. |
Filter all atoms of one array that belong to nucleotides. |
Filter all atoms of one array that belong to canonical amino acid residues. |
Filter all atoms of one array that belong to amino acid residues. |
Filter all atoms of one array that belong to carbohydrates. |
Filter all peptide backbone atoms of one array. |
Filter all phosphate backbone atoms of one array. |
Filter for atoms such that their bond length with the next atom lies within the provided boundaries. |
Filter for atoms that are a part of a consecutive standard macromolecular polymer entity. |
Filter all atoms of one array that are part of the solvent. |
Filter all atoms of an atom array, that are monoatomic ions (e.g. sodium or chloride ions). |
Filter all atoms of one array that exist also in another array. |
Filter all atoms, that have the first altloc ID appearing in a residue. |
For each residue, filter all atoms, that have the altloc ID with the highest occupancy for this residue. |
Check if the atom IDs are incremented by more than 1 or decremented, from one atom to the next one. |
Check if the residue IDs are incremented by more than 1 or decremented, from one atom to the next one. |
Check if the (peptide or phosphate) backbone atoms have non-reasonable distance to the next atom. |
Check if a structure contains duplicate atoms, i.e. two atoms in a structure have the same annotations (coordinates may be different). |
Check linear (consecutive) bond continuity of atoms in atom array. |
Create an array of continuous residue IDs for a given structure. |
Infer the elements of atoms based on their atom name. |
Create atom names for a single residue based on elements. |
Residue level utility#
Get indices for an atom array, each indicating the beginning of a residue. |
Get the residue IDs and names of an atom array (stack). |
Apply a function to intervals of data, where each interval corresponds to one residue. |
Expand residue-wise data to atom-wise data. |
Get boolean masks indicating the residues to which the given atom indices belong. |
For each given atom index, get the index that points to the start of the residue that atom belongs to. |
For each given atom index, obtain the position of the residue corresponding to this index in the input array. |
Get the amount of residues in an atom array (stack). |
Iterate over all residues in an atom array (stack). |
Chain level utility#
Get the indices in an atom array, which indicates the beginning of a new chain. |
Apply a function to intervals of data, where each interval corresponds to one chain. |
Expand chain-wise data to atom-wise data. |
Get boolean masks indicating the chains to which the given atom indices belong. |
For each given atom index, get the index that points to the start of the chain that atom belongs to. |
For each given atom index, obtain the position of the chain corresponding to this index in the input array. |
Iterate over all chains in an atom array (stack). |
Get the chain IDs of an atom array (stack). |
Get the amount of chains in an atom array (stack). |
Iterate over all chains in an atom array (stack). |
Molecule level utility#
Get an index array for each molecule in the given structure. |
Get a boolean mask for each molecule in the given structure. |
Iterate over each molecule in a input structure. |
Structure comparison#
General analysis#
Calculate the Solvent Accessible Surface Area (SASA) of a protein. |
Find hydrogen bonds in a structure using the Baker-Hubbard algorithm. |
Get the relative frequency of each hydrogen bond in a multi-model structure. |
Compute the partial charge of the individual atoms comprised in a given |
Compute the density of the selected atoms. |
Measure the characteristic backbone dihedral angles of a chain. |
Calculate the secondary structure elements (SSEs) of a peptide chain based on the P-SEA algorithm. |
Nucleic acids#
This enum type represents the interacting edge for a given base. |
This enum type represents the relative glycosidic bond orientation for a given base pair. |
Map a nucleotide to one of the 5 common bases Adenine, Guanine, Thymine, Cytosine, and Uracil. |
Use DSSR criteria to find the base pairs in an |
Find pi-stacking interactions between aromatic rings in nucleic acids. |
Identify the pseudoknot order for each base pair in a given set of base pairs. |
Get the interacting edges for given base pairs in an |
Calculate the glycosidic bond orientation for given base pairs in an |
Represent a nucleic acid strand in dot-bracket-letter-notation (DBL-notation). |
Represent a nucleic-acid-strand in dot-bracket-letter-notation (DBL-notation). |
Extract the base pairs from a nucleic-acid-strand in dot-bracket-letter-notation (DBL-notation). |
Indicates that a structure is not suitable for a certain operation. |
This class enables the efficient search of atoms in vicinity of a defined location. |
Indicates that a structure is not complete. |
Indicates that a structure was not expected. |
Get the atom coordinates of the given array. |
Convert each chain in a structure into a sequence. |
A subpackage for obtaining all kinds of chemical information about atoms and residues, including masses, radii, bonds, etc. |
A subpackage for reading and writing structure related data. |
The MOL format is used to depict atom positions and bonds for small molecules. |
This subpackage is used for reading and writing trajectories in the compressed Gromacs XTC format. |
This subpackage is used for reading and writing trajectories in the AMBER NetCDF format. |
This subpackage is used for reading and writing trajectories in the CDC format used by software like CHARMM, OpenMM and NAMD. |
This subpackage is used for reading and writing an |
This subpackage is used for reading and writing an |
This subpackage is used for reading and writing trajectories in the uncompressed Gromacs TRR format. |
This subpackage is used for reading and writing an |
This subpackage provides support for the the modern PDBx file formats. |
A subpackage for converting structures to structural alphabet sequences. |
A subpackage for visualizing structure related objects. |