biotite.structure
#
A subpackage for handling molecular structures.
In this context an atom is described by two kinds of attributes: the
coordinates and the annotations. The annotations include information
about polypetide chain id, residue id, residue name, hetero atom
information, atom name and optionally more. The coordinates are a
NumPy float ndarray
of length 3, containing the x, y and z
coordinates.
An Atom
contains data for a single atom, it stores the
annotations as scalar values and the coordinates as length 3
ndarray
.
An AtomArray
stores data for an entire structure model
containing n atoms.
Therefore the annotations are represented as ndarray
objects of
length n, the so called annotation arrays.
The coordinates are a (n x 3) ndarray
.
An AtomArrayStack
stores data for m models, where each model
contains the same atoms at different positions.
Hence, the annotation arrays are represented as ndarray
objects
of length n like the AtomArray
, while the coordinates are a
(m x n x 3) ndarray
.
Like an AtomArray
can be iterated to obtain Atom
objects, an AtomArrayStack
yields AtomArray
objects.
All three types must not be subclassed.
The following annotation categories are mandatory:
Category |
Type |
Examples |
Description |
---|---|---|---|
chain_id |
string (U4) |
‘A’,’S’,’AB’, … |
Polypeptide chain |
res_id |
int |
1,2,3, … |
Sequence position of residue |
ins_code |
string (U1) |
‘’, ‘A’,’B’,.. |
PDB insertion code (iCode) |
res_name |
string (U5) |
‘GLY’,’ALA’, … |
Residue name |
hetero |
bool |
True, False |
False for |
atom_name |
string (U6) |
‘CA’,’N’, … |
Atom name |
element |
string (U2) |
‘C’,’O’,’SE’, … |
Chemical Element |
For all Atom
, AtomArray
and AtomArrayStack
objects these annotations are initially set with default values.
Additionally to these annotations, an arbitrary amount of annotation
categories can be added via add_annotation()
or
set_annotation()
.
The annotation arrays can be accessed either via the method
get_annotation()
or directly (e.g. array.res_id
).
The following annotation categories are optionally used by some functions:
Category |
Type |
Examples |
Description |
---|---|---|---|
atom_id |
int |
1,2,3, … |
Atom serial number |
b_factor |
float |
0.9, 12.3, … |
Temperature factor |
occupancy |
float |
.1, .3, .9, … |
Occupancy |
charge |
int |
-2,-1,0,1,2, … |
Electric charge of the atom |
sym_id |
string |
‘1’,’2’,’3’, … |
Symmetry ID for assemblies/symmetry mates |
For each type, the attributes can be accessed directly.
Both AtomArray
and AtomArrayStack
support
NumPy style indexing.
The index is propagated to each attribute.
If a single integer is used as index,
an object with one dimension less is returned
(AtomArrayStack
-> AtomArray
,
AtomArray
-> Atom
).
If a slice, index array or a boolean mask is given, a substructure is
returned
(AtomArrayStack
-> AtomArrayStack
,
AtomArray
-> AtomArray
)
As in NumPy, these are not necessarily deep copies of the originals:
The attributes of the sliced object may still point to the original
ndarray
.
Use the copy()
method if a deep copy is required.
Bond information can be associated to an AtomArray
or
AtomArrayStack
by setting the bonds
attribute with a
BondList
.
A BondList
specifies the indices of atoms that form chemical
bonds.
Some functionalities require that the input structure has an associated
BondList
.
If no BondList
is associated, the bonds
attribute is
None
.
Based on the implementation in NumPy arrays, this package furthermore contains a comprehensive set of functions for structure analysis, manipulation and visualization.
The universal length unit in this package is Å.
Structure types#
A representation of a single atom. |
|
An array representation of a model consisting of multiple atoms. |
|
A collection of multiple |
|
Concatenate multiple |
|
Create an |
|
Repeat atoms ( |
|
Create an |
Boxes and unit cells#
Calculate the three vectors spanning a box from the unit cell lengths and angles. |
|
Get the unit cell lengths and angles from box vectors. |
|
Get the volume of one ore multiple boxes. |
|
Repeat the atoms in a box by duplicating and placing them in adjacent boxes. |
|
Similar to |
|
Move all coordinates into the given box, with the box vectors originating at (0,0,0). |
|
Remove segmentation caused by periodic boundary conditions from each molecule in the given structure. |
|
Remove segmentation caused by periodic boundary conditions from given coordinates. |
|
Transform coordinates to fractions of box vectors. |
|
Transform fractions of box vectors to coordinates. |
|
Check, whether a box or multiple boxes is/are orthogonal. |
Bonds#
A bond list stores indices of atoms (usually of an |
|
This enum type represents the type of a chemical bond. |
|
Create a |
|
Create a |
|
Get indices to all atoms that are directly or inderectly connected to the root atom indicated by the given index. |
|
Find all rotatable bonds in a given |
Geometry#
Measure the displacement vector, i.e. the vector difference, from one array of atom coordinates to another array of coordinates. |
|
Measure the displacement, i.e. the vector difference, between pairs of atoms. |
|
Measure the euclidian distance between atoms. |
|
Measure the euclidian distance between pairs of atoms. |
|
Measure the angle between 3 atoms. |
|
Measure the angle between triples of atoms. |
|
Measure the dihedral angle between 4 atoms. |
|
Measure the dihedral angle between quadruples of atoms. |
|
Measure the centroid of a structure. |
|
Calculate the center(s) of mass of an atom array or stack. |
|
Compute the radius/radii of gyration of an atom array or stack. |
|
Compute the radial distribution function g(r) (RDF) for one or multiple given central positions based on a given system of particles. |
Transformations#
Translate the given atoms or coordinates by a given vector. |
|
Rotate the given atoms or coordinates about the x, y and z axes by given angles. |
|
Rotate the given atoms or coordinates about the x, y and z axes by given angles. |
|
Rotate the given atoms or coordinates about a given axis by a given angle. |
|
Apply a transformation to atoms or coordinates, that would transfer a origin vector to a target vector. |
|
Translate and rotate the atoms to be centered at the origin with the principal axes aligned to the Cartesian axes, as specified by the order parameter. |
Superimpositions#
Superimpose structures onto each other, minimizing the RMSD between them. |
|
Superimpose one protein or nucleotide chain onto another one, considering sequence differences and conformational outliers. |
|
Superimpose structures onto a fixed structure, ignoring conformational outliers. |
|
An affine transformation, consisting of translations and a rotation. |
Filters#
Filter all atoms of one array that belong to canonical nucleotides. |
|
Filter all atoms of one array that belong to nucleotides. |
|
Filter all atoms of one array that belong to canonical amino acid residues. |
|
Filter all atoms of one array that belong to amino acid residues. |
|
Filter all atoms of one array that belong to carbohydrates. |
|
Filter all peptide backbone atoms of one array. |
|
Filter all phosphate backbone atoms of one array. |
|
Filter for atoms such that their bond length with the next atom lies within the provided boundaries. |
|
Filter for atoms that are a part of a consecutive standard macromolecular polymer entity. |
|
Filter all atoms of one array that are part of the solvent. |
|
Filter all atoms of an atom array, that are monoatomic ions (e.g. sodium or chloride ions). |
|
Filter all atoms of one array that exist also in another array. |
|
Filter all atoms, that have the first altloc ID appearing in a residue. |
|
For each residue, filter all atoms, that have the altloc ID with the highest occupancy for this residue. |
Checks#
Check if the atom IDs are incremented by more than 1 or decremented, from one atom to the next one. |
|
Check if the residue IDs are incremented by more than 1 or decremented, from one atom to the next one. |
|
Check if the (peptide or phosphate) backbone atoms have non-reasonable distance to the next atom. |
|
Check if a structure contains duplicate atoms, i.e. two atoms in a structure have the same annotations (coordinates may be different). |
|
Check linear (consecutive) bond continuity of atoms in atom array. |
Repair#
Create an array of continuous residue IDs for a given structure. |
|
Infer the elements of atoms based on their atom name. |
|
Create atom names for a single residue based on elements. |
Residue level utility#
Get indices for an atom array, each indicating the beginning of a residue. |
|
Get the residue IDs and names of an atom array (stack). |
|
Apply a function to intervals of data, where each interval corresponds to one residue. |
|
Expand residue-wise data to atom-wise data. |
|
Get boolean masks indicating the residues to which the given atom indices belong. |
|
For each given atom index, get the index that points to the start of the residue that atom belongs to. |
|
For each given atom index, obtain the position of the residue corresponding to this index in the input array. |
|
Get the amount of residues in an atom array (stack). |
|
Iterate over all residues in an atom array (stack). |
Chain level utility#
Get the indices in an atom array, which indicates the beginning of a new chain. |
|
Apply a function to intervals of data, where each interval corresponds to one chain. |
|
Expand chain-wise data to atom-wise data. |
|
Get boolean masks indicating the chains to which the given atom indices belong. |
|
For each given atom index, get the index that points to the start of the chain that atom belongs to. |
|
For each given atom index, obtain the position of the chain corresponding to this index in the input array. |
|
Iterate over all chains in an atom array (stack). |
|
Get the chain IDs of an atom array (stack). |
|
Get the amount of chains in an atom array (stack). |
|
Iterate over all chains in an atom array (stack). |
Molecule level utility#
Get an index array for each molecule in the given structure. |
|
Get a boolean mask for each molecule in the given structure. |
|
Iterate over each molecule in a input structure. |
Structure comparison#
General analysis#
Calculate the Solvent Accessible Surface Area (SASA) of a protein. |
|
Find hydrogen bonds in a structure using the Baker-Hubbard algorithm. |
|
Get the relative frequency of each hydrogen bond in a multi-model structure. |
|
Compute the partial charge of the individual atoms comprised in a given |
|
Compute the density of the selected atoms. |
Proteins#
Measure the characteristic backbone dihedral angles of a chain. |
|
Calculate the secondary structure elements (SSEs) of a peptide chain based on the P-SEA algorithm. |
Nucleic acids#
This enum type represents the interacting edge for a given base. |
|
This enum type represents the relative glycosidic bond orientation for a given base pair. |
|
Map a nucleotide to one of the 5 common bases Adenine, Guanine, Thymine, Cytosine, and Uracil. |
|
Use DSSR criteria to find the base pairs in an |
|
Find pi-stacking interactions between aromatic rings in nucleic acids. |
|
Identify the pseudoknot order for each base pair in a given set of base pairs. |
|
Get the interacting edges for given base pairs in an |
|
Calculate the glycosidic bond orientation for given base pairs in an |
|
Represent a nucleic acid strand in dot-bracket-letter-notation (DBL-notation). |
|
Represent a nucleic-acid-strand in dot-bracket-letter-notation (DBL-notation). |
|
Extract the base pairs from a nucleic-acid-strand in dot-bracket-letter-notation (DBL-notation). |
Miscellaneous#
Indicates that a structure is not suitable for a certain operation. |
|
This class enables the efficient search of atoms in vicinity of a defined location. |
|
Indicates that a structure is not complete. |
|
Indicates that a structure was not expected. |
|
Get the atom coordinates of the given array. |
|
Convert each chain in a structure into a sequence. |
Subpackages#
A subpackage for obtaining all kinds of chemical information about atoms and residues, including masses, radii, bonds, etc. |
|
A subpackage for reading and writing structure related data. |
|
The MOL format is used to depict atom positions and bonds for small molecules. |
|
This subpackage is used for reading and writing trajectories in the compressed Gromacs XTC format. |
|
This subpackage is used for reading and writing trajectories in the AMBER NetCDF format. |
|
This subpackage is used for reading and writing trajectories in the CDC format used by software like CHARMM, OpenMM and NAMD. |
|
This subpackage is used for reading and writing an |
|
This subpackage is used for reading and writing an |
|
This subpackage is used for reading and writing trajectories in the uncompressed Gromacs TRR format. |
|
This subpackage is used for reading and writing an |
|
This subpackage provides support for the the modern PDBx file formats. |
|
A subpackage for converting structures to structural alphabet sequences. |
|
A subpackage for visualizing structure related objects. |