`biotite.structure`#

A subpackage for handling molecular structures.

In this context an atom is described by two kinds of attributes: the coordinates and the annotations. The annotations include information about polypetide chain id, residue id, residue name, hetero atom information, atom name and optionally more. The coordinates are a NumPy float ndarray of length 3, containing the x, y and z coordinates.

An Atom contains data for a single atom, it stores the annotations as scalar values and the coordinates as length 3 ndarray.

An AtomArray stores data for an entire structure model containing n atoms. Therefore the annotations are represented as ndarray objects of length n, the so called annotation arrays. The coordinates are a (n x 3) ndarray.

An AtomArrayStack stores data for m models, where each model contains the same atoms at different positions. Hence, the annotation arrays are represented as ndarray objects of length n like the AtomArray, while the coordinates are a (m x n x 3) ndarray.

Like an AtomArray can be iterated to obtain Atom objects, an AtomArrayStack yields AtomArray objects. All three types must not be subclassed.

The following annotation categories are mandatory:

Category	Type	Examples	Description
chain_id	string (U4)	‘A’,’S’,’AB’, …	Polypeptide chain
res_id	int	1,2,3, …	Sequence position of residue
ins_code	string (U1)	‘’, ‘A’,’B’,..	PDB insertion code (iCode)
res_name	string (U5)	‘GLY’,’ALA’, …	Residue name
hetero	bool	True, False	False for `ATOM`, true for `HETATM`
atom_name	string (U6)	‘CA’,’N’, …	Atom name
element	string (U2)	‘C’,’O’,’SE’, …	Chemical Element

For all Atom, AtomArray and AtomArrayStack objects these annotations are initially set with default values. Additionally to these annotations, an arbitrary amount of annotation categories can be added via add_annotation() or set_annotation(). The annotation arrays can be accessed either via the method get_annotation() or directly (e.g. array.res_id).

The following annotation categories are optionally used by some functions:

Category	Type	Examples	Description
atom_id	int	1,2,3, …	Atom serial number
b_factor	float	0.9, 12.3, …	Temperature factor
occupancy	float	.1, .3, .9, …	Occupancy
charge	int	-2,-1,0,1,2, …	Electric charge of the atom
sym_id	string	1,2,3, …	Symmetry ID for assemblies/symmetry mates

For each type, the attributes can be accessed directly. Both AtomArray and AtomArrayStack support NumPy style indexing. The index is propagated to each attribute. If a single integer is used as index, an object with one dimension less is returned (AtomArrayStack -> AtomArray, AtomArray -> Atom). If a slice, index array or a boolean mask is given, a substructure is returned (AtomArrayStack -> AtomArrayStack, AtomArray -> AtomArray) As in NumPy, these are not necessarily deep copies of the originals: The attributes of the sliced object may still point to the original ndarray. Use the copy() method if a deep copy is required.

Bond information can be associated to an AtomArray or AtomArrayStack by setting the bonds attribute with a BondList. A BondList specifies the indices of atoms that form chemical bonds. Some functionalities require that the input structure has an associated BondList. If no BondList is associated, the bonds attribute is None.

Based on the implementation in NumPy arrays, this package furthermore contains a comprehensive set of functions for structure analysis, manipulation and visualization.

The universal length unit in this package is Å.

Structure types#

`Atom`	A representation of a single atom.
`AtomArray`	An array representation of a model consisting of multiple atoms.
`AtomArrayStack`	A collection of multiple `AtomArray` instances, where each atom array has equal annotation arrays.
`concatenate`	Concatenate multiple `AtomArray` or `AtomArrayStack` objects into a single `AtomArray` or `AtomArrayStack`, respectively.
`array`	Create an `AtomArray` from a list of `Atom`.
`stack`	Create an `AtomArrayStack` from a list of `AtomArray`.
`repeat`	Repeat atoms (`AtomArray` or `AtomArrayStack`) multiple times in the same model with different coordinates.
`from_template`	Create an `AtomArrayStack` using template atoms and given coordinates.

Boxes and unit cells#

`space_group_transforms`	Get the coordinate transformations for a given space group.
`vectors_from_unitcell`	Calculate the three vectors spanning a box from the unit cell lengths and angles.
`unitcell_from_vectors`	Get the unit cell lengths and angles from box vectors.
`box_volume`	Get the volume of one ore multiple boxes.
`repeat_box`	Repeat the atoms in a box by duplicating and placing them in adjacent boxes.
`repeat_box_coord`	Similar to `repeat_box()`, repeat the coordinates in a box by duplicating and placing them in adjacent boxes.
`move_inside_box`	Move all coordinates into the given box, with the box vectors originating at (0,0,0).
`remove_pbc`	Remove segmentation caused by periodic boundary conditions from each molecule in the given structure.
`remove_pbc_from_coord`	Remove segmentation caused by periodic boundary conditions from given coordinates.
`coord_to_fraction`	Transform coordinates to fractions of box vectors.
`fraction_to_coord`	Transform fractions of box vectors to coordinates.
`is_orthogonal`	Check, whether a box or multiple boxes is/are orthogonal.

Bonds#

`BondList`	A bond list stores indices of atoms (usually of an `AtomArray` or `AtomArrayStack`) that form chemical bonds together with the type (or order) of the bond.
`BondType`	This enum type represents the type of a chemical bond.
`connect_via_residue_names`	Create a `BondList` for a given atom array (stack), based on the deposited bonds for each residue in the RCSB `components.cif` dataset.
`connect_via_distances`	Create a `BondList` for a given atom array, based on pairwise atom distances.
`find_connected`	Get indices to all atoms that are directly or inderectly connected to the root atom indicated by the given index.
`find_rotatable_bonds`	Find all rotatable bonds in a given `BondList`.

Geometry#

`displacement`	Measure the displacement vector, i.e. the vector difference, from one array of atom coordinates to another array of coordinates.
`index_displacement`	Measure the displacement, i.e. the vector difference, between pairs of atoms.
`distance`	Measure the euclidian distance between atoms.
`index_distance`	Measure the euclidian distance between pairs of atoms.
`angle`	Measure the angle between 3 atoms.
`index_angle`	Measure the angle between triples of atoms.
`dihedral`	Measure the dihedral angle between 4 atoms.
`index_dihedral`	Measure the dihedral angle between quadruples of atoms.
`centroid`	Measure the centroid of a structure.
`mass_center`	Calculate the center(s) of mass of an atom array or stack.
`gyration_radius`	Compute the radius/radii of gyration of an atom array or stack.
`rdf`	Compute the radial distribution function g(r) (RDF) for one or multiple given central positions based on a given system of particles.

Transformations#

`AffineTransformation`	An affine transformation, consisting of translations and a rotation.
`translate`	Translate the given atoms or coordinates by a given vector.
`rotate`	Rotate the given atoms or coordinates about the x, y and z axes by given angles.
`rotate_centered`	Rotate the given atoms or coordinates about the x, y and z axes by given angles.
`rotate_about_axis`	Rotate the given atoms or coordinates about a given axis by a given angle.
`align_vectors`	Apply a transformation to atoms or coordinates, that would transfer a origin vector to a target vector.
`orient_principal_components`	Translate and rotate the atoms to be centered at the origin with the principal axes aligned to the Cartesian axes, as specified by the order parameter.

Superimpositions#

`superimpose`	Superimpose structures onto each other, minimizing the RMSD between them.
`superimpose_without_outliers`	Superimpose structures onto a fixed structure, ignoring conformational outliers.
`superimpose_homologs`	Superimpose a protein or nucleotide structure onto another one, considering sequence differences and conformational outliers.
`superimpose_structural_homologs`	Superimpose two remotely homologous protein structures.

Filters#

`filter_canonical_nucleotides`	Filter all atoms of one array that belong to canonical nucleotides.
`filter_nucleotides`	Filter all atoms of one array that belong to nucleotides.
`filter_canonical_amino_acids`	Filter all atoms of one array that belong to canonical amino acid residues.
`filter_amino_acids`	Filter all atoms of one array that belong to amino acid residues.
`filter_carbohydrates`	Filter all atoms of one array that belong to carbohydrates.
`filter_peptide_backbone`	Filter all peptide backbone atoms of one array.
`filter_phosphate_backbone`	Filter all phosphate backbone atoms of one array.
`filter_linear_bond_continuity`	Filter for atoms such that their bond length with the next atom lies within the provided boundaries.
`filter_polymer`	Filter for atoms that are a part of a consecutive standard macromolecular polymer entity.
`filter_solvent`	Filter all atoms of one array that are part of the solvent.
`filter_monoatomic_ions`	Filter all atoms of an atom array, that are monoatomic ions (e.g. sodium or chloride ions).
`filter_heavy`	Filter all non-hydrogen atoms of an atom array.
`filter_intersection`	Filter all atoms of one array that exist also in another array.
`filter_first_altloc`	Filter all atoms, that have the first altloc ID appearing in a residue.
`filter_highest_occupancy_altloc`	For each residue, filter all atoms, that have the altloc ID with the highest occupancy for this residue.

Checks#

`check_atom_id_continuity`	Check if the atom IDs are incremented by more than 1 or decremented, from one atom to the next one.
`check_res_id_continuity`	Check if the residue IDs are incremented by more than 1 or decremented, from one atom to the next one.
`check_backbone_continuity`	Check if the (peptide or phosphate) backbone atoms have non-reasonable distance to the next atom.
`check_duplicate_atoms`	Check if a structure contains duplicate atoms, i.e. two atoms in a structure have the same annotations (coordinates may be different).
`check_linear_continuity`	Check linear (consecutive) bond continuity of atoms in atom array.

Repair#

`create_continuous_res_ids`	Create an array of continuous residue IDs for a given structure.
`infer_elements`	Infer the elements of atoms based on their atom name.
`create_atom_names`	Create atom names for a single residue based on elements.

Residue level utility#

`get_residue_starts`	Get indices for an atom array, each indicating the beginning of a residue.
`get_residues`	Get the residue IDs and names of an atom array (stack).
`apply_residue_wise`	Apply a function to intervals of data, where each interval corresponds to one residue.
`spread_residue_wise`	Expand residue-wise data to atom-wise data.
`get_residue_masks`	Get boolean masks indicating the residues to which the given atom indices belong.
`get_residue_starts_for`	For each given atom index, get the index that points to the start of the residue that atom belongs to.
`get_residue_positions`	For each given atom index, obtain the position of the residue corresponding to this index in the input array.
`get_all_residue_positions`	For each atom, obtain the position of the residue corresponding to this atom in the input array.
`get_residue_count`	Get the amount of residues in an atom array (stack).
`residue_iter`	Iterate over all residues in an atom array (stack).
`get_atom_name_indices`	For each residue, get the index of the atom with the given atom name.

Chain level utility#

`get_chain_starts`	Get the indices in an atom array, which indicates the beginning of a new chain.
`apply_chain_wise`	Apply a function to intervals of data, where each interval corresponds to one chain.
`spread_chain_wise`	Expand chain-wise data to atom-wise data.
`get_chain_masks`	Get boolean masks indicating the chains to which the given atom indices belong.
`get_chain_starts_for`	For each given atom index, get the index that points to the start of the chain that atom belongs to.
`get_chain_positions`	For each given atom index, obtain the position of the chain corresponding to this index in the input array.
`get_all_chain_positions`	For each atom, obtain the position of the chain corresponding to this atom in the input array.
`get_chains`	Get the chain IDs of an atom array (stack).
`get_chain_count`	Get the amount of chains in an atom array (stack).
`chain_iter`	Iterate over all chains in an atom array (stack).

Molecule level utility#

`get_molecule_indices`	Get an index array for each molecule in the given structure.
`get_molecule_masks`	Get a boolean mask for each molecule in the given structure.
`molecule_iter`	Iterate over each molecule in a input structure.

Structure comparison#

`average`	Calculate an average structure.
`rmsd`	Calculate the RMSD between two structures.
`rmspd`	Calculate the RMSD of atom pair distances for given structures relative to those found in a reference structure.
`rmsf`	Calculate the RMSF between two structures.
`lddt`	Calculate the local Distance Difference Test (lDDT) score of a structure with respect to its reference.
`tm_score`	Compute the TM-score for the given protein structures.

General analysis#

`sasa`	Calculate the Solvent Accessible Surface Area (SASA) of a protein.
`hbond`	Find hydrogen bonds in a structure using the Baker-Hubbard algorithm.
`hbond_frequency`	Get the relative frequency of each hydrogen bond in a multi-model structure.
`partial_charges`	Compute the partial charge of the individual atoms comprised in a given `AtomArray` depending on their electronegativity.
`density`	Compute the density of the selected atoms.

Proteins#

`dihedral_backbone`	Measure the characteristic backbone dihedral angles of a chain.
`dihedral_side_chain`	Measure the side chain \(\chi\) dihedral angles of amino acid residues.
`annotate_sse`	Calculate the secondary structure elements (SSEs) of a peptide chain based on the P-SEA algorithm.

Nucleic acids#

`Edge`	This enum type represents the interacting edge for a given base.
`GlycosidicBond`	This enum type represents the relative glycosidic bond orientation for a given base pair.
`map_nucleotide`	Map a nucleotide to one of the 5 common bases Adenine, Guanine, Thymine, Cytosine, and Uracil.
`base_pairs`	Use DSSR criteria to find the base pairs in an `AtomArray`.
`base_stacking`	Find pi-stacking interactions between aromatic rings in nucleic acids.
`pseudoknots`	Identify the pseudoknot order for each base pair in a given set of base pairs.
`base_pairs_edge`	Get the interacting edges for given base pairs in an `AtomArray` according to the Leontis-Westhof nomenclature.
`base_pairs_glycosidic_bond`	Calculate the glycosidic bond orientation for given base pairs in an `AtomArray` according to the Leontis-Westhof nomenclature.
`dot_bracket`	Represent a nucleic acid strand in dot-bracket-letter-notation (DBL-notation).
`dot_bracket_from_structure`	Represent a nucleic-acid-strand in dot-bracket-letter-notation (DBL-notation).
`base_pairs_from_dot_bracket`	Extract the base pairs from a nucleic-acid-strand in dot-bracket-letter-notation (DBL-notation).

Aromatic rings#

`find_aromatic_rings`	Find (anti-)aromatic rings in a structure.
`find_stacking_interactions`	Find pi-stacking interactions between aromatic rings.
`find_pi_cation_interactions`	Find pi-cation interactions between aromatic rings and cations.
`PiStacking`	The type of pi-stacking interaction.

Miscellaneous#

`BadStructureError`	Indicates that a structure is not suitable for a certain operation.
`CellList`	This class enables the efficient search of atoms in vicinity of a defined location.
`IncompleteStructureWarning`	Indicates that a structure is not complete.
`UnexpectedStructureWarning`	Indicates that a structure was not expected.
`coord`	Get the atom coordinates of the given array.
`set_print_limits`	Set the maximum number of models and atoms to print in the `str()` and `repr()` representations.
`to_sequence`	Convert each chain in a structure into a sequence.

Subpackages#

`biotite.structure.alphabet`	A subpackage for converting structures to structural alphabet sequences.
`biotite.structure.graphics`	A subpackage for visualizing structure related objects.
`biotite.structure.io`	A subpackage for reading and writing structure related data.
`biotite.structure.io.gro`	This subpackage is used for reading and writing an `AtomArray` or `AtomArrayStack` using the Gro format used by the gromacs software package.
`biotite.structure.io.pdbx`	This subpackage provides support for the the modern PDBx file formats.
`biotite.structure.io.pdbqt`	This subpackage is used for reading and writing an `AtomArray` or `AtomArrayStack` using the PDBQT format used by the AutoDock software series.
`biotite.structure.io.xtc`	This subpackage is used for reading and writing trajectories in the compressed Gromacs XTC format.
`biotite.structure.io.dcd`	This subpackage is used for reading and writing trajectories in the CDC format used by software like CHARMM, OpenMM and NAMD.
`biotite.structure.io.mol`	The MOL format is used to depict atom positions and bonds for small molecules.
`biotite.structure.io.pdb`	This subpackage is used for reading and writing an `AtomArray` or `AtomArrayStack` using the popular PDB format.
`biotite.structure.io.trr`	This subpackage is used for reading and writing trajectories in the uncompressed Gromacs TRR format.
`biotite.structure.io.netcdf`	This subpackage is used for reading and writing trajectories in the AMBER NetCDF format.
`biotite.structure.info`	A subpackage for obtaining all kinds of chemical information about atoms and residues, including masses, radii, bonds, etc.

biotite.structure#