base_pairs#

biotite.structure.base_pairs(atom_array, min_atoms_per_base=3, unique=True)[source]#

Use DSSR criteria to find the base pairs in an AtomArray.

The algorithm is able to identify canonical and non-canonical base pairs. between the 5 common bases Adenine, Guanine, Thymine, Cytosine, and Uracil bound to Deoxyribose and Ribose. Each Base is mapped to the 5 common bases Adenine, Guanine, Thymine, Cytosine, and Uracil in a standard reference frame described in [1] using map_nucleotide().

The DSSR Criteria are as follows [2]:

  1. Distance between base origins <=15 Å

  2. Vertical separation between the base planes <=2.5 Å

  3. Angle between the base normal vectors <=65°

  4. Absence of stacking between the two bases

  5. Presence of at least one hydrogen bond involving a base atom

Parameters:
atom_arrayAtomArray

The AtomArray to find base pairs in.

min_atoms_per_baseinteger, optional (default: 3)

The number of atoms a nucleotides’ base must have to be considered a candidate for a base pair.

uniquebool, optional (default: True)

If True, each base is assumed to be only paired with one other base. If multiple pairings are plausible, the pairing with the most hydrogen bonds is selected.

Returns:
basepairsndarray, dtype=int, shape=(n,2)

Each row is equivalent to one base pair and contains the first indices of the residues corresponding to each base.

Notes

The bases from the standard reference frame described in [1] were modified such that only the base atoms are implemented. Sugar atoms (specifically C1’) were disregarded, as nucleosides such as PSU do not posess the usual N-glycosidic linkage, thus leading to inaccurate results.

The vertical separation is implemented as the scalar projection of the distance vectors between the base origins according to [3] onto the averaged base normal vectors.

The presence of base stacking is assumed if the following criteria are met [4]:

  1. Distance between aromatic ring centers <=4.5 Å

  2. Angle between the ring normal vectors <=23°

  3. Angle between normalized distance vector between two ring centers and both bases’ normal vectors <=40°

Please note that ring normal vectors are assumed to be equal to the base normal vectors.

For structures without hydrogens the accuracy of the algorithm is limited as the hydrogen bonds can be only checked be checked for plausibility. A hydrogen bond is considered as plausible if a cutoff of 3.6 Å between N/O atom pairs is met. 3.6Å was chosen as hydrogen bonds are typically 1.5-2.5Å in length. N-H and O-H bonds have a length of 1.00Å and 0.96Å respectively. Thus, including some buffer, a 3.6Å cutoff should cover all hydrogen bonds.

References

Examples

Compute the base pairs for the structure with the PDB ID 1QXB:

>>> from os.path import join
>>> dna_helix = load_structure(
...     join(path_to_structures, "base_pairs", "1qxb.cif")
... )
>>> basepairs = base_pairs(dna_helix)
>>> print(dna_helix[basepairs].res_name)
[['DC' 'DG']
 ['DG' 'DC']
 ['DC' 'DG']
 ['DG' 'DC']
 ['DA' 'DT']
 ['DA' 'DT']
 ['DT' 'DA']
 ['DT' 'DA']
 ['DC' 'DG']
 ['DG' 'DC']
 ['DC' 'DG']
 ['DG' 'DC']]