Representing bonds#
Up to now we only looked into atom arrays whose atoms are merely described by
its coordinates and annotations.
But there is more:
Chemical bonds can be described, too, using a BondList.
Consider the following case, where our AtomArray contains four atoms:
N, CA, C and CB. CA is a central atom that is connected to
N, C and CB.
A BondList is created by passing a ndarray containing pairs
of integers, where each integer represents an index in a corresponding
AtomArray.
The pairs indicate which atoms share a bond.
Additionally, it is required to specify the number of atoms in the
AtomArray.
import biotite.structure as struc
array = struc.array([
struc.Atom([0,0,0], atom_name="N"),
struc.Atom([0,0,0], atom_name="CA"),
struc.Atom([0,0,0], atom_name="C"),
struc.Atom([0,0,0], atom_name="CB")
])
print("Atoms:", array.atom_name)
bond_list = struc.BondList(
array.array_length(),
np.array([[1,0], [1,2], [1,3]])
)
print("Bonds (indices and type):")
print(bond_list.as_array())
print("Bonds (atoms names):")
print(array.atom_name[bond_list.as_array()[:, :2]])
ca_bonds, ca_bond_types = bond_list.get_bonds(1)
print("Bonds of CA:", array.atom_name[ca_bonds])
Atoms: ['N' 'CA' 'C' 'CB']
Bonds (indices and type):
[[0 1 0]
[1 2 0]
[1 3 0]]
Bonds (atoms names):
[['N' 'CA']
['CA' 'C']
['CA' 'CB']]
Bonds of CA: ['N' 'C' 'CB']
When you look at the internal ndarray (as given by
BondList.as_array()), you see a third column containing zeros.
This column describes each bond with values from the BondType enum:
0 corresponds to BondType.ANY, which means that the type of the bond
is undefined.
This makes sense, since we did not define the bond types, when we created the
BondList.
The other thing that has changed is the index order:
Each bond is sorted so that the index with the lower index is the
first element.
Although a BondList uses a ndarray under the hood, indexing
works a little bit different:
The indexing operation is not applied to the internal ndarray, instead
it behaves like the same indexing operation was applied to a corresponding atom
array:
The bond list adjusts its indices so that they still point to the same atoms as
before.
Bonds that involve at least one atom, that has been removed, are
deleted as well.
We will try that by deleting the C atom.
mask = (array.atom_name != "C")
sub_array = array[mask]
sub_bond_list = bond_list[mask]
print("Atoms:", sub_array.atom_name)
print("Bonds (indices and type):")
print(sub_bond_list.as_array())
print("Bonds (atoms names):")
print(sub_array.atom_name[sub_bond_list.as_array()[:, :2]])
Atoms: ['N' 'CA' 'CB']
Bonds (indices and type):
[[0 1 0]
[1 2 0]]
Bonds (atoms names):
[['N' 'CA']
['CA' 'CB']]
As you see, the bond involving the C atom is removed and the remaining
indices are shifted.
Connecting atoms and bonds#
We do not need to index the atom array and the bond list separately.
For the sake of convenience you can associate a BondList to an
AtomArray via the bonds attribute.
If no BondList is associated, bonds is None.
Every time the atom array is indexed, the index is also applied to the
associated bond list.
array.bonds = bond_list
sub_array = array[array.atom_name != "C"]
print("Bonds (atoms names):")
print(sub_array.atom_name[sub_array.bonds.as_array()[:, :2]])
Bonds (atoms names):
[['N' 'CA']
['CA' 'CB']]
Keep in mind, that some functionalities in Biotite even require that the
input AtomArray or AtomArrayStack has an associated
BondList.
Reading and writing bonds#
Up to now the bond information has been created manually, which is impractical in most cases. Instead bond information can be loaded from and saved to most file formats. We’ll try that on the structure of TC5b and look at the bond information of the third residue, a tyrosine.
from tempfile import gettempdir
import biotite.database.rcsb as rcsb
import biotite.structure.io.pdbx as pdbx
file_path = rcsb.fetch("1l2y", "bcif", gettempdir())
pdbx_file = pdbx.BinaryCIFFile.read(file_path)
# Essential: set the 'include_bonds' parameter to true
stack = pdbx.get_structure(pdbx_file, include_bonds=True)
tyrosine = stack[:, (stack.res_id == 3)]
print("Bonds (indices and type):")
print(tyrosine.bonds.as_array())
print("Bonds (atoms names):")
print(tyrosine.atom_name[tyrosine.bonds.as_array()[:, :2]])
Bonds (indices and type):
[[ 0 1 1]
[ 0 12 1]
[ 1 2 1]
[ 1 4 1]
[ 1 13 1]
[ 2 3 2]
[ 4 5 1]
[ 4 14 1]
[ 4 15 1]
[ 5 6 6]
[ 5 7 5]
[ 6 8 5]
[ 6 16 1]
[ 7 9 6]
[ 7 17 1]
[ 8 10 6]
[ 8 18 1]
[ 9 10 5]
[ 9 19 1]
[10 11 1]
[11 20 1]]
Bonds (atoms names):
[['N' 'CA']
['N' 'H']
['CA' 'C']
['CA' 'CB']
['CA' 'HA']
['C' 'O']
['CB' 'CG']
['CB' 'HB2']
['CB' 'HB3']
['CG' 'CD1']
['CG' 'CD2']
['CD1' 'CE1']
['CD1' 'HD1']
['CD2' 'CE2']
['CD2' 'HD2']
['CE1' 'CZ']
['CE1' 'HE1']
['CE2' 'CZ']
['CE2' 'HE2']
['CZ' 'OH']
['OH' 'HH']]
Not only the connected atoms, but also the bond types are defined:
Here we have both, BondType.SINGLE and BondType.DOUBLE bonds
(enum values 1 and 2, respectively).
Bond information can also be automatically inferred from an AtomArray
or AtomArrayStack:
connect_via_residue_names() is able to connect atoms in all residues
that appear in the
Chemical Component Dictionary, comprising
every molecule that appears in any PDB entry.
stack = pdbx.get_structure(pdbx_file, include_bonds=False)
stack.bonds = struc.connect_via_residue_names(stack)
tyrosine = stack[:, (stack.res_id == 3)]
print("Bonds (indices):")
print(tyrosine.bonds.as_array())
print("Bonds (atoms names):")
print(tyrosine.atom_name[tyrosine.bonds.as_array()[:, :2]])
Bonds (indices):
[[ 0 1 1]
[ 0 12 1]
[ 1 2 1]
[ 1 4 1]
[ 1 13 1]
[ 2 3 2]
[ 4 5 1]
[ 4 14 1]
[ 4 15 1]
[ 5 6 6]
[ 5 7 5]
[ 6 8 5]
[ 6 16 1]
[ 7 9 6]
[ 7 17 1]
[ 8 10 6]
[ 8 18 1]
[ 9 10 5]
[ 9 19 1]
[10 11 1]
[11 20 1]]
Bonds (atoms names):
[['N' 'CA']
['N' 'H']
['CA' 'C']
['CA' 'CB']
['CA' 'HA']
['C' 'O']
['CB' 'CG']
['CB' 'HB2']
['CB' 'HB3']
['CG' 'CD1']
['CG' 'CD2']
['CD1' 'CE1']
['CD1' 'HD1']
['CD2' 'CE2']
['CD2' 'HD2']
['CE1' 'CZ']
['CE1' 'HE1']
['CE2' 'CZ']
['CE2' 'HE2']
['CZ' 'OH']
['OH' 'HH']]
Filtering and editing bonds#
The recommended way to apply changes to a BondList (apart from adding/removing
single bonds) is to use the ndarray obtained via BondList.as_array()
as transient representation and creating a new BondList from the modified
ndarray.
# Transiently convert the bond list to an array
bond_array = tyrosine.bonds.as_array()
# As an example, remove all single bonds
bond_array = bond_array[bond_array[:, 2] != struc.BondType.SINGLE]
# Create a new bond list from the modified array
tyrosine.bonds = struc.BondList(tyrosine.array_length(), bond_array)
print(tyrosine.atom_name[tyrosine.bonds.as_array()[:, :2]])
[['C' 'O']
['CG' 'CD1']
['CG' 'CD2']
['CD1' 'CE1']
['CD2' 'CE2']
['CE1' 'CZ']
['CE2' 'CZ']]