Interface to RDKit#
RDKit is a popular cheminformatics package
and thus can be used to supplement Biotite with a variety of functionalities focused
on small molecules, such as conversion from/to textual representations
(e.g. SMILES and InChI) and visualization as structural formulas.
Basically, the biotite.interface.rdkit subpackage provides only two functions:
to_mol() to obtain a rdkit.Chem.rdchem.Mol from an AtomArray
and from_mol() for the reverse direction.
The rest happens within the realm of RDKit.
This tutorial will only give a small glance on how the interface can be used.
For comprehensive documentation refer to the
RDKit documentation.
First example: Depiction as structural formula#
RDKit allows rendering structural formulas using pillow. For a proper structural formula, we need to compute proper 2D coordinates first.
import biotite.interface.rdkit as rdkit_interface
import biotite.structure.info as struc
import rdkit.Chem.AllChem as Chem
from rdkit.Chem.Draw import MolToImage
penicillin = struc.residue("PNN")
mol = rdkit_interface.to_mol(penicillin)
# We do not want to include explicit hydrogen atoms in the structural formula
mol = Chem.RemoveHs(mol)
Chem.Compute2DCoords(mol)
image = MolToImage(mol, size=(600, 400))
display(image)
Second example: Creating a molecule from SMILES#
Although the Chemical Component Dictionary accessible from
biotite.structure.info already provides all compounds found in the PDB,
there are a myriad of compounds out there that are not part of it.
One way to to obtain them as AtomArray is passing a SMILES string to
RDKit to obtain the topology of the molecule and then computing the coordinates.
ERTAPENEM_SMILES = "C[C@@H]1[C@@H]2[C@H](C(=O)N2C(=C1S[C@H]3C[C@H](NC3)C(=O)NC4=CC=CC(=C4)C(=O)O)C(=O)O)[C@@H](C)O"
mol = Chem.MolFromSmiles(ERTAPENEM_SMILES)
# RDKit uses implicit hydrogen atoms by default, but Biotite requires explicit ones
mol = Chem.AddHs(mol)
# Create a 3D conformer
conformer_id = Chem.EmbedMolecule(mol)
Chem.UFFOptimizeMolecule(mol)
ertapenem = rdkit_interface.from_mol(mol, conformer_id)
print(ertapenem)
0 C -3.565 1.846 1.730
0 C -3.308 1.007 0.473
0 C -4.556 0.264 -0.110
0 C -5.818 0.140 0.608
0 C -5.508 -1.318 0.782
0 O -6.084 -2.297 1.325
0 N -4.192 -1.100 0.139
0 C -2.871 -1.312 0.559
0 C -2.314 -0.118 0.756
0 S -0.625 0.174 1.307
0 C 0.039 0.453 -0.378
0 C 1.450 0.984 -0.335
0 C 2.042 0.529 -1.666
0 N 1.300 -0.664 -2.100
0 C 0.171 -0.830 -1.182
0 C 3.514 0.236 -1.540
0 O 3.954 -0.916 -1.807
0 N 4.415 1.289 -1.161
0 C 5.781 1.081 -0.761
0 C 6.729 2.081 -1.020
0 C 8.057 1.916 -0.621
0 C 8.449 0.758 0.053
0 C 7.511 -0.248 0.346
0 C 6.175 -0.074 -0.058
0 C 7.917 -1.469 1.077
0 O 9.110 -1.621 1.456
0 O 6.984 -2.464 1.354
0 C -2.310 -2.644 0.840
0 O -2.972 -3.671 0.534
0 O -1.074 -2.786 1.462
0 C -7.092 0.439 -0.230
0 C -7.353 -0.527 -1.402
0 O -8.219 0.480 0.610
0 H -3.846 1.200 2.589
0 H -2.644 2.403 2.002
0 H -4.370 2.586 1.536
0 H -2.889 1.672 -0.313
0 H -4.652 0.440 -1.204
0 H -5.850 0.650 1.591
0 H -0.599 1.178 -0.932
0 H 2.003 0.529 0.519
0 H 1.464 2.091 -0.238
0 H 1.910 1.346 -2.413
0 H 0.924 -0.492 -3.063
0 H -0.770 -1.047 -1.735
0 H 0.394 -1.687 -0.511
0 H 4.068 2.273 -1.231
0 H 6.439 2.986 -1.539
0 H 8.783 2.689 -0.834
0 H 9.485 0.653 0.352
0 H 5.437 -0.822 0.200
0 H 7.249 -3.302 1.857
0 H -0.695 -3.704 1.665
0 H -6.971 1.456 -0.661
0 H -6.495 -0.553 -2.104
0 H -8.247 -0.189 -1.966
0 H -7.547 -1.556 -1.031
0 H -8.317 -0.418 1.022