Interface to RDKit#

RDKit is a popular cheminformatics package and thus can be used to supplement Biotite with a variety of functionalities focused on small molecules, such as conversion from/to textual representations (e.g. SMILES and InChI) and visualization as structural formulas. Basically, the biotite.interface.rdkit subpackage provides only two functions: to_mol() to obtain a rdkit.Chem.rdchem.Mol from an AtomArray and from_mol() for the reverse direction. The rest happens within the realm of RDKit. This tutorial will only give a small glance on how the interface can be used. For comprehensive documentation refer to the RDKit documentation.

First example: Depiction as structural formula#

RDKit allows rendering structural formulas using pillow. For a proper structural formula, we need to compute proper 2D coordinates first.

import biotite.interface.rdkit as rdkit_interface
import biotite.structure.info as struc
import rdkit.Chem.AllChem as Chem
from rdkit.Chem.Draw import MolToImage

penicillin = struc.residue("PNN")
mol = rdkit_interface.to_mol(penicillin)
# We do not want to include explicit hydrogen atoms in the structural formula
mol = Chem.RemoveHs(mol)
Chem.Compute2DCoords(mol)
image = MolToImage(mol, size=(600, 400))
display(image)
../../_images/rdkit_2_0.png

Second example: Creating a molecule from SMILES#

Although the Chemical Component Dictionary accessible from biotite.structure.info already provides all compounds found in the PDB, there are a myriad of compounds out there that are not part of it. One way to to obtain them as AtomArray is passing a SMILES string to RDKit to obtain the topology of the molecule and then computing the coordinates.

ERTAPENEM_SMILES = "C[C@@H]1[C@@H]2[C@H](C(=O)N2C(=C1S[C@H]3C[C@H](NC3)C(=O)NC4=CC=CC(=C4)C(=O)O)C(=O)O)[C@@H](C)O"

mol = Chem.MolFromSmiles(ERTAPENEM_SMILES)
# RDKit uses implicit hydrogen atoms by default, but Biotite requires explicit ones
mol = Chem.AddHs(mol)
# Create a 3D conformer
conformer_id = Chem.EmbedMolecule(mol)
Chem.UFFOptimizeMolecule(mol)
ertapenem = rdkit_interface.from_mol(mol, conformer_id)
print(ertapenem)
            0             C        -3.565    1.846    1.730
            0             C        -3.308    1.007    0.473
            0             C        -4.556    0.264   -0.110
            0             C        -5.818    0.140    0.608
            0             C        -5.508   -1.318    0.782
            0             O        -6.084   -2.297    1.325
            0             N        -4.192   -1.100    0.139
            0             C        -2.871   -1.312    0.559
            0             C        -2.314   -0.118    0.756
            0             S        -0.625    0.174    1.307
            0             C         0.039    0.453   -0.378
            0             C         1.450    0.984   -0.335
            0             C         2.042    0.529   -1.666
            0             N         1.300   -0.664   -2.100
            0             C         0.171   -0.830   -1.182
            0             C         3.514    0.236   -1.540
            0             O         3.954   -0.916   -1.807
            0             N         4.415    1.289   -1.161
            0             C         5.781    1.081   -0.761
            0             C         6.729    2.081   -1.020
            0             C         8.057    1.916   -0.621
            0             C         8.449    0.758    0.053
            0             C         7.511   -0.248    0.346
            0             C         6.175   -0.074   -0.058
            0             C         7.917   -1.469    1.077
            0             O         9.110   -1.621    1.456
            0             O         6.984   -2.464    1.354
            0             C        -2.310   -2.644    0.840
            0             O        -2.972   -3.671    0.534
            0             O        -1.074   -2.786    1.462
            0             C        -7.092    0.439   -0.230
            0             C        -7.353   -0.527   -1.402
            0             O        -8.219    0.480    0.610
            0             H        -3.846    1.200    2.589
            0             H        -2.644    2.403    2.002
            0             H        -4.370    2.586    1.536
            0             H        -2.889    1.672   -0.313
            0             H        -4.652    0.440   -1.204
            0             H        -5.850    0.650    1.591
            0             H        -0.599    1.178   -0.932
            0             H         2.003    0.529    0.519
            0             H         1.464    2.091   -0.238
            0             H         1.910    1.346   -2.413
            0             H         0.924   -0.492   -3.063
            0             H        -0.770   -1.047   -1.735
            0             H         0.394   -1.687   -0.511
            0             H         4.068    2.273   -1.231
            0             H         6.439    2.986   -1.539
            0             H         8.783    2.689   -0.834
            0             H         9.485    0.653    0.352
            0             H         5.437   -0.822    0.200
            0             H         7.249   -3.302    1.857
            0             H        -0.695   -3.704    1.665
            0             H        -6.971    1.456   -0.661
            0             H        -6.495   -0.553   -2.104
            0             H        -8.247   -0.189   -1.966
            0             H        -7.547   -1.556   -1.031
            0             H        -8.317   -0.418    1.022