From single atoms to multi-model structures#

To understand how Atom, AtomArray and AtomArrayStack relate to each other, we will create them from scratch. In an actual application one would usually read a structure from a file, as explained in the next chapter.

import biotite.structure as struc

atom1 = struc.Atom(
    [0,0,0], chain_id="A", res_id=1, res_name="GLY",
    atom_name="N", element="N"
)
atom2 = struc.Atom(
    [0,1,1], chain_id="A", res_id=1, res_name="GLY",
    atom_name="CA", element="C"
)
atom3 = struc.Atom(
    [0,0,2], chain_id="A", res_id=1, res_name="GLY",
    atom_name="C", element="C"
)

The first parameter is the coordinates (internally converted into an ndarray), the other parameters are annotations. The annotations shown in this example are mandatory: The chain ID, residue ID, residue name, insertion code, atom name, element and whether the atom is not in protein/nucleotide chain (hetero). If you miss one of these, they will get a default value. The mandatory annotation categories originate from the ATOM and HETATM records in the PDB format. Additionally, you can specify an arbitrary amount of custom annotations, like B-factors, charge, etc.

In most cases you won’t work with single Atom instances, because one usually deals with entire molecular structures, containing an arbitrary amount of atoms. For this purpose biotite.structure offers the AtomArray. An atom array can be seen as an array of atom instances (hence the name). But instead of storing Atom instances in a list, an AtomArray instance contains one ndarray for each annotation and the coordinates. In order to see this in action, we first have to create an array from the atoms we constructed before. Then we can access the annotations and coordinates of the atom array simply by specifying the attribute.

import numpy as np

array = struc.array([atom1, atom2, atom3])
print("Chain ID:", array.chain_id)
print("Residue ID:", array.res_id)
print("Atom name:", array.atom_name)
print("Coordinates:", array.coord)
print()
print(array)
Chain ID: ['A' 'A' 'A']
Residue ID: [1 1 1]
Atom name: ['N' 'CA' 'C']
Coordinates: [[0. 0. 0.]
 [0. 1. 1.]
 [0. 0. 2.]]

    A       1  GLY N      N         0.000    0.000    0.000
    A       1  GLY CA     C         0.000    1.000    1.000
    A       1  GLY C      C         0.000    0.000    2.000

The array() builder function takes any iterable object containing Atom instances. If you wanted to, you could even use another AtomArray, which functions also as an iterable object of Atom objects. An alternative way of constructing an array would be creating an AtomArray by using its constructor, which fills the annotation arrays and coordinates with the type-specific zero value. In our example all annotation arrays have a length of 3, since we used 3 atoms to create it. A structure containing n atoms is represented by annotation arrays of length n and coordinates of shape (n,3). As the annotations and coordinates are simply ndarray objects, they can be edited using NumPy functionality.

array.chain_id[:] = "B"
array.coord[array.element == "C", 0] = 42
# It is also possible to replace an entire annotation with another array
array.res_id = np.array([1,2,3])
print(array)
    B       1  GLY N      N         0.000    0.000    0.000
    B       2  GLY CA     C        42.000    1.000    1.000
    B       3  GLY C      C        42.000    0.000    2.000

Apart from the structure manipulation functions we see later on, this is the usual way to edit structures in Biotite.

Warning

For editing an annotation, the index must be applied to the annotation and not to the AtomArray. For example, you should write array.chain_id[...] = "B" instead of array[...].chain_id = "B". The latter example is incorrect, as it creates a subarray of the initial AtomArray (discussed in a later chapter) and then tries to replace the annotation array with the new value.

If you want to add further annotation categories to an array, you have to call the add_annotation() or set_annotation() method at first. After that you can access the new annotation array like any other annotation array.

array.add_annotation("foo", dtype=bool)
array.set_annotation("bar", [1, 2, 3])
print(array.foo)
print(array.bar)
[False False False]
[1 2 3]

In some cases, you might need to handle structures, where each atom is present in multiple locations (multiple models in NMR structures, MD trajectories). For these cases AtomArrayStack objects enter the stage: They represent a list of atom arrays with the same atoms in each model/frame, but differing coordinates. Hence the annotation arrays in AtomArrayStack objects still have the same length n as in AtomArray. However, a stack stores the coordinates in a (m,n,3)-shaped ndarray, where m is the number of frames. A stack is constructed with stack() analogous to the code snippet above. It is crucial that all AtomArray objects, that should be stacked, have the same annotation arrays, otherwise an exception is raised. For simplicity reasons, we create a stack containing two identical models, derived from the previous example.

stack = struc.stack([array, array.copy()])
print(stack)
Model 1
    B       1  GLY N      N         0.000    0.000    0.000
    B       2  GLY CA     C        42.000    1.000    1.000
    B       3  GLY C      C        42.000    0.000    2.000

Model 2
    B       1  GLY N      N         0.000    0.000    0.000
    B       2  GLY CA     C        42.000    1.000    1.000
    B       3  GLY C      C        42.000    0.000    2.000