# biotite.structure.CellList¶

class biotite.structure.CellList(atom_array, cell_size, periodic=False, box=None)

Bases: object

This class enables the efficient search of atoms in vicinity of a defined location.

This class stores the indices of an atom array in virtual “cells”, each corresponding to a specific coordinate interval. If the atoms in vicinity of a specific location are searched, only the atoms in the relevant cells are checked. Effectively this decreases the operation time for finding atoms with a maximum distance to given coordinates from O(n) to O(1), after the CellList has been created. Therefore a CellList saves calculation time in those cases, where vicinity is checked for multiple locations.

Parameters
atom_arrayAtomArray or ndarray, dtype=float, shape=(n,3)

The AtomArray to create the CellList for. Alternatively the atom coordiantes are accepted directly. In this case box must be set, if periodic is true.

cell_sizefloat

The coordinate interval each cell has for x, y and z axis. The amount of cells depends on the range of coordinates in the atom_array and the cell_size.

periodicbool, optional

If true, the cell list considers periodic copies of atoms. The periodicity is based on the box attribute of atom_array. (Default: False)

boxndarray, dtype=float, shape=(3,3), optional

If provided, this parameter will be used instead of the box attribute of atom_array.

Examples

>>> cell_list = CellList(atom_array, cell_size=5)

create_adjacency_matrix(threshold_distance)

Create an adjacency matrix for the atoms in this cell list.

An adjacency matrix depicts which atoms i and j have a distance lower than a given threshold distance. The values in the adjacency matrix m are m[i,j] = 1 if distance(i,j) <= threshold else 0

Parameters
threshold_distancefloat

The threshold distance. All atom pairs that have a distance lower than this value are indicated by True values in the resulting matrix.

Returns
matrixndarray, dtype=bool

An n x n adjacency matrix.

Notes

The highest performance is achieved when the the cell size is equal to the threshold distance. However, this is purely optinal: The resulting adjacency matrix is the same for every cell size.

Although the adjacency matrix should be symmetric in most cases, it may occur that m[i,j] != m[j,i], when distance(i,j) is very close to the threshold_distance due to numerical errors. The matrix can be symmetrized with numpy.maximum(a, a.T).

Examples

Create adjacency matrix for CA atoms in a structure:

>>> atom_array = atom_array[atom_array.atom_name == "CA"]
>>> cell_list = CellList(atom_array, 5)

get_atoms(coord, radius, as_mask=False)

Find atoms with a maximum distance from given coordinates.

Parameters
coordndarray, dtype=float, shape=(3,) or shape=(m,3)

The central coordinates, around which the atoms are searched. If a single position is given, the indices of atoms in its radius are returned. Multiple positions (2-D ndarray) have a vectorized behavior: Each row in the resulting ndarray contains the indices for the corresponding position. Since the positions may have different amounts of adjacent atoms, trailing -1 values are used to indicate nonexisting indices.

radiusfloat or ndarray, shape=(n,), dtype=float, optional

The radius around coord, in which the atoms are searched, i.e. all atoms in radius distance to coord are returned. Either a single radius can be given as scalar, or individual radii for each position in coord can be provided as ndarray.

If true, the result is returned as boolean mask, instead of an index array

Returns
indicesndarray, dtype=int32, shape=(n,) or shape=(m,n)

The indices of the atom array, where the atoms are in the defined radius around coord. If coord contains multiple positions, this return value is two-dimensional with trailing -1 values for empty values. Only returned with as_mask set to false.

Same as indices, but as boolean mask. The values are true for atoms in the atom array, that are in the defined vicinity. Only returned with as_mask set to true.

Examples

Get adjacent atoms for a single position:

>>> cell_list = CellList(atom_array, 3)
>>> pos = np.array([1.0, 2.0, 3.0])
>>> print(indices)
[102 104 112]
>>> print(atom_array[indices])
A       6 TRP CE3    C         0.779    0.524    2.812
A       6 TRP CZ3    C         1.439    0.433    4.053
A       6 TRP HE3    H        -0.299    0.571    2.773
>>> print(atom_array[indices])
A       6 TRP CD2    C         1.508    0.564    1.606
A       6 TRP CE3    C         0.779    0.524    2.812
A       6 TRP CZ3    C         1.439    0.433    4.053
A       6 TRP HE3    H        -0.299    0.571    2.773
A       6 TRP HZ3    H         0.862    0.400    4.966
A       3 TYR CZ     C        -0.639    3.053    5.043
A       3 TYR HH     H         1.187    3.395    5.567
A      19 PRO HD2    H         0.470    3.937    1.260
A       6 TRP CE2    C         2.928    0.515    1.710
A       6 TRP CH2    C         2.842    0.407    4.120
A      18 PRO HA     H         2.719    3.181    1.316
A      18 PRO HB3    H         2.781    3.223    3.618
A      18 PRO CB     C         3.035    4.190    3.187


Get adjacent atoms for mutliple positions:

>>> cell_list = CellList(atom_array, 3)
>>> pos = np.array([[1.0,2.0,3.0], [2.0,3.0,4.0], [3.0,4.0,5.0]])
>>> print(indices)
[[ 99 102 104 112 114  45  55 290 101 105 271 273 268  -1  -1]
[104 114  45  46  55  44  54 105 271 273 265 268 269 272 275]
[ 46  55 273 268 269 272 274 275  -1  -1  -1  -1  -1  -1  -1]]
>>> # Convert to list of arrays and remove trailing -1
>>> indices = [row[row != -1] for row in indices]
>>> for row in indices:
...     print(row)
[ 99 102 104 112 114  45  55 290 101 105 271 273 268]
[104 114  45  46  55  44  54 105 271 273 265 268 269 272 275]
[ 46  55 273 268 269 272 274 275]

get_atoms_in_cells(coord, cell_radius=1, as_mask=False)

Find atoms with a maximum cell distance from given coordinates.

Instead of using the radius as maximum euclidian distance to the given coordinates, the radius is measured as the amount of cells: A radius of 0 means, that only the atoms in the same cell as the given coordinates are considered. A radius of 1 means, that the atoms indices from this cell and the 8 surrounding cells are returned and so forth. This is more efficient than get_atoms().

Parameters
coordndarray, dtype=float, shape=(3,) or shape=(m,3)

The central coordinates, around which the atoms are searched. If a single position is given, the indices of atoms in its cell radius are returned. Multiple positions (2-D ndarray) have a vectorized behavior: Each row in the resulting ndarray contains the indices for the corresponding position. Since the positions may have different amounts of adjacent atoms, trailing -1 values are used to indicate nonexisting indices.

cell_radiusint or ndarray, shape=(n,), dtype=int, optional

The radius around coord (in amount of cells), in which the atoms are searched. This does not correspond to the Euclidian distance used in get_atoms(). In this case, all atoms in the cell corresponding to coord and in adjacent cells are returned. Either a single radius can be given as scalar, or individual radii for each position in coord can be provided as ndarray. By default atoms are searched in the cell of coord and directly adjacent cells (cell_radius = 1).

Returns
indicesndarray, dtype=int32, shape=(n,) or shape=(m,n)

The indices of the atom array, where the atoms are in the defined radius around coord. If coord contains multiple positions, this return value is two-dimensional with trailing -1 values for empty values. Only returned with as_mask set to false.

Same as indices, but as boolean mask. The values are true for atoms in the atom array, that are in the defined vicinity. Only returned with as_mask set to true.

post_process()

Post process the resulting indices of adjacent atoms, including periodicity handling and optional conversion into a boolean matrix.