biotite.structure.CellList

class biotite.structure.CellList(atom_array, cell_size, periodic=False, box=None, selection=None)[source]

Bases: object

This class enables the efficient search of atoms in vicinity of a defined location.

This class stores the indices of an atom array in virtual “cells”, each corresponding to a specific coordinate interval. If the atoms in vicinity of a specific location are searched, only the atoms in the relevant cells are checked. Effectively this decreases the operation time for finding atoms with a maximum distance to given coordinates from O(n) to O(1), after the CellList has been created. Therefore a CellList saves calculation time in those cases, where vicinity is checked for multiple locations.

Parameters
atom_arrayAtomArray or ndarray, dtype=float, shape=(n,3)

The AtomArray to create the CellList for. Alternatively the atom coordinates are accepted directly. In this case box must be set, if periodic is true.

cell_sizefloat

The coordinate interval each cell has for x, y and z axis. The amount of cells depends on the range of coordinates in the atom_array and the cell_size.

periodicbool, optional

If true, the cell list considers periodic copies of atoms. The periodicity is based on the box attribute of atom_array. (Default: False)

boxndarray, dtype=float, shape=(3,3), optional

If provided, the periodicity is based on this parameter instead of the box attribute of atom_array. Only has an effect, if periodic is True.

selectionndarray, dtype=bool, shape=(n,), optional

If provided, only the atoms masked by this array are stored in the cell list. However, the indices stored in the cell list will still refer to the original unfiltered atom_array.

Examples

>>> cell_list = CellList(atom_array, cell_size=5)
>>> near_atoms = atom_array[cell_list.get_atoms(np.array([1,2,3]), radius=7.0)]
create_adjacency_matrix(threshold_distance)

Create an adjacency matrix for the atoms in this cell list.

An adjacency matrix depicts which atoms i and j have a distance lower than a given threshold distance. The values in the adjacency matrix m are m[i,j] = 1 if distance(i,j) <= threshold else 0

Parameters
threshold_distancefloat

The threshold distance. All atom pairs that have a distance lower than this value are indicated by True values in the resulting matrix.

Returns
matrixndarray, dtype=bool, shape=(n,n)

An n x n adjacency matrix. If a selection was given to the constructor of the CellList, the rows and columns corresponding to atoms, that are not masked by the selection, have all elements set to False.

Notes

The highest performance is achieved when the the cell size is equal to the threshold distance. However, this is purely optinal: The resulting adjacency matrix is the same for every cell size.

Although the adjacency matrix should be symmetric in most cases, it may occur that m[i,j] != m[j,i], when distance(i,j) is very close to the threshold_distance due to numerical errors. The matrix can be symmetrized with numpy.maximum(a, a.T).

Examples

Create adjacency matrix for CA atoms in a structure:

>>> atom_array = atom_array[atom_array.atom_name == "CA"]
>>> cell_list = CellList(atom_array, 5)
>>> matrix = cell_list.create_adjacency_matrix(5)
get_atoms(coord, radius, as_mask=False)

Find atoms with a maximum distance from given coordinates.

Parameters
coordndarray, dtype=float, shape=(3,) or shape=(m,3)

The central coordinates, around which the atoms are searched. If a single position is given, the indices of atoms in its radius are returned. Multiple positions (2-D ndarray) have a vectorized behavior: Each row in the resulting ndarray contains the indices for the corresponding position. Since the positions may have different amounts of adjacent atoms, trailing -1 values are used to indicate nonexisting indices.

radiusfloat or ndarray, shape=(n,), dtype=float, optional

The radius around coord, in which the atoms are searched, i.e. all atoms in radius distance to coord are returned. Either a single radius can be given as scalar, or individual radii for each position in coord can be provided as ndarray.

as_maskbool, optional

If true, the result is returned as boolean mask, instead of an index array.

Returns
indicesndarray, dtype=int32, shape=(p,) or shape=(m,p)

The indices of the atom array, where the atoms are in the defined radius around coord. If coord contains multiple positions, this return value is two-dimensional with trailing -1 values for empty values. Only returned with as_mask set to false.

maskndarray, dtype=bool, shape=(m,n) or shape=(n,)

Same as indices, but as boolean mask. The values are true for atoms in the atom array, that are in the defined vicinity. Only returned with as_mask set to true.

Notes

In case of a CellList with periodic set to True: If more than one periodic copy of an atom is within the threshold radius, the returned indices array contains the corresponding index multiple times. Please use numpy.unique(), if this is undesireable.

Examples

Get adjacent atoms for a single position:

>>> cell_list = CellList(atom_array, 3)
>>> pos = np.array([1.0, 2.0, 3.0])
>>> indices = cell_list.get_atoms(pos, radius=2.0)
>>> print(indices)
[102 104 112]
>>> print(atom_array[indices])
    A       6  TRP CE3    C         0.779    0.524    2.812
    A       6  TRP CZ3    C         1.439    0.433    4.053
    A       6  TRP HE3    H        -0.299    0.571    2.773
>>> indices = cell_list.get_atoms(pos, radius=3.0)
>>> print(atom_array[indices])
    A       6  TRP CD2    C         1.508    0.564    1.606
    A       6  TRP CE3    C         0.779    0.524    2.812
    A       6  TRP CZ3    C         1.439    0.433    4.053
    A       6  TRP HE3    H        -0.299    0.571    2.773
    A       6  TRP HZ3    H         0.862    0.400    4.966
    A       3  TYR CZ     C        -0.639    3.053    5.043
    A       3  TYR HH     H         1.187    3.395    5.567
    A      19  PRO HD2    H         0.470    3.937    1.260
    A       6  TRP CE2    C         2.928    0.515    1.710
    A       6  TRP CH2    C         2.842    0.407    4.120
    A      18  PRO HA     H         2.719    3.181    1.316
    A      18  PRO HB3    H         2.781    3.223    3.618
    A      18  PRO CB     C         3.035    4.190    3.187

Get adjacent atoms for mutliple positions:

>>> cell_list = CellList(atom_array, 3)
>>> pos = np.array([[1.0,2.0,3.0], [2.0,3.0,4.0], [3.0,4.0,5.0]])
>>> indices = cell_list.get_atoms(pos, radius=3.0)
>>> print(indices)
[[ 99 102 104 112 114  45  55 290 101 105 271 273 268  -1  -1]
 [104 114  45  46  55  44  54 105 271 273 265 268 269 272 275]
 [ 46  55 273 268 269 272 274 275  -1  -1  -1  -1  -1  -1  -1]]
>>> # Convert to list of arrays and remove trailing -1
>>> indices = [row[row != -1] for row in indices]
>>> for row in indices:
...     print(row)
[ 99 102 104 112 114  45  55 290 101 105 271 273 268]
[104 114  45  46  55  44  54 105 271 273 265 268 269 272 275]
[ 46  55 273 268 269 272 274 275]
get_atoms_in_cells(coord, cell_radius=1, as_mask=False)

Find atoms with a maximum cell distance from given coordinates.

Instead of using the radius as maximum euclidian distance to the given coordinates, the radius is measured as the amount of cells: A radius of 0 means, that only the atoms in the same cell as the given coordinates are considered. A radius of 1 means, that the atoms indices from this cell and the 8 surrounding cells are returned and so forth. This is more efficient than get_atoms().

Parameters
coordndarray, dtype=float, shape=(3,) or shape=(m,3)

The central coordinates, around which the atoms are searched. If a single position is given, the indices of atoms in its cell radius are returned. Multiple positions (2-D ndarray) have a vectorized behavior: Each row in the resulting ndarray contains the indices for the corresponding position. Since the positions may have different amounts of adjacent atoms, trailing -1 values are used to indicate nonexisting indices.

cell_radiusint or ndarray, shape=(n,), dtype=int, optional

The radius around coord (in amount of cells), in which the atoms are searched. This does not correspond to the Euclidian distance used in get_atoms(). In this case, all atoms in the cell corresponding to coord and in adjacent cells are returned. Either a single radius can be given as scalar, or individual radii for each position in coord can be provided as ndarray. By default atoms are searched in the cell of coord and directly adjacent cells (cell_radius = 1).

as_maskbool, optional

If true, the result is returned as boolean mask, instead of an index array.

Returns
indicesndarray, dtype=int32, shape=(p,) or shape=(m,p)

The indices of the atom array, where the atoms are in the defined radius around coord. If coord contains multiple positions, this return value is two-dimensional with trailing -1 values for empty values. Only returned with as_mask set to false.

maskndarray, dtype=bool, shape=(m,n) or shape=(n,)

Same as indices, but as boolean mask. The values are true for atoms in the atom array, that are in the defined vicinity. Only returned with as_mask set to true.

See also

get_atoms

Notes

In case of a CellList with periodic set to True: If more than one periodic copy of an atom is within the threshold radius, the returned indices array contains the corresponding index multiple times. Please use numpy.unique(), if this is undesireable.