biotite.structure.CellList¶
- class biotite.structure.CellList(atom_array, cell_size, periodic=False, box=None, selection=None)[source]¶
Bases:
object
This class enables the efficient search of atoms in vicinity of a defined location.
This class stores the indices of an atom array in virtual “cells”, each corresponding to a specific coordinate interval. If the atoms in vicinity of a specific location are searched, only the atoms in the relevant cells are checked. Effectively this decreases the operation time for finding atoms with a maximum distance to given coordinates from O(n) to O(1), after the
CellList
has been created. Therefore aCellList
saves calculation time in those cases, where vicinity is checked for multiple locations.- Parameters:
- atom_arrayAtomArray or ndarray, dtype=float, shape=(n,3)
The
AtomArray
to create theCellList
for. Alternatively the atom coordinates are accepted directly. In this case box must be set, if periodic is true.- cell_sizefloat
The coordinate interval each cell has for x, y and z axis. The amount of cells depends on the range of coordinates in the atom_array and the cell_size.
- periodicbool, optional
If true, the cell list considers periodic copies of atoms. The periodicity is based on the box attribute of atom_array. (Default: False)
- boxndarray, dtype=float, shape=(3,3), optional
If provided, the periodicity is based on this parameter instead of the
box
attribute of atom_array. Only has an effect, if periodic isTrue
.- selectionndarray, dtype=bool, shape=(n,), optional
If provided, only the atoms masked by this array are stored in the cell list. However, the indices stored in the cell list will still refer to the original unfiltered atom_array.
Examples
>>> cell_list = CellList(atom_array, cell_size=5) >>> near_atoms = atom_array[cell_list.get_atoms(np.array([1,2,3]), radius=7.0)]
- create_adjacency_matrix(threshold_distance)¶
Create an adjacency matrix for the atoms in this cell list.
An adjacency matrix depicts which atoms i and j have a distance lower than a given threshold distance. The values in the adjacency matrix
m
arem[i,j] = 1 if distance(i,j) <= threshold else 0
- Parameters:
- threshold_distancefloat
The threshold distance. All atom pairs that have a distance lower than this value are indicated by
True
values in the resulting matrix.
- Returns:
- matrixndarray, dtype=bool, shape=(n,n)
An n x n adjacency matrix. If a selection was given to the constructor of the
CellList
, the rows and columns corresponding to atoms, that are not masked by the selection, have all elements set toFalse
.
Notes
The highest performance is achieved when the the cell size is equal to the threshold distance. However, this is purely optinal: The resulting adjacency matrix is the same for every cell size.
Although the adjacency matrix should be symmetric in most cases, it may occur that
m[i,j] != m[j,i]
, whendistance(i,j)
is very close to the threshold_distance due to numerical errors. The matrix can be symmetrized withnumpy.maximum(a, a.T)
.Examples
Create adjacency matrix for CA atoms in a structure:
>>> atom_array = atom_array[atom_array.atom_name == "CA"] >>> cell_list = CellList(atom_array, 5) >>> matrix = cell_list.create_adjacency_matrix(5)
- get_atoms(coord, radius, as_mask=False)¶
Find atoms with a maximum distance from given coordinates.
- Parameters:
- coordndarray, dtype=float, shape=(3,) or shape=(m,3)
The central coordinates, around which the atoms are searched. If a single position is given, the indices of atoms in its radius are returned. Multiple positions (2-D
ndarray
) have a vectorized behavior: Each row in the resultingndarray
contains the indices for the corresponding position. Since the positions may have different amounts of adjacent atoms, trailing -1 values are used to indicate nonexisting indices.- radiusfloat or ndarray, shape=(n,), dtype=float, optional
The radius around coord, in which the atoms are searched, i.e. all atoms in radius distance to coord are returned. Either a single radius can be given as scalar, or individual radii for each position in coord can be provided as
ndarray
.- as_maskbool, optional
If true, the result is returned as boolean mask, instead of an index array.
- Returns:
- indicesndarray, dtype=int32, shape=(p,) or shape=(m,p)
The indices of the atom array, where the atoms are in the defined radius around coord. If coord contains multiple positions, this return value is two-dimensional with trailing -1 values for empty values. Only returned with as_mask set to false.
- maskndarray, dtype=bool, shape=(m,n) or shape=(n,)
Same as indices, but as boolean mask. The values are true for atoms in the atom array, that are in the defined vicinity. Only returned with as_mask set to true.
See also
Notes
In case of a
CellList
with periodic set to True: If more than one periodic copy of an atom is within the threshold radius, the returned indices array contains the corresponding index multiple times. Please usenumpy.unique()
, if this is undesireable.Examples
Get adjacent atoms for a single position:
>>> cell_list = CellList(atom_array, 3) >>> pos = np.array([1.0, 2.0, 3.0]) >>> indices = cell_list.get_atoms(pos, radius=2.0) >>> print(indices) [102 104 112] >>> print(atom_array[indices]) A 6 TRP CE3 C 0.779 0.524 2.812 A 6 TRP CZ3 C 1.439 0.433 4.053 A 6 TRP HE3 H -0.299 0.571 2.773 >>> indices = cell_list.get_atoms(pos, radius=3.0) >>> print(atom_array[indices]) A 6 TRP CD2 C 1.508 0.564 1.606 A 6 TRP CE3 C 0.779 0.524 2.812 A 6 TRP CZ3 C 1.439 0.433 4.053 A 6 TRP HE3 H -0.299 0.571 2.773 A 6 TRP HZ3 H 0.862 0.400 4.966 A 3 TYR CZ C -0.639 3.053 5.043 A 3 TYR HH H 1.187 3.395 5.567 A 19 PRO HD2 H 0.470 3.937 1.260 A 6 TRP CE2 C 2.928 0.515 1.710 A 6 TRP CH2 C 2.842 0.407 4.120 A 18 PRO HA H 2.719 3.181 1.316 A 18 PRO HB3 H 2.781 3.223 3.618 A 18 PRO CB C 3.035 4.190 3.187
Get adjacent atoms for mutliple positions:
>>> cell_list = CellList(atom_array, 3) >>> pos = np.array([[1.0,2.0,3.0], [2.0,3.0,4.0], [3.0,4.0,5.0]]) >>> indices = cell_list.get_atoms(pos, radius=3.0) >>> print(indices) [[ 99 102 104 112 114 45 55 290 101 105 271 273 268 -1 -1] [104 114 45 46 55 44 54 105 271 273 265 268 269 272 275] [ 46 55 273 268 269 272 274 275 -1 -1 -1 -1 -1 -1 -1]] >>> # Convert to list of arrays and remove trailing -1 >>> indices = [row[row != -1] for row in indices] >>> for row in indices: ... print(row) [ 99 102 104 112 114 45 55 290 101 105 271 273 268] [104 114 45 46 55 44 54 105 271 273 265 268 269 272 275] [ 46 55 273 268 269 272 274 275]
- get_atoms_in_cells(coord, cell_radius=1, as_mask=False)¶
Find atoms with a maximum cell distance from given coordinates.
Instead of using the radius as maximum euclidian distance to the given coordinates, the radius is measured as the amount of cells: A radius of 0 means, that only the atoms in the same cell as the given coordinates are considered. A radius of 1 means, that the atoms indices from this cell and the 8 surrounding cells are returned and so forth. This is more efficient than get_atoms().
- Parameters:
- coordndarray, dtype=float, shape=(3,) or shape=(m,3)
The central coordinates, around which the atoms are searched. If a single position is given, the indices of atoms in its cell radius are returned. Multiple positions (2-D
ndarray
) have a vectorized behavior: Each row in the resultingndarray
contains the indices for the corresponding position. Since the positions may have different amounts of adjacent atoms, trailing -1 values are used to indicate nonexisting indices.- cell_radiusint or ndarray, shape=(n,), dtype=int, optional
The radius around coord (in amount of cells), in which the atoms are searched. This does not correspond to the Euclidian distance used in get_atoms(). In this case, all atoms in the cell corresponding to coord and in adjacent cells are returned. Either a single radius can be given as scalar, or individual radii for each position in coord can be provided as
ndarray
. By default atoms are searched in the cell of coord and directly adjacent cells (cell_radius = 1).- as_maskbool, optional
If true, the result is returned as boolean mask, instead of an index array.
- Returns:
- indicesndarray, dtype=int32, shape=(p,) or shape=(m,p)
The indices of the atom array, where the atoms are in the defined radius around coord. If coord contains multiple positions, this return value is two-dimensional with trailing -1 values for empty values. Only returned with as_mask set to false.
- maskndarray, dtype=bool, shape=(m,n) or shape=(n,)
Same as indices, but as boolean mask. The values are true for atoms in the atom array, that are in the defined vicinity. Only returned with as_mask set to true.
See also
Notes
In case of a
CellList
with periodic set to True: If more than one periodic copy of an atom is within the threshold radius, the returned indices array contains the corresponding index multiple times. Please usenumpy.unique()
, if this is undesireable.
Gallery¶
Construction of an adjacency matrix
Contact sites of protein-DNA interaction
Identification of lipid bilayer leaflets
Cavity solvation in different states of HCN4