biotite.structure.filter_highest_occupancy_altloc

biotite.structure.filter_highest_occupancy_altloc(atoms, altloc_ids, occupancies)[source]

For each residue, filter all atoms, that have the altloc ID with the highest occupancy for this residue.

Structure files (PDB, PDBx) allow for duplicate atom records, in case a residue is found in multiple alternate locations (altloc). This function is used to remove such duplicate atoms by choosing a single altloc ID for an atom with other altlocs being removed.

Parameters
atomsAtomArray, shape=(n,) or AtomArrayStack, shape=(m,n)

The unfiltered structure to be filtered.

altloc_idsndarray, shape=(n,), dtype=’U1’

An array containing the alternate location IDs for each atom in atoms. Can contain ‘.’, ‘?’, ‘ ‘, ‘’ or a letter at each position.

occupanciesndarray, shape=(n,), dtype=float

An array containing the occupancy values for each atom in atoms.

Returns
filterndarray, dtype=bool

For each residue, this array is True in the following cases:

  • The atom has no altloc ID ('.', '?', ' ', '').

  • The atom has the altloc ID (e.g. 'A', 'B', etc.), of which the corresponding occupancy values are highest for the entire residue.

Notes

The function will be rarely used by the end user, since this kind of filtering is usually automatically performed, when the structure is loaded from a file. The exception are structures that were read with altloc set to True.

Examples

>>> atoms = array([
...     Atom(coord=[1, 2, 3], res_id=1, atom_name="CA"),
...     Atom(coord=[4, 5, 6], res_id=1, atom_name="CB"),
...     Atom(coord=[6, 5, 4], res_id=1, atom_name="CB")
... ])
>>> altloc_ids = np.array([".", "A", "B"])
>>> occupancies = np.array([1.0, 0.1, 0.9])
>>> filtered = atoms[filter_highest_occupancy_altloc(
...     atoms, altloc_ids, occupancies
... )]
>>> print(filtered)
            1      CA               1.000    2.000    3.000
            1      CB               6.000    5.000    4.000