get_sse#

biotite.structure.io.pdbx.get_sse(pdbx_file, data_block=None, match_model=None)[source]#

Get the secondary structure from a PDBx file.

Parameters:
pdbx_fileCIFFile or CIFBlock or BinaryCIFFile or BinaryCIFBlock

The file object. The following categories are required:

  • entity_poly

  • struct_conf (if alpha-helices are present)

  • struct_sheet_range (if beta-strands are present)

  • atom_site (if match_model is set)

data_blockstr, optional

The name of the data block. Default is the first (and most times only) data block of the file. If the data block object is passed directly to pdbx_file, this parameter is ignored.

match_modelNone, optional

If a model number is given, only secondary structure elements for residues are kept, that are resolved in the given model. This means secondary structure elements for residues that would not appear in a corresponding AtomArray from get_structure() are removed. By default, all residues in the sequence are kept.

Returns:
sse_dictdict of str -> ndarray, dtype=str

The dictionary maps the chain ID (derived from auth_asym_id) to the secondary structure of the respective chain.

  • "a": alpha-helix

  • "b": beta-strand

  • "c": coil or not an amino acid

Each secondary structure element corresponds to the label_seq_id of the atom_site category. This means that the 0-th position of the array corresponds to the residue in atom_site with label_seq_id 1.

Examples

>>> import os.path
>>> file = CIFFile.read(os.path.join(path_to_structures, "1aki.cif"))
>>> sse = get_sse(file, match_model=1)
>>> print(sse)
{'A': array(['c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a',
             'a', 'c', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'c', 'c', 'a', 'a',
             'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'c', 'c', 'c',
             'c', 'c', 'c', 'b', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'b', 'b',
             'b', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c',
             'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c',
             'c', 'a', 'a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a',
             'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'c', 'c', 'a',
             'a', 'a', 'a', 'c', 'a', 'a', 'a', 'a', 'a', 'a', 'c', 'c', 'c',
             'c', 'c', 'a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'c', 'c'],
             dtype='<U1')}

If only secondary structure elements for resolved residues are requested, the length of the returned array matches the number of peptide residues in the structure.

>>> file = CIFFile.read(os.path.join(path_to_structures, "3o5r.cif"))
>>> print(len(get_sse(file, match_model=1)["A"]))
128
>>> atoms = get_structure(file, model=1)
>>> atoms = atoms[filter_amino_acids(atoms) & (atoms.chain_id == "A")]
>>> print(get_residue_count(atoms))
128