to_sequence#

biotite.structure.to_sequence(atoms, allow_hetero=False)[source]#

Convert each chain in a structure into a sequence.

Parameters:
atomsAtomArray or AtomArrayStack

The structure. May contain multiple chains. Each chain must be either a peptide or a nucleic acid.

allow_heterobool, optional

If true, residues inside a amino acid or nucleotide chain, that have no one-letter code, are replaced by the respective ‘any’ symbol (“X” or “N”, respectively). The same is true for amino acids in nucleotide chains and vice versa. By default, an exception is raised.

Returns:
sequenceslist of Sequence, length=n

The sequence for each chain in the structure.

chain_start_indicesndarray, shape=(n,), dtype=int

The atom index where each chain starts.

Notes

Residues are considered amino acids or nucleotides based on their appearance info.amino_acid_names() or info.nucleotide_names(), respectively.

Examples

>>> sequences, chain_starts = to_sequence(atom_array)
>>> print(sequences)
[ProteinSequence("NLYIQWLKDGGPSSGRPPPS")]