get_sequence#

biotite.structure.io.pdbx.get_sequence(pdbx_file, data_block=None)[source]#

Get the protein and nucleotide sequences from the entity_poly.pdbx_seq_one_letter_code_can entry.

Supported polymer types (_entity_poly.type) are: 'polypeptide(D)', 'polypeptide(L)', 'polydeoxyribonucleotide', 'polyribonucleotide' and 'polydeoxyribonucleotide/polyribonucleotide hybrid'. Uracil is converted to Thymine.

Parameters:
pdbx_fileCIFFile or CIFBlock or BinaryCIFFile or BinaryCIFBlock

The file object.

data_blockstr, optional

The name of the data block. Default is the first (and most times only) data block of the file. If the data block object is passed directly to pdbx_file, this parameter is ignored.

Returns:
sequence_dictDictionary of Sequences

Dictionary keys are derived from entity_poly.pdbx_strand_id (equivalent to atom_site.auth_asym_id). Dictionary values are sequences.

Notes

The entity_poly.pdbx_seq_one_letter_code_can field contains the initial complete sequence. If the structure represents a truncated or spliced version of this initial sequence, it will include only a subset of the initial sequence. Use biotite.structure.get_residues to retrieve only the residues that are represented in the structure.