biotite.sequence.ProteinSequence

class biotite.sequence.ProteinSequence(sequence=())[source]

Bases: biotite.sequence.sequence.Sequence

Representation of a protein sequence.

Furthermore this class offers a conversion of amino acids from 3-letter code into 1-letter code and vice versa.

Parameters:
sequence : iterable object, optional

The initial protein sequence. This may either be a list or a string. May take upper or lower case letters. If a list is given, the list elements can be 1-letter or 3-letter amino acid representations. By default the sequence is empty.

get_alphabet()[source]

Get the Alphabet of the Sequence.

This method must be overwritten, when subclassing Sequence.

Returns:
alphabet : Alphabet

Sequence alphabet.

remove_stops()[source]

Remove stop signals from the sequence.

Returns:
no_stop : ProteinSequence

A copy of this sequence without stop signals.

static convert_letter_3to1(symbol)[source]

Convert a 3-letter to a 1-letter amino acid representation.

Parameters:
symbol : string

3-letter amino acid representation.

Returns:
convert : string

1-letter amino acid representation.

copy(new_seq_code=None)

Copy the object.

Parameters:
new_seq_code : ndarray, optional

If this parameter is set, the sequence code is set to this value, rather than the original sequence code.

Returns:
copy

A copy of this object.

get_symbol_frequency()

Get the number of occurences of each symbol in the sequence.

If a symbol does not occur in the sequence, but it is in the alphabet, its number of occurences is 0.

Returns:
frequency : dict

A dictionary containing the symbols as keys and the corresponding number of occurences in the sequence as values.

is_valid()

Check, if the sequence contains a valid sequence code.

A sequence code is valid, if at each sequence position the code is smaller than the size of the alphabet.

Invalid code means that the code cannot be decoded into symbols. Furthermore invalid code can lead to serious errors in alignments, since the substitution matrix is indexed with an invalid index.

Returns:
valid : bool

True, if the sequence is valid, false otherwise.

reverse()

Reverse the Sequence.

Returns:
reversed : Sequence

The reversed Sequence.

Examples

>>> dna_seq = NucleotideSequence("ACGTA")
>>> dna_seq_rev = dna_seq.reverse()
>>> print(dna_seq_rev)
ATGCA
static convert_letter_1to3(symbol)[source]

Convert a 1-letter to a 3-letter amino acid representation.

Parameters:
symbol : string

1-letter amino acid representation.

Returns:
convert : string

3-letter amino acid representation.