biotite.sequence.GeneralSequence¶

class biotite.sequence.GeneralSequence(alphabet, sequence=())[source]¶

Bases: Sequence

This class allows the creation of a sequence with custom Alphabet without the need to subclass Sequence.

Parameters

alphabetAlphabet: The alphabet of this sequence.
sequenceiterable object, optional: The symbol sequence, the Sequence is initialized with. For alphabets containing single letter strings, this parameter may also be a str object. By default the sequence is empty.

as_type(sequence)¶

Convert the GeneralSequence into a sequence of another Sequence type.

This function simply replaces the sequence code of the given sequence with the sequence code of this object.

Parameters

sequenceSequence: The Sequence whose sequence code is replaced with the one of this object. The alphabet must equal or extend the alphabet of this object.

Returns

sequenceSequence: The input sequence with replaced sequence code.

Raises

AlphabetError: If the the Alphabet of the input sequence does not extend the Alphabet of this sequence.

copy(new_seq_code=None)¶

Copy the object.

Parameters

new_seq_codendarray, optional: If this parameter is set, the sequence code is set to this value, rather than the original sequence code.

Returns

copy: A copy of this object.

static dtype(alphabet_size)¶

Get the sequence code dtype required for the given size of the alphabet.

Parameters

alpahabet_sizeint: The size of the alphabet.

Returns

dtype: The dtype, that is large enough to store symbol codes, that are encoded by an Alphabet of the given size.

get_alphabet()¶

Get the Alphabet of the Sequence.

This method must be overwritten, when subclassing Sequence.

Returns

alphabetAlphabet: Sequence alphabet.

get_symbol_frequency()¶

Get the number of occurences of each symbol in the sequence.

If a symbol does not occur in the sequence, but it is in the alphabet, its number of occurences is 0.

Returns

frequencydict: A dictionary containing the symbols as keys and the corresponding number of occurences in the sequence as values.

is_valid()¶

Check, if the sequence contains a valid sequence code.

A sequence code is valid, if at each sequence position the code is smaller than the size of the alphabet.

Invalid code means that the code cannot be decoded into symbols. Furthermore invalid code can lead to serious errors in alignments, since the substitution matrix is indexed with an invalid index.

Returns

validbool: True, if the sequence is valid, false otherwise.

reverse(copy=True)¶

Reverse the Sequence.

Parameters

copybool, optional: If set to False, the code ndarray of the returned sequence is an array view to the sequence code of this object. In this case, manipulations on the returned sequence would also affect this object. Otherwise, the sequence code is copied.

Returns

reversedSequence: The reversed Sequence.

Examples

>>> dna_seq = NucleotideSequence("ACGTA")
>>> dna_seq_rev = dna_seq.reverse()
>>> print(dna_seq_rev)
ATGCA

Gallery¶

Structural alignment of lysozyme variants using ‘Protein Blocks’

Structural alignment of lysozyme variants using 'Protein Blocks'