PositionalSequence
#
- class biotite.sequence.PositionalSequence(original_sequence)[source]#
Bases:
Sequence
A sequence where each symbol is associated with a position.
For each individual position the sequence contains a separate
PositionalSequence.Symbol
, encoded by a custom alphabet for this sequence. In consequence the symbol code is the position in the sequence itself. This is useful for aligning sequences based on a position-specific substitution matrix.- Parameters:
- original_sequenceseq.Sequence
The original sequence to create the positional sequence from.
- class Symbol(original_alphabet: ..., original_code: ..., position: ...)#
Bases:
object
Combination of a symbol and its position in a sequence.
See also
PositionalSequence
The sequence type containing
PositionalSymbol
objects.
- Attributes:
- original_alphabetAlphabet
The original alphabet, where the symbol stems from.
- original_codeint
The code of the original symbol in the original alphabet.
- positionint
The 0-based position of the symbol in the sequence.
- symbolobject
The symbol from the original alphabet.
- copy(new_seq_code=None)#
Copy the object.
- Parameters:
- new_seq_codendarray, optional
If this parameter is set, the sequence code is set to this value, rather than the original sequence code.
- Returns:
- copy
A copy of this object.
- static dtype(alphabet_size)#
Get the sequence code dtype required for the given size of the alphabet.
- get_alphabet()#
Get the
Alphabet
of theSequence
.This method must be overwritten, when subclassing
Sequence
.- Returns:
- alphabetAlphabet
Sequence
alphabet.
- get_symbol_frequency()#
Get the number of occurences of each symbol in the sequence.
If a symbol does not occur in the sequence, but it is in the alphabet, its number of occurences is 0.
- Returns:
- frequencydict
A dictionary containing the symbols as keys and the corresponding number of occurences in the sequence as values.
- is_valid()#
Check, if the sequence contains a valid sequence code.
A sequence code is valid, if at each sequence position the code is smaller than the size of the alphabet.
Invalid code means that the code cannot be decoded into symbols. Furthermore invalid code can lead to serious errors in alignments, since the substitution matrix is indexed with an invalid index.
- Returns:
- validbool
True, if the sequence is valid, false otherwise.
- reconstruct()#
Reconstruct the original sequence from the positional sequence.
- Returns:
- original_sequenceGeneralSequence
The original sequence. Although the actual type of the returned sequence is always a
GeneralSequence
, the alphabet and the symbols of the returned sequence are equal to the original sequence.
- reverse(copy=True)#
Reverse the
Sequence
.- Parameters:
- copybool, optional
If set to False, the code
ndarray
of the returned sequence is an array view to the sequence code of this object. In this case, manipulations on the returned sequence would also affect this object. Otherwise, the sequence code is copied.
- Returns:
- reversedSequence
The reversed
Sequence
.
Examples
>>> dna_seq = NucleotideSequence("ACGTA") >>> dna_seq_rev = dna_seq.reverse() >>> print(dna_seq_rev) ATGCA