LetterAlphabet#

class biotite.sequence.LetterAlphabet(symbols)[source]#

Bases: Alphabet

LetterAlphabet is a an Alphabet subclass specialized for letter based alphabets, like DNA or protein sequence alphabets. The alphabet size is limited to the 94 printable, non-whitespace characters. Internally the symbols are saved as bytes objects. The encoding and decoding process is a lot faster than for a normal Alphabet.

The performance gain comes through the use of NumPy and Cython for encoding and decoding, without the need of a dictionary.

Parameters:
symbolsiterable object or str or bytes

The symbols, that are allowed in this alphabet. The corresponding code for a symbol, is the index of that symbol in this list.

decode(code, as_bytes=False)#

Use the alphabet to decode a symbol code.

Parameters:
codeint

The symbol code to be decoded.

Returns:
symbolobject

The symbol corresponding to code.

Raises:
AlphabetError

If code is not a valid code in the alphabet.

decode_multiple(code, as_bytes=False)#

Decode a sequence code into a list of symbols.

Parameters:
codendarray, dtype=uint8

The sequence code to decode. Works fastest if a ndarray is provided.

as_bytesbool, optional

If true, the output array will contain bytes (dtype ‘S1’). Otherwise, the the output array will contain str (dtype ‘U1’).

Returns:
symbolsndarray, dtype=’U1’ or dtype=’S1’

The decoded list of symbols.

encode(symbol)#

Use the alphabet to encode a symbol.

Parameters:
symbolobject

The object to encode into a symbol code.

Returns:
codeint

The symbol code of symbol.

Raises:
AlphabetError

If symbol is not in the alphabet.

encode_multiple(symbols, dtype=None)#

Encode multiple symbols.

Parameters:
symbolsiterable object or str or bytes

The symbols to encode. The method is fastest when a ndarray, str or bytes object containing the symbols is provided, instead of e.g. a list.

dtypedtype, optional

For compatibility with superclass. The value is ignored.

Returns:
codendarray

The sequence code.

extends(alphabet)#

Check, if this alphabet extends another alphabet.

Parameters:
alphabetAlphabet

The potential parent alphabet.

Returns:
resultbool

True, if this object extends alphabet, false otherwise.

get_symbols()#

Get the symbols in the alphabet.

Returns:
symbolstuple

The symbols.

is_letter_alphabet()#

Check whether the symbols in this alphabet are single printable letters. If so, the alphabet could be expressed by a LetterAlphabet.

Returns:
is_letter_alphabetbool

True, if all symbols in the alphabet are ‘str’ or ‘bytes’, have length 1 and are printable.