biotite.sequence.LetterAlphabet

class biotite.sequence.LetterAlphabet(symbols)[source]

Bases: Alphabet

LetterAlphabet is a an Alphabet subclass specialized for letter based alphabets, like DNA or protein sequence alphabets. The alphabet size is limited to the 94 printable, non-whitespace characters. Internally the symbols are saved as bytes objects. The encoding and decoding process is a lot faster than for a normal Alphabet.

The performance gain comes through the use of NumPy and Cython for encoding and decoding, without the need of a dictionary.

Parameters
symbolsiterable object or str or bytes

The symbols, that are allowed in this alphabet. The corresponding code for a symbol, is the index of that symbol in this list.

decode(code, as_bytes=False)

Use the alphabet to decode a symbol code.

Parameters
codeint

The symbol code to be decoded.

Returns
symbolobject

The symbol corresponding to code.

Raises
AlphabetError

If code is not a valid code in the alphabet.

decode_multiple(code, as_bytes=False)

Decode a sequence code into a list of symbols.

Parameters
codendarray, dtype=uint8

The sequence code to decode. Works fastest if a ndarray is provided.

as_bytesbool, optional

If true, the output array will contain bytes (dtype ‘S1’). Otherwise, the the output array will contain str (dtype ‘U1’).

Returns
symbolsndarray, dtype=’U1’ or dtype=’S1’

The decoded list of symbols.

encode(symbol)

Use the alphabet to encode a symbol.

Parameters
symbolobject

The object to encode into a symbol code.

Returns
codeint

The symbol code of symbol.

Raises
AlphabetError

If symbol is not in the alphabet.

encode_multiple(symbols, dtype=None)

Encode multiple symbols.

Parameters
symbolsiterable object or str or bytes

The symbols to encode. The method is fastest when a ndarray, str or bytes object containing the symbols is provided, instead of e.g. a list.

dtypedtype, optional

For compatibility with superclass. The value is ignored

Returns
codendarray

The sequence code.

extends(alphabet)

Check, if this alphabet extends another alphabet.

Parameters
alphabetAlphabet

The potential parent alphabet.

Returns
resultbool

True, if this object extends alphabet, false otherwise.

get_symbols()

Get the symbols in the alphabet.

Returns
symbolslist

Copy of the internal list of symbols.

is_letter_alphabet()

Check whether the symbols in this alphabet are single printable letters. If so, the alphabet could be expressed by a LetterAlphabet.

Returns
is_letter_alphabetbool

True, if all symbols in the alphabet are ‘str’ or ‘bytes’, have length 1 and are printable.