# biotite.sequence.Alphabet¶

class biotite.sequence.Alphabet(symbols)[source]

Bases: object

This class defines the allowed symbols for a Sequence and handles the encoding/decoding between symbols and symbol codes.

An Alphabet is created with the list of symbols, that can be used in this context. In most cases a symbol will be simply a letter, hence a string of length 1. But in principal every hashable Python object can serve as symbol.

The encoding of a symbol into a symbol code is done in the following way: Find the first index in the symbol list, where the list element equals the symbol. This index is the symbol code. If the symbol is not found in the list, an AlphabetError is raised.

Internally, a dictionary is used for encoding, with symbols as keys and symbol codes as values. Therefore, every symbol must be hashable. For decoding the symbol list is indexed with the symbol code.

If an alphabet 1 contains the same symbols and the same symbol-code-mappings like another alphabet 2, but alphabet 1 introdues also new symbols, then alphabet 1 extends alphabet 2. Per definition, every alphabet also extends itself.

Objects of this class are immutable.

Parameters: symbols : iterable object The symbols, that are allowed in this alphabet. The corresponding code for a symbol, is the index of that symbol in this list.

Examples

Create an Alphabet containing DNA letters and encode/decode a letter/code:

>>> alph = Alphabet(["A","C","G","T"])
>>> print(alph.encode("G"))
2
>>> print(alph.decode(2))
G
>>> try:
...    alph.encode("foo")
... except Exception as e:
...    print(e)
'foo' is not in the alphabet


Create an Alphabet of arbitrary objects:

>>> alph = Alphabet(["foo", 42, (1,2,3), 5, 3.141])
>>> print(alph.encode((1,2,3)))
2
>>> print(alph.decode(4))
3.141

get_symbols()[source]

Get the symbols in the alphabet.

Returns: symbols : list Copy of the internal list of symbols.
extends(alphabet)[source]

Check, if this alphabet extends another alphabet.

Parameters: alphabet : Alphabet The potential parent alphabet. result : bool True, if this object extends alphabet, false otherwise.
encode(symbol)[source]

Use the alphabet to encode a symbol.

Parameters: symbol : object The object to encode into a symbol code. code : int The symbol code of symbol. AlphabetError If symbol is not in the alphabet.
decode(code)[source]

Use the alphabet to decode a symbol code.

Parameters: code : int The symbol code to be decoded. symbol : object The symbol corresponding to code. AlphabetError If code is not a valid code in the alphabet.
encode_multiple(symbols, dtype=<class 'numpy.int64'>)[source]

Encode a list of symbols.

Parameters: symbols : array-like The symbols to encode. dtype : dtype, optional The dtype of the output ndarray. (Default: int64) code : ndarray The sequence code.
decode_multiple(code)[source]

Decode a sequence code into a list of symbols.

Parameters: code : ndarray The sequence code to decode. symbols : list The decoded list of symbols.