- class biotite.sequence.Alphabet(symbols)¶
This class defines the allowed symbols for a
Sequenceand handles the encoding/decoding between symbols and symbol codes.
Alphabetis created with the list of symbols, that can be used in this context. In most cases a symbol will be simply a letter, hence a string of length 1. But in principle every hashable Python object can serve as symbol.
The encoding of a symbol into a symbol code is done in the following way: Find the first index in the symbol list, where the list element equals the symbol. This index is the symbol code. If the symbol is not found in the list, an
Internally, a dictionary is used for encoding, with symbols as keys and symbol codes as values. Therefore, every symbol must be hashable. For decoding the symbol list is indexed with the symbol code.
If an alphabet 1 contains the same symbols and the same symbol-code-mappings like another alphabet 2, but alphabet 1 introdues also new symbols, then alphabet 1 extends alphabet 2. Per definition, every alphabet also extends itself.
Objects of this class are immutable.
- symbolsiterable object
The symbols, that are allowed in this alphabet. The corresponding code for a symbol, is the index of that symbol in this list.
Create an Alphabet containing DNA letters and encode/decode a letter/code:
>>> alph = Alphabet(["A","C","G","T"]) >>> print(alph.encode("G")) 2 >>> print(alph.decode(2)) G >>> try: ... alph.encode("foo") ... except Exception as e: ... print(e) Symbol 'foo' is not in the alphabet
Create an Alphabet of arbitrary objects:
>>> alph = Alphabet(["foo", 42, (1,2,3), 5, 3.141]) >>> print(alph.encode((1,2,3))) 2 >>> print(alph.decode(4)) 3.141
On the subject of alphabet extension: An alphabet always extends itself.
>>> Alphabet(["A","C","G","T"]).extends(Alphabet(["A","C","G","T"])) True
An alphabet extends an alphabet when it contains additional symbols…
>>> Alphabet(["A","C","G","T","U"]).extends(Alphabet(["A","C","G","T"])) True
…but not vice versa
>>> Alphabet(["A","C","G","T"]).extends(Alphabet(["A","C","G","T","U"])) False
Two alphabets with same symbols but different symbol-code-mappings
>>> Alphabet(["A","C","G","T"]).extends(Alphabet(["A","C","T","G"])) False
Use the alphabet to decode a symbol code.
The symbol code to be decoded.
The symbol corresponding to code.
If code is not a valid code in the alphabet.
Decode a sequence code into a list of symbols.
The sequence code to decode.
The decoded list of symbols.
Use the alphabet to encode a symbol.
The object to encode into a symbol code.
The symbol code of symbol.
If symbol is not in the alphabet.
- encode_multiple(symbols, dtype=<class 'numpy.int64'>)¶
Encode a list of symbols.
The symbols to encode.
- dtypedtype, optional
The dtype of the output ndarray. (Default: int64)
The sequence code.
Check, if this alphabet extends another alphabet.
The potential parent alphabet.
True, if this object extends alphabet, false otherwise.
Get the symbols in the alphabet.
Copy of the internal list of symbols.
Check whether the symbols in this alphabet are single printable letters. If so, the alphabet could be expressed by a LetterAlphabet.
True, if all symbols in the alphabet are ‘str’ or ‘bytes’, have length 1 and are printable.