`biotite.sequence.align`#

This subpackage provides functionality for sequence alignments.

The two central classes involved are SubstitutionMatrix and Alignment:

Every function that performs an alignment requires a SubstitutionMatrix that provides similarity scores for each symbol combination of two alphabets (usually both alphabets are equal). The alphabets in the SubstitutionMatrix must match or extend the alphabets of the sequences to be aligned.

An alignment cannot be directly represented as list of Sequence objects, since a gap indicates the absence of any symbol. Instead, the aligning functions return one or more Alignment instances. These objects contain the original sequences and a trace, that describe which positions (indices) in the sequences are aligned. Optionally they also contain the similarity score.

The aligning functions align_optimal() and align_multiple() cover most use cases for pairwise and multiple sequence alignments respectively.

However, Biotite provides also a modular system to build performant heuristic alignment search methods, e.g. for finding homologies in a sequence database or map reads to a genome. The table below summarizes those provided functionalities. The typical stages in alignment search, where those functionalities are used, are arranged from top to bottom.

Entire k-mer set

k-mer subset selection

Minimizers

MinimizerSelector

Syncmers

SyncmerSelector

CachedSyncmerSelector

Mincode

MincodeSelector

k-mer indexing and matching

Perfect hashing

KmerTable

Space-efficient hashing

BucketKmerTable

bucket_number()

Ungapped seed extension

align_local_ungapped()

Gapped alignment

Banded local/semiglobal alignment

align_banded()

Local alignment (X-drop)

align_local_gapped()

Significance evaluation

EValueEstimator

Substitution matrices#

SubstitutionMatrix

A SubstitutionMatrix is the foundation for scoring in sequence alignments.

Aligners#

`align_ungapped`	Align two sequences without insertion of gaps.
`align_optimal`	Perform an optimal alignment of two sequences based on a dynamic programming algorithm.
`align_local_ungapped`	Perform a local alignment extending from given seed position without inserting gaps.
`align_local_gapped`	Perform a local gapped alignment extending from a given seed position.
`align_banded`	Perform a local or semi-global alignment within a defined diagonal band.
`align_multiple`	Perform a multiple sequence alignment using a progressive alignment algorithm.

Alignments#

`Alignment`	An `Alignment` object stores information about which symbols of n sequences are aligned to each other and it stores the corresponding alignment score.
`get_codes`	Get the sequence codes of the sequences in the alignment.
`get_symbols`	Similar to `get_codes()`, but contains the decoded symbols instead of codes.
`get_sequence_identity`	Calculate the sequence identity for an alignment.
`get_pairwise_sequence_identity`	Calculate the pairwise sequence identity for an alignment.
`score`	Calculate the similarity score of an alignment.

k-mers#

`KmerAlphabet`	This type of alphabet uses k-mers as symbols, i.e. all combinations of k symbols from its base alphabet.
`KmerTable`	This class represents a k-mer index table.
`BucketKmerTable`	This class represents a k-mer index table.
`SimilarityRule`	This is the abstract base class for all similarity rules.
`ScoreThresholdRule`	This similarity rule calculates all k-mers that have a greater or equal similarity score with a given k-mer than a defined threshold score.
`bucket_number`	Find an appropriate number of buckets for a `BucketKmerTable` based on the number of elements (i.e. k-mers) that should be stored in the table.

k-mer subset selections#

`MinimizerSelector`	Selects the minimizers in sequences.
`SyncmerSelector`	Selects the syncmers in sequences.
`CachedSyncmerSelector`	Selects the syncmers in sequences.
`MincodeSelector`	Selects the \(1/\text{compression}\) smallest k-mers from `KmerAlphabet`.

k-mer permutations#

`Permutation`	Provides an order for k-mers, usually used by k-mer subset selectors such as `MinimizerSelector`.
`RandomPermutation`	Provide a pseudo-randomized order for k-mers.
`FrequencyPermutation`	Provide an order for k-mers from a given `KmerAlphabet`, such that less frequent k-mers are smaller than more frequent k-mers.

CIGAR strings#

`CigarOp`	An enum for the different CIGAR operations.
`read_alignment_from_cigar`	Create an `Alignment` from a CIGAR string.
`write_alignment_to_cigar`	Convert an `Alignment` into a CIGAR string.

Miscellaneous#

`EValueEstimator`	This class is used to calculate expect values (E-values) for local pairwise sequence alignments.
`find_terminal_gaps`	Find the slice indices that would remove terminal gaps from an alignment.
`remove_gaps`	Remove all gap columns from an alignment.
`remove_terminal_gaps`	Remove terminal gaps from an alignment.

biotite.sequence.align#