biotite.sequence.align
#
This subpackage provides functionality for sequence alignments.
The two central classes involved are SubstitutionMatrix
and
Alignment
:
Every function that performs an alignment requires a
SubstitutionMatrix
that provides similarity scores for each
symbol combination of two alphabets (usually both alphabets are equal).
The alphabets in the SubstitutionMatrix
must match or extend
the alphabets of the sequences to be aligned.
An alignment cannot be directly represented as list of Sequence
objects, since a gap indicates the absence of any symbol.
Instead, the aligning functions return one or more Alignment
instances.
These objects contain the original sequences and a trace, that describe
which positions (indices) in the sequences are aligned.
Optionally they also contain the similarity score.
The aligning functions align_optimal()
and
align_multiple()
cover most use cases for pairwise and multiple
sequence alignments respectively.
However, Biotite provides also a modular system to build performant heuristic alignment search methods, e.g. for finding homologies in a sequence database or map reads to a genome. The table below summarizes those provided functionalities. The typical stages in alignment search, where those functionalities are used, are arranged from top to bottom.
Entire k-mer set
k-mer subset selection
Minimizers
Mincode
k-mer indexing and matching
Perfect hashing
Ungapped seed extension
Gapped alignment
Banded local/semiglobal alignment
Local alignment (X-drop)
Significance evaluation
Substitution matrices#
A |
Aligners#
Align two sequences without insertion of gaps. |
|
Perform an optimal alignment of two sequences based on a dynamic programming algorithm. |
|
Perform a local alignment extending from given seed position without inserting gaps. |
|
Perform a local gapped alignment extending from a given seed position. |
|
Perform a local or semi-global alignment within a defined diagonal band. |
|
Perform a multiple sequence alignment using a progressive alignment algorithm. |
Alignments#
An |
|
Get the sequence codes of the sequences in the alignment. |
|
Similar to |
|
Calculate the sequence identity for an alignment. |
|
Calculate the pairwise sequence identity for an alignment. |
|
Calculate the similarity score of an alignment. |
k-mers#
This type of alphabet uses k-mers as symbols, i.e. all combinations of k symbols from its base alphabet. |
|
This class represents a k-mer index table. |
|
This class represents a k-mer index table. |
|
This is the abstract base class for all similarity rules. |
|
This similarity rule calculates all k-mers that have a greater or equal similarity score with a given k-mer than a defined threshold score. |
|
Find an appropriate number of buckets for a |
k-mer subset selections#
Selects the minimizers in sequences. |
|
Selects the syncmers in sequences. |
|
Selects the syncmers in sequences. |
|
Selects the \(1/\text{compression}\) smallest k-mers from |
k-mer permutations#
Provides an order for k-mers, usually used by k-mer subset selectors such as |
|
Provide a pseudo-randomized order for k-mers. |
|
Provide an order for k-mers from a given |
CIGAR strings#
Miscellaneous#
This class is used to calculate expect values (E-values) for local pairwise sequence alignments. |
|
Find the slice indices that would remove terminal gaps from an alignment. |
|
Remove terminal gaps from an alignment. |