biotite.sequence.align

This subpackage provides functionality for sequence alignments.

The two central classes involved are SubstitutionMatrix and Alignment:

Every function that performs an alignment requires a SubstitutionMatrix that provides similarity scores for each symbol combination of two alphabets (usually both alphabets are equal). The alphabets in the SubstitutionMatrix must match or extend the alphabets of the sequences to be aligned.

An alignment cannot be directly represented as list of Sequence objects, since a gap indicates the absence of any symbol. Instead, the aligning functions return one or more Alignment instances. These objects contain the original sequences and a trace, that describe which positions (indices) in the sequences are aligned. Optionally they also contain the similarity score.

The aligning functions are usually C-accelerated, reducing the computation time substantially.

This subpackage also contains functionality for finding k-mer matches between two sequences, allowing fast heuristic pairwise alignments.

Substitution matrices

SubstitutionMatrix

A SubstitutionMatrix is the foundation for scoring in sequence alignments.

Aligners

align_ungapped

Align two sequences without insertion of gaps.

align_optimal

Perform an optimal alignment of two sequences based on a dynamic programming algorithm.

align_local_ungapped

Perform a local alignment extending from given seed position without inserting gaps.

align_local_gapped

Perform a local gapped alignment extending from a given seed position.

align_banded

Perform a local or semi-global alignment within a defined diagonal band.

align_multiple

Perform a multiple sequence alignment using a progressive alignment algorithm.

Alignments

Alignment

An Alignment object stores information about which symbols of n sequences are aligned to each other and it stores the corresponding alignment score.

get_codes

Get the sequence codes of the sequences in the alignment.

get_symbols

Similar to get_codes(), but contains the decoded symbols instead of codes.

get_sequence_identity

Calculate the sequence identity for an alignment.

get_pairwise_sequence_identity

Calculate the pairwise sequence identity for an alignment.

score

Calculate the similarity score of an alignment.

k-mers

KmerAlphabet

This type of alphabet uses k-mers as symbols, i.e. all combinations of k symbols from its base alphabet.

KmerTable

This class represents a k-mer index table.

BucketKmerTable

This class represents a k-mer index table.

SimilarityRule

This is the abstract base class for all similarity rules.

ScoreThresholdRule

This similarity rule calculates all k-mers that have a greater or equal similarity score with a given k-mer than a defined threshold score.

bucket_number

Find an appropriate number of buckets for a BucketKmerTable based on the number of elements (i.e.

k-mer subset selections

MinimizerSelector

Selects the minimizers in sequences.

SyncmerSelector

Selects the syncmers in sequences.

CachedSyncmerSelector

Selects the syncmers in sequences.

MincodeSelector

Selects the \(1/\text{compression}\) smallest k-mers from KmerAlphabet.

k-mer permutations

Permutation

Provides an order for k-mers, usually used by k-mer subset selectors such as MinimizerSelector.

RandomPermutation

Provide a pseudo-randomized order for k-mers.

FrequencyPermutation

Provide an order for k-mers from a given KmerAlphabet, such that less frequent k-mers are smaller than more frequent k-mers.

CIGAR strings

CigarOp

An enum for the different CIGAR operations.

read_alignment_from_cigar

Create an Alignment from a CIGAR string.

write_alignment_to_cigar

Convert an Alignment into a CIGAR string.

Miscellaneous

EValueEstimator

This class is used to calculate expect values (E-values) for local pairwise sequence alignments.

find_terminal_gaps

Find the slice indices that would remove terminal gaps from an alignment.

remove_terminal_gaps

Remove terminal gaps from an alignment.