biotite.sequence.align.align_optimal¶

biotite.sequence.align.
align_optimal
(seq1, seq2, matrix, gap_penalty=10, terminal_penalty=True, local=False, max_number=1000)¶ Perform an optimal alignment of two sequences based on the dynamic programming algorithm. [1]
This algorithm yields an optimal alignment, i.e. the sequences are aligned in the way that results in the highest similarity score. This operation can be very time and space consuming, because both scale linearly with each sequence length.
The aligned sequences do not need to be instances from the same Sequence subclass, since they do not need to have the same alphabet. The only requirement is that the substitution matrix’ alphabets extend the alphabets of the two sequences.
This function can either perform a global alignment, based on the NeedlemanWunsch algorithm [1] or a local alignment, based on the Smith–Waterman algorithm [2].
Furthermore this function supports affine gap penalties using the Gotoh algorithm [3], however, this requires approximately 4 times the RAM space and execution time.
 Parameters
 seq1, seq2Sequence
The sequences to be aligned.
 matrixSubstitutionMatrix
The substitution matrix used for scoring.
 gap_penaltyint or (tuple, dtype=int), optional
If an integer is provided, the value will be interpreted as general gap penalty. If a tuple is provided, an affine gap penalty is used. The first integer in the tuple is the gap opening penalty, the second integer is the gap extension penalty. The values need to be negative. (Default: 10)
 terminal_penaltybool, optional
If true, gap penalties are applied to terminal gaps. If local is true, this parameter has no effect. (Default: True)
 localbool, optional
If false, a global alignment is performed, otherwise a local alignment is performed. (Default: False)
 max_numberint, optional
The maximum number of alignments returned. When the number of branches exceeds this value in the traceback step, no further branches are created. (Default: 1000)
 Returns
 alignmentslist, type=Alignment
A list of alignments. Each alignment in the list has the same maximum similarity score.
References
 1(1,2)
SB Needleman, CD Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins.” J Mol Biol, 48, 443453 (1970).
 2
TF Smith, MS Waterman, “Identification of common molecular subsequences.” J Mol Biol, 147, 195197 (1981).
 3
O Gotoh, “An improved algorithm for matching biological sequences.” J Mol Biol, 162, 705708 (1982).
Examples
>>> seq1 = NucleotideSequence("ATACGCTTGCT") >>> seq2 = NucleotideSequence("AGGCGCAGCT") >>> matrix = SubstitutionMatrix.std_nucleotide_matrix() >>> ali = align_optimal(seq1, seq2, matrix, gap_penalty=6) >>> for a in ali: ... print(a, "\n") ATACGCTTGCT AGGCGCAGCT <BLANKLINE> ATACGCTTGCT AGGCGCAGCT <BLANKLINE>