biotite.sequence.align.align_local_gapped

biotite.sequence.align.align_local_gapped(seq1, seq2, matrix, seed, threshold, gap_penalty=- 10, max_number=1, direction='both', score_only=False, max_table_size=None)[source]

Perform a local gapped alignment extending from a given seed position.

The alignment extends into one or both directions (controlled by direction) until the total alignment score falls more than threshold below the maximum score found (X-Drop). 1 The returned alignment contains the range that yielded the maximum score.

Parameters
seq1, seq2Sequence

The sequences to be aligned.

matrixSubstitutionMatrix

The substitution matrix used for scoring.

seedtuple(int, int)

The indices in seq1 and seq2 where the local alignment starts. The indices must be non-negative.

thresholdint

If the current score falls this value below the maximum score found, the alignment terminates.

gap_penaltyint or tuple(int, int), optional

If an integer is provided, the value will be interpreted as linear gap penalty. If a tuple is provided, an affine gap penalty is used 2. The first integer in the tuple is the gap opening penalty, the second integer is the gap extension penalty. threshold : int If the current score falls this value below the maximum score found, the alignment terminates.

max_numberint, optional

The maximum number of alignments returned. When the number of branches exceeds this value in the traceback step, no further branches are created. By default, only a single alignment is returned.

direction{‘both’, ‘upstream’, ‘downstream’}, optional

Controls in which direction the alignment extends starting from the seed. If 'upstream', the alignment starts before the seed and ends at the seed. If 'downstream', the alignment starts at the seed and ends behind the seed. If 'both' (default) the alignment starts before the seed and ends behind the seed. The seed position itself is always included in the alignment.

score_onlybool, optional

If set to True, only the similarity score is returned instead of the Alignment, decreasing the runtime substantially.

max_table_sizeint, optional

A MemoryError is raised, if the number of cells in the internal dynamic programming table, i.e. approximately the product of the lengths of the aligned regions, would exceed the given value.

Returns
alignmentslist of Alignment

A list of found alignments. Each alignment in the list has the same similarity score. Only returned, if score_only is False.

scoreint

The alignment similarity score. Only returned, if score_only is True.

See also

align_ungapped

For ungapped local alignments with the same X-Drop technique.

Notes

Unilke align_optimal(), this function does not allocate memory proportional to the length of both sequences, but only approximately proportional to lengths of the aligned regions. In principle, this makes this function viable for local alignments of sequences of any length. However, if the product of the lengths of the homologous regions is too large to fit into memory, a MemoryError or even a crash may occur. This may also happen in spurious long alignments due to poor choice of substitution matrix or gap penalty. You may set max_table_size to avoid excessive memory use and crashes.

References

1

Z. Zhang, S. Schwartz, L. Wagner, W. Miller, “A greedy algorithm for aligning DNA sequences,” Journal of Computational Biology, vol. 7, pp. 203–214, February 2000. doi: 10.1089/10665270050081478

2

O. Gotoh, “An improved algorithm for matching biological sequences,” Journal of Molecular Biology, vol. 162, pp. 705–708, December 1982. doi: 10.1016/0022-2836(82)90398-9

Examples

>>> seq1 = NucleotideSequence("CGTAGCTATCGCCTGTACGGTT")
>>> seq2 = NucleotideSequence("TATATGCCTTACGGAATTGCTTTTT")
>>> matrix = SubstitutionMatrix.std_nucleotide_matrix()
>>> alignment = align_local_gapped(
...     seq1, seq2, matrix, seed=(16, 10), threshold=20
... )[0]
>>> print(alignment)
TATCGCCTGTACGG
TAT-GCCT-TACGG
>>> alignment = align_local_gapped(
...     seq1, seq2, matrix, seed=(16, 10), threshold=20, direction="upstream"
... )[0]
>>> print(alignment)
TATCGCCTGTA
TAT-GCCT-TA
>>> alignment = align_local_gapped(
...     seq1, seq2, matrix, seed=(16, 10), threshold=20, direction="downstream"
... )[0]
>>> print(alignment)
ACGG
ACGG
>>> score = align_local_gapped(
...     seq1, seq2, matrix, seed=(16, 10), threshold=20, score_only=True
... )
>>> print(score)
40