biotite.sequence.align.read_alignment_from_cigar

biotite.sequence.align.read_alignment_from_cigar(cigar, position, reference_sequence, segment_sequence)[source]

Create an Alignment from a CIGAR string.

Parameters
cigarstr

The CIGAR string.

positionint

0-based position of the first aligned base in the reference. 0-based equivalent to the POS field in the SAM/BAM file.

reference_sequenceSequence

The reference sequence.

segment_sequenceSequence

The segment, read or query sequence.

Returns
alignmentAlignment

The alignment.

Notes

This function expects that the segment_sequence was taken from the SAM/BAM file, hence hard-clipped bases are not part of the sequence. Therefore, hard clipped bases are simply ignored in the CIGAR string.

Examples

>>> ref = NucleotideSequence("TATAAAAGGTTTCCGACCGTAGGTAGCTGA")
>>> seg = NucleotideSequence("CCCCGGTTTGACCGTATGTAG")
>>> print(read_alignment_from_cigar("9M2D12M", 3, ref, seg))
AAAAGGTTTCCGACCGTAGGTAG
CCCCGGTTT--GACCGTATGTAG
>>> print(read_alignment_from_cigar("4X5=2D7=1X4=", 3, ref, seg))
AAAAGGTTTCCGACCGTAGGTAG
CCCCGGTTT--GACCGTATGTAG

If bases in the segment sequence are soft-clipped, they do not appear in the alignment. Furthermore, the start of the reference sequence must be adapted.

>>> print(read_alignment_from_cigar("4S5M2D12M", 7, ref, seg))
GGTTTCCGACCGTAGGTAG
GGTTT--GACCGTATGTAG

Hard-clipped bases are not part of the segment sequence. Hence H operations are completely ignored.

>>> seg = NucleotideSequence("GGTTTGACCGTATGTAG")
>>> print(read_alignment_from_cigar("4H5M2D12M", 7, ref, seg))
GGTTTCCGACCGTAGGTAG
GGTTT--GACCGTATGTAG

Reading from BAM codes is also possible:

>>> seg = NucleotideSequence("CCCCGGTTTGACCGTATGTAG")
>>> op_tuples = [
...     (CigarOp.MATCH, 9),
...     (CigarOp.DELETION, 2),
...     (CigarOp.MATCH, 12)
... ]
>>> print(read_alignment_from_cigar(op_tuples, 3, ref, seg))
AAAAGGTTTCCGACCGTAGGTAG
CCCCGGTTT--GACCGTATGTAG