biotite.sequence.align.write_alignment_to_cigar

biotite.sequence.align.write_alignment_to_cigar(alignment, reference_index=0, segment_index=1, introns=(), distinguish_matches=False, hard_clip=False, as_string=True)[source]

Convert an Alignment into a CIGAR string.

Parameters
alignmentAlignment

The alignment to be converted.

reference_indexint, optional

The index of the reference sequence in the alignment. By default the first sequence is used.

segment_indexint, optional

The index of the segment, read or query sequence in the alignment. By default the second sequence is used.

intronsiterable object of tuple(int, int), optional

The introns in the reference sequence. The introns are given as tuples of start and exclusive stop index. In those regions gaps in the reference sequence are reflected by ‘N’ in the CIGAR string. By default no introns are assumed.

distinguish_matchesbool, optional

If true, matches (‘=’) are distinguished from mismatches (‘X’). Otherwise, matches and mismatches are reflected equally by an ‘M’ in the CIGAR string.

hard_clipbool, optional

If true, clipped bases are hard-clipped. Otherwise, clipped bases are soft-clipped.

as_stringbool, optional

If true, the CIGAR string is returned. Otherwise, a list of tuples is returned, where the first element of each tuple specifies the CigarOp and the second element specifies the number of repetitions.

Returns
cigarstr or ndarray, shape=(n,2) dtype=int

If as_string is true, the CIGAR string is returned. Otherwise, an array is returned, where the first column specifies the CigarOp and the second column specifies the number of repetitions of that operation.

Examples

>>> ref = NucleotideSequence("TATAAAAGGTTTCCGACCGTAGGTAGCTGA")
>>> seg = NucleotideSequence("CCCCGGTTTGACCGTATGTAG")
>>> matrix = SubstitutionMatrix.std_nucleotide_matrix()
>>> semiglobal_alignment = align_optimal(
...     ref, seg, matrix, local=False, terminal_penalty=False
... )[0]
>>> print(semiglobal_alignment)
TATAAAAGGTTTCCGACCGTAGGTAGCTGA
---CCCCGGTTT--GACCGTATGTAG----
>>> print(write_alignment_to_cigar(semiglobal_alignment))
9M2D12M
>>> print(write_alignment_to_cigar(semiglobal_alignment, introns=[(12, 14)]))
9M2N12M
>>> print(write_alignment_to_cigar(semiglobal_alignment, distinguish_matches=True))
4X5=2D7=1X4=
>>> local_alignment = align_optimal(ref, seg, matrix, local=True)[0]
>>> print(local_alignment)
GGTTTCCGACCGTAGGTAG
GGTTT--GACCGTATGTAG
>>> print(write_alignment_to_cigar(local_alignment, hard_clip=False))
4S5M2D12M
>>> print(write_alignment_to_cigar(local_alignment, hard_clip=True))
4H5M2D12M

Writing operations as BAM codes is also possible:

>>> op_tuples = write_alignment_to_cigar(semiglobal_alignment, as_string=False)
>>> for op, length in op_tuples:
...     print(CigarOp(op), length)
CigarOp.MATCH 9
CigarOp.DELETION 2
CigarOp.MATCH 12