write_alignment_to_cigar#

biotite.sequence.align.write_alignment_to_cigar(alignment, reference_index=0, segment_index=1, introns=(), distinguish_matches=False, hard_clip=False, include_terminal_gaps=False, as_string=True)[source]#

Convert an Alignment into a CIGAR string.

Parameters:
alignmentAlignment

The alignment to be converted.

reference_indexint, optional

The index of the reference sequence in the alignment. By default the first sequence is used.

segment_indexint, optional

The index of the segment, read or query sequence in the alignment. By default the second sequence is used.

intronsiterable object of tuple(int, int), optional

The introns in the reference sequence. The introns are given as tuples of start and exclusive stop index. In those regions gaps in the reference sequence are reflected by ‘N’ in the CIGAR string. By default no introns are assumed.

distinguish_matchesbool, optional

If true, matches (‘=’) are distinguished from mismatches (‘X’). Otherwise, matches and mismatches are reflected equally by an ‘M’ in the CIGAR string.

hard_clipbool, optional

If true, clipped bases are hard-clipped. Otherwise, clipped bases are soft-clipped.

include_terminal_gapsbool, optional

If true, terminal gaps in the segment sequence are included in the CIGAR string. These are represented by D operations at the start and/or end of the string. By default, those terminal gaps are omitted in the CIGAR, which is the way SAM/BAM expects a CIGAR to be.

as_stringbool, optional

If true, the CIGAR string is returned. Otherwise, a list of tuples is returned, where the first element of each tuple specifies the CigarOp and the second element specifies the number of repetitions.

Returns:
cigarstr or ndarray, shape=(n,2) dtype=int

If as_string is true, the CIGAR string is returned. Otherwise, an array is returned, where the first column specifies the CigarOp and the second column specifies the number of repetitions of that operation.

Notes

If include_terminal_gaps is set to true, you usually want to set position=0 in read_alignment_from_cigar() to get the correct alignment.

Examples

>>> ref = NucleotideSequence("TATAAAAGGTTTCCGACCGTAGGTAGCTGA")
>>> seg = NucleotideSequence("CCCCGGTTTGACCGTATGTAG")
>>> matrix = SubstitutionMatrix.std_nucleotide_matrix()
>>> semiglobal_alignment = align_optimal(
...     ref, seg, matrix, local=False, terminal_penalty=False
... )[0]
>>> print(semiglobal_alignment)
TATAAAAGGTTTCCGACCGTAGGTAGCTGA
---CCCCGGTTT--GACCGTATGTAG----
>>> print(write_alignment_to_cigar(semiglobal_alignment))
9M2D12M
>>> print(write_alignment_to_cigar(semiglobal_alignment, introns=[(12, 14)]))
9M2N12M
>>> print(write_alignment_to_cigar(semiglobal_alignment, distinguish_matches=True))
4X5=2D7=1X4=
>>> print(write_alignment_to_cigar(semiglobal_alignment, include_terminal_gaps=True))
3D9M2D12M4D
>>> local_alignment = align_optimal(ref, seg, matrix, local=True)[0]
>>> print(local_alignment)
GGTTTCCGACCGTAGGTAG
GGTTT--GACCGTATGTAG
>>> print(write_alignment_to_cigar(local_alignment, hard_clip=False))
4S5M2D12M
>>> print(write_alignment_to_cigar(local_alignment, hard_clip=True))
4H5M2D12M

Writing operations as BAM codes is also possible:

>>> op_tuples = write_alignment_to_cigar(semiglobal_alignment, as_string=False)
>>> for op, length in op_tuples:
...     print(CigarOp(op).name, length)
MATCH 9
DELETION 2
MATCH 12