write_alignment_to_cigar
#
- biotite.sequence.align.write_alignment_to_cigar(alignment, reference_index=0, segment_index=1, introns=(), distinguish_matches=False, hard_clip=False, include_terminal_gaps=False, as_string=True)[source]#
Convert an
Alignment
into a CIGAR string.- Parameters:
- alignmentAlignment
The alignment to be converted.
- reference_indexint, optional
The index of the reference sequence in the alignment. By default the first sequence is used.
- segment_indexint, optional
The index of the segment, read or query sequence in the alignment. By default the second sequence is used.
- intronsiterable object of tuple(int, int), optional
The introns in the reference sequence. The introns are given as tuples of start and exclusive stop index. In those regions gaps in the reference sequence are reflected by ‘N’ in the CIGAR string. By default no introns are assumed.
- distinguish_matchesbool, optional
If true, matches (‘=’) are distinguished from mismatches (‘X’). Otherwise, matches and mismatches are reflected equally by an ‘M’ in the CIGAR string.
- hard_clipbool, optional
If true, clipped bases are hard-clipped. Otherwise, clipped bases are soft-clipped.
- include_terminal_gapsbool, optional
If true, terminal gaps in the segment sequence are included in the CIGAR string. These are represented by
D
operations at the start and/or end of the string. By default, those terminal gaps are omitted in the CIGAR, which is the way SAM/BAM expects a CIGAR to be.- as_stringbool, optional
If true, the CIGAR string is returned. Otherwise, a list of tuples is returned, where the first element of each tuple specifies the
CigarOp
and the second element specifies the number of repetitions.
- Returns:
- cigarstr or ndarray, shape=(n,2) dtype=int
If as_string is true, the CIGAR string is returned. Otherwise, an array is returned, where the first column specifies the
CigarOp
and the second column specifies the number of repetitions of that operation.
See also
Notes
If include_terminal_gaps is set to true, you usually want to set
position=0
inread_alignment_from_cigar()
to get the correct alignment.Examples
>>> ref = NucleotideSequence("TATAAAAGGTTTCCGACCGTAGGTAGCTGA") >>> seg = NucleotideSequence("CCCCGGTTTGACCGTATGTAG") >>> matrix = SubstitutionMatrix.std_nucleotide_matrix() >>> semiglobal_alignment = align_optimal( ... ref, seg, matrix, local=False, terminal_penalty=False ... )[0] >>> print(semiglobal_alignment) TATAAAAGGTTTCCGACCGTAGGTAGCTGA ---CCCCGGTTT--GACCGTATGTAG---- >>> print(write_alignment_to_cigar(semiglobal_alignment)) 9M2D12M >>> print(write_alignment_to_cigar(semiglobal_alignment, introns=[(12, 14)])) 9M2N12M >>> print(write_alignment_to_cigar(semiglobal_alignment, distinguish_matches=True)) 4X5=2D7=1X4= >>> print(write_alignment_to_cigar(semiglobal_alignment, include_terminal_gaps=True)) 3D9M2D12M4D >>> local_alignment = align_optimal(ref, seg, matrix, local=True)[0] >>> print(local_alignment) GGTTTCCGACCGTAGGTAG GGTTT--GACCGTATGTAG >>> print(write_alignment_to_cigar(local_alignment, hard_clip=False)) 4S5M2D12M >>> print(write_alignment_to_cigar(local_alignment, hard_clip=True)) 4H5M2D12M
Writing operations as BAM codes is also possible:
>>> op_tuples = write_alignment_to_cigar(semiglobal_alignment, as_string=False) >>> for op, length in op_tuples: ... print(CigarOp(op).name, length) MATCH 9 DELETION 2 MATCH 12