plot_alignment_similarity_based
#
- biotite.sequence.graphics.plot_alignment_similarity_based(axes, alignment, symbols_per_line=50, show_numbers=False, number_size=None, number_functions=None, labels=None, label_size=None, show_line_position=False, spacing=1, color=None, cmap=None, matrix=None, color_symbols=False, symbol_spacing=None, symbol_size=None, symbol_param=None)[source]#
Plot a pairwise or multiple sequence alignment highlighting the similarity per alignment column.
This function works like
plot_alignment()
with aSymbolPlotter
, that colors the symbols based on the similarity with the other symbols in the same column. The color intensity (or colormap value, respectively) of a symbol scales with similarity of the respective symbol to the other symbols in the same alignment column.- Parameters:
- axesAxes
A Matplotlib axes, that is used as plotting area.
- alignmentAlignment
The pairwise or multiple sequence alignment to be plotted. The alphabet of each sequence in the alignment must be the same.
- symbols_per_lineint, optional
The amount of alignment columns that are diplayed per line.
- show_numbersbool, optional
If true, the sequence position of the symbols in the last alignment column of a line is shown on the right side of the plot. If the last symbol is a gap, the position of the last actual symbol before this gap is taken. If the first symbol did not occur up to this point, no number is shown for this line. By default the first symbol of a sequence has the position 1, but this behavior can be changed using the number_functions parameter.
- number_sizefloat, optional
The font size of the position numbers.
- number_functionslist of [(None or Callable(int -> int)], optional
By default the position of the first symbol in a sequence is 1, i.e. the sequence position is the sequence index incremented by 1. The behavior can be changed with this parameter: If supplied, the length of the list must match the number of sequences in the alignment. Every entry is a function that maps a sequence index (int) to a sequence position (int) for the respective sequence. A None entry means, that the default numbering is applied for the sequence.
- labelslist of str, optional
The sequence labels. Must be the same size and order as the sequences in the alignment.
- label_sizefloat, optional
Font size of the labels.
- show_line_positionbool, optional
If true the position within a line is plotted below the alignment.
- spacingfloat, optional
The spacing between the alignment lines. 1.0 means that the size is equal to the size of a symbol box.
- colortuple or str, optional
A Matplotlib compatible color. If this parameter is given, the box color in an interpolated value between white and the given color, or, if color_symbols is set to true, between the given color and black. The interpolation percentage is given by the average normalized similarity.
- cmapColormap or str, optional
The boxes (or symbols, if color_symbols is set) are colored based on the normalized similarity value on the given Matplotlib Colormap.
- matrixSubstitutionMatrix
The substitution matrix used to determine the similarity of two symbols. By default an identity matrix is used, i.e. only match and mismatch is distinguished.
- color_symbolsbool, optional
If true, the symbols themselves are colored. If false, the symbols are black, and the boxes behind the symbols are colored.
- symbol_spacingint, optional
A space is placed between each number of elements desired by variable.
- symbol_sizefloat, optional
Font size of the sequence symbols.
- symbol_paramdict
Additional parameters that is given to the
matplotlib.Text
instance of each symbol.
See also
plot_alignment
Analogous functionality with a customizable
SymbolPlotter
.LetterSimilarityPlotter
The
SymbolPlotter
used in this function.
Notes
For determination of the color, a measure called average normalized similarity is used.
The normalized similarity of one symbol a to another symbol b (both in aphabet X) is defined as
\[S_{norm}(a,b) = \frac{S(a,b) - \min\limits_x(S(a,x))} {\max\limits_x(S(a,x)) - \min\limits_x(S(a,x))}\]\[a,b,x \in X\]where S(x,y) is the similarity score of the two symbols x and y described in the substitution matrix. The similarity S(x,-) is always 0. As the normalization is conducted only with respect to a, the normalized similarity is not commutative.
The average normalized similarity of a symbol a is determined by averaging the normalized similarity over each symbol bi in the same alignment column.
\[S_{norm,av}(a) = \frac{1}{n-1} \left[\left(\sum\limits_{i=1}^n S_{norm}(a,b_i)\right) - S_{norm}(a,a)\right]\]The normalized similarity of a to itself is subtracted, because a does also occur in bi.
Gallery#

Searching for structural homologs in a protein structure database