plot_alignment_similarity_based#

biotite.sequence.graphics.plot_alignment_similarity_based(axes, alignment, symbols_per_line=50, show_numbers=False, number_size=None, number_functions=None, labels=None, label_size=None, show_line_position=False, spacing=1, color=None, cmap=None, matrix=None, color_symbols=False, symbol_spacing=None, symbol_size=None, symbol_param=None)[source]#

Plot a pairwise or multiple sequence alignment highlighting the similarity per alignment column.

This function works like plot_alignment() with a SymbolPlotter, that colors the symbols based on the similarity with the other symbols in the same column. The color intensity (or colormap value, respectively) of a symbol scales with similarity of the respective symbol to the other symbols in the same alignment column.

Parameters:
axesAxes

A Matplotlib axes, that is used as plotting area.

alignmentAlignment

The pairwise or multiple sequence alignment to be plotted. The alphabet of each sequence in the alignment must be the same.

symbols_per_lineint, optional

The amount of alignment columns that are diplayed per line.

show_numbersbool, optional

If true, the sequence position of the symbols in the last alignment column of a line is shown on the right side of the plot. If the last symbol is a gap, the position of the last actual symbol before this gap is taken. If the first symbol did not occur up to this point, no number is shown for this line. By default the first symbol of a sequence has the position 1, but this behavior can be changed using the number_functions parameter.

number_sizefloat, optional

The font size of the position numbers.

number_functionslist of [(None or Callable(int -> int)], optional

By default the position of the first symbol in a sequence is 1, i.e. the sequence position is the sequence index incremented by 1. The behavior can be changed with this parameter: If supplied, the length of the list must match the number of sequences in the alignment. Every entry is a function that maps a sequence index (int) to a sequence position (int) for the respective sequence. A None entry means, that the default numbering is applied for the sequence.

labelslist of str, optional

The sequence labels. Must be the same size and order as the sequences in the alignment.

label_sizefloat, optional

Font size of the labels.

show_line_positionbool, optional

If true the position within a line is plotted below the alignment.

spacingfloat, optional

The spacing between the alignment lines. 1.0 means that the size is equal to the size of a symbol box.

colortuple or str, optional

A Matplotlib compatible color. If this parameter is given, the box color in an interpolated value between white and the given color, or, if color_symbols is set to true, between the given color and black. The interpolation percentage is given by the average normalized similarity.

cmapColormap or str, optional

The boxes (or symbols, if color_symbols is set) are colored based on the normalized similarity value on the given Matplotlib Colormap.

matrixSubstitutionMatrix

The substitution matrix used to determine the similarity of two symbols. By default an identity matrix is used, i.e. only match and mismatch is distinguished.

color_symbolsbool, optional

If true, the symbols themselves are colored. If false, the symbols are black, and the boxes behind the symbols are colored.

symbol_spacingint, optional

A space is placed between each number of elements desired by variable.

symbol_sizefloat, optional

Font size of the sequence symbols.

symbol_paramdict

Additional parameters that is given to the matplotlib.Text instance of each symbol.

See also

plot_alignment

Analogous functionality with a customizable SymbolPlotter.

LetterSimilarityPlotter

The SymbolPlotter used in this function.

Notes

For determination of the color, a measure called average normalized similarity is used.

The normalized similarity of one symbol a to another symbol b (both in aphabet X) is defined as

\[S_{norm}(a,b) = \frac{S(a,b) - \min\limits_x(S(a,x))} {\max\limits_x(S(a,x)) - \min\limits_x(S(a,x))}\]
\[a,b,x \in X\]

where S(x,y) is the similarity score of the two symbols x and y described in the substitution matrix. The similarity S(x,-) is always 0. As the normalization is conducted only with respect to a, the normalized similarity is not commutative.

The average normalized similarity of a symbol a is determined by averaging the normalized similarity over each symbol bi in the same alignment column.

\[S_{norm,av}(a) = \frac{1}{n-1} \left[\left(\sum\limits_{i=1}^n S_{norm}(a,b_i)\right) - S_{norm}(a,a)\right]\]

The normalized similarity of a to itself is subtracted, because a does also occur in bi.