ScoreThresholdRule#

class biotite.sequence.align.ScoreThresholdRule(matrix, threshold)[source]#

Bases: SimilarityRule

This similarity rule calculates all k-mers that have a greater or equal similarity score with a given k-mer than a defined threshold score.

The similarity score \(S\) of two k-mers \(a\) and \(b\) is defined as the sum of the pairwise similarity scores from a substitution matrix \(M\):

\[S(a,b) = \sum_{i=1}^k M(a_i, b_i)\]

Therefore, this similarity rule allows substitutions with similar symbols within a k-mer.

This class is especially useful for finding similar k-mers in protein sequences.

Parameters:
matrixSubstitutionMatrix

The similarity scores are taken from this matrix. The matrix must be symmetric.

thresholdint

The threshold score. A k-mer \(b\) is regarded as similar to a k-mer \(a\), if the similarity score between \(a\) and \(b\) is equal or greater than the threshold.

Notes

For efficient generation of similar k-mers an implementation of the branch-and-bound algorithm [1] is used.

References

Examples

>>> kmer_alphabet = KmerAlphabet(ProteinSequence.alphabet, k=3)
>>> matrix = SubstitutionMatrix.std_protein_matrix()
>>> rule = ScoreThresholdRule(matrix, threshold=15)
>>> similars = rule.similar_kmers(kmer_alphabet, kmer_alphabet.encode("AIW"))
>>> print(["".join(s) for s in kmer_alphabet.decode_multiple(similars)])
['AFW', 'AIW', 'ALW', 'AMW', 'AVW', 'CIW', 'GIW', 'SIW', 'SVW', 'TIW', 'VIW', 'XIW']
similar_kmers(kmer_alphabet, kmer)#

Calculate all similar k-mers for a given k-mer.

Parameters:
kmer_alphabetKmerAlphabet

The reference k-mer alphabet to select the k-mers from.

kmerint

The symbol code for the k-mer to find similars for.

Returns:
similar_kmersndarray, dtype=np.int64

The symbol codes for all similar k-mers.