ScoreThresholdRule
#
- class biotite.sequence.align.ScoreThresholdRule(matrix, threshold)[source]#
Bases:
SimilarityRule
This similarity rule calculates all k-mers that have a greater or equal similarity score with a given k-mer than a defined threshold score.
The similarity score \(S\) of two k-mers \(a\) and \(b\) is defined as the sum of the pairwise similarity scores from a substitution matrix \(M\):
\[S(a,b) = \sum_{i=1}^k M(a_i, b_i)\]Therefore, this similarity rule allows substitutions with similar symbols within a k-mer.
This class is especially useful for finding similar k-mers in protein sequences.
- Parameters:
- matrixSubstitutionMatrix
The similarity scores are taken from this matrix. The matrix must be symmetric.
- thresholdint
The threshold score. A k-mer \(b\) is regarded as similar to a k-mer \(a\), if the similarity score between \(a\) and \(b\) is equal or greater than the threshold.
Notes
For efficient generation of similar k-mers an implementation of the branch-and-bound algorithm [1] is used.
References
Examples
>>> kmer_alphabet = KmerAlphabet(ProteinSequence.alphabet, k=3) >>> matrix = SubstitutionMatrix.std_protein_matrix() >>> rule = ScoreThresholdRule(matrix, threshold=15) >>> similars = rule.similar_kmers(kmer_alphabet, kmer_alphabet.encode("AIW")) >>> print(["".join(s) for s in kmer_alphabet.decode_multiple(similars)]) ['AFW', 'AIW', 'ALW', 'AMW', 'AVW', 'CIW', 'GIW', 'SIW', 'SVW', 'TIW', 'VIW', 'XIW']
- similar_kmers(kmer_alphabet, kmer)#
Calculate all similar k-mers for a given k-mer.
- Parameters:
- kmer_alphabetKmerAlphabet
The reference k-mer alphabet to select the k-mers from.
- kmerint
The symbol code for the k-mer to find similars for.
- Returns:
- similar_kmersndarray, dtype=np.int64
The symbol codes for all similar k-mers.