FrequencyPermutation
#
- class biotite.sequence.align.FrequencyPermutation(kmer_alphabet, counts)[source]#
Bases:
Permutation
Provide an order for k-mers from a given
KmerAlphabet
, such that less frequent k-mers are smaller than more frequent k-mers. The frequency of each k-mer can either be given directly via the constructor or can be computed from aKmerTable
viafrom_table()
.- Parameters:
- kmer_alphabetKmerAlphabet, length=n
The k-mer alphabet that defines the range of possible k-mers that should be permuted.
- countsndarray, shape=(n,), dtype=np.int64
The absolute frequency, i.e. the number of occurrences, of each k-mer in kmer_alphabet in the sequence database of interest.
counts[c] = f
, wherec
is the k-mer code andf
is the corresponding frequency.
Notes
In actual sequences some sequence patterns appear in high quantity. When selecting a subset of k-mers, e.g. via
MinimizerSelector
, it is desireable to select the low-frequency informative k-mers to avoid spurious matches. To achieve such selection this class can be used.This class uses a table to look up the order. Hence, the memory consumption is \(8 n^k\) bytes, where \(n\) is the size of the base alphabet and \(k\) is the k-mer size.
Examples
>>> alphabet = LetterAlphabet("abcdr") >>> sequence = GeneralSequence(alphabet, "abracadabra") >>> kmer_table = KmerTable.from_sequences(k=2, sequences=[sequence]) >>> print(kmer_table) ab: (0, 0), (0, 7) ac: (0, 3) ad: (0, 5) br: (0, 1), (0, 8) ca: (0, 4) da: (0, 6) ra: (0, 2), (0, 9) >>> # Create all k-mers in lexicographic order >>> kmer_alph = kmer_table.kmer_alphabet >>> kmer_codes = np.arange(0, len(kmer_alph)) >>> print(["..."] + ["".join(kmer_alph.decode(c)) for c in kmer_codes[-10:]]) ['...', 'da', 'db', 'dc', 'dd', 'dr', 'ra', 'rb', 'rc', 'rd', 'rr'] >>> # After applying the permutation the k-mers are ordered >>> # by their frequency in the table >>> # -> the most frequent k-mers have low rank >>> permutation = FrequencyPermutation.from_table(kmer_table) >>> order = permutation.permute(kmer_codes) >>> print(order) [ 0 22 18 19 1 2 3 4 5 23 20 6 7 8 9 21 10 11 12 13 24 14 15 16 17] >>> kmer_codes = kmer_codes[np.argsort(order)] >>> print(["..."] + ["".join(kmer_alph.decode(c)) for c in kmer_codes[-10:]]) ['...', 'rc', 'rd', 'rr', 'ac', 'ad', 'ca', 'da', 'ab', 'br', 'ra']
- Attributes:
- min, max: int
The minimum and maximum value, the permutated value (i.e. the return value of
permute()
) can take.- kmer_alphabetKmerAlphabet
The k-mer alphabet that defines the range of possible k-mers that should be permuted.
- static from_table(kmer_table)#
Create a
FrequencyPermutation
from the k-mer counts of aKmerTable
.- Parameters:
- kmer_tableKmerTable
The k-mer counts are taken from this table.
- Returns:
- permutationFrequencyPermutation
The permutation is based on the counts.
- permute(kmers)#
Give the given k-mers a new order.
- Parameters:
- kmersndarray, dtype=np.int64
The k-mers to reorder given as k-mer code.
- Returns:
- orderndarray, dtype=np.int64
The sort key for the new order, i.e. a k-mer
A
is smaller than k-merB
, iforder[A] < order[B]
The order value may not only contain positive but also negative integers. The order is unambiguous: IfA != B
, thenorder[A] != order[B]
.