biotite.sequence.align.FrequencyPermutation¶
- class biotite.sequence.align.FrequencyPermutation(kmer_alphabet, counts)[source]¶
Bases:
Permutation
Provide an order for k-mers from a given
KmerAlphabet
, such that less frequent k-mers are smaller than more frequent k-mers. The frequency of each k-mer can either be given directly via the constructor or can be computed from aKmerTable
viafrom_table()
.- Parameters
- kmer_alphabetKmerAlphabet, length=n
The k-mer alphabet that defines the range of possible k-mers that should be permuted.
- countsndarray, shape=(n,), dtype=np.int64
The absolute frequency, i.e. the number of occurrences, of each k-mer in kmer_alphabet in the sequence database of interest.
counts[c] = f
, wherec
is the k-mer code andf
is the corresponding frequency.
Notes
In actual sequences some sequence patterns appear in high quantity. When selecting a subset of k-mers, e.g. via
MinimizerSelector
, it is desireable to select the low-frequency informative k-mers to avoid spurious matches. To achieve such selection this class can be used.This class uses a table to look up the order. Hence, the memory consumption is \(8 n^k\) bytes, where \(n\) is the size of the base alphabet and \(k\) is the k-mer size.
Examples
>>> alphabet = LetterAlphabet("abcdr") >>> sequence = GeneralSequence(alphabet, "abracadabra") >>> kmer_table = KmerTable.from_sequences(k=2, sequences=[sequence]) >>> print(kmer_table) ab: (0, 0), (0, 7) ac: (0, 3) ad: (0, 5) br: (0, 1), (0, 8) ca: (0, 4) da: (0, 6) ra: (0, 2), (0, 9) >>> # Create all k-mers in lexicographic order >>> kmer_alph = kmer_table.kmer_alphabet >>> kmer_codes = np.arange(0, len(kmer_alph)) >>> print(["..."] + ["".join(kmer_alph.decode(c)) for c in kmer_codes[-10:]]) ['...', 'da', 'db', 'dc', 'dd', 'dr', 'ra', 'rb', 'rc', 'rd', 'rr'] >>> # After applying the permutation the k-mers are ordered >>> # by their frequency in the table >>> # -> the most frequent k-mers have low rank >>> permutation = FrequencyPermutation.from_table(kmer_table) >>> order = permutation.permute(kmer_codes) >>> print(order) [ 0 24 20 19 16 15 14 13 12 22 21 10 11 8 7 18 6 5 4 3 23 2 1 9 17] >>> kmer_codes = kmer_codes[np.argsort(order)] >>> print(["..."] + ["".join(kmer_alph.decode(c)) for c in kmer_codes[-10:]]) ['...', 'ba', 'ar', 'rr', 'da', 'ad', 'ac', 'ca', 'br', 'ra', 'ab']
- Attributes
- min, max: int
The minimum and maximum value, the permutated value (i.e. the return value of
permute()
) can take.
- from_table()¶
Create a
FrequencyPermutation
from the k-mer counts of aKmerTable
.- Parameters
- kmer_tableKmerTable
The k-mer counts are taken from this table.
- Returns
- permutationFrequencyPermutation
The permutation is based on the counts.
- permute(kmers)¶
Give the given k-mers a new order.
- Parameters
- kmersndarray, dtype=np.int64
The k-mers to reorder given as k-mer code.
- Returns
- orderndarray, dtype=np.int64
The sort key for the new order, i.e. a k-mer
A
is smaller than k-merB
, iforder[A] < order[B]
The order value may not only contain positive but also negative integers. The order is unambiguous: IfA != B
, thenorder[A] != order[B]
.