biotite.sequence.align.RandomPermutation

class biotite.sequence.align.RandomPermutation[source]

Bases: Permutation

Provide a pseudo-randomized order for k-mers.

Notes

This class uses a simple full-period linear congruential generator (LCG) to provide pseudo-randomized values:

\[\text{order} = (a c_\text{k-mer} + 1) \mod 2^64.\]

The factor \(a\) is taken from 1 to ensure full periodicity and good random behavior. However, note that LCGs in general do not provide perfect random behavior, but only good-enough values for this purpose.

References

1

G. Steele, S. Vigna, “Computationally easy, spectrally good multipliers for congruential pseudorandom number generators,” arXiv, September 2021. doi: 10.48550/arXiv.2001.05304

Examples

>>> kmer_alph = KmerAlphabet(NucleotideSequence.alphabet_unamb, k=2)
>>> permutation = RandomPermutation()
>>> # k-mer codes representing the k-mers from 'AA' to 'TT'
>>> # in lexicographic order
>>> kmer_codes = np.arange(len(kmer_alph))
>>> print(kmer_codes)
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]
>>> print(["".join(kmer_alph.decode(c)) for c in kmer_codes])
['AA', 'AC', 'AG', 'AT', 'CA', 'CC', 'CG', 'CT', 'GA', 'GC', 'GG', 'GT', 'TA', 'TC', 'TG', 'TT']
>>> # Shuffle order of these k-mer codes using the permutation
>>> order = permutation.permute(kmer_codes)
>>> print(order)
[                   1 -3372029247567499370 -6744058495134998741
  8330656331007053504  4958627083439554133  1586597835872054762
 -1785431411695444609 -5157460659262943980 -8529489906830443351
  6545224919311608894  3173195671744109523  -198833575823389848
 -3570862823390889219 -6942892070958388590  8131822755183663655
  4759793507616164284]
>>> # The order is not lexicographic anymore
>>> kmer_codes = kmer_codes[np.argsort(order)]
>>> print(["".join(kmer_alph.decode(c)) for c in kmer_codes])
['GA', 'TC', 'AG', 'CT', 'TA', 'AC', 'CG', 'GT', 'AA', 'CC', 'GG', 'TT', 'CA', 'GC', 'TG', 'AT']
Attributes
min, max: int

The minimum and maximum value, the permutated value (i.e. the return value of permute()) can take.

permute(kmers)

Give the given k-mers a new order.

Parameters
kmersndarray, dtype=np.int64

The k-mers to reorder given as k-mer code.

Returns
orderndarray, dtype=np.int64

The sort key for the new order, i.e. a k-mer A is smaller than k-mer B, if order[A] < order[B] The order value may not only contain positive but also negative integers. The order is unambiguous: If A != B, then order[A] != order[B].