biotite.structure.pseudoknots

biotite.structure.pseudoknots(base_pairs, scores=None, max_pseudoknot_order=None)[source]

Identify the pseudoknot order for each base pair in a given set of base pairs.

By default the algorithm removes base pairs until the remaining base pairs are completely nested i.e. no pseudoknots appear. The pseudoknot order of the removed base pairs is incremented and the procedure is repeated with these base pairs. Base pairs are removed in a way that maximizes the number of remaining base pairs. However, an optional score for each individual base pair can be provided.

Parameters
base_pairsndarray, dtype=int, shape=(n,2)

The base pairs to determine the pseudoknot order of. Each row represents indices form two paired bases. The structure of the ndarray is equal to the structure of the output of base_pairs(), where the indices represent the beginning of the residues.

scoresndarray, dtype=int, shape=(n,), optional

The score for each base pair. By default, the score of each base pair is 1.

max_pseudoknot_orderint, optional

The maximum pseudoknot order to be found. If a base pair would be of a higher order, its order is specified as -1. By default, the algorithm is run until all base pairs have an assigned pseudoknot order.

Returns
pseudoknot_orderndarray, dtype=int, shape=(m,n)

The pseudoknot order of the input base_pairs. Multiple solutions that maximize the number of basepairs or the given score, respectively, may be possible. Therefore all m individual solutions are returned.

Notes

The dynamic programming approach by Smit et al 1 is applied to detect pseudoknots. The algorithm was originally developed to remove pseudoknots from a structure. However, if it is run iteratively on removed knotted pairs it can be used to identify the pseudoknot order.

The pseudoknot order is defined as the minimum number of base pair set decompositions resulting in a nested structure 2. Therefore, there are no pseudoknots between base pairs with the same pseudoknot order.

References

1

S. Smit, K. Rother, J. Heringa, R. Knight, “From knotted to nested RNA structures: A variety of computational methods for pseudoknot removal,” RNA, vol. 14, pp. 410–416, March 2008. doi: 10.1261/rna.881308

2

M. Antczak, M. Popenda, T. Zok, M. Zurkowski, R. W. Adamiak, M. Szachniuk, “New algorithms to represent complex pseudoknotted RNA structures in dot-bracket notation,” Bioinformatics, vol. 34, pp. 1304–1312, April 2018. doi: 10.1093/bioinformatics/btx783

Examples

Remove the pseudoknotted base pair for the sequence ABCbac, where the corresponding big and small letters each represent a base pair:

Define the base pairs as ndarray:

>>> basepairs = np.array([[0, 4],
...                       [1, 3],
...                       [2, 5]])

Find the unknotted base pairs, optimizing for the maximum number of base pairs:

>>> print(pseudoknots(basepairs, max_pseudoknot_order=0))
[[ 0  0 -1]]

This indicates that the base pair Cc is a pseudoknot.

Given the length of the sequence (6 bases), we can also represent the unknotted structure in dot bracket notation:

>>> print(dot_bracket(basepairs, 6, max_pseudoknot_order=0)[0])
((.)).

If the maximum pseudoknot order is not restricted, the order of the knotted pairs is determined and can be represented using dot bracket letter notation:

>>> print(pseudoknots(basepairs))
[[0 0 1]]
>>> print(dot_bracket(basepairs, 6)[0])
(([))]