Writing source code#
Scope#
The scope of Biotite are methods that make up the backbone of computational molecular biology. Thus, new functionalities added to Biotite should be relatively general and well established.
Code of which the purpose is too special could be published as extension package instead.
Consistency#
New functionalities should act on the existing central classes, if applicable to keep the code as uniform as possible. Specifically, these include
biotite.sequence.Sequence
and its subclasses,biotite.sequence.Annotation
, includingbiotite.sequence.Feature
andbiotite.sequence.Location
,biotite.sequence.Profile
,biotite.application.Application
and its subclasses,and in general
numpy.ndarray
.
If you think that the currently available classes miss a central object in bioinformatics, you might consider opening an issue on GitHub or reach out to the maintainers.
Small helper classes for a functionality (for example an Enum
for a
function parameter) is also permitted, as long as it does not introduce a
redundancy with the classes mentioned above.
Python version and interpreter#
The package supports all minor Python versions released in the last 42 months (NEP 29). In consequence, language features that were introduced after the oldest supported Python version are not allowed. This time span balances the support for older Python versions as well as the ability to use more recent features of the programming language.
Furthermore, this package is currently made for usage with CPython. Official support for PyPy might be added someday.
Code style#
Biotite is compliant with PEP 8 and uses Ruff for code formatting and linting. The maximum line length is 88 characters. An exception is made for docstring lines, if it is not possible to use a maximum of 88 characters (e.g. tables and parameter type descriptions). To make code changes ready for a pull request, simply run
$ ruff format
$ ruff check --fix
and fix the remaining linter complaints.
Dependencies#
Biotite aims to rely only on a few dependencies to keep the installation
small.
However optional dependencies for a specific dependency are also allowed if
necessary.
In this case add your special dependency to the list of extra
requirements in install.rst
.
The import statement for the dependency should be located directly inside the
function or class, rather than module level, to ensure that the package is not
required for any other functionality or for building the API documentation.
An example for this approach are the plotting functions in
biotite.sequence.graphics
, that require Matplotlib.
Code efficiency#
The central aims of Biotite are that it is both, convenient and fast. Therefore, the code should be vectorized as much as possible using NumPy. In cases the problem cannot be reasonably or conveniently solved this way, writing modules in Cython is the preferred way to go. Writing extensions directly in C/C++ is discouraged due to the bad readability. Writing extensions in other programming languages (e.g. in Rust via PyO3) is currently not permitted to keep the build process simple.
Docstrings#
Biotite uses
numpydoc
formatted docstrings for its documentation.
These docstrings can be interpreted by Sphinx via the numpydoc
extension.
All publicly accessible attributes must be fully documented.
This includes functions, classes, methods, instance and class variables and the
__init__
modules:
The __init__
module documentation summarizes the content of the entire
subpackage, since the single modules are not visible to the user.
In the class docstring, the class itself is described and the constructor is
documented.
The publicly accessible instance variables are documented under the
Attributes headline, while class variables are documented in their separate
docstrings.
Methods do not need to be summarized in the class docstring.
Module imports#
In Biotite, the user imports packages in contrast to single modules
(similar to NumPy).
In order for that to work, the __init__.py
file of each Biotite
subpackage needs to import all of its modules, whose content is publicly
accessible, in a relative manner.
from .module1 import *
from .module2 import *
Import statements should be the only statements in a __init__.py
file.
In case a module needs functionality from another subpackage of Biotite,
use an absolute import as suggested by PEP 8.
This import should target the module directly and not the package to avoid
circular imports and thus an ImportError
.
So import statements like the following are totally OK:
from biotite.subpackage.module import foo
In order to prevent namespace pollution, all modules must define the __all__ variable with all publicly accessible attributes of the module.
Versioning#
Biotite adopts Semantic Versioning for its releases. This means that the version number is composed of three parts:
Major version: Incremented when incompatible API changes are made.
Minor version: Incremented when a new functionality is added in a backwards compatible manner.
Patch version: Incremented when backwards compatible bug fixes are made.
Note, that such backwards incompatible changes in minor/patch versions are only disallowed regarding the public API. This means that names and types of parameters and the type of the return value must not be changed in any function/class documented in the API reference. However, behavioral changes (especially small ones) are allowed.
Although minor versions may not remove existing functionalities, they can deprecate them by
marking them as deprecated via a notice in the docstring and
raising a DeprecationWarning when a deprecated functionality is used.
This gives the user a heads-up that the functionality will be removed soon. In the next major version, deprecated functionalities can be removed entirely.
Extension packages#
Biotite extension packages are Python packages that provide further
functionality for Biotite objects (AtomArray
, Sequence
,
etc.)
or offer objects that build up on these ones.
There can be good reasons why one could choose to publish code as extension package instead of contributing it directly to the Biotite project:
Independent development
An incompatible license
The code’s use cases are too specialized
Unsuitable dependencies
Extensions written in a non-permitted programming language
If your code fulfills the following conditions
extends Biotite functionality
is documented
is well tested
you can open an issue to ask for addition of the package to the extension package page.