Tutorial#
Getting started#
Biotite is a Python package that facilitates everyday tasks in sequence and structure bioinformatics by providing a broad set of tools functionalities for handling files, analyzing data and visualizing results. This tutorial should give newcomers a quick tour through the central functionalities of this package and how they can be used in combination. Thus, the following chapters use rather simple examples. If you are more interested in application of Biotite on real-world problems, have a look at the example gallery.
Installation#
Biotite is available for pip and Conda package managers. You can install the package simply via
$ pip install biotite
or
$ conda install -c conda-forge biotite
respectively.
If the installation was successful, you should be able to import and use Biotite, for example
import biotite.sequence as seq
print(seq.ProteinSequence("BIQTITE*IS*INSTALLED"))
BIQTITE*IS*INSTALLED
If you experience issues or search for other installation methods, have a look at the installation page.
Overview#
Biotite is split into 4 subpackages:
The biotite.sequence
subpackage contains functionality for
working with sequence information of any kind.
The package contains by default sequence types for nucleotides and
proteins, but the alphabet-based implementation allows simple
integration of own sequence types, even if they do not rely on letters.
Beside the support for different file formats, the package includes general
purpose functions for sequence manipulations and a comprehensive
modular systems for sequence alignments.
The biotite.structure
subpackage enables handling of 3D
structures of biomolecules.
Simplified, a structure is represented by NumPy arrays for atom coordinates
and annotations (residue names, elements, charges, etc.).
This renders operations applied to this structure representation very fast and
scales from single models to entire ensembles (e.g. molecular dynamics
trajectories).
Structures can be read and written from many popular file formats - from the
ancient PDB to the modern BinaryCIF.
The subpackage provides functionalities for filtering, measuring, editing,
superimposing structures and much more.
The biotite.database
subpackage is all about downloading data
from biological databases, that can be subsequently used in the aforementioned
subpackages.
It allows searching for database entries by specifying and combining criteria
in a Pythonic way and thereby conceals the complexity of the underlying REST
API of the database.
The biotite.application
subpackage extends the repertoire of Biotite’s
analysis functions with interfaces for external software.
These range from locally installed programs (e.g. Clustal Omega) to web
applications (e.g. NCBI BLAST).
The interfaces are seamless:
The input and output are sequence and structure objects, file input/output and
the command line interface is handled internally.
It is basically very similar to using normal functions.
The following chapters will take you on a journey through the functionalities provided by the mentioned subpackages.
Note
The files used in this tutorial will be stored in a temporary directory. So make sure to put the files you want keep somewhere else.