Skip to main content
Ctrl+K
Biotite - Home
  • Tutorial
  • Installation
  • API Reference
  • Examples
  • Extensions
  • Contributor guide
  • Logo
  • GitHub
  • PyPI
  • News
  • Tutorial
  • Installation
  • API Reference
  • Examples
  • Extensions
  • Contributor guide
  • Logo
  • GitHub
  • PyPI
  • News

Section Navigation

  • Sequence examples
    • Homology and alignment
      • Pairwise sequence alignment of protein sequences
      • Customized visualization of a multiple sequence alignment
      • Finding homologous regions in two genomes
      • Finding homologs of a gene in a genome
      • Phylogenetic tree of a protein family
      • Hydropathy and conservation of ion channels
      • Dendrogram of a protein family
      • Homology search and multiple sequence alignment
      • Conservation of binding site
      • Fetching and aligning a protein from different species
      • Display sequence similarity in a heat map
      • Plot epitope mapping data onto protein sequence alignments
      • Mutual information as measure for coevolution of residues
      • Polymorphisms in a gene
    • Sequence read quality control and mapping
      • Quantifying gene expression from RNA-seq data
      • Comparative genome assembly
      • Quality control of sequencing data
      • Quality of sequence reads
    • Sequence profiles
      • Sequence logo of sequences with equal length
      • Identification of a binding site by sequence conservation
    • Features and annotations
      • Feature map of a synthetic operon
      • Plasmid map of a vector
      • Visualization of a custom plasmid
      • Visualization of a region in proximity to a feature
      • Domains of bacterial sigma factors
    • Miscellaneous
      • Dendrogram of a substitution matrix
      • Calculation of codon usage
      • Biotite color schemes
      • Biotite color schemes for protein sequences
      • Statistics of local alignments and the E-value
      • Identification of potential open reading frames
  • Structure examples
    • Protein backbone and secondary structure
      • Assembly of a straight peptide from sequence
      • Ramachandran plot of dynein motor domain
      • Determination of amino acid enantiomers
      • Arrangement of beta-sheets
      • Three ways to get the secondary structure of a protein
    • Nucleic acid base pairs and secondary structure
      • Plotting the base pairs of a tRNA-like-structure
      • Leontis-Westhof Nomenclature
      • Comparison of a tRNA-like-structure with a tRNA
      • Visualization of Watson-Crick base pairs
    • Small molecules
      • Enumeration of alkane isomers
      • Molecular visualization of a small molecule using Matplotlib
      • Partial charge distribution
    • Proximity and contacts
      • Construction of an adjacency matrix
      • Contact sites of protein-DNA interaction
      • Detection of disulfide bonds
      • Hydrogen bonds between protein domains
      • Identification of lipid bilayer leaflets
    • Molecular dynamics and docking
      • Docking a ligand to a receptor
      • Basic analysis of a MD simulation
      • BinaryCIF as trajectory format
      • LDDT for predicted structure evaluation
      • Visualization of normal modes from an elastic network model
      • Creation of an amino acid rotamer library
      • Analysis of solvation shells
      • Secondary structure during an MD simulation
      • Cavity solvation in different states of an ion channel
    • Structural alphabets
      • Multiple Structural alignment of orthologous proteins
      • Searching for structural homologs in a protein structure database
    • Miscellaneous
      • Biological assembly of a structure
      • Calculation of protein diameter
      • Identifying unresolved regions in protein structures
      • Visualization of glycosylated amino acids
      • Superimposition of homologous protein structures
      • Annual releases of PDB structures
  • Examples
  • Sequence profiles
  • Identificati...

Note

Go to the end to download the full example code.

Identification of a binding site by sequence conservation#

In this example we identify the ribosomal binding site on mRNA, also called Shine-Dalgarno sequence, in Escherichia coli.

In the beginning of the translation the 16S rRNA of the ribosome recognizes this purine-rich region on the mRNA, which typically lies a few bases upstream of the start codon. After binding the sequence, the ribosome starts scanning the mRNA in downstream direction to locate the start codon.

rbs identification
# Code source: Patrick Kunzmann
# License: BSD 3 clause

import tempfile
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import numpy as np
from matplotlib.patches import Patch
import biotite
import biotite.database.entrez as entrez
import biotite.sequence as seq
import biotite.sequence.graphics as graphics
import biotite.sequence.io.genbank as gb

UTR_LENGTH = 20


### Get the E. coli K-12 genome as annotated sequence

gb_file = gb.GenBankFile.read(
    entrez.fetch("U00096", tempfile.gettempdir(), "gb", "nuccore", "gb")
)
# We are only interested in CDS features
bl21_genome = gb.get_annotated_sequence(gb_file, include_only=["CDS"])


### Extract sequences for 5' untranslated regions (UTRs)

# In this case we define the untranslated region, as the sequence
# up to UTR_LENGTH bases upstream from the start codon
utrs = []
for cds in bl21_genome.annotation:
    # Expect a single location for the feature,
    # since no splicing can occur
    # Ignore special cases like ribosomal slippage sites, etc.
    # for simplicity
    if len(cds.locs) != 1:
        continue
    # Get the only location for this feature
    loc = list(cds.locs)[0]
    # Get the region behind or before the CDS, based on the strand the
    # CDS is on
    if loc.strand == seq.Location.Strand.FORWARD:
        utr_start = loc.first - UTR_LENGTH
        utr_stop = loc.first
        # Include the start codon (3 bases) in the UTRs for later
        # visualization
        utrs.append(bl21_genome[utr_start : utr_stop + 3].sequence)
    else:
        utr_start = loc.last + 1
        utr_stop = loc.last + 1 + UTR_LENGTH
        utrs.append(
            bl21_genome[utr_start - 3 : utr_stop].sequence.reverse().complement()
        )


### Create profile

# Increase the counter for each base and position
# while iterating over the sequences
frequencies = np.zeros((UTR_LENGTH + 3, len(bl21_genome.sequence.alphabet)), dtype=int)
for utr in utrs:
    frequencies[np.arange(len(utr)), utr.code] += 1

profile = seq.SequenceProfile(
    symbols=frequencies,
    gaps=np.zeros(len(frequencies)),
    alphabet=bl21_genome.sequence.alphabet,
)


### Visualize the profile


# Spend extra effort for correct sequence postion labels
def normalize_seq_pos(x):
    """
    Normalize sequence position, so that the position of the upstream bases is negative.
    """
    # Sequence positions are always integers
    x = int(x)
    x -= UTR_LENGTH
    # There is no '0' position
    if x <= 0:
        x -= 1
    return x


@ticker.FuncFormatter
def sequence_loc_formatter(x, pos):
    x = normalize_seq_pos(x)
    return f"{x:+}"


COLOR_SCHEME = [
    biotite.colors["lightgreen"],  # A
    biotite.colors["orange"],  # C
    biotite.colors["dimgreen"],  # G
    biotite.colors["brightorange"],  # T
]

fig, ax = plt.subplots(figsize=(8.0, 3.0))
graphics.plot_sequence_logo(ax, profile, COLOR_SCHEME)

normalized_pos = np.array([normalize_seq_pos(x) for x in range(len(profile.symbols))])
tick_locs = np.where(np.isin(normalized_pos, [-15, -10, -5, -1, 1]))[0]
ax.set_xticks(tick_locs)
ax.xaxis.set_major_formatter(ticker.FuncFormatter(sequence_loc_formatter))

ax.set_xlabel("Residue position")
ax.set_ylabel("Conservation (Bits)")
ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.legend(
    loc="upper left",
    handles=[
        Patch(color=biotite.colors["green"], label="Purine"),
        Patch(color=biotite.colors["lightorange"], label="Pyrimidine"),
    ],
)

fig.tight_layout()

plt.show()

Download Jupyter notebook: rbs_identification.ipynb

Download Python source code: rbs_identification.py

Download zipped: rbs_identification.zip

Gallery generated by Sphinx-Gallery

Edit on GitHub
Show Source

© Copyright The Biotite contributors.

Created using Sphinx 8.2.3.

Built with the PyData Sphinx Theme 0.15.4.