biotite.sequence.io.genbank.GenBankFile

class biotite.sequence.io.genbank.GenBankFile[source]

Bases: biotite.file.TextFile

This class represents a file in GenBank format.

A GenBank file provides 3 kinds of information: At first it contains some general information about the file, like IDs, database relations and source organism. Secondly it contains sequence annotations, i.e. the positions of a reference sequence, that fulfill certain roles, like promoters or coding sequences. At last the file contains optionally the reference sequence.

As of now, GenBank files can only be parsed, writing GenBank files is not supported at this point.

Examples

>>> import os.path
>>> file = GenBankFile()
>>> file.read(os.path.join(path_to_sequences, "ec_bl21.gb"))
>>> print(file.get_definition())
Escherichia coli BL21(DE3), complete genome.
>>> features = [f for f in file.get_annotation(include_only=["CDS"])
...             if "gene" in f.qual and "lac" in f.qual["gene"]]
>>> for f in sorted(features):
...     if "gene" in f.qual and "lac" in f.qual["gene"]:
...         for loc in f.locs:
...             print(f.qual["gene"], loc.strand, loc.first, loc.last)
lacA Strand.REVERSE 330784 331395
lacY Strand.REVERSE 331461 332714
lacZ Strand.REVERSE 332766 335840
lacI Strand.REVERSE 335963 337045
lacI Strand.FORWARD 748736 749818
read(file)[source]

Parse a file (or file-like object) and store the content in this object.

Parameters:
file_name : file-like object or str

The file to be read. Alternatively a file path cen be supplied.

write(file)[source]

Not implemented yet.

get_locus()[source]

Parse the LOCUS field of the file.

Returns:
locus_dict : dict

A dictionary storing the locus name, length, type, division and date.

Examples

>>> import os.path
>>> file = GenBankFile()
>>> file.read(os.path.join(path_to_sequences, "ec_bl21.gb"))
>>> for key, val in file.get_locus().items():
...     print(key, ":", val)
name : CP001509
length : 4558953
type : DNA circular
division : BCT
date : 16-FEB-2017
get_definition()[source]

Parse the DEFINITION field of the file.

Returns:
definition : str

Content of the DEFINITION field.

get_accession()[source]

Parse the ACCESSION field of the file.

Returns:
accession : str

Content of the ACCESSION field.

get_version()[source]

Parse the VERSION field of the file.

Returns:
version : str

Content of the VERSION field. Does not include GI.

get_gi()[source]

Get the GI of the file.

Returns:
gi : str

The GI of the file.

Parse the DBLINK field of the file.

Returns:
link_dict : dict

A dictionary storing the database links, with the database name as key, and the corresponding ID as value.

Examples

>>> import os.path
>>> file = GenBankFile()
>>> file.read(os.path.join(path_to_sequences, "ec_bl21.gb"))
>>> for key, val in file.get_db_link().items():
...     print(key, ":", val)
BioProject : PRJNA20713
BioSample : SAMN02603478
get_source()[source]

Parse the SOURCE field of the file.

Returns:
source : str

Organism name corresponding to this file.

get_references()[source]

Parse the REFERENCE fields of the file.

Returns:
ref_list : list

A list, where each element is a dictionary storing the reference information for one reference.

get_comment()[source]

Parse the COMMENT field of the file.

Returns:
comment : str

Content of the COMMENT field.

get_annotation(include_only=None)[source]

Get the sequence annotation from the ANNOTATION field.

Parameters:
include_only : iterable object, optional

List of names of feature keys (str), which should included in the annotation. By default all features are included.

Returns:
annotation : Annotation

Sequence annotation from the file.

get_sequence()[source]

Get the sequence from the ORIGIN field.

Returns:
sequence : NucleotideSequence

The reference sequence in the file.

get_annotated_sequence(include_only=None)[source]

Get an annotated sequence by combining the ANNOTATION and ORIGIN fields.

Parameters:
include_only : iterable object, optional

List of names of feature keys (str), which should included in the annotation. By default all features are included.

Returns:
annot_seq : AnnotatedSequence

The annotated sequence.

copy()

Create a deep copy of this object.

Returns:
copy

A copy of this object.