biotite.sequence.io.fasta.FastaFile

class biotite.sequence.io.fasta.FastaFile(chars_per_line=80)[source]

Bases: TextFile, MutableMapping

This class represents a file in FASTA format.

A FASTA file contains so called header lines, beginning with >, that describe following sequence. The corresponding sequence starts at the line after the header line and ends at the next header line or at the end of file. The header along with its sequence forms an entry.

This class is used in a dictionary like manner, implementing the MutableMapping interface: Headers (without the leading >) are used as keys, and strings containing the sequences are the corresponding values. Entries can be accessed using indexing, del deletes the entry at the given index.

Parameters
chars_per_lineint, optional

The number characters in a line containing sequence data after which a line break is inserted. Only relevant, when adding sequences to a file. Default is 80.

Examples

>>> import os.path
>>> file = FastaFile()
>>> file["seq1"] = "ATACT"
>>> print(file["seq1"])
ATACT
>>> file["seq2"] = "AAAATT"
>>> print(file)
>seq1
ATACT
>seq2
AAAATT
>>> print(dict(file.items()))
{'seq1': 'ATACT', 'seq2': 'AAAATT'}
>>> for header, seq in file.items():
...     print(header, seq)
seq1 ATACT
seq2 AAAATT
>>> del file["seq1"]
>>> print(dict(file.items()))
{'seq2': 'AAAATT'}
>>> file.write(os.path.join(path_to_directory, "test.fasta"))
clear() None.  Remove all items from D.
copy()

Create a deep copy of this object.

Returns
copy

A copy of this object.

get(k[, d]) D[k] if k in D, else d.  d defaults to None.
items() a set-like object providing a view on D's items
keys() a set-like object providing a view on D's keys
pop(k[, d]) v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() (k, v), remove and return some (key, value) pair

as a 2-tuple; but raise KeyError if D is empty.

classmethod read(file, chars_per_line=80)

Read a FASTA file.

Parameters
filefile-like object or str

The file to be read. Alternatively a file path can be supplied.

chars_per_lineint, optional

The number characters in a line containing sequence data after which a line break is inserted. Only relevant, when adding sequences to a file. Default is 80.

Returns
file_objectFastaFile

The parsed file.

static read_iter(file)

Create an iterator over each sequence of the given FASTA file.

Parameters
filefile-like object or str

The file to be read. Alternatively a file path can be supplied.

Yields
headerstr

The header of the current sequence.

seq_strstr

The current sequence as string.

Notes

This approach gives the same results as FastaFile.read(file).items(), but is slightly faster and much more memory efficient.

setdefault(k[, d]) D.get(k,d), also set D[k]=d if k not in D
update([E, ]**F) None.  Update D from mapping/iterable E and F.

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values() an object providing a view on D's values
write(file)

Write the contents of this object into a file (or file-like object).

Parameters
filefile-like object or str

The file to be written to. Alternatively a file path can be supplied.

static write_iter(file, items, chars_per_line=80)

Iterate over the given items and write each item into the specified file.

In contrast to write(), the lines of text are not stored in an intermediate TextFile, but are directly written to the file. Hence, this static method may save a large amount of memory if a large file should be written, especially if the items are provided as generator.

Parameters
filefile-like object or str

The file to be written to. Alternatively a file path can be supplied.

itemsgenerator or array-like of tuple(str, str)

The entries to be written into the file. Each entry consists of an header string and a sequence string.

chars_per_lineint, optional

The number characters in a line containing sequence data after which a line break is inserted. Only relevant, when adding sequences to a file. Default is 80.

Notes

This method does not test, whether the given identifiers are unambiguous.