biotite.structure.io.pdb.PDBFile

class biotite.structure.io.pdb.PDBFile[source]

Bases: TextFile

This class represents a PDB file.

The usage of biotite.structure.io.pdbx is encouraged in favor of this class.

This class only provides support for reading/writing the pure atom information (ATOM, HETATM, MODEL and ENDMDL records). TER records cannot be written. Additionally, REMARK records can be read

See also

CIFFile
BinaryCIFFile

Examples

Load a \*.pdb file, modify the structure and save the new structure into a new file:

>>> import os.path
>>> file = PDBFile.read(os.path.join(path_to_structures, "1l2y.pdb"))
>>> array_stack = file.get_structure()
>>> array_stack_mod = rotate(array_stack, [1,2,3])
>>> file = PDBFile()
>>> file.set_structure(array_stack_mod)
>>> file.write(os.path.join(path_to_directory, "1l2y_mod.pdb"))
copy()

Create a deep copy of this object.

Returns
copy

A copy of this object.

get_assembly(assembly_id=None, model=None, altloc='first', extra_fields=[], include_bonds=False)

Build the given biological assembly.

This function receives the data from REMARK 350 records in the file. Consequently, this remark must be present in the file.

Parameters
assembly_idstr

The assembly to build. Available assembly IDs can be obtained via list_assemblies().

modelint, optional

If this parameter is given, the function will return an AtomArray from the atoms corresponding to the given model number (starting at 1). Negative values are used to index models starting from the last model instead of the first model. If this parameter is omitted, an AtomArrayStack containing all models will be returned, even if the structure contains only one model.

altloc{‘first’, ‘occupancy’, ‘all’}
This parameter defines how altloc IDs are handled:
  • 'first' - Use atoms that have the first altloc ID appearing in a residue.

  • 'occupancy' - Use atoms that have the altloc ID with the highest occupancy for a residue.

  • 'all' - Use all atoms. Note that this leads to duplicate atoms. When this option is chosen, the altloc_id annotation array is added to the returned structure.

extra_fieldslist of str, optional

The strings in the list are optional annotation categories that should be stored in the output array or stack. These are valid values: 'atom_id', 'b_factor', 'occupancy' and 'charge'.

include_bondsbool, optional

If set to true, a BondList will be created for the resulting AtomArray containing the bond information from the file. Bonds, whose order could not be determined from the Chemical Component Dictionary (e.g. especially inter-residue bonds), have BondType.ANY, since the PDB format itself does not support bond orders.

Returns
assemblyAtomArray or AtomArrayStack

The assembly. The return type depends on the model parameter.

Examples

>>> import os.path
>>> file = PDBFile.read(os.path.join(path_to_structures, "1f2n.pdb"))
>>> assembly = file.get_assembly(model=1)
get_b_factor(model=None)

Get only the B-factors from the PDB file.

Parameters
modelint, optional

If this parameter is given, the function will return a 1D B-factor array from the atoms corresponding to the given model number (starting at 1). Negative values are used to index models starting from the last model instead of the first model. If this parameter is omitted, an 2D B-factor array containing all models will be returned, even if the structure contains only one model.

Returns
b_factorndarray, shape=(m,n) or shape=(n,), dtype=float

The B-factors read from the ATOM and HETATM records of the file.

Notes

Note that get_b_factor() may output more B-factors than the atom array (stack) from the corresponding get_structure() call has atoms. The reason for this is, that get_structure() filters altloc IDs, while get_b_factor() does not.

get_coord(model=None)

Get only the coordinates from the PDB file.

Parameters
modelint, optional

If this parameter is given, the function will return a 2D coordinate array from the atoms corresponding to the given model number (starting at 1). Negative values are used to index models starting from the last model instead of the first model. If this parameter is omitted, an 3D coordinate array containing all models will be returned, even if the structure contains only one model.

Returns
coordndarray, shape=(m,n,3) or shape=(n,3), dtype=float

The coordinates read from the ATOM and HETATM records of the file.

Notes

Note that get_coord() may output more coordinates than the atom array (stack) from the corresponding get_structure() call has. The reason for this is, that get_structure() filters altloc IDs, while get_coord() does not.

Examples

Read an AtomArrayStack from multiple PDB files, where each PDB file contains the same atoms but different positions. This is an efficient approach when a trajectory is spread into multiple PDB files, as done e.g. by the Rosetta modeling software.

For the purpose of this example, the PDB files are created from an existing AtomArrayStack.

>>> import os.path
>>> from tempfile import gettempdir
>>> file_names = []
>>> for i in range(atom_array_stack.stack_depth()):
...     pdb_file = PDBFile()
...     pdb_file.set_structure(atom_array_stack[i])
...     file_name = os.path.join(gettempdir(), f"model_{i+1}.pdb")
...     pdb_file.write(file_name)
...     file_names.append(file_name)
>>> print(file_names)
['...model_1.pdb', '...model_2.pdb', ..., '...model_38.pdb']

Now the PDB files are used to create an AtomArrayStack, where each model represents a different model.

Construct a new AtomArrayStack with annotations taken from one of the created files used as template and coordinates from all of the PDB files.

>>> template_file = PDBFile.read(file_names[0])
>>> template = template_file.get_structure()
>>> coord = []
>>> for i, file_name in enumerate(file_names):
...     pdb_file = PDBFile.read(file_name)
...     coord.append(pdb_file.get_coord(model=1))
>>> new_stack = from_template(template, np.array(coord))

The newly created AtomArrayStack should now be equal to the AtomArrayStack the PDB files were created from.

>>> print(np.allclose(new_stack.coord, atom_array_stack.coord))
True
get_model_count()

Get the number of models contained in the PDB file.

Returns
model_countint

The number of models.

get_remark(number)

Get the lines containing the REMARK records with the given number.

Parameters
numberint

The REMARK number, i.e. the XXX in REMARK XXX.

Returns
remark_linesNone or list of str

The content of the selected REMARK lines. Each line is an element of this list. The REMARK XXX `` part of each line is omitted. Furthermore, the first line, which always must be empty, is not included. ``None is returned, if the selected REMARK records do not exist in the file.

Examples

>>> import os.path
>>> file = PDBFile.read(os.path.join(path_to_structures, "1l2y.pdb"))
>>> remarks = file.get_remark(900)
>>> print("\n".join(remarks))
RELATED ENTRIES
RELATED ID: 5292   RELATED DB: BMRB
BMRB 5292 IS CHEMICAL SHIFTS FOR TC5B IN BUFFER AND BUFFER
CONTAINING 30 VOL-% TFE.
RELATED ID: 1JRJ   RELATED DB: PDB
1JRJ IS AN ANALAGOUS C-TERMINAL STRUCTURE.
>>> nonexistent_remark = file.get_remark(999)
>>> print(nonexistent_remark)
None
get_structure(model=None, altloc='first', extra_fields=[], include_bonds=False)

Get an AtomArray or AtomArrayStack from the PDB file.

This function parses standard base-10 PDB files as well as hybrid-36 PDB.

Parameters
modelint, optional

If this parameter is given, the function will return an AtomArray from the atoms corresponding to the given model number (starting at 1). Negative values are used to index models starting from the last model instead of the first model. If this parameter is omitted, an AtomArrayStack containing all models will be returned, even if the structure contains only one model.

altloc{‘first’, ‘occupancy’, ‘all’}
This parameter defines how altloc IDs are handled:
  • 'first' - Use atoms that have the first altloc ID appearing in a residue.

  • 'occupancy' - Use atoms that have the altloc ID with the highest occupancy for a residue.

  • 'all' - Use all atoms. Note that this leads to duplicate atoms. When this option is chosen, the altloc_id annotation array is added to the returned structure.

extra_fieldslist of str, optional

The strings in the list are optional annotation categories that should be stored in the output array or stack. These are valid values: 'atom_id', 'b_factor', 'occupancy' and 'charge'.

include_bondsbool, optional

If set to true, a BondList will be created for the resulting AtomArray containing the bond information from the file. Bonds, whose order could not be determined from the Chemical Component Dictionary (e.g. especially inter-residue bonds), have BondType.ANY, since the PDB format itself does not support bond orders.

Returns
arrayAtomArray or AtomArrayStack

The return type depends on the model parameter.

get_symmetry_mates(model=None, altloc='first', extra_fields=[], include_bonds=False)

Build a structure model containing all symmetric copies of the structure within a single unit cell, given by the space group.

This function receives the data from REMARK 290 records in the file. Consequently, this remark must be present in the file, which is usually only true for crystal structures.

Parameters
modelint, optional

If this parameter is given, the function will return an AtomArray from the atoms corresponding to the given model number (starting at 1). Negative values are used to index models starting from the last model instead of the first model. If this parameter is omitted, an AtomArrayStack containing all models will be returned, even if the structure contains only one model.

altloc{‘first’, ‘occupancy’, ‘all’}
This parameter defines how altloc IDs are handled:
  • 'first' - Use atoms that have the first altloc ID appearing in a residue.

  • 'occupancy' - Use atoms that have the altloc ID with the highest occupancy for a residue.

  • 'all' - Use all atoms. Note that this leads to duplicate atoms. When this option is chosen, the altloc_id annotation array is added to the returned structure.

extra_fieldslist of str, optional

The strings in the list are optional annotation categories that should be stored in the output array or stack. These are valid values: 'atom_id', 'b_factor', 'occupancy' and 'charge'.

include_bondsbool, optional

If set to true, a BondList will be created for the resulting AtomArray containing the bond information from the file. Bonds, whose order could not be determined from the Chemical Component Dictionary (e.g. especially inter-residue bonds), have BondType.ANY, since the PDB format itself does not support bond orders.

Returns
symmetry_matesAtomArray or AtomArrayStack

All atoms within a single unit cell. The return type depends on the model parameter.

Notes

To expand the structure beyond a single unit cell, use repeat_box() with the return value as its input.

Examples

>>> import os.path
>>> file = PDBFile.read(os.path.join(path_to_structures, "1aki.pdb"))
>>> atoms_in_unit_cell = file.get_symmetry_mates(model=1)
list_assemblies()

List the biological assemblies that are available for the structure in the given file.

This function receives the data from the REMARK 300 records in the file. Consequently, this remark must be present in the file.

Returns
assemblieslist of str

A list that contains the available assembly IDs.

Examples

>>> import os.path
>>> file = PDBFile.read(os.path.join(path_to_structures, "1f2n.pdb"))
>>> print(file.list_assemblies())
['1']
classmethod read(file)

Parse a file (or file-like object).

Parameters
filefile-like object or str

The file to be read. Alternatively a file path can be supplied.

Returns
file_objectFile

An instance from the respective File subclass representing the parsed file.

static read_iter(file)

Create an iterator over each line of the given text file.

Parameters
filefile-like object or str

The file to be read. Alternatively a file path can be supplied.

Yields
linestr

The current line in the file.

set_structure(array, hybrid36=False)

Set the AtomArray or AtomArrayStack for the file.

This makes also use of the optional annotation arrays 'atom_id', 'b_factor', 'occupancy' and 'charge'. If the atom array (stack) contains the annotation 'atom_id', these values will be used for atom numbering instead of continuous numbering.

Parameters
arrayAtomArray or AtomArrayStack

The array or stack to be saved into this file. If a stack is given, each array in the stack is saved as separate model.

hybrid36: bool, optional

Defines wether the file should be written in hybrid-36 format.

Notes

If array has an associated BondList, CONECT records are also written for all non-water hetero residues and all inter-residue connections.

write(file)

Write the contents of this object into a file (or file-like object).

Parameters
filefile-like object or str

The file to be written to. Alternatively a file path can be supplied.

static write_iter(file, lines)

Iterate over the given lines of text and write each line into the specified file.

In contrast to write(), each line of text is not stored in an intermediate TextFile, but is directly written to the file. Hence, this static method may save a large amount of memory if a large file should be written, especially if the lines are provided as generator.

Parameters
filefile-like object or str

The file to be written to. Alternatively a file path can be supplied.

linesgenerator or array-like of str

The lines of text to be written. Must not include line break characters.