`PDBFile`#

class biotite.structure.io.pdb.PDBFile[source]#

Bases: TextFile

This class represents a PDB file.

The usage of biotite.structure.io.pdbx is encouraged in favor of this class.

This class only provides support for reading/writing the pure atom information (ATOM, HETATM, MODEL and ENDMDL records). TER records cannot be written. Additionally, REMARK records can be read

See also

CIFFile: Interface to CIF files, a modern replacement for PDB files.
BinaryCIFFile: Interface to BinaryCIF files, a binary variant of CIF files.

Examples

Load a \*.pdb file, modify the structure and save the new structure into a new file:

>>> import os.path
>>> file = PDBFile.read(os.path.join(path_to_structures, "1l2y.pdb"))
>>> array_stack = file.get_structure()
>>> array_stack_mod = rotate(array_stack, [1,2,3])
>>> file = PDBFile()
>>> file.set_structure(array_stack_mod)
>>> file.write(os.path.join(path_to_directory, "1l2y_mod.pdb"))

copy()#

Create a deep copy of this object.

Returns:

copy: A copy of this object.

get_assembly(assembly_id=None, model=None, altloc='first', extra_fields=[], include_bonds=False)#

Build the given biological assembly.

This function receives the data from REMARK 350 records in the file. Consequently, this remark must be present in the file.

Parameters:

assembly_idstr

The assembly to build. Available assembly IDs can be obtained via list_assemblies().

modelint, optional

If this parameter is given, the function will return an AtomArray from the atoms corresponding to the given model number (starting at 1). Negative values are used to index models starting from the last model instead of the first model. If this parameter is omitted, an AtomArrayStack containing all models will be returned, even if the structure contains only one model.

altloc{‘first’, ‘occupancy’, ‘all’}

This parameter defines how altloc IDs are handled:

'first' - Use atoms that have the first altloc ID appearing in a residue.
'occupancy' - Use atoms that have the altloc ID with the highest occupancy for a residue.
'all' - Use all atoms. Note that this leads to duplicate atoms. When this option is chosen, the altloc_id annotation array is added to the returned structure.

extra_fieldslist of str, optional

The strings in the list are optional annotation categories that should be stored in the output array or stack. These are valid values: 'atom_id', 'b_factor', 'occupancy' and 'charge'.

include_bondsbool, optional

If set to true, a BondList will be created for the resulting AtomArray containing the bond information from the file. Bonds, whose order could not be determined from the Chemical Component Dictionary (e.g. especially inter-residue bonds), have BondType.ANY, since the PDB format itself does not support bond orders.

Returns:

assemblyAtomArray or AtomArrayStack: The assembly. The return type depends on the model parameter. Contains the sym_id annotation, which enumerates the copies of the asymmetric unit in the assembly.

Examples

>>> import os.path
>>> file = PDBFile.read(os.path.join(path_to_structures, "1f2n.pdb"))
>>> assembly = file.get_assembly(model=1)

get_b_factor(model=None)#

Get only the B-factors from the PDB file.

Parameters:

modelint, optional: If this parameter is given, the function will return a 1D B-factor array from the atoms corresponding to the given model number (starting at 1). Negative values are used to index models starting from the last model instead of the first model. If this parameter is omitted, an 2D B-factor array containing all models will be returned, even if the structure contains only one model.

Returns:

b_factorndarray, shape=(m,n) or shape=(n,), dtype=float: The B-factors read from the ATOM and HETATM records of the file.

Notes

Note that get_b_factor() may output more B-factors than the atom array (stack) from the corresponding get_structure() call has atoms. The reason for this is, that get_structure() filters altloc IDs, while get_b_factor() does not.

get_coord(model=None)#

Get only the coordinates from the PDB file.

Parameters:

modelint, optional: If this parameter is given, the function will return a 2D coordinate array from the atoms corresponding to the given model number (starting at 1). Negative values are used to index models starting from the last model instead of the first model. If this parameter is omitted, an 3D coordinate array containing all models will be returned, even if the structure contains only one model.

Returns:

coordndarray, shape=(m,n,3) or shape=(n,3), dtype=float: The coordinates read from the ATOM and HETATM records of the file.

Notes

Note that get_coord() may output more coordinates than the atom array (stack) from the corresponding get_structure() call has. The reason for this is, that get_structure() filters altloc IDs, while get_coord() does not.

Examples

Read an AtomArrayStack from multiple PDB files, where each PDB file contains the same atoms but different positions. This is an efficient approach when a trajectory is spread into multiple PDB files, as done e.g. by the Rosetta modeling software.

For the purpose of this example, the PDB files are created from an existing AtomArrayStack.

>>> import os.path
>>> from tempfile import gettempdir
>>> file_names = []
>>> for i in range(atom_array_stack.stack_depth()):
...     pdb_file = PDBFile()
...     pdb_file.set_structure(atom_array_stack[i])
...     file_name = os.path.join(gettempdir(), f"model_{i+1}.pdb")
...     pdb_file.write(file_name)
...     file_names.append(file_name)
>>> print(file_names)
['...model_1.pdb', '...model_2.pdb', ..., '...model_38.pdb']

Now the PDB files are used to create an AtomArrayStack, where each model represents a different model.

Construct a new AtomArrayStack with annotations taken from one of the created files used as template and coordinates from all of the PDB files.

>>> template_file = PDBFile.read(file_names[0])
>>> template = template_file.get_structure()
>>> coord = []
>>> for i, file_name in enumerate(file_names):
...     pdb_file = PDBFile.read(file_name)
...     coord.append(pdb_file.get_coord(model=1))
>>> new_stack = from_template(template, np.array(coord))

The newly created AtomArrayStack should now be equal to the AtomArrayStack the PDB files were created from.

>>> print(np.allclose(new_stack.coord, atom_array_stack.coord))
True

get_model_count()#

Get the number of models contained in the PDB file.

Returns:

model_countint: The number of models.

get_remark(number)#

Get the lines containing the REMARK records with the given number.

Parameters:

numberint: The REMARK number, i.e. the XXX in REMARK XXX.

Returns:

remark_linesNone or list of str: The content of the selected REMARK lines. Each line is an element of this list. The REMARK XXX `` part of each line is omitted. Furthermore, the first line, which always must be empty, is not included. ``None is returned, if the selected REMARK records do not exist in the file.

Examples

>>> import os.path
>>> file = PDBFile.read(os.path.join(path_to_structures, "1l2y.pdb"))
>>> remarks = file.get_remark(900)
>>> print("\n".join(remarks))
RELATED ENTRIES
RELATED ID: 5292   RELATED DB: BMRB
BMRB 5292 IS CHEMICAL SHIFTS FOR TC5B IN BUFFER AND BUFFER
CONTAINING 30 VOL-% TFE.
RELATED ID: 1JRJ   RELATED DB: PDB
1JRJ IS AN ANALAGOUS C-TERMINAL STRUCTURE.
>>> nonexistent_remark = file.get_remark(999)
>>> print(nonexistent_remark)
None

get_space_group()#

Extract the space group and Z value from the CRYST1 record.

Returns:

space_groupstr: The extracted space group.
z_valint: The extracted Z value.

get_structure(model=None, altloc='first', extra_fields=[], include_bonds=False)#

Get an AtomArray or AtomArrayStack from the PDB file.

This function parses standard base-10 PDB files as well as hybrid-36 PDB.

Parameters:

modelint, optional

If this parameter is given, the function will return an AtomArray from the atoms corresponding to the given model number (starting at 1). Negative values are used to index models starting from the last model instead of the first model. If this parameter is omitted, an AtomArrayStack containing all models will be returned, even if the structure contains only one model.

altloc{‘first’, ‘occupancy’, ‘all’}

This parameter defines how altloc IDs are handled:

'first' - Use atoms that have the first altloc ID appearing in a residue.
'occupancy' - Use atoms that have the altloc ID with the highest occupancy for a residue.
'all' - Use all atoms. Note that this leads to duplicate atoms. When this option is chosen, the altloc_id annotation array is added to the returned structure.

extra_fieldslist of str, optional

The strings in the list are optional annotation categories that should be stored in the output array or stack. These are valid values: 'atom_id', 'b_factor', 'occupancy' and 'charge'.

include_bondsbool, optional

If set to true, a BondList will be created for the resulting AtomArray containing the bond information from the file. Bonds, whose order could not be determined from the Chemical Component Dictionary (e.g. especially inter-residue bonds), have BondType.ANY, since the PDB format itself does not support bond orders.

Returns:

arrayAtomArray or AtomArrayStack: The return type depends on the model parameter.

get_symmetry_mates(model=None, altloc='first', extra_fields=[], include_bonds=False)#

Build a structure model containing all symmetric copies of the structure within a single unit cell, given by the space group.

This function receives the data from REMARK 290 records in the file. Consequently, this remark must be present in the file, which is usually only true for crystal structures.

DEPRECATED: Use get_unit_cell() instead.

Parameters:

modelint, optional

If this parameter is given, the function will return an AtomArray from the atoms corresponding to the given model number (starting at 1). Negative values are used to index models starting from the last model instead of the first model. If this parameter is omitted, an AtomArrayStack containing all models will be returned, even if the structure contains only one model.

altloc{‘first’, ‘occupancy’, ‘all’}

This parameter defines how altloc IDs are handled:

'first' - Use atoms that have the first altloc ID appearing in a residue.
'occupancy' - Use atoms that have the altloc ID with the highest occupancy for a residue.
'all' - Use all atoms. Note that this leads to duplicate atoms. When this option is chosen, the altloc_id annotation array is added to the returned structure.

extra_fieldslist of str, optional

The strings in the list are optional annotation categories that should be stored in the output array or stack. These are valid values: 'atom_id', 'b_factor', 'occupancy' and 'charge'.

include_bondsbool, optional

If set to true, a BondList will be created for the resulting AtomArray containing the bond information from the file. Bonds, whose order could not be determined from the Chemical Component Dictionary (e.g. especially inter-residue bonds), have BondType.ANY, since the PDB format itself does not support bond orders.

Returns:

symmetry_matesAtomArray or AtomArrayStack: All atoms within a single unit cell. The return type depends on the model parameter.

Notes

To expand the structure beyond a single unit cell, use repeat_box() with the return value as its input.

Examples

>>> import os.path
>>> file = PDBFile.read(os.path.join(path_to_structures, "1aki.pdb"))
>>> atoms_in_unit_cell = file.get_symmetry_mates(model=1)

get_unit_cell(model=None, altloc='first', extra_fields=[], include_bonds=False)#

Build a structure model containing all symmetric copies of the structure within a single unit cell, given by the space group.

This function receives the data from REMARK 290 records in the file. Consequently, this remark must be present in the file, which is usually only true for crystal structures.

Parameters:

modelint, optional

If this parameter is given, the function will return an AtomArray from the atoms corresponding to the given model number (starting at 1). Negative values are used to index models starting from the last model instead of the first model. If this parameter is omitted, an AtomArrayStack containing all models will be returned, even if the structure contains only one model.

altloc{‘first’, ‘occupancy’, ‘all’}

This parameter defines how altloc IDs are handled:

'first' - Use atoms that have the first altloc ID appearing in a residue.
'occupancy' - Use atoms that have the altloc ID with the highest occupancy for a residue.
'all' - Use all atoms. Note that this leads to duplicate atoms. When this option is chosen, the altloc_id annotation array is added to the returned structure.

extra_fieldslist of str, optional

The strings in the list are optional annotation categories that should be stored in the output array or stack. These are valid values: 'atom_id', 'b_factor', 'occupancy' and 'charge'.

include_bondsbool, optional

If set to true, a BondList will be created for the resulting AtomArray containing the bond information from the file. Bonds, whose order could not be determined from the Chemical Component Dictionary (e.g. especially inter-residue bonds), have BondType.ANY, since the PDB format itself does not support bond orders.

Returns:

symmetry_matesAtomArray or AtomArrayStack: All atoms within a single unit cell. The return type depends on the model parameter.

Notes

To expand the structure beyond a single unit cell, use repeat_box() with the return value as its input.

Examples

>>> import os.path
>>> file = PDBFile.read(os.path.join(path_to_structures, "1aki.pdb"))
>>> atoms_in_unit_cell = file.get_unit_cell(model=1)

list_assemblies()#

List the biological assemblies that are available for the structure in the given file.

This function receives the data from the REMARK 300 records in the file. Consequently, this remark must be present in the file.

Returns:

assemblieslist of str: A list that contains the available assembly IDs.

Examples

>>> import os.path
>>> file = PDBFile.read(os.path.join(path_to_structures, "1f2n.pdb"))
>>> print(file.list_assemblies())
['1']

classmethod read(file)#

Parse a file (or file-like object).

Parameters:

filefile-like object or str: The file to be read. Alternatively a file path can be supplied.

Returns:

fileFile: An instance from the respective File subclass representing the parsed file.

static read_iter(file)#

Create an iterator over each line of the given text file.

Parameters:

filefile-like object or str: The file to be read. Alternatively a file path can be supplied.

Yields:

linestr: The current line in the file.

set_space_group(info)#

Update the CRYST1 record with the provided space group and Z value.

Parameters:

infotuple(str, int) or SpaceGroupInfo: Contains the space group and Z-value.

set_structure(array, hybrid36=False)#

Set the AtomArray or AtomArrayStack for the file.

This makes also use of the optional annotation arrays 'atom_id', 'b_factor', 'occupancy' and 'charge'. If the atom array (stack) contains the annotation 'atom_id', these values will be used for atom numbering instead of continuous numbering.

Parameters:

arrayAtomArray or AtomArrayStack: The array or stack to be saved into this file. If a stack is given, each array in the stack is saved as separate model.
hybrid36bool, optional: Defines whether the file should be written in hybrid-36 format.

Notes

If array has an associated BondList, CONECT records are also written for all non-water hetero residues and all inter-residue connections.

write(file)#

Write the contents of this object into a file (or file-like object).

Parameters:

filefile-like object or str: The file to be written to. Alternatively a file path can be supplied.

static write_iter(file, lines)#

Iterate over the given lines of text and write each line into the specified file.

In contrast to write(), each line of text is not stored in an intermediate TextFile, but is directly written to the file. Hence, this static method may save a large amount of memory if a large file should be written, especially if the lines are provided as generator.

Parameters:

filefile-like object or str: The file to be written to. Alternatively a file path can be supplied.
linesgenerator or array-like of str: The lines of text to be written. Must not include line break characters.

Gallery#

Plotting the base pairs of a tRNA-like-structure

Leontis-Westhof Nomenclature

PDBFile#

Gallery#

`PDBFile`#