PDBFile
#
- class biotite.structure.io.pdb.PDBFile[source]#
Bases:
TextFile
This class represents a PDB file.
The usage of
biotite.structure.io.pdbx
is encouraged in favor of this class.This class only provides support for reading/writing the pure atom information (ATOM, HETATM, MODEL and ENDMDL records). TER records cannot be written. Additionally, REMARK records can be read
See also
CIFFile
BinaryCIFFile
Examples
Load a \*.pdb file, modify the structure and save the new structure into a new file:
>>> import os.path >>> file = PDBFile.read(os.path.join(path_to_structures, "1l2y.pdb")) >>> array_stack = file.get_structure() >>> array_stack_mod = rotate(array_stack, [1,2,3]) >>> file = PDBFile() >>> file.set_structure(array_stack_mod) >>> file.write(os.path.join(path_to_directory, "1l2y_mod.pdb"))
- copy()#
Create a deep copy of this object.
- Returns:
- copy
A copy of this object.
- get_assembly(assembly_id=None, model=None, altloc='first', extra_fields=[], include_bonds=False)#
Build the given biological assembly.
This function receives the data from
REMARK 350
records in the file. Consequently, this remark must be present in the file.- Parameters:
- assembly_idstr
The assembly to build. Available assembly IDs can be obtained via
list_assemblies()
.- modelint, optional
If this parameter is given, the function will return an
AtomArray
from the atoms corresponding to the given model number (starting at 1). Negative values are used to index models starting from the last model instead of the first model. If this parameter is omitted, anAtomArrayStack
containing all models will be returned, even if the structure contains only one model.- altloc{‘first’, ‘occupancy’, ‘all’}
- This parameter defines how altloc IDs are handled:
'first'
- Use atoms that have the first altloc ID appearing in a residue.'occupancy'
- Use atoms that have the altloc ID with the highest occupancy for a residue.'all'
- Use all atoms. Note that this leads to duplicate atoms. When this option is chosen, thealtloc_id
annotation array is added to the returned structure.
- extra_fieldslist of str, optional
The strings in the list are optional annotation categories that should be stored in the output array or stack. These are valid values:
'atom_id'
,'b_factor'
,'occupancy'
and'charge'
.- include_bondsbool, optional
If set to true, a
BondList
will be created for the resultingAtomArray
containing the bond information from the file. Bonds, whose order could not be determined from the Chemical Component Dictionary (e.g. especially inter-residue bonds), haveBondType.ANY
, since the PDB format itself does not support bond orders.
- Returns:
- assemblyAtomArray or AtomArrayStack
The assembly. The return type depends on the model parameter.
Examples
>>> import os.path >>> file = PDBFile.read(os.path.join(path_to_structures, "1f2n.pdb")) >>> assembly = file.get_assembly(model=1)
- get_b_factor(model=None)#
Get only the B-factors from the PDB file.
- Parameters:
- modelint, optional
If this parameter is given, the function will return a 1D B-factor array from the atoms corresponding to the given model number (starting at 1). Negative values are used to index models starting from the last model instead of the first model. If this parameter is omitted, an 2D B-factor array containing all models will be returned, even if the structure contains only one model.
- Returns:
- b_factorndarray, shape=(m,n) or shape=(n,), dtype=float
The B-factors read from the ATOM and HETATM records of the file.
Notes
Note that
get_b_factor()
may output more B-factors than the atom array (stack) from the correspondingget_structure()
call has atoms. The reason for this is, thatget_structure()
filters altloc IDs, while get_b_factor() does not.
- get_coord(model=None)#
Get only the coordinates from the PDB file.
- Parameters:
- modelint, optional
If this parameter is given, the function will return a 2D coordinate array from the atoms corresponding to the given model number (starting at 1). Negative values are used to index models starting from the last model instead of the first model. If this parameter is omitted, an 3D coordinate array containing all models will be returned, even if the structure contains only one model.
- Returns:
- coordndarray, shape=(m,n,3) or shape=(n,3), dtype=float
The coordinates read from the ATOM and HETATM records of the file.
Notes
Note that
get_coord()
may output more coordinates than the atom array (stack) from the correspondingget_structure()
call has. The reason for this is, thatget_structure()
filters altloc IDs, while get_coord() does not.Examples
Read an
AtomArrayStack
from multiple PDB files, where each PDB file contains the same atoms but different positions. This is an efficient approach when a trajectory is spread into multiple PDB files, as done e.g. by the Rosetta modeling software.For the purpose of this example, the PDB files are created from an existing
AtomArrayStack
.>>> import os.path >>> from tempfile import gettempdir >>> file_names = [] >>> for i in range(atom_array_stack.stack_depth()): ... pdb_file = PDBFile() ... pdb_file.set_structure(atom_array_stack[i]) ... file_name = os.path.join(gettempdir(), f"model_{i+1}.pdb") ... pdb_file.write(file_name) ... file_names.append(file_name) >>> print(file_names) ['...model_1.pdb', '...model_2.pdb', ..., '...model_38.pdb']
Now the PDB files are used to create an
AtomArrayStack
, where each model represents a different model.Construct a new
AtomArrayStack
with annotations taken from one of the created files used as template and coordinates from all of the PDB files.>>> template_file = PDBFile.read(file_names[0]) >>> template = template_file.get_structure() >>> coord = [] >>> for i, file_name in enumerate(file_names): ... pdb_file = PDBFile.read(file_name) ... coord.append(pdb_file.get_coord(model=1)) >>> new_stack = from_template(template, np.array(coord))
The newly created
AtomArrayStack
should now be equal to theAtomArrayStack
the PDB files were created from.>>> print(np.allclose(new_stack.coord, atom_array_stack.coord)) True
- get_model_count()#
Get the number of models contained in the PDB file.
- Returns:
- model_countint
The number of models.
- get_remark(number)#
Get the lines containing the REMARK records with the given number.
- Parameters:
- numberint
The REMARK number, i.e. the XXX in
REMARK XXX
.
- Returns:
- remark_linesNone or list of str
The content of the selected REMARK lines. Each line is an element of this list. The
REMARK XXX `` part of each line is omitted. Furthermore, the first line, which always must be empty, is not included. ``None
is returned, if the selected REMARK records do not exist in the file.
Examples
>>> import os.path >>> file = PDBFile.read(os.path.join(path_to_structures, "1l2y.pdb")) >>> remarks = file.get_remark(900) >>> print("\n".join(remarks)) RELATED ENTRIES RELATED ID: 5292 RELATED DB: BMRB BMRB 5292 IS CHEMICAL SHIFTS FOR TC5B IN BUFFER AND BUFFER CONTAINING 30 VOL-% TFE. RELATED ID: 1JRJ RELATED DB: PDB 1JRJ IS AN ANALAGOUS C-TERMINAL STRUCTURE. >>> nonexistent_remark = file.get_remark(999) >>> print(nonexistent_remark) None
- get_structure(model=None, altloc='first', extra_fields=[], include_bonds=False)#
Get an
AtomArray
orAtomArrayStack
from the PDB file.This function parses standard base-10 PDB files as well as hybrid-36 PDB.
- Parameters:
- modelint, optional
If this parameter is given, the function will return an
AtomArray
from the atoms corresponding to the given model number (starting at 1). Negative values are used to index models starting from the last model instead of the first model. If this parameter is omitted, anAtomArrayStack
containing all models will be returned, even if the structure contains only one model.- altloc{‘first’, ‘occupancy’, ‘all’}
- This parameter defines how altloc IDs are handled:
'first'
- Use atoms that have the first altloc ID appearing in a residue.'occupancy'
- Use atoms that have the altloc ID with the highest occupancy for a residue.'all'
- Use all atoms. Note that this leads to duplicate atoms. When this option is chosen, thealtloc_id
annotation array is added to the returned structure.
- extra_fieldslist of str, optional
The strings in the list are optional annotation categories that should be stored in the output array or stack. These are valid values:
'atom_id'
,'b_factor'
,'occupancy'
and'charge'
.- include_bondsbool, optional
If set to true, a
BondList
will be created for the resultingAtomArray
containing the bond information from the file. Bonds, whose order could not be determined from the Chemical Component Dictionary (e.g. especially inter-residue bonds), haveBondType.ANY
, since the PDB format itself does not support bond orders.
- Returns:
- arrayAtomArray or AtomArrayStack
The return type depends on the model parameter.
- get_symmetry_mates(model=None, altloc='first', extra_fields=[], include_bonds=False)#
Build a structure model containing all symmetric copies of the structure within a single unit cell, given by the space group.
This function receives the data from
REMARK 290
records in the file. Consequently, this remark must be present in the file, which is usually only true for crystal structures.- Parameters:
- modelint, optional
If this parameter is given, the function will return an
AtomArray
from the atoms corresponding to the given model number (starting at 1). Negative values are used to index models starting from the last model instead of the first model. If this parameter is omitted, anAtomArrayStack
containing all models will be returned, even if the structure contains only one model.- altloc{‘first’, ‘occupancy’, ‘all’}
- This parameter defines how altloc IDs are handled:
'first'
- Use atoms that have the first altloc ID appearing in a residue.'occupancy'
- Use atoms that have the altloc ID with the highest occupancy for a residue.'all'
- Use all atoms. Note that this leads to duplicate atoms. When this option is chosen, thealtloc_id
annotation array is added to the returned structure.
- extra_fieldslist of str, optional
The strings in the list are optional annotation categories that should be stored in the output array or stack. These are valid values:
'atom_id'
,'b_factor'
,'occupancy'
and'charge'
.- include_bondsbool, optional
If set to true, a
BondList
will be created for the resultingAtomArray
containing the bond information from the file. Bonds, whose order could not be determined from the Chemical Component Dictionary (e.g. especially inter-residue bonds), haveBondType.ANY
, since the PDB format itself does not support bond orders.
- Returns:
- symmetry_matesAtomArray or AtomArrayStack
All atoms within a single unit cell. The return type depends on the model parameter.
Notes
To expand the structure beyond a single unit cell, use
repeat_box()
with the return value as its input.Examples
>>> import os.path >>> file = PDBFile.read(os.path.join(path_to_structures, "1aki.pdb")) >>> atoms_in_unit_cell = file.get_symmetry_mates(model=1)
- list_assemblies()#
List the biological assemblies that are available for the structure in the given file.
This function receives the data from the
REMARK 300
records in the file. Consequently, this remark must be present in the file.- Returns:
- assemblieslist of str
A list that contains the available assembly IDs.
Examples
>>> import os.path >>> file = PDBFile.read(os.path.join(path_to_structures, "1f2n.pdb")) >>> print(file.list_assemblies()) ['1']
- classmethod read(file)#
Parse a file (or file-like object).
- Parameters:
- filefile-like object or str
The file to be read. Alternatively a file path can be supplied.
- Returns:
- file_objectFile
An instance from the respective
File
subclass representing the parsed file.
- static read_iter(file)#
Create an iterator over each line of the given text file.
- Parameters:
- filefile-like object or str
The file to be read. Alternatively a file path can be supplied.
- Yields:
- linestr
The current line in the file.
- set_structure(array, hybrid36=False)#
Set the
AtomArray
orAtomArrayStack
for the file.This makes also use of the optional annotation arrays
'atom_id'
,'b_factor'
,'occupancy'
and'charge'
. If the atom array (stack) contains the annotation'atom_id'
, these values will be used for atom numbering instead of continuous numbering.- Parameters:
- arrayAtomArray or AtomArrayStack
The array or stack to be saved into this file. If a stack is given, each array in the stack is saved as separate model.
- hybrid36: bool, optional
Defines wether the file should be written in hybrid-36 format.
Notes
If array has an associated
BondList
,CONECT
records are also written for all non-water hetero residues and all inter-residue connections.
- write(file)#
Write the contents of this object into a file (or file-like object).
- Parameters:
- filefile-like object or str
The file to be written to. Alternatively a file path can be supplied.
- static write_iter(file, lines)#
Iterate over the given lines of text and write each line into the specified file.
In contrast to
write()
, each line of text is not stored in an intermediateTextFile
, but is directly written to the file. Hence, this static method may save a large amount of memory if a large file should be written, especially if the lines are provided as generator.- Parameters:
- filefile-like object or str
The file to be written to. Alternatively a file path can be supplied.
- linesgenerator or array-like of str
The lines of text to be written. Must not include line break characters.
Gallery#
Plotting the base pairs of a tRNA-like-structure