search
#
- biotite.database.rcsb.search(query, return_type='entry', range=None, sort_by=None, group_by=None, return_groups=False, content_types=('experimental',))[source]#
Get all PDB IDs that meet the given query requirements, via the RCSB search API.
This function requires an internet connection.
- Parameters:
- queryQuery
The search query.
- return_type{‘entry’, ‘assembly’, ‘polymer_entity’, ‘non_polymer_entity’, ‘polymer_instance’}, optional
The type of the returned identifiers:
'entry'
: Only the PDB ID is returned (e.g.'XXXX'
). These can be used directly a input tofetch()
.'assembly'
: The PDB ID appended with assembly ID is returned (e.g.'XXXX-1'
).'polymer_entity'
: The PDB ID appended with entity ID of polymers is returned (e.g.'XXXX_1'
).'non_polymer_entity'
: The PDB ID appended with entity ID of non-polymeric entities is returned (e.g.'XXXX_1'
).'polymer_instance'
: The PDB ID appended with chain ID (more exactly'asym_id'
) is returned (e.g.'XXXX.A'
).
- rangetuple(int, int), optional
If this parameter is specified, only PDB IDs in this range are selected from all matching PDB IDs and returned (pagination). The range is zero-indexed and the stop value is exclusive.
- sort_bystr or Sorting, optional
If specified, the returned PDB IDs are sorted by the values of the given field name. A complete list of the available fields is documented at https://search.rcsb.org/structure-search-attributes.html. and https://search.rcsb.org/chemical-search-attributes.html. If a string is given sorting is performed in descending order. To choose the order, a
Sorting
object needs to be provided.- group_byGrouping
If this parameter is set, the PDB IDs that meet the query requirements, are grouped according to the given criterion.
- return_groupsboolean, optional
Only has effect, if group_by is set. By default the representative with the highest rank in each group is returned. The rank is determined by the sort_by parameter of
Grouping
provided in group_by. If set to true, groups containing all structures belonging to the group are returned instead.- content_typesiterable of {“experimental”, “computational”}, optional
Specify whether experimental and computational structures should be included. At least one of them needs to be specified. By default only experimental structures are included. Note, that identifiers for computational structures cannot be downloaded via
biotite.database.rcsb.fetch()
as they point to AlphaFold DB and ModelArchive.
- Returns:
- idslist of str or dict (str -> list of str)
If return_groups is false (default case), a list of strings containing all PDB IDs that meet the query requirements is returned. If return_groups is set to true a dictionary of groups is returned. This dictionary maps group identifiers to a list of all PDB IDs belonging to this group.
Notes
If group_by is set, the number of results may be lower than in an ungrouped query, as grouping is not applicable to all structures. For example a DNA structure has no associated Uniprot accession and hence is omitted by
UniprotGrouping
.Also note that sort_by does not affect the order within a group. This order is determined by the sort_by parameter of the
Grouping
.Examples
>>> query = FieldQuery("reflns.d_resolution_high", less_or_equal=0.6) >>> print(sorted(search(query))) ['1EJG', '1I0T', '3NIR', '3P4J', '4JLJ', '5D8V', '5NW3', '7ATG', '7R0H'] >>> print(search(query, sort_by="rcsb_accession_info.initial_release_date")) ['7R0H', '7ATG', '5NW3', '5D8V', '4JLJ', '3P4J', '3NIR', '1I0T', '1EJG'] >>> print(search( ... query, range=(1,4), sort_by="rcsb_accession_info.initial_release_date" ... )) ['7ATG', '5NW3', '5D8V'] >>> print(sorted(search(query, return_type="polymer_instance"))) ['1EJG.A', '1I0T.A', '1I0T.B', '3NIR.A', '3P4J.A', '3P4J.B', '4JLJ.A', '4JLJ.B', '5D8V.A', '5NW3.A', '7ATG.A', '7ATG.B', '7R0H.A'] >>> print(search( ... query, return_type="polymer_entity", return_groups=True, ... group_by=UniprotGrouping(sort_by="rcsb_accession_info.initial_release_date"), ... )) {'P24297': ['5NW3_1'], 'P27707': ['4JLJ_1'], 'P80176': ['5D8V_1'], 'O29777': ['7R0H_1'], 'P01542': ['3NIR_1', '1EJG_1']}
Gallery#

Searching for structural homologs in a protein structure database