Skip to main content
Ctrl+K
Biotite - Home
  • Tutorial
  • Installation
  • API Reference
  • Examples
  • Extensions
  • Contributor guide
  • Logo
  • GitHub
  • PyPI
  • News
  • Tutorial
  • Installation
  • API Reference
  • Examples
  • Extensions
  • Contributor guide
  • Logo
  • GitHub
  • PyPI
  • News

Section Navigation

  • biotite
    • File
    • TextFile
    • plot_scaled_text
    • AdaptiveFancyArrow
    • Copyable
    • DeserializationError
    • InvalidFileError
    • SerializationError
    • set_font_size_in_coord
  • biotite.application
    • Application
    • WebApp
    • LocalApp
    • MSAApp
    • AppState
    • AppStateError
    • RuleViolationError
    • TimeoutError
    • VersionError
    • requires_state
  • biotite.application.autodock
    • VinaApp
  • biotite.application.blast
    • BlastAlignment
    • BlastWebApp
  • biotite.application.clustalo
    • ClustalOmegaApp
  • biotite.application.dssp
    • DsspApp
  • biotite.application.mafft
    • MafftApp
  • biotite.application.muscle
    • Muscle5App
    • MuscleApp
  • biotite.application.sra
    • FastaDumpApp
    • FastqDumpApp
  • biotite.application.tantan
    • TantanApp
  • biotite.application.viennarna
    • RNAalifoldApp
    • RNAfoldApp
    • RNAplotApp
  • biotite.database
    • RequestError
  • biotite.database.afdb
    • fetch
  • biotite.database.entrez
    • Query
    • CompositeQuery
    • SimpleQuery
    • get_database_name
    • search
    • fetch
    • fetch_single_file
    • set_api_key
    • get_api_key
  • biotite.database.pubchem
    • Query
    • NameQuery
    • SmilesQuery
    • InchiQuery
    • InchiKeyQuery
    • FormulaQuery
    • SuperstructureQuery
    • SubstructureQuery
    • SimilarityQuery
    • IdentityQuery
    • search
    • fetch
    • fetch_property
    • ThrottleStatus
  • biotite.database.rcsb
    • Query
    • SingleQuery
    • CompositeQuery
    • BasicQuery
    • FieldQuery
    • SequenceQuery
    • MotifQuery
    • StructureQuery
    • Sorting
    • Grouping
    • DepositGrouping
    • IdentityGrouping
    • UniprotGrouping
    • count
    • search
    • fetch
  • biotite.database.uniprot
    • CompositeQuery
    • Query
    • SimpleQuery
    • fetch
    • search
  • biotite.interface
    • LossyConversionWarning
  • biotite.interface.openmm
    • from_context
    • from_state
    • from_states
    • from_topology
    • to_system
    • to_topology
  • biotite.interface.pymol
    • launch_pymol
    • launch_interactive_pymol
    • reset
    • setup_parameters
    • DuplicatePyMOLError
    • PyMOLObject
    • to_model
    • from_model
    • draw_cgo
    • get_cylinder_cgo
    • get_cone_cgo
    • get_sphere_cgo
    • get_point_cgo
    • get_line_cgo
    • get_multiline_cgo
    • draw_arrows
    • draw_box
    • show
    • play
    • ModifiedObjectError
    • NonexistentObjectError
    • RenderError
    • TimeoutError
  • biotite.interface.rdkit
    • from_mol
    • to_mol
  • biotite.sequence
    • Sequence
    • NucleotideSequence
    • ProteinSequence
    • GeneralSequence
    • Alphabet
    • LetterAlphabet
    • AlphabetMapper
    • AlphabetError
    • common_alphabet
    • Feature
    • Location
    • Annotation
    • AnnotatedSequence
    • find_subsequence
    • find_symbol
    • find_symbol_first
    • find_symbol_last
    • CodonTable
    • PositionalSequence
    • PurePositionalSequence
    • SequenceProfile
  • biotite.sequence.align
    • SubstitutionMatrix
    • align_ungapped
    • align_optimal
    • align_local_ungapped
    • align_local_gapped
    • align_banded
    • align_multiple
    • Alignment
    • get_codes
    • get_symbols
    • get_sequence_identity
    • get_pairwise_sequence_identity
    • score
    • KmerAlphabet
    • KmerTable
    • BucketKmerTable
    • SimilarityRule
    • ScoreThresholdRule
    • bucket_number
    • MinimizerSelector
    • SyncmerSelector
    • CachedSyncmerSelector
    • MincodeSelector
    • Permutation
    • RandomPermutation
    • FrequencyPermutation
    • CigarOp
    • read_alignment_from_cigar
    • write_alignment_to_cigar
    • EValueEstimator
    • find_terminal_gaps
    • remove_gaps
    • remove_terminal_gaps
  • biotite.sequence.graphics
    • plot_feature_map
    • plot_sequence_logo
    • plot_alignment
    • plot_alignment_similarity_based
    • plot_alignment_type_based
    • plot_dendrogram
    • SymbolPlotter
    • LetterPlotter
    • LetterSimilarityPlotter
    • LetterTypePlotter
    • FeaturePlotter
    • CodingPlotter
    • PromoterPlotter
    • TerminatorPlotter
    • RBSPlotter
    • MiscFeaturePlotter
    • load_color_scheme
    • get_color_scheme
    • list_color_scheme_names
    • ArrayPlotter
    • plot_alignment_array
    • plot_plasmid_map
  • biotite.sequence.io
    • load_sequence
    • load_sequences
    • save_sequence
    • save_sequences
  • biotite.sequence.io.fasta
    • FastaFile
    • get_alignment
    • get_sequence
    • get_sequences
    • set_alignment
    • set_sequence
    • set_sequences
  • biotite.sequence.io.fastq
    • FastqFile
    • get_sequence
    • get_sequences
    • set_sequence
    • set_sequences
  • biotite.sequence.io.genbank
    • GenBankFile
    • MultiFile
    • get_accession
    • get_annotated_sequence
    • get_annotation
    • get_db_link
    • get_definition
    • get_gi
    • get_locus
    • get_raw_sequence
    • get_sequence
    • get_source
    • get_version
    • set_annotated_sequence
    • set_annotation
    • set_locus
    • set_sequence
  • biotite.sequence.io.gff
    • GFFFile
    • get_annotation
    • set_annotation
  • biotite.sequence.phylo
    • Tree
    • TreeNode
    • TreeError
    • as_binary
    • upgma
    • neighbor_joining
  • biotite.structure
    • Atom
    • AtomArray
    • AtomArrayStack
    • concatenate
    • array
    • stack
    • repeat
    • from_template
    • vectors_from_unitcell
    • unitcell_from_vectors
    • box_volume
    • repeat_box
    • repeat_box_coord
    • move_inside_box
    • remove_pbc
    • remove_pbc_from_coord
    • coord_to_fraction
    • fraction_to_coord
    • is_orthogonal
    • BondList
    • BondType
    • connect_via_residue_names
    • connect_via_distances
    • find_connected
    • find_rotatable_bonds
    • displacement
    • index_displacement
    • distance
    • index_distance
    • angle
    • index_angle
    • dihedral
    • index_dihedral
    • centroid
    • mass_center
    • gyration_radius
    • rdf
    • translate
    • rotate
    • rotate_centered
    • rotate_about_axis
    • align_vectors
    • orient_principal_components
    • superimpose
    • superimpose_without_outliers
    • superimpose_homologs
    • superimpose_structural_homologs
    • AffineTransformation
    • filter_canonical_nucleotides
    • filter_nucleotides
    • filter_canonical_amino_acids
    • filter_amino_acids
    • filter_carbohydrates
    • filter_peptide_backbone
    • filter_phosphate_backbone
    • filter_linear_bond_continuity
    • filter_polymer
    • filter_solvent
    • filter_monoatomic_ions
    • filter_intersection
    • filter_first_altloc
    • filter_highest_occupancy_altloc
    • check_atom_id_continuity
    • check_res_id_continuity
    • check_backbone_continuity
    • check_duplicate_atoms
    • check_linear_continuity
    • create_continuous_res_ids
    • infer_elements
    • create_atom_names
    • get_residue_starts
    • get_residues
    • apply_residue_wise
    • spread_residue_wise
    • get_residue_masks
    • get_residue_starts_for
    • get_residue_positions
    • get_residue_count
    • residue_iter
    • get_chain_starts
    • apply_chain_wise
    • spread_chain_wise
    • get_chain_masks
    • get_chain_starts_for
    • get_chain_positions
    • get_chains
    • get_chain_count
    • chain_iter
    • get_molecule_indices
    • get_molecule_masks
    • molecule_iter
    • average
    • rmsd
    • rmspd
    • rmsf
    • lddt
    • tm_score
    • sasa
    • hbond
    • hbond_frequency
    • partial_charges
    • density
    • dihedral_backbone
    • annotate_sse
    • Edge
    • GlycosidicBond
    • map_nucleotide
    • base_pairs
    • base_stacking
    • pseudoknots
    • base_pairs_edge
    • base_pairs_glycosidic_bond
    • dot_bracket
    • dot_bracket_from_structure
    • base_pairs_from_dot_bracket
    • find_aromatic_rings
    • find_stacking_interactions
    • PiStacking
    • BadStructureError
    • CellList
    • IncompleteStructureWarning
    • UnexpectedStructureWarning
    • coord
    • to_sequence
  • biotite.structure.alphabet
    • I3DSequence
    • ProteinBlocksSequence
    • to_3di
    • to_protein_blocks
  • biotite.structure.graphics
    • plot_atoms
    • plot_ball_and_stick_model
    • plot_nucleotide_secondary_structure
  • biotite.structure.info
    • residue
    • bond_type
    • bonds_in_residue
    • amino_acid_names
    • nucleotide_names
    • carbohydrate_names
    • vdw_radius_single
    • vdw_radius_protor
    • get_ccd
    • get_from_ccd
    • set_ccd_path
    • all_residues
    • full_name
    • link_type
    • mass
    • one_letter_code
    • standardize_order
  • biotite.structure.io
    • TrajectoryFile
    • load_structure
    • save_structure
  • biotite.structure.io.dcd
    • DCDFile
  • biotite.structure.io.gro
    • GROFile
  • biotite.structure.io.mol
    • Header
    • MOLFile
    • Metadata
    • SDFile
    • SDRecord
    • get_structure
    • set_structure
  • biotite.structure.io.netcdf
    • NetCDFFile
  • biotite.structure.io.pdb
    • PDBFile
    • get_assembly
    • get_model_count
    • get_structure
    • get_symmetry_mates
    • list_assemblies
    • set_structure
  • biotite.structure.io.pdbqt
    • PDBQTFile
    • get_structure
    • set_structure
  • biotite.structure.io.pdbx
    • get_sequence
    • get_model_count
    • get_structure
    • set_structure
    • get_component
    • set_component
    • list_assemblies
    • get_assembly
    • get_sse
    • CIFFile
    • CIFBlock
    • CIFCategory
    • CIFColumn
    • CIFData
    • BinaryCIFFile
    • BinaryCIFBlock
    • BinaryCIFCategory
    • BinaryCIFColumn
    • BinaryCIFData
    • ByteArrayEncoding
    • FixedPointEncoding
    • IntervalQuantizationEncoding
    • RunLengthEncoding
    • DeltaEncoding
    • IntegerPackingEncoding
    • StringArrayEncoding
    • TypeCode
    • MaskValue
    • compress
  • biotite.structure.io.trr
    • TRRFile
  • biotite.structure.io.xtc
    • XTCFile
  • API Reference
  • biotite.database.rcsb

search#

biotite.database.rcsb.search(query, return_type='entry', range=None, sort_by=None, group_by=None, return_groups=False, content_types=('experimental',))[source]#

Get all PDB IDs that meet the given query requirements, via the RCSB search API.

This function requires an internet connection.

Parameters:
queryQuery

The search query.

return_type{‘entry’, ‘assembly’, ‘polymer_entity’, ‘non_polymer_entity’, ‘polymer_instance’}, optional

The type of the returned identifiers:

  • 'entry': Only the PDB ID is returned (e.g. 'XXXX'). These can be used directly a input to fetch().

  • 'assembly': The PDB ID appended with assembly ID is returned (e.g. 'XXXX-1').

  • 'polymer_entity': The PDB ID appended with entity ID of polymers is returned (e.g. 'XXXX_1').

  • 'non_polymer_entity': The PDB ID appended with entity ID of non-polymeric entities is returned (e.g. 'XXXX_1').

  • 'polymer_instance': The PDB ID appended with chain ID (more exactly 'asym_id') is returned (e.g. 'XXXX.A').

rangetuple(int, int), optional

If this parameter is specified, only PDB IDs in this range are selected from all matching PDB IDs and returned (pagination). The range is zero-indexed and the stop value is exclusive.

sort_bystr or Sorting, optional

If specified, the returned PDB IDs are sorted by the values of the given field name. A complete list of the available fields is documented at https://search.rcsb.org/structure-search-attributes.html. and https://search.rcsb.org/chemical-search-attributes.html. If a string is given sorting is performed in descending order. To choose the order, a Sorting object needs to be provided.

group_byGrouping

If this parameter is set, the PDB IDs that meet the query requirements, are grouped according to the given criterion.

return_groupsboolean, optional

Only has effect, if group_by is set. By default the representative with the highest rank in each group is returned. The rank is determined by the sort_by parameter of Grouping provided in group_by. If set to true, groups containing all structures belonging to the group are returned instead.

content_typesiterable of {“experimental”, “computational”}, optional

Specify whether experimental and computational structures should be included. At least one of them needs to be specified. By default only experimental structures are included. Note, that identifiers for computational structures cannot be downloaded via biotite.database.rcsb.fetch() as they point to AlphaFold DB and ModelArchive.

Returns:
idslist of str or dict (str -> list of str)

If return_groups is false (default case), a list of strings containing all PDB IDs that meet the query requirements is returned. If return_groups is set to true a dictionary of groups is returned. This dictionary maps group identifiers to a list of all PDB IDs belonging to this group.

Notes

If group_by is set, the number of results may be lower than in an ungrouped query, as grouping is not applicable to all structures. For example a DNA structure has no associated Uniprot accession and hence is omitted by UniprotGrouping.

Also note that sort_by does not affect the order within a group. This order is determined by the sort_by parameter of the Grouping.

Examples

>>> query = FieldQuery("reflns.d_resolution_high", less_or_equal=0.6)
>>> print(sorted(search(query)))
['1EJG', '1I0T', '3NIR', '3P4J', '4JLJ', '5D8V', '5NW3', '7ATG', '7R0H']
>>> print(search(query, sort_by="rcsb_accession_info.initial_release_date"))
['7R0H', '7ATG', '5NW3', '5D8V', '4JLJ', '3P4J', '3NIR', '1I0T', '1EJG']
>>> print(search(
...     query, range=(1,4), sort_by="rcsb_accession_info.initial_release_date"
... ))
['7ATG', '5NW3', '5D8V']
>>> print(sorted(search(query, return_type="polymer_instance")))
['1EJG.A', '1I0T.A', '1I0T.B', '3NIR.A', '3P4J.A', '3P4J.B', '4JLJ.A', '4JLJ.B', '5D8V.A', '5NW3.A', '7ATG.A', '7ATG.B', '7R0H.A']
>>> print(search(
...     query, return_type="polymer_entity", return_groups=True,
...     group_by=UniprotGrouping(sort_by="rcsb_accession_info.initial_release_date"),
... ))
{'P24297': ['5NW3_1'], 'P27707': ['4JLJ_1'], 'P80176': ['5D8V_1'], 'O29777': ['7R0H_1'], 'P01542': ['3NIR_1', '1EJG_1']}

Gallery#

LDDT for predicted structure evaluation

LDDT for predicted structure evaluation

Searching for structural homologs in a protein structure database

Searching for structural homologs in a protein structure database
On this page
  • search()
  • Gallery
Edit on GitHub
Show Source

© Copyright The Biotite contributors.

Created using Sphinx 7.3.7.

Built with the PyData Sphinx Theme 0.15.4.