biotite.application.sra.FastqDumpApp¶
- class biotite.application.sra.FastqDumpApp(uid, output_path_prefix=None, bin_path='fasterq-dump', offset='Sanger')[source]¶
Bases:
biotite.application.localapp.LocalApp
Fetch sequencing data as FASTQ from the NCBI sequence read archive (SRA) using sra-tools.
- Parameters
- uidstr
A unique identifier (UID) of the file to be downloaded.
- output_path_prefixstr, optional
The prefix of the path to store the downloaded FASTQ file.
.fastq
is appended to this prefix if the run contains a single read per spot._1.fastq
,_2.fastq
, etc. is appended if it contains multiple reads per spot. By default, the files are created in a temporary directory and deleted after the files have been read.- bin_pathstr, optional
Path to the
fasterq-dump
binary.- offsetint or {‘Sanger’, ‘Solexa’, ‘Illumina-1.3’, ‘Illumina-1.5’, ‘Illumina-1.8’}, optional
This value is subtracted from the FASTQ ASCII code to obtain the quality score. Can either be directly the value, or a string that indicates the score format.
- add_additional_options(options)¶
Add additional options for the command line program. These options are put before the arguments automatically determined by the respective
LocalApp
subclass.This method is focused on advanced users, who have knowledge on the available options of the command line program and the options already used by the
LocalApp
subclasses. Ignoring the already used options may result in conflicting CLI arguments and potential unexpected results. It is recommended to use this method only, when the respectiveLocalApp
subclass does not provide a method to set the desired option.- Parameters
- optionslist of str
A list of strings representing the command line options.
Notes
In order to see which options the command line execution used, try the
get_command()
method.Examples
>>> seq1 = ProteinSequence("BIQTITE") >>> seq2 = ProteinSequence("TITANITE") >>> seq3 = ProteinSequence("BISMITE") >>> seq4 = ProteinSequence("IQLITE") >>> # Run application without additional arguments >>> app = ClustalOmegaApp([seq1, seq2, seq3, seq4]) >>> app.start() >>> app.join() >>> print(app.get_command()) clustalo --in ...fa --out ...fa --force --output-order=tree-order --seqtype Protein --guidetree-out ...tree >>> # Run application with additional argument >>> app = ClustalOmegaApp([seq1, seq2, seq3, seq4]) >>> app.add_additional_options(["--full"]) >>> app.start() >>> app.join() >>> print(app.get_command()) clustalo --full --in ...fa --out ...fa --force --output-order=tree-order --seqtype Protein --guidetree-out ...tree
- cancel()¶
Cancel the application when in RUNNING or FINISHED state.
- clean_up()¶
Do clean up work after the application terminates.
PROTECTED: Optionally override when inheriting.
- static fetch(uid, output_path_prefix=None, bin_path='fasterq-dump', offset='Sanger')¶
Get the sequences and score values belonging to the UID from the NCBI sequence read archive (SRA).
- Parameters
- uidstr
A unique identifier (UID) of the file to be downloaded.
- output_path_prefixstr, optional
The prefix of the path to store the downloaded FASTQ file.
.fastq
is appended to this prefix if the run contains a single read per spot._1.fastq
,_2.fastq
, etc. is appended if it contains multiple reads per spot. By default, the files are created in a temporary directory and deleted after the files have been read.- bin_pathstr, optional
Path to the
fasterq-dump
binary.- offsetint or {‘Sanger’, ‘Solexa’, ‘Illumina-1.3’, ‘Illumina-1.5’, ‘Illumina-1.8’}, optional
This value is subtracted from the FASTQ ASCII code to obtain the quality score. Can either be directly the value, or a string that indicates the score format.
- Returns
- sequences_and_scoreslist of dict (str -> (NucleotideSequence, ndarray))
This list contains the reads for each spot: The first item contains the first read for each spot, the second item contains the second read for each spot (if existing), etc. Each item in the list is a dictionary mapping identifiers to its corresponding sequence and score values.
- get_app_state()¶
Get the current app state.
- Returns
- app_stateAppState
The current app state.
- get_command()¶
Get the executed command.
Cannot be called until the application has been started.
- Returns
- commandstr
The executed command.
Examples
>>> seq1 = ProteinSequence("BIQTITE") >>> seq2 = ProteinSequence("TITANITE") >>> seq3 = ProteinSequence("BISMITE") >>> seq4 = ProteinSequence("IQLITE") >>> app = ClustalOmegaApp([seq1, seq2, seq3, seq4]) >>> app.start() >>> print(app.get_command()) clustalo --in ...fa --out ...fa --force --output-order=tree-order --seqtype Protein --guidetree-out ...tree
- get_exit_code()¶
Get the exit code of the process.
PROTECTED: Do not call from outside.
- Returns
- codeint
The exit code.
- get_fastq()¶
Get the FastqFile objects from the downloaded file(s).
- Returns
- fastq_fileslist of FastqFile
This list contains the reads for each spot: The first item contains the first read for each spot, the second item contains the second read for each spot (if existing), etc.
- get_file_paths()¶
Get the file paths to the downloaded FASTQ files.
- Returns
- pathslist of str
The file paths to the downloaded files.
- get_process()¶
Get the Popen instance.
PROTECTED: Do not call from outside.
- Returns
- processPopen
The Popen instance
- get_sequences()¶
Get the sequences and score values from the downloaded file(s).
- Returns
- sequences_and_scoreslist of dict (str -> (NucleotideSequence, ndarray))
This list contains the reads for each spot: The first item contains the first read for each spot, the second item contains the second read for each spot (if existing), etc. Each item in the list is a dictionary mapping identifiers to its corresponding sequence and score values.
- get_stderr()¶
Get the STDERR pipe content of the process.
PROTECTED: Do not call from outside.
- Returns
- stdoutstr
The standard error.
- get_stdout()¶
Get the STDOUT pipe content of the process.
PROTECTED: Do not call from outside.
- Returns
- stdoutstr
The standard outpout.
- is_finished()¶
Check if the application has finished.
PROTECTED: Override when inheriting.
- Returns
- finishedbool
True of the application has finished, false otherwise
- join(timeout=None)¶
Conclude the application run and set its state to JOINED. This can only be done from the RUNNING or FINISHED state.
If the application is FINISHED the joining process happens immediately, if otherwise the application is RUNNING, this method waits until the application is FINISHED.
- Parameters
- timeoutfloat, optional
If this parameter is specified, the
Application
only waits for finishing until this value (in seconds) runs out. After this time is exceeded aTimeoutError
is raised and the application is cancelled.
- Raises
- TimeoutError
If the joining process exceeds the timeout value.
- set_arguments(arguments)¶
Set command line arguments for the application run.
PROTECTED: Do not call from outside.
- Parameters
- argumentslist of str
A list of strings representing the command line options.
- set_exec_dir(exec_dir)¶
Set the directory where the application should be executed. If not set, it will be executed in the working directory at the time the application was created.
PROTECTED: Do not call from outside.
- Parameters
- exec_dirstr
The execution directory.
- start()¶
Start the application run and set its state to RUNNING. This can only be done from the CREATED state.
- wait_interval()¶
The time interval of
is_finished()
calls in the joining process.PROTECTED: Override when inheriting.
- Returns
- intervalfloat
Time (in seconds) between calls of
is_finished()
injoin()