biotite.application.muscle.MuscleApp

class biotite.application.muscle.MuscleApp(sequences, bin_path='muscle', matrix=None)[source]

Bases: biotite.application.msaapp.MSAApp

Perform a multiple sequence alignment using MUSCLE.

Parameters
sequenceslist of Sequence

The sequences to be aligned.

bin_pathstr, optional

Path of the MUSCLE binary.

matrixSubstitutionMatrix, optional

A custom substitution matrix.

Examples

>>> seq1 = ProteinSequence("BIQTITE")
>>> seq2 = ProteinSequence("TITANITE")
>>> seq3 = ProteinSequence("BISMITE")
>>> seq4 = ProteinSequence("IQLITE")
>>> app = MuscleApp([seq1, seq2, seq3, seq4])
>>> app.start()
>>> app.join()
>>> alignment = app.get_alignment()
>>> print(alignment)
BIQT-ITE
TITANITE
BISM-ITE
-IQL-ITE
run(self)[source]

Commence the application run. Called in start().

PROTECTED: Override when inheriting.

evaluate(self)[source]

Evaluate application results. Called in join().

PROTECTED: Override when inheriting.

set_gap_penalty(self, gap_penalty)[source]

Set the gap penalty for the alignment.

Parameters
gap_penaltyfloat or (tuple, dtype=int)

If a float is provided, the value will be interpreted as general gap penalty. If a tuple is provided, an affine gap penalty is used. The first value in the tuple is the gap opening penalty, the second value is the gap extension penalty. The values need to be negative.

get_guide_tree(self, iteration='identity')[source]

Get the guide tree created for the progressive alignment.

Parameters
iteration{‘kmer’, ‘identity’}

If ‘kmer’, the first iteration tree is returned. This tree uses the sequences common k-mers as distance measure. If ‘identity’ the second iteration tree is returned. This tree uses distances based on the pairwise sequence identity after the first progressive alignment iteration.

Returns
treeTree

The guide tree.

static supports_nucleotide()[source]

Check whether this class supports nucleotide sequences for alignment.

Returns
supportbool

True, if the class has support, false otherwise.

PROTECTED: Override when inheriting.
static supports_protein()[source]

Check whether this class supports nucleotide sequences for alignment.

Returns
supportbool

True, if the class has support, false otherwise.

PROTECTED: Override when inheriting.
static supports_custom_nucleotide_matrix()[source]

Check whether this class supports custom substitution matrices for protein sequence alignment.

Returns
supportbool

True, if the class has support, false otherwise.

PROTECTED: Override when inheriting.
static supports_custom_protein_matrix()[source]

Check whether this class supports custom substitution matrices for nucleotide sequence alignment.

Returns
supportbool

True, if the class has support, false otherwise.

PROTECTED: Override when inheriting.
classmethod align(sequences, bin_path=None, matrix=None, gap_penalty=None)[source]

Perform a multiple sequence alignment.

This is a convenience function, that wraps the MSAApp execution.

Parameters
sequencesiterable object of Sequence

The sequences to be aligned

bin_pathstr, optional

Path of the MSA software binary. By default, the default path will be used.

matrixSubstitutionMatrix, optional

A custom substitution matrix.

gap_penaltyfloat or (tuple, dtype=int), optional

If a float is provided, the value will be interpreted as general gap penalty. If a tuple is provided, an affine gap penalty is used. The first value in the tuple is the gap opening penalty, the second value is the gap extension penalty. The values need to be negative.

Returns
alignmentAlignment

The global multiple sequence alignment.

add_additional_options(self, options)

Add additional options for the command line program. These options are put before the arguments automatically determined by the respective LocalApp subclass.

This method is focused on advanced users, who have knowledge on the available options of the command line program and the options already used by the LocalApp subclasses. Ignoring the already used options may result in conflicting CLI arguments and potential unexpected results. It is recommended to use this method only, when the respective LocalApp subclass does not provide a method to set the desired option.

Parameters
optionslist of str

A list of strings representing the command line options.

Notes

In order to see which options the command line execution used, try the get_command() method.

Examples

>>> seq1 = ProteinSequence("BIQTITE")
>>> seq2 = ProteinSequence("TITANITE")
>>> seq3 = ProteinSequence("BISMITE")
>>> seq4 = ProteinSequence("IQLITE")
>>> # Run application without additional arguments
>>> app = ClustalOmegaApp([seq1, seq2, seq3, seq4])
>>> app.start()
>>> app.join()
>>> print(app.get_command())
clustalo --in ...fa --out ...fa --output-order=tree-order --seqtype Protein --guidetree-out ...tree
>>> # Run application with additional argument
>>> app = ClustalOmegaApp([seq1, seq2, seq3, seq4])
>>> app.add_additional_options(["--full"])
>>> app.start()
>>> app.join()
>>> print(app.get_command())
clustalo --full --in ...fa --out ...fa --output-order=tree-order --seqtype Protein --guidetree-out ...tree
cancel(self)

Cancel the application when in RUNNING or FINISHED state.

clean_up(self)

Do clean up work after the application terminates.

PROTECTED: Optionally override when inheriting.

get_alignment(self)

Get the resulting multiple sequence alignment.

Returns
alignmentAlignment

The global multiple sequence alignment.

get_alignment_order(self)

Get the order of the resulting multiple sequence alignment.

Usually the order of sequences in the output file is different from the input file, e.g. the sequences are ordered according to the guide tree. When using align() this order is rearranged so that its is the same as the input order. This method returns the original order of the sequences that can be used to restore the MSA software intended order.

Returns
orderndarray, dtype=int

The sequence order intended by the MSA software.

Examples

Align sequences and restore the original order:

app = ClustalOmegaApp(sequences) app.start() app.join() alignment = app.get_alignment() order = app.get_alignment_order() alignment = alignment[:, order]

get_app_state(self)

Get the current app state.

Returns
app_stateAppState

The current app state.

get_command(self)

Get the executed command.

Cannot be called until the application has been started.

Returns
commandstr

The executed command.

Examples

>>> seq1 = ProteinSequence("BIQTITE")
>>> seq2 = ProteinSequence("TITANITE")
>>> seq3 = ProteinSequence("BISMITE")
>>> seq4 = ProteinSequence("IQLITE")
>>> app = ClustalOmegaApp([seq1, seq2, seq3, seq4])
>>> app.start()
>>> print(app.get_command())
clustalo --in ...fa --out ...fa --output-order=tree-order --seqtype Protein --guidetree-out ...tree
get_exit_code(self)

Get the exit code of the process.

PROTECTED: Do not call from outside.

Returns
codeint

The exit code.

get_input_file_path(self)

Get input file path (FASTA format).

PROTECTED: Do not call from outside.

Returns
pathstr

Path of input file.

get_matrix_file_path(self)

Get file path for custom substitution matrix.

PROTECTED: Do not call from outside.

Returns
pathstr or None

Path of substitution matrix. None if no matrix was given.

get_output_file_path(self)

Get output file path (FASTA format).

PROTECTED: Do not call from outside.

Returns
pathstr

Path of output file.

get_process(self)

Get the Popen instance.

PROTECTED: Do not call from outside.

Returns
processPopen

The Popen instance

get_seqtype(self)

Get the type of aligned sequences.

When a custom sequence type (neither nucleotide nor protein) is mapped onto a protein sequence, the return value is also 'protein'.

PROTECTED: Do not call from outside.

Returns
seqtype{‘nucleotide’, ‘protein’}

Type of sequences to be aligned.

get_stderr(self)

Get the STDERR pipe content of the process.

PROTECTED: Do not call from outside.

Returns
stdoutstr

The standard error.

get_stdout(self)

Get the STDOUT pipe content of the process.

PROTECTED: Do not call from outside.

Returns
stdoutstr

The standard outpout.

is_finished(self)

Check if the application has finished.

PROTECTED: Override when inheriting.

Returns
finishedbool

True of the application has finished, false otherwise

join(self, timeout=None)

Conclude the application run and set its state to JOINED. This can only be done from the RUNNING or FINISHED state.

If the application is FINISHED the joining process happens immediately, if otherwise the application is RUNNING, this method waits until the application is FINISHED.

Parameters
timeoutfloat, optional

If this parameter is specified, the Application only waits for finishing until this value (in seconds) runs out. After this time is exceeded a TimeoutError is raised and the application is cancelled.

Raises
TimeoutError

If the joining process exceeds the timeout value.

set_arguments(self, arguments)

Set command line arguments for the application run.

PROTECTED: Do not call from outside.

Parameters
argumentslist of str

A list of strings representing the command line options.

set_exec_dir(self, exec_dir)

Set the directory where the application should be executed. If not set, it will be executed in the working directory at the time the application was created.

PROTECTED: Do not call from outside.

Parameters
exec_dirstr

The execution directory.

start(self)

Start the application run and set its state to RUNNING. This can only be done from the CREATED state.

wait_interval(self)

The time interval of is_finished() calls in the joining process.

PROTECTED: Override when inheriting.

Returns
intervalfloat

Time (in seconds) between calls of is_finished() in join()