comparison plasmidtrakr.xml @ 14:9a84b8511fc2 draft

planemo upload commit 6681312523efb1b57807ea244c63f9cbb02c574e
author galaxytrakr
date Thu, 30 Apr 2026 13:04:42 +0000
parents 6eaad34862cb
children 58006290e654
comparison
equal deleted inserted replaced
13:6eaad34862cb 14:9a84b8511fc2
1 <tool id="plasmidtrakr" name="PlasmidTrakr" version="0.1.12"> 1 <tool id="plasmidtrakr" name="PlasmidTrakr" version="0.2.0">
2 <description>Predicts isolate source from plasmid profiles using a trained machine learning model</description> 2 <description>Screens assemblies against a Mash database and predicts isolate source using a trained machine learning model</description>
3 3
4 <requirements> 4 <requirements>
5 <requirement type="package" version="2.3">mash</requirement>
5 <requirement type="package" version="2.3.3">pandas</requirement> 6 <requirement type="package" version="2.3.3">pandas</requirement>
6 <requirement type="package" version="1.6.1">scikit-learn</requirement> 7 <requirement type="package" version="1.6.1">scikit-learn</requirement>
7 </requirements> 8 </requirements>
8 9
9 <command detect_errors="exit_code"><![CDATA[ 10 <command detect_errors="exit_code"><![CDATA[
11 ## 1. Symlink the Mash database from the tool data table
12 ln -s '$mash_database.fields.path' queries.msh &&
13
14 ## 2. Run Mash Screen internally
15 mash screen
16 -w
17 -i $threshold
18 queries.msh
19 '$assembly_input'
20 > mash_results.tabular
21 &&
22
23 ## 3. Run PlasmidTrakr prediction
10 python $__tool_directory__/predict_source.py 24 python $__tool_directory__/predict_source.py
11 -i $mash_input 25 -i mash_results.tabular
12 -b $model_selection.fields.path 26 -b '$model_selection.fields.path'
13 -t $threshold 27 -t $threshold
14 -o $prediction_output 28 -o '$prediction_output'
15 ]]></command> 29 ]]></command>
16 30
17 <inputs> 31 <inputs>
18 <param name="mash_input" type="data" format="tabular" label="Mash Screen Output" help="The tabular output file from the Galaxy 'mash screen' tool."/> 32 <param name="assembly_input" type="data" format="fasta,fasta.gz,fastq,fastq.gz" label="Genome Assembly / Reads" help="The FASTA/FASTQ file containing the isolate sequence."/>
19 33
34 <param name="mash_database" type="select" label="Select Mash Database" help="Choose the pre-computed Mash sketch database to screen against.">
35 <options from_data_table="mash_sketches">
36 <validator type="no_options" message="No Mash databases are configured. Please contact your Galaxy administrator." />
37 </options>
38 </param>
39
20 <param name="model_selection" type="select" label="Select Prediction Model" help="Choose which trained model to use for prediction."> 40 <param name="model_selection" type="select" label="Select Prediction Model" help="Choose which trained model to use for prediction.">
21 <options from_data_table="plasmidtrakr_models"> 41 <options from_data_table="plasmidtrakr_models">
22 <validator type="no_options" message="No prediction models are configured. Please contact your Galaxy administrator." /> 42 <validator type="no_options" message="No prediction models are configured. Please contact your Galaxy administrator." />
23 </options> 43 </options>
24 </param> 44 </param>
25 45
26 <param name="threshold" type="float" value="0.95" label="Mash Identity Threshold" help="Filter plasmid hits below this identity. Must match the threshold used for model training."/> 46 <param name="threshold" type="float" value="0.95" min="0.0" max="1.0" label="Mash Identity Threshold" help="Filter plasmid hits below this identity. Must match the threshold used for model training."/>
27 </inputs> 47 </inputs>
28 48
29 <outputs> 49 <outputs>
30 <data name="prediction_output" format="tabular" label="Prediction for ${on_string}" /> 50 <data name="prediction_output" format="tabular" label="Prediction for ${on_string}" />
31 </outputs> 51 </outputs>
32 52
33 <help><![CDATA[ 53 <help><![CDATA[
34 **What it does** 54 **What it does**
35 55
36 This tool takes the list of plasmid hits from the Galaxy **mash screen** tool and uses a pre-trained **machine learning model** to predict the original source of the isolate. 56 This tool performs a complete workflow in a single step: it screens a genome assembly or read set against a built-in plasmid database using **mash screen**, and then feeds those plasmid hits into a pre-trained **machine learning model** to predict the original source of the isolate.
37 57
38 **Workflow for Genome Assemblies** 58 **Workflow**
39 59 1. Provide your **genome assembly (FASTA)** or raw reads.
40 1. Go to the **mash screen** tool in Galaxy. 60 2. Select your **Mash database** from the server's configured list.
41 2. In the **"Single or Paired-end reads"** dropdown, select **"Single"**. 61 3. Select the desired prediction model.
42 3. For the **"Select fastq dataset"** input, provide your **genome assembly FASTA file**. 62 4. Execute to screen and predict in one step.
43 4. Run the `mash screen` job against the appropriate plasmid database.
44 5. Use the tabular output from that job as the input for **this prediction tool**.
45 6. Select the desired prediction model from the dropdown menu.
46 7. Execute to get your prediction.
47 63
48 **Output** 64 **Output**
49
50 A tabular file containing the isolate ID, the predicted source, and a confidence score. 65 A tabular file containing the isolate ID, the predicted source, and a confidence score.
51 ]]></help> 66 ]]></help>
52 67
53 <citations> 68 <citations>
54 <citation type="bibtex"> 69 <citation type="bibtex">