Mercurial > repos > galaxytrakr > plasmidtrakr
comparison plasmidtrakr.xml @ 14:9a84b8511fc2 draft
planemo upload commit 6681312523efb1b57807ea244c63f9cbb02c574e
| author | galaxytrakr |
|---|---|
| date | Thu, 30 Apr 2026 13:04:42 +0000 |
| parents | 6eaad34862cb |
| children | 58006290e654 |
comparison
equal
deleted
inserted
replaced
| 13:6eaad34862cb | 14:9a84b8511fc2 |
|---|---|
| 1 <tool id="plasmidtrakr" name="PlasmidTrakr" version="0.1.12"> | 1 <tool id="plasmidtrakr" name="PlasmidTrakr" version="0.2.0"> |
| 2 <description>Predicts isolate source from plasmid profiles using a trained machine learning model</description> | 2 <description>Screens assemblies against a Mash database and predicts isolate source using a trained machine learning model</description> |
| 3 | 3 |
| 4 <requirements> | 4 <requirements> |
| 5 <requirement type="package" version="2.3">mash</requirement> | |
| 5 <requirement type="package" version="2.3.3">pandas</requirement> | 6 <requirement type="package" version="2.3.3">pandas</requirement> |
| 6 <requirement type="package" version="1.6.1">scikit-learn</requirement> | 7 <requirement type="package" version="1.6.1">scikit-learn</requirement> |
| 7 </requirements> | 8 </requirements> |
| 8 | 9 |
| 9 <command detect_errors="exit_code"><![CDATA[ | 10 <command detect_errors="exit_code"><![CDATA[ |
| 11 ## 1. Symlink the Mash database from the tool data table | |
| 12 ln -s '$mash_database.fields.path' queries.msh && | |
| 13 | |
| 14 ## 2. Run Mash Screen internally | |
| 15 mash screen | |
| 16 -w | |
| 17 -i $threshold | |
| 18 queries.msh | |
| 19 '$assembly_input' | |
| 20 > mash_results.tabular | |
| 21 && | |
| 22 | |
| 23 ## 3. Run PlasmidTrakr prediction | |
| 10 python $__tool_directory__/predict_source.py | 24 python $__tool_directory__/predict_source.py |
| 11 -i $mash_input | 25 -i mash_results.tabular |
| 12 -b $model_selection.fields.path | 26 -b '$model_selection.fields.path' |
| 13 -t $threshold | 27 -t $threshold |
| 14 -o $prediction_output | 28 -o '$prediction_output' |
| 15 ]]></command> | 29 ]]></command> |
| 16 | 30 |
| 17 <inputs> | 31 <inputs> |
| 18 <param name="mash_input" type="data" format="tabular" label="Mash Screen Output" help="The tabular output file from the Galaxy 'mash screen' tool."/> | 32 <param name="assembly_input" type="data" format="fasta,fasta.gz,fastq,fastq.gz" label="Genome Assembly / Reads" help="The FASTA/FASTQ file containing the isolate sequence."/> |
| 19 | 33 |
| 34 <param name="mash_database" type="select" label="Select Mash Database" help="Choose the pre-computed Mash sketch database to screen against."> | |
| 35 <options from_data_table="mash_sketches"> | |
| 36 <validator type="no_options" message="No Mash databases are configured. Please contact your Galaxy administrator." /> | |
| 37 </options> | |
| 38 </param> | |
| 39 | |
| 20 <param name="model_selection" type="select" label="Select Prediction Model" help="Choose which trained model to use for prediction."> | 40 <param name="model_selection" type="select" label="Select Prediction Model" help="Choose which trained model to use for prediction."> |
| 21 <options from_data_table="plasmidtrakr_models"> | 41 <options from_data_table="plasmidtrakr_models"> |
| 22 <validator type="no_options" message="No prediction models are configured. Please contact your Galaxy administrator." /> | 42 <validator type="no_options" message="No prediction models are configured. Please contact your Galaxy administrator." /> |
| 23 </options> | 43 </options> |
| 24 </param> | 44 </param> |
| 25 | 45 |
| 26 <param name="threshold" type="float" value="0.95" label="Mash Identity Threshold" help="Filter plasmid hits below this identity. Must match the threshold used for model training."/> | 46 <param name="threshold" type="float" value="0.95" min="0.0" max="1.0" label="Mash Identity Threshold" help="Filter plasmid hits below this identity. Must match the threshold used for model training."/> |
| 27 </inputs> | 47 </inputs> |
| 28 | 48 |
| 29 <outputs> | 49 <outputs> |
| 30 <data name="prediction_output" format="tabular" label="Prediction for ${on_string}" /> | 50 <data name="prediction_output" format="tabular" label="Prediction for ${on_string}" /> |
| 31 </outputs> | 51 </outputs> |
| 32 | 52 |
| 33 <help><![CDATA[ | 53 <help><![CDATA[ |
| 34 **What it does** | 54 **What it does** |
| 35 | 55 |
| 36 This tool takes the list of plasmid hits from the Galaxy **mash screen** tool and uses a pre-trained **machine learning model** to predict the original source of the isolate. | 56 This tool performs a complete workflow in a single step: it screens a genome assembly or read set against a built-in plasmid database using **mash screen**, and then feeds those plasmid hits into a pre-trained **machine learning model** to predict the original source of the isolate. |
| 37 | 57 |
| 38 **Workflow for Genome Assemblies** | 58 **Workflow** |
| 39 | 59 1. Provide your **genome assembly (FASTA)** or raw reads. |
| 40 1. Go to the **mash screen** tool in Galaxy. | 60 2. Select your **Mash database** from the server's configured list. |
| 41 2. In the **"Single or Paired-end reads"** dropdown, select **"Single"**. | 61 3. Select the desired prediction model. |
| 42 3. For the **"Select fastq dataset"** input, provide your **genome assembly FASTA file**. | 62 4. Execute to screen and predict in one step. |
| 43 4. Run the `mash screen` job against the appropriate plasmid database. | |
| 44 5. Use the tabular output from that job as the input for **this prediction tool**. | |
| 45 6. Select the desired prediction model from the dropdown menu. | |
| 46 7. Execute to get your prediction. | |
| 47 | 63 |
| 48 **Output** | 64 **Output** |
| 49 | |
| 50 A tabular file containing the isolate ID, the predicted source, and a confidence score. | 65 A tabular file containing the isolate ID, the predicted source, and a confidence score. |
| 51 ]]></help> | 66 ]]></help> |
| 52 | 67 |
| 53 <citations> | 68 <citations> |
| 54 <citation type="bibtex"> | 69 <citation type="bibtex"> |
