<tool id="plasmidtrakr" name="PlasmidTrakr" version="0.1.9">
    <description>Predicts isolate source from plasmid profiles using a trained machine learning model</description>

    <requirements>
        <requirement type="package" version="1.5.3">pandas</requirement>
        <requirement type="package" version="1.2.2">scikit-learn</requirement>
    </requirements>

    <command detect_errors="exit_code"><![CDATA[
        python $__tool_directory__/predict_source.py
            -i $mash_input
            -b $model_selection.fields.path
            -t $threshold
            -o $prediction_output
    ]]></command>

    <inputs>
        <param name="mash_input" type="data" format="tabular" label="Mash Screen Output" help="The tabular output file from the Galaxy 'mash screen' tool."/>

        <param name="model_selection" type="select" label="Select Prediction Model" help="Choose which trained model to use for prediction.">
            <options from_data_table="plasmidtrakr_models">
                <validator type="no_options" message="No prediction models are configured. Please contact your Galaxy administrator." />
            </options>
        </param>

        <param name="threshold" type="float" value="0.95" label="Mash Identity Threshold" help="Filter plasmid hits below this identity. Must match the threshold used for model training."/>
    </inputs>

    <outputs>
        <data name="prediction_output" format="tabular" label="Prediction for ${on_string}" />
    </outputs>

    <help><![CDATA[
**What it does**

This tool takes the list of plasmid hits from the Galaxy **mash screen** tool and uses a pre-trained **machine learning model** to predict the original source of the isolate.

**Workflow for Genome Assemblies**

1.  Go to the **mash screen** tool in Galaxy.
2.  In the **"Single or Paired-end reads"** dropdown, select **"Single"**.
3.  For the **"Select fastq dataset"** input, provide your **genome assembly FASTA file**.
4.  Run the `mash screen` job against the appropriate plasmid database.
5.  Use the tabular output from that job as the input for **this prediction tool**.
6.  Select the desired prediction model from the dropdown menu.
7.  Execute to get your prediction.

**Output**

A tabular file containing the isolate ID, the predicted source, and a confidence score.
    ]]></help>

    <citations>
        <citation type="bibtex">
            @misc{strain_2026_plasmidtrakr,
                author = {Strain, Errol},
                title = {PlasmidTrakr: A tool for predicting isolate source from plasmid profiles},
                year = {2026},
                publisher = {GitHub},
                journal = {GitHub repository},
                howpublished = {\url{https://github.com/estrain/plasmidtrakr}}
            }
        </citation>
    </citations>
</tool>
