diff cfsan_cronology.xml @ 5:6e5ceea33843

"planemo upload"
author kkonganti
date Mon, 27 Nov 2023 14:50:43 -0500
parents 7ac696717239
children a3c1cba6f773
line wrap: on
line diff
--- a/cfsan_cronology.xml	Mon Nov 27 14:37:10 2023 -0500
+++ b/cfsan_cronology.xml	Mon Nov 27 14:50:43 2023 -0500
@@ -75,7 +75,7 @@
         <param name="refgenome" optional="true" value="GCF_003516125" type="text"
                 label="NCBI reference genome accession"
                 help="Is the reference genome other than Cronobacter sakazakii? Reference genome FASTA is used as a model for gene prediction. DO NOT ENTER THE DECIMAL PART (Ex: GCF_003516125.1)." />
-        <param name="tuspy_n" optional="true" value="10" type="integer" label="Enter the number of top unique hits to retain after initial MASH screen step"
+        <param name="tuspy_n" optional="true" value="2" type="integer" label="Enter the number of top unique hits to retain after initial MASH screen step"
             help="These hits will be used to build a genome distance based tree for your experiment run. Default value of 2 is suitable for almost all scenarios."/>
         <param name="fq_filename_delim" type="text" value="_" label="File name delimitor by which samples are grouped together (--fq_filename_delim)" 
             help="This is the delimitor by which samples are grouped together to display in the final MultiQC report. For example, if your input data sets are mango_replicate1.fastq.gz, mango_replicate2.fastq.gz, orange_replicate1_maryland.fastq.gz, orange_replicate2_maryland.fastq.gz, then to create 2 samples mango and orange, the value for --fq_filename_delim would be _ (underscore) and the value for --fq_filename_delim_idx would be 1, since you want to group by the first word (i.e. mango or orange) after splitting the filename based on _ (underscore)."/>
@@ -115,11 +115,10 @@
 
 **Purpose**
 
-cronology is an automated workflow to assign Salmonella serotype based on NCBI Pathogen Detection Project for Salmonella. 
-It uses MASH to reduce the search space followed by additional genome filtering with sourmash. It then performs genome based 
-alignment with kma followed by count generation using salmon. This workflow can be used to analyze shotgun metagenomics 
-datasets, quasi-metagenomic datasets (enriched for Salmonella) and target enriched datasets (enriched with molecular baits specific for Salmonella) 
-and is especially useful in a case where a sample is of multi-serovar mixture.
+cronology is an automated workflow for Cronobacter isolate assembly,
+sequencing typing and traceback. The workflow version 0.1.0 takes in single-end
+or paired-end Illumina short read data, performs QC using fastp, assembly and polish using shovill and polypolish
+and whole genome distance based clustering using mashtree based on NCBI Pathogen Detection DB for Cronobacter.
 
 It is written in Nextflow and is part of the modular data analysis pipelines (CFSAN PIPELINES or CPIPES for short) at CFSAN.
 
@@ -130,8 +129,7 @@
 
 **Testing and Validation**
 
-The CPIPES - cronology Nextflow pipeline has been wrapped to make it work in Galaxy. It takes in either paired or unpaired short reads list as an input 
-and performs read quality control followed by de novo assembly, gene prediction and annotation, sequence typing and whole genome distance based clustering.
+The CPIPES - cronology Nextflow pipeline has been wrapped to make it work in Galaxy. 
 All the testing has been done on the command line on the CFSAN Raven2 HPC Cluster.