comparison cfsan_cronology.xml @ 5:6e5ceea33843

"planemo upload"
author kkonganti
date Mon, 27 Nov 2023 14:50:43 -0500
parents 7ac696717239
children a3c1cba6f773
comparison
equal deleted inserted replaced
4:7ac696717239 5:6e5ceea33843
73 </when> 73 </when>
74 </conditional> 74 </conditional>
75 <param name="refgenome" optional="true" value="GCF_003516125" type="text" 75 <param name="refgenome" optional="true" value="GCF_003516125" type="text"
76 label="NCBI reference genome accession" 76 label="NCBI reference genome accession"
77 help="Is the reference genome other than Cronobacter sakazakii? Reference genome FASTA is used as a model for gene prediction. DO NOT ENTER THE DECIMAL PART (Ex: GCF_003516125.1)." /> 77 help="Is the reference genome other than Cronobacter sakazakii? Reference genome FASTA is used as a model for gene prediction. DO NOT ENTER THE DECIMAL PART (Ex: GCF_003516125.1)." />
78 <param name="tuspy_n" optional="true" value="10" type="integer" label="Enter the number of top unique hits to retain after initial MASH screen step" 78 <param name="tuspy_n" optional="true" value="2" type="integer" label="Enter the number of top unique hits to retain after initial MASH screen step"
79 help="These hits will be used to build a genome distance based tree for your experiment run. Default value of 2 is suitable for almost all scenarios."/> 79 help="These hits will be used to build a genome distance based tree for your experiment run. Default value of 2 is suitable for almost all scenarios."/>
80 <param name="fq_filename_delim" type="text" value="_" label="File name delimitor by which samples are grouped together (--fq_filename_delim)" 80 <param name="fq_filename_delim" type="text" value="_" label="File name delimitor by which samples are grouped together (--fq_filename_delim)"
81 help="This is the delimitor by which samples are grouped together to display in the final MultiQC report. For example, if your input data sets are mango_replicate1.fastq.gz, mango_replicate2.fastq.gz, orange_replicate1_maryland.fastq.gz, orange_replicate2_maryland.fastq.gz, then to create 2 samples mango and orange, the value for --fq_filename_delim would be _ (underscore) and the value for --fq_filename_delim_idx would be 1, since you want to group by the first word (i.e. mango or orange) after splitting the filename based on _ (underscore)."/> 81 help="This is the delimitor by which samples are grouped together to display in the final MultiQC report. For example, if your input data sets are mango_replicate1.fastq.gz, mango_replicate2.fastq.gz, orange_replicate1_maryland.fastq.gz, orange_replicate2_maryland.fastq.gz, then to create 2 samples mango and orange, the value for --fq_filename_delim would be _ (underscore) and the value for --fq_filename_delim_idx would be 1, since you want to group by the first word (i.e. mango or orange) after splitting the filename based on _ (underscore)."/>
82 <param name="fq_filename_delim_idx" type="integer" value="1" label="File name delimitor index (--fq_filename_delim_idx)" /> 82 <param name="fq_filename_delim_idx" type="integer" value="1" label="File name delimitor index (--fq_filename_delim_idx)" />
83 </inputs> 83 </inputs>
113 113
114 .. class:: infomark 114 .. class:: infomark
115 115
116 **Purpose** 116 **Purpose**
117 117
118 cronology is an automated workflow to assign Salmonella serotype based on NCBI Pathogen Detection Project for Salmonella. 118 cronology is an automated workflow for Cronobacter isolate assembly,
119 It uses MASH to reduce the search space followed by additional genome filtering with sourmash. It then performs genome based 119 sequencing typing and traceback. The workflow version 0.1.0 takes in single-end
120 alignment with kma followed by count generation using salmon. This workflow can be used to analyze shotgun metagenomics 120 or paired-end Illumina short read data, performs QC using fastp, assembly and polish using shovill and polypolish
121 datasets, quasi-metagenomic datasets (enriched for Salmonella) and target enriched datasets (enriched with molecular baits specific for Salmonella) 121 and whole genome distance based clustering using mashtree based on NCBI Pathogen Detection DB for Cronobacter.
122 and is especially useful in a case where a sample is of multi-serovar mixture.
123 122
124 It is written in Nextflow and is part of the modular data analysis pipelines (CFSAN PIPELINES or CPIPES for short) at CFSAN. 123 It is written in Nextflow and is part of the modular data analysis pipelines (CFSAN PIPELINES or CPIPES for short) at CFSAN.
125 124
126 125
127 ---- 126 ----
128 127
129 .. class:: infomark 128 .. class:: infomark
130 129
131 **Testing and Validation** 130 **Testing and Validation**
132 131
133 The CPIPES - cronology Nextflow pipeline has been wrapped to make it work in Galaxy. It takes in either paired or unpaired short reads list as an input 132 The CPIPES - cronology Nextflow pipeline has been wrapped to make it work in Galaxy.
134 and performs read quality control followed by de novo assembly, gene prediction and annotation, sequence typing and whole genome distance based clustering.
135 All the testing has been done on the command line on the CFSAN Raven2 HPC Cluster. 133 All the testing has been done on the command line on the CFSAN Raven2 HPC Cluster.
136 134
137 135
138 ---- 136 ----
139 137