Mercurial > repos > kkonganti > cfsan_centriflaken
comparison 0.4.0/readme/centriflaken_hy.md @ 101:ce6d9548fe89
"planemo upload"
author | kkonganti |
---|---|
date | Thu, 04 Aug 2022 10:45:55 -0400 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
100:9d9537c907bd | 101:ce6d9548fe89 |
---|---|
1 # CPIPES (CFSAN PIPELINES) | |
2 | |
3 ## The modular pipeline repository at CFSAN, FDA | |
4 | |
5 **CPIPES** (CFSAN PIPELINES) is a collection of modular pipelines based on **NEXTFLOW**, | |
6 mostly for bioinformatics data analysis at **CFSAN, FDA.** | |
7 | |
8 --- | |
9 | |
10 ### **centriflaken_hy** | |
11 | |
12 --- | |
13 `centriflaken_hy` is a variant of the original `centriflaken` pipeline but for Illumina short reads either single-end or paired-end. | |
14 | |
15 #### Workflow Usage | |
16 | |
17 ```bash | |
18 module load cpipes/0.4.0 | |
19 | |
20 cpipes --pipeline centriflaken_hy [options] | |
21 ``` | |
22 | |
23 Example: Run the default `centriflaken_hy` pipeline with taxa of interest as *E. coli*. | |
24 | |
25 ```bash | |
26 cd /hpc/scratch/$USER | |
27 mkdir nf-cpipes | |
28 cd nf-cpipes | |
29 cpipes --pipeline centriflaken_hy --input /path/to/illumina/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov' | |
30 ``` | |
31 | |
32 Example: Run the `centriflaken_hy` pipeline with taxa of interest as *Salmonella*. In this mode, `SerotypeFinder` tool will be replaced with `SeqSero2` tool. | |
33 | |
34 ```bash | |
35 cd /hpc/scratch/$USER | |
36 mkdir nf-cpipes | |
37 cd nf-cpipes | |
38 cpipes --pipeline centriflaken_hy --centrifuge_extract_bug 'Salmonella' --input /path/to/illumina/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov' | |
39 ``` | |
40 | |
41 #### `centriflaken_hy` Help | |
42 | |
43 ```text | |
44 [Kranti.Konganti@login2-slurm ]$ cpipes --pipeline centriflaken_hy --help | |
45 N E X T F L O W ~ version 21.12.1-edge | |
46 Launching `/home/Kranti.Konganti/apps/cpipes/cpipes` [soggy_curie] - revision: 72db279311 | |
47 ================================================================================ | |
48 (o) | |
49 ___ _ __ _ _ __ ___ ___ | |
50 / __|| '_ \ | || '_ \ / _ \/ __| | |
51 | (__ | |_) || || |_) || __/\__ \ | |
52 \___|| .__/ |_|| .__/ \___||___/ | |
53 | | | | | |
54 |_| |_| | |
55 -------------------------------------------------------------------------------- | |
56 A collection of modular pipelines at CFSAN, FDA. | |
57 -------------------------------------------------------------------------------- | |
58 Name : CPIPES | |
59 Author : Kranti.Konganti@fda.hhs.gov | |
60 Version : 0.4.0 | |
61 Center : CFSAN, FDA. | |
62 ================================================================================ | |
63 | |
64 Workflow : centriflaken_hy | |
65 | |
66 Author : Kranti.Konganti@fda.hhs.gov | |
67 | |
68 Version : 0.4.0 | |
69 | |
70 | |
71 Usage : cpipes --pipeline centriflaken_hy [options] | |
72 | |
73 | |
74 Required : | |
75 | |
76 --input : Absolute path to directory containing FASTQ | |
77 files. The directory should contain only | |
78 FASTQ files as all the files within the | |
79 mentioned directory will be read. Ex: -- | |
80 input /path/to/fastq_pass | |
81 | |
82 --output : Absolute path to directory where all the | |
83 pipeline outputs should be stored. Ex: -- | |
84 output /path/to/output | |
85 | |
86 Other options : | |
87 | |
88 --metadata : Absolute path to metadata CSV file | |
89 containing five mandatory columns: sample, | |
90 fq1,fq2,strandedness,single_end. The fq1 | |
91 and fq2 columns contain absolute paths to | |
92 the FASTQ files. This option can be used in | |
93 place of --input option. This is rare. Ex: -- | |
94 metadata samplesheet.csv | |
95 | |
96 --fq_suffix : The suffix of FASTQ files (Unpaired reads | |
97 or R1 reads or Long reads) if an input | |
98 directory is mentioned via --input option. | |
99 Default: _R1_001.fastq.gz | |
100 | |
101 --fq2_suffix : The suffix of FASTQ files (Paired-end reads | |
102 or R2 reads) if an input directory is | |
103 mentioned via --input option. Default: | |
104 _R2_001.fastq.gz | |
105 | |
106 --fq_filter_by_len : Remove FASTQ reads that are less than this | |
107 many bases. Default: 75 | |
108 | |
109 --fq_strandedness : The strandedness of the sequencing run. | |
110 This is mostly needed if your sequencing | |
111 run is RNA-SEQ. For most of the other runs, | |
112 it is probably safe to use unstranded for | |
113 the option. Default: unstranded | |
114 | |
115 --fq_single_end : SINGLE-END information will be auto- | |
116 detected but this option forces PAIRED-END | |
117 FASTQ files to be treated as SINGLE-END so | |
118 only read 1 information is included in auto- | |
119 generated samplesheet. Default: false | |
120 | |
121 --fq_filename_delim : Delimiter by which the file name is split | |
122 to obtain sample name. Default: _ | |
123 | |
124 --fq_filename_delim_idx : After splitting FASTQ file name by using | |
125 the --fq_filename_delim option, all | |
126 elements before this index (1-based) will | |
127 be joined to create final sample name. | |
128 Default: 1 | |
129 | |
130 --seqkit_rmdup_run : Remove duplicate sequences using seqkit | |
131 rmdup. Default: false | |
132 | |
133 --seqkit_rmdup_n : Match and remove duplicate sequences by | |
134 full name instead of just ID. Defaut: false | |
135 | |
136 --seqkit_rmdup_s : Match and remove duplicate sequences by | |
137 sequence content. Defaut: true | |
138 | |
139 --seqkit_rmdup_d : Save the duplicated sequences to a file. | |
140 Defaut: false | |
141 | |
142 --seqkit_rmdup_D : Save the number and list of duplicated | |
143 sequences to a file. Defaut: false | |
144 | |
145 --seqkit_rmdup_i : Ignore case while using seqkit rmdup. | |
146 Defaut: false | |
147 | |
148 --seqkit_rmdup_P : Only consider positive strand (i.e. 5') | |
149 when comparing by sequence content. Defaut: | |
150 false | |
151 | |
152 --kraken2_db : Absolute path to kraken database. Default: / | |
153 hpc/db/kraken2/standard-210914 | |
154 | |
155 --kraken2_confidence : Confidence score threshold which must be | |
156 between 0 and 1. Default: 0.0 | |
157 | |
158 --kraken2_quick : Quick operation (use first hit or hits). | |
159 Default: false | |
160 | |
161 --kraken2_use_mpa_style : Report output like Kraken 1's kraken-mpa- | |
162 report. Default: false | |
163 | |
164 --kraken2_minimum_base_quality : Minimum base quality used in classification | |
165 which is only effective with FASTQ input. | |
166 Default: 0 | |
167 | |
168 --kraken2_report_zero_counts : Report counts for ALL taxa, even if counts | |
169 are zero. Default: false | |
170 | |
171 --kraken2_report_minmizer_data : Report minimizer and distinct minimizer | |
172 count information in addition to normal | |
173 Kraken report. Default: false | |
174 | |
175 --kraken2_use_names : Print scientific names instead of just | |
176 taxids. Default: true | |
177 | |
178 --kraken2_extract_bug : Extract the reads or contigs beloging to | |
179 this bug. Default: Escherichia coli | |
180 | |
181 --centrifuge_x : Absolute path to centrifuge database. | |
182 Default: /hpc/db/centrifuge/2022-04-12/ab | |
183 | |
184 --centrifuge_save_unaligned : Save SINGLE-END reads that did not align. | |
185 For PAIRED-END reads, save read pairs that | |
186 did not align concordantly. Default: false | |
187 | |
188 --centrifuge_save_aligned : Save SINGLE-END reads that aligned. For | |
189 PAIRED-END reads, save read pairs that | |
190 aligned concordantly. Default: false | |
191 | |
192 --centrifuge_out_fmt_sam : Centrifuge output should be in SAM. Default: | |
193 false | |
194 | |
195 --centrifuge_extract_bug : Extract this bug from centrifuge results. | |
196 Default: Escherichia coli | |
197 | |
198 --centrifuge_ignore_quals : Treat all quality values as 30 on Phred | |
199 scale. Default: false | |
200 | |
201 --megahit_run : Run MEGAHIT assembler. Default: true | |
202 | |
203 --megahit_min_count : <int>. Minimum multiplicity for filtering ( | |
204 k_min+1)-mers. Defaut: false | |
205 | |
206 --megahit_k_list : Comma-separated list of kmer size. All | |
207 values must be odd, in the range 15-255, | |
208 increment should be <= 28. Ex: '21,29,39,59, | |
209 79,99,119,141'. Default: false | |
210 | |
211 --megahit_no_mercy : Do not add mercy k-mers. Default: false | |
212 | |
213 --megahit_bubble_level : <int>. Intensity of bubble merging (0-2), 0 | |
214 to disable. Default: false | |
215 | |
216 --megahit_merge_level : <l,s>. Merge complex bubbles of length <= l* | |
217 kmer_size and similarity >= s. Default: | |
218 false | |
219 | |
220 --megahit_prune_level : <int>. Strength of low depth pruning (0-3). | |
221 Default: false | |
222 | |
223 --megahit_prune_depth : <int>. Remove unitigs with avg k-mer depth | |
224 less than this value. Default: false | |
225 | |
226 --megahit_low_local_ratio : <float>. Ratio threshold to define low | |
227 local coverage contigs. Default: false | |
228 | |
229 --megahit_max_tip_len : <int>. remove tips less than this value [< | |
230 int> * k]. Default: false | |
231 | |
232 --megahit_no_local : Disable local assembly. Default: false | |
233 | |
234 --megahit_kmin_1pass : Use 1pass mode to build SdBG of k_min. | |
235 Default: false | |
236 | |
237 --megahit_preset : <str>. Override a group of parameters. | |
238 Valid values are meta-sensitive which | |
239 enforces '--min-count 1 --k-list 21,29,39, | |
240 49,...,129,141', meta-large (large & | |
241 complex metagenomes, like soil) which | |
242 enforces '--k-min 27 --k-max 127 --k-step | |
243 10'. Default: meta-sensitive | |
244 | |
245 --megahit_mem_flag : <int>. SdBG builder memory mode. 0: minimum; | |
246 1: moderate; 2: use all memory specified. | |
247 Default: 2 | |
248 | |
249 --megahit_min_contig_len : <int>. Minimum length of contigs to output. | |
250 Default: false | |
251 | |
252 --spades_run : Run SPAdes assembler. Default: false | |
253 | |
254 --spades_isolate : This flag is highly recommended for high- | |
255 coverage isolate and multi-cell data. | |
256 Defaut: false | |
257 | |
258 --spades_sc : This flag is required for MDA (single-cell) | |
259 data. Default: false | |
260 | |
261 --spades_meta : This flag is required for metagenomic data. | |
262 Default: true | |
263 | |
264 --spades_bio : This flag is required for biosytheticSPAdes | |
265 mode. Default: false | |
266 | |
267 --spades_corona : This flag is required for coronaSPAdes mode. | |
268 Default: false | |
269 | |
270 --spades_rna : This flag is required for RNA-Seq data. | |
271 Default: false | |
272 | |
273 --spades_plasmid : Runs plasmidSPAdes pipeline for plasmid | |
274 detection. Default: false | |
275 | |
276 --spades_metaviral : Runs metaviralSPAdes pipeline for virus | |
277 detection. Default: false | |
278 | |
279 --spades_metaplasmid : Runs metaplasmidSPAdes pipeline for plasmid | |
280 detection in metagenomics datasets. Default: | |
281 false | |
282 | |
283 --spades_rnaviral : This flag enables virus assembly module | |
284 from RNA-Seq data. Default: false | |
285 | |
286 --spades_iontorrent : This flag is required for IonTorrent data. | |
287 Default: false | |
288 | |
289 --spades_only_assembler : Runs only the SPAdes assembler module ( | |
290 without read error correction). Default: | |
291 false | |
292 | |
293 --spades_careful : Tries to reduce the number of mismatches | |
294 and short indels in the assembly. Default: | |
295 false | |
296 | |
297 --spades_cov_cutoff : Coverage cutoff value (a positive float | |
298 number). Default: false | |
299 | |
300 --spades_k : List of k-mer sizes (must be odd and less | |
301 than 128). Default: false | |
302 | |
303 --spades_hmm : Directory with custom hmms that replace the | |
304 default ones (very rare). Default: false | |
305 | |
306 --serotypefinder_run : Run SerotypeFinder tool. Default: true | |
307 | |
308 --serotypefinder_x : Generate extended output files. Default: | |
309 true | |
310 | |
311 --serotypefinder_db : Path to SerotypeFinder databases. Default: / | |
312 hpc/db/serotypefinder/2.0.2 | |
313 | |
314 --serotypefinder_min_threshold : Minimum percent identity (in float) | |
315 required for calling a hit. Default: 0.85 | |
316 | |
317 --serotypefinder_min_cov : Minumum percent coverage (in float) | |
318 required for calling a hit. Default: 0.80 | |
319 | |
320 --seqsero2_run : Run SeqSero2 tool. Default: false | |
321 | |
322 --seqsero2_t : '1' for interleaved paired-end reads, '2' | |
323 for separated paired-end reads, '3' for | |
324 single reads, '4' for genome assembly, '5' | |
325 for nanopore reads (fasta/fastq). Default: | |
326 4 | |
327 | |
328 --seqsero2_m : Which workflow to apply, 'a'(raw reads | |
329 allele micro-assembly), 'k'(raw reads and | |
330 genome assembly k-mer). Default: k | |
331 | |
332 --seqsero2_c : SeqSero2 will only output serotype | |
333 prediction without the directory containing | |
334 log files. Default: false | |
335 | |
336 --seqsero2_s : SeqSero2 will not output header in | |
337 SeqSero_result.tsv. Default: false | |
338 | |
339 --mlst_run : Run MLST tool. Default: true | |
340 | |
341 --mlst_minid : DNA %identity of full allelle to consider ' | |
342 similar' [~]. Default: 95 | |
343 | |
344 --mlst_mincov : DNA %cov to report partial allele at all [?]. | |
345 Default: 10 | |
346 | |
347 --mlst_minscore : Minumum score out of 100 to match a scheme. | |
348 Default: 50 | |
349 | |
350 --abricate_run : Run ABRicate tool. Default: true | |
351 | |
352 --abricate_minid : Minimum DNA %identity. Defaut: 90 | |
353 | |
354 --abricate_mincov : Minimum DNA %coverage. Defaut: 80 | |
355 | |
356 --abricate_datadir : ABRicate databases folder. Defaut: /hpc/db/ | |
357 abricate/1.0.1/db | |
358 | |
359 Help options : | |
360 | |
361 --help : Display this message. | |
362 ``` | |
363 | |
364 ### **BETA** | |
365 | |
366 --- | |
367 The development of the modular structure and flow is an ongoing effort and may change depending on assessment of various computational topics and other considerations. |