comparison 0.4.0/readme/centriflaken_hy.md @ 101:ce6d9548fe89

"planemo upload"
author kkonganti
date Thu, 04 Aug 2022 10:45:55 -0400
parents
children
comparison
equal deleted inserted replaced
100:9d9537c907bd 101:ce6d9548fe89
1 # CPIPES (CFSAN PIPELINES)
2
3 ## The modular pipeline repository at CFSAN, FDA
4
5 **CPIPES** (CFSAN PIPELINES) is a collection of modular pipelines based on **NEXTFLOW**,
6 mostly for bioinformatics data analysis at **CFSAN, FDA.**
7
8 ---
9
10 ### **centriflaken_hy**
11
12 ---
13 `centriflaken_hy` is a variant of the original `centriflaken` pipeline but for Illumina short reads either single-end or paired-end.
14
15 #### Workflow Usage
16
17 ```bash
18 module load cpipes/0.4.0
19
20 cpipes --pipeline centriflaken_hy [options]
21 ```
22
23 Example: Run the default `centriflaken_hy` pipeline with taxa of interest as *E. coli*.
24
25 ```bash
26 cd /hpc/scratch/$USER
27 mkdir nf-cpipes
28 cd nf-cpipes
29 cpipes --pipeline centriflaken_hy --input /path/to/illumina/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
30 ```
31
32 Example: Run the `centriflaken_hy` pipeline with taxa of interest as *Salmonella*. In this mode, `SerotypeFinder` tool will be replaced with `SeqSero2` tool.
33
34 ```bash
35 cd /hpc/scratch/$USER
36 mkdir nf-cpipes
37 cd nf-cpipes
38 cpipes --pipeline centriflaken_hy --centrifuge_extract_bug 'Salmonella' --input /path/to/illumina/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
39 ```
40
41 #### `centriflaken_hy` Help
42
43 ```text
44 [Kranti.Konganti@login2-slurm ]$ cpipes --pipeline centriflaken_hy --help
45 N E X T F L O W ~ version 21.12.1-edge
46 Launching `/home/Kranti.Konganti/apps/cpipes/cpipes` [soggy_curie] - revision: 72db279311
47 ================================================================================
48 (o)
49 ___ _ __ _ _ __ ___ ___
50 / __|| '_ \ | || '_ \ / _ \/ __|
51 | (__ | |_) || || |_) || __/\__ \
52 \___|| .__/ |_|| .__/ \___||___/
53 | | | |
54 |_| |_|
55 --------------------------------------------------------------------------------
56 A collection of modular pipelines at CFSAN, FDA.
57 --------------------------------------------------------------------------------
58 Name : CPIPES
59 Author : Kranti.Konganti@fda.hhs.gov
60 Version : 0.4.0
61 Center : CFSAN, FDA.
62 ================================================================================
63
64 Workflow : centriflaken_hy
65
66 Author : Kranti.Konganti@fda.hhs.gov
67
68 Version : 0.4.0
69
70
71 Usage : cpipes --pipeline centriflaken_hy [options]
72
73
74 Required :
75
76 --input : Absolute path to directory containing FASTQ
77 files. The directory should contain only
78 FASTQ files as all the files within the
79 mentioned directory will be read. Ex: --
80 input /path/to/fastq_pass
81
82 --output : Absolute path to directory where all the
83 pipeline outputs should be stored. Ex: --
84 output /path/to/output
85
86 Other options :
87
88 --metadata : Absolute path to metadata CSV file
89 containing five mandatory columns: sample,
90 fq1,fq2,strandedness,single_end. The fq1
91 and fq2 columns contain absolute paths to
92 the FASTQ files. This option can be used in
93 place of --input option. This is rare. Ex: --
94 metadata samplesheet.csv
95
96 --fq_suffix : The suffix of FASTQ files (Unpaired reads
97 or R1 reads or Long reads) if an input
98 directory is mentioned via --input option.
99 Default: _R1_001.fastq.gz
100
101 --fq2_suffix : The suffix of FASTQ files (Paired-end reads
102 or R2 reads) if an input directory is
103 mentioned via --input option. Default:
104 _R2_001.fastq.gz
105
106 --fq_filter_by_len : Remove FASTQ reads that are less than this
107 many bases. Default: 75
108
109 --fq_strandedness : The strandedness of the sequencing run.
110 This is mostly needed if your sequencing
111 run is RNA-SEQ. For most of the other runs,
112 it is probably safe to use unstranded for
113 the option. Default: unstranded
114
115 --fq_single_end : SINGLE-END information will be auto-
116 detected but this option forces PAIRED-END
117 FASTQ files to be treated as SINGLE-END so
118 only read 1 information is included in auto-
119 generated samplesheet. Default: false
120
121 --fq_filename_delim : Delimiter by which the file name is split
122 to obtain sample name. Default: _
123
124 --fq_filename_delim_idx : After splitting FASTQ file name by using
125 the --fq_filename_delim option, all
126 elements before this index (1-based) will
127 be joined to create final sample name.
128 Default: 1
129
130 --seqkit_rmdup_run : Remove duplicate sequences using seqkit
131 rmdup. Default: false
132
133 --seqkit_rmdup_n : Match and remove duplicate sequences by
134 full name instead of just ID. Defaut: false
135
136 --seqkit_rmdup_s : Match and remove duplicate sequences by
137 sequence content. Defaut: true
138
139 --seqkit_rmdup_d : Save the duplicated sequences to a file.
140 Defaut: false
141
142 --seqkit_rmdup_D : Save the number and list of duplicated
143 sequences to a file. Defaut: false
144
145 --seqkit_rmdup_i : Ignore case while using seqkit rmdup.
146 Defaut: false
147
148 --seqkit_rmdup_P : Only consider positive strand (i.e. 5')
149 when comparing by sequence content. Defaut:
150 false
151
152 --kraken2_db : Absolute path to kraken database. Default: /
153 hpc/db/kraken2/standard-210914
154
155 --kraken2_confidence : Confidence score threshold which must be
156 between 0 and 1. Default: 0.0
157
158 --kraken2_quick : Quick operation (use first hit or hits).
159 Default: false
160
161 --kraken2_use_mpa_style : Report output like Kraken 1's kraken-mpa-
162 report. Default: false
163
164 --kraken2_minimum_base_quality : Minimum base quality used in classification
165 which is only effective with FASTQ input.
166 Default: 0
167
168 --kraken2_report_zero_counts : Report counts for ALL taxa, even if counts
169 are zero. Default: false
170
171 --kraken2_report_minmizer_data : Report minimizer and distinct minimizer
172 count information in addition to normal
173 Kraken report. Default: false
174
175 --kraken2_use_names : Print scientific names instead of just
176 taxids. Default: true
177
178 --kraken2_extract_bug : Extract the reads or contigs beloging to
179 this bug. Default: Escherichia coli
180
181 --centrifuge_x : Absolute path to centrifuge database.
182 Default: /hpc/db/centrifuge/2022-04-12/ab
183
184 --centrifuge_save_unaligned : Save SINGLE-END reads that did not align.
185 For PAIRED-END reads, save read pairs that
186 did not align concordantly. Default: false
187
188 --centrifuge_save_aligned : Save SINGLE-END reads that aligned. For
189 PAIRED-END reads, save read pairs that
190 aligned concordantly. Default: false
191
192 --centrifuge_out_fmt_sam : Centrifuge output should be in SAM. Default:
193 false
194
195 --centrifuge_extract_bug : Extract this bug from centrifuge results.
196 Default: Escherichia coli
197
198 --centrifuge_ignore_quals : Treat all quality values as 30 on Phred
199 scale. Default: false
200
201 --megahit_run : Run MEGAHIT assembler. Default: true
202
203 --megahit_min_count : <int>. Minimum multiplicity for filtering (
204 k_min+1)-mers. Defaut: false
205
206 --megahit_k_list : Comma-separated list of kmer size. All
207 values must be odd, in the range 15-255,
208 increment should be <= 28. Ex: '21,29,39,59,
209 79,99,119,141'. Default: false
210
211 --megahit_no_mercy : Do not add mercy k-mers. Default: false
212
213 --megahit_bubble_level : <int>. Intensity of bubble merging (0-2), 0
214 to disable. Default: false
215
216 --megahit_merge_level : <l,s>. Merge complex bubbles of length <= l*
217 kmer_size and similarity >= s. Default:
218 false
219
220 --megahit_prune_level : <int>. Strength of low depth pruning (0-3).
221 Default: false
222
223 --megahit_prune_depth : <int>. Remove unitigs with avg k-mer depth
224 less than this value. Default: false
225
226 --megahit_low_local_ratio : <float>. Ratio threshold to define low
227 local coverage contigs. Default: false
228
229 --megahit_max_tip_len : <int>. remove tips less than this value [<
230 int> * k]. Default: false
231
232 --megahit_no_local : Disable local assembly. Default: false
233
234 --megahit_kmin_1pass : Use 1pass mode to build SdBG of k_min.
235 Default: false
236
237 --megahit_preset : <str>. Override a group of parameters.
238 Valid values are meta-sensitive which
239 enforces '--min-count 1 --k-list 21,29,39,
240 49,...,129,141', meta-large (large &
241 complex metagenomes, like soil) which
242 enforces '--k-min 27 --k-max 127 --k-step
243 10'. Default: meta-sensitive
244
245 --megahit_mem_flag : <int>. SdBG builder memory mode. 0: minimum;
246 1: moderate; 2: use all memory specified.
247 Default: 2
248
249 --megahit_min_contig_len : <int>. Minimum length of contigs to output.
250 Default: false
251
252 --spades_run : Run SPAdes assembler. Default: false
253
254 --spades_isolate : This flag is highly recommended for high-
255 coverage isolate and multi-cell data.
256 Defaut: false
257
258 --spades_sc : This flag is required for MDA (single-cell)
259 data. Default: false
260
261 --spades_meta : This flag is required for metagenomic data.
262 Default: true
263
264 --spades_bio : This flag is required for biosytheticSPAdes
265 mode. Default: false
266
267 --spades_corona : This flag is required for coronaSPAdes mode.
268 Default: false
269
270 --spades_rna : This flag is required for RNA-Seq data.
271 Default: false
272
273 --spades_plasmid : Runs plasmidSPAdes pipeline for plasmid
274 detection. Default: false
275
276 --spades_metaviral : Runs metaviralSPAdes pipeline for virus
277 detection. Default: false
278
279 --spades_metaplasmid : Runs metaplasmidSPAdes pipeline for plasmid
280 detection in metagenomics datasets. Default:
281 false
282
283 --spades_rnaviral : This flag enables virus assembly module
284 from RNA-Seq data. Default: false
285
286 --spades_iontorrent : This flag is required for IonTorrent data.
287 Default: false
288
289 --spades_only_assembler : Runs only the SPAdes assembler module (
290 without read error correction). Default:
291 false
292
293 --spades_careful : Tries to reduce the number of mismatches
294 and short indels in the assembly. Default:
295 false
296
297 --spades_cov_cutoff : Coverage cutoff value (a positive float
298 number). Default: false
299
300 --spades_k : List of k-mer sizes (must be odd and less
301 than 128). Default: false
302
303 --spades_hmm : Directory with custom hmms that replace the
304 default ones (very rare). Default: false
305
306 --serotypefinder_run : Run SerotypeFinder tool. Default: true
307
308 --serotypefinder_x : Generate extended output files. Default:
309 true
310
311 --serotypefinder_db : Path to SerotypeFinder databases. Default: /
312 hpc/db/serotypefinder/2.0.2
313
314 --serotypefinder_min_threshold : Minimum percent identity (in float)
315 required for calling a hit. Default: 0.85
316
317 --serotypefinder_min_cov : Minumum percent coverage (in float)
318 required for calling a hit. Default: 0.80
319
320 --seqsero2_run : Run SeqSero2 tool. Default: false
321
322 --seqsero2_t : '1' for interleaved paired-end reads, '2'
323 for separated paired-end reads, '3' for
324 single reads, '4' for genome assembly, '5'
325 for nanopore reads (fasta/fastq). Default:
326 4
327
328 --seqsero2_m : Which workflow to apply, 'a'(raw reads
329 allele micro-assembly), 'k'(raw reads and
330 genome assembly k-mer). Default: k
331
332 --seqsero2_c : SeqSero2 will only output serotype
333 prediction without the directory containing
334 log files. Default: false
335
336 --seqsero2_s : SeqSero2 will not output header in
337 SeqSero_result.tsv. Default: false
338
339 --mlst_run : Run MLST tool. Default: true
340
341 --mlst_minid : DNA %identity of full allelle to consider '
342 similar' [~]. Default: 95
343
344 --mlst_mincov : DNA %cov to report partial allele at all [?].
345 Default: 10
346
347 --mlst_minscore : Minumum score out of 100 to match a scheme.
348 Default: 50
349
350 --abricate_run : Run ABRicate tool. Default: true
351
352 --abricate_minid : Minimum DNA %identity. Defaut: 90
353
354 --abricate_mincov : Minimum DNA %coverage. Defaut: 80
355
356 --abricate_datadir : ABRicate databases folder. Defaut: /hpc/db/
357 abricate/1.0.1/db
358
359 Help options :
360
361 --help : Display this message.
362 ```
363
364 ### **BETA**
365
366 ---
367 The development of the modular structure and flow is an ongoing effort and may change depending on assessment of various computational topics and other considerations.