kkonganti@92
|
1 # CPIPES (CFSAN PIPELINES)
|
kkonganti@92
|
2
|
kkonganti@92
|
3 ## The modular pipeline repository at CFSAN, FDA
|
kkonganti@92
|
4
|
kkonganti@92
|
5 **CPIPES** (CFSAN PIPELINES) is a collection of modular pipelines based on **NEXTFLOW**,
|
kkonganti@92
|
6 mostly for bioinformatics data analysis at **CFSAN, FDA.**
|
kkonganti@92
|
7
|
kkonganti@92
|
8 ---
|
kkonganti@92
|
9
|
kkonganti@92
|
10 ### **centriflaken_hy**
|
kkonganti@92
|
11
|
kkonganti@92
|
12 ---
|
kkonganti@92
|
13 `centriflaken_hy` is a variant of the original `centriflaken` pipeline but for Illumina short reads either single-end or paired-end.
|
kkonganti@92
|
14
|
kkonganti@92
|
15 #### Workflow Usage
|
kkonganti@92
|
16
|
kkonganti@92
|
17 ```bash
|
kkonganti@97
|
18 module load cpipes/0.3.0
|
kkonganti@92
|
19
|
kkonganti@92
|
20 cpipes --pipeline centriflaken_hy [options]
|
kkonganti@92
|
21 ```
|
kkonganti@92
|
22
|
kkonganti@92
|
23 Example: Run the default `centriflaken_hy` pipeline with taxa of interest as *E. coli*.
|
kkonganti@92
|
24
|
kkonganti@92
|
25 ```bash
|
kkonganti@92
|
26 cd /hpc/scratch/$USER
|
kkonganti@92
|
27 mkdir nf-cpipes
|
kkonganti@92
|
28 cd nf-cpipes
|
kkonganti@92
|
29 cpipes --pipeline centriflaken_hy --input /path/to/illumina/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
|
kkonganti@92
|
30 ```
|
kkonganti@92
|
31
|
kkonganti@92
|
32 Example: Run the `centriflaken_hy` pipeline with taxa of interest as *Salmonella*. In this mode, `SerotypeFinder` tool will be replaced with `SeqSero2` tool.
|
kkonganti@92
|
33
|
kkonganti@92
|
34 ```bash
|
kkonganti@92
|
35 cd /hpc/scratch/$USER
|
kkonganti@92
|
36 mkdir nf-cpipes
|
kkonganti@92
|
37 cd nf-cpipes
|
kkonganti@92
|
38 cpipes --pipeline centriflaken_hy --centrifuge_extract_bug 'Salmonella' --input /path/to/illumina/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
|
kkonganti@92
|
39 ```
|
kkonganti@92
|
40
|
kkonganti@92
|
41 #### `centriflaken_hy` Help
|
kkonganti@92
|
42
|
kkonganti@92
|
43 ```text
|
kkonganti@92
|
44 [Kranti.Konganti@login2-slurm ]$ cpipes --pipeline centriflaken_hy --help
|
kkonganti@92
|
45 N E X T F L O W ~ version 21.12.1-edge
|
kkonganti@97
|
46 Launching `/home/Kranti.Konganti/apps/cpipes/cpipes` [condescending_jang] - revision: 72db279311
|
kkonganti@92
|
47 ================================================================================
|
kkonganti@92
|
48 (o)
|
kkonganti@92
|
49 ___ _ __ _ _ __ ___ ___
|
kkonganti@92
|
50 / __|| '_ \ | || '_ \ / _ \/ __|
|
kkonganti@92
|
51 | (__ | |_) || || |_) || __/\__ \
|
kkonganti@92
|
52 \___|| .__/ |_|| .__/ \___||___/
|
kkonganti@92
|
53 | | | |
|
kkonganti@92
|
54 |_| |_|
|
kkonganti@92
|
55 --------------------------------------------------------------------------------
|
kkonganti@92
|
56 A collection of modular pipelines at CFSAN, FDA.
|
kkonganti@92
|
57 --------------------------------------------------------------------------------
|
kkonganti@92
|
58 Name : CPIPES
|
kkonganti@92
|
59 Author : Kranti.Konganti@fda.hhs.gov
|
kkonganti@97
|
60 Version : 0.3.0
|
kkonganti@92
|
61 Center : CFSAN, FDA.
|
kkonganti@92
|
62 ================================================================================
|
kkonganti@92
|
63
|
kkonganti@92
|
64 Workflow : centriflaken_hy
|
kkonganti@92
|
65
|
kkonganti@92
|
66 Author : Kranti.Konganti@fda.hhs.gov
|
kkonganti@92
|
67
|
kkonganti@97
|
68 Version : 0.3.0
|
kkonganti@92
|
69
|
kkonganti@92
|
70
|
kkonganti@92
|
71 Usage : cpipes --pipeline centriflaken_hy [options]
|
kkonganti@92
|
72
|
kkonganti@92
|
73
|
kkonganti@92
|
74 Required :
|
kkonganti@92
|
75
|
kkonganti@92
|
76 --input : Absolute path to directory containing FASTQ
|
kkonganti@92
|
77 files. The directory should contain only
|
kkonganti@92
|
78 FASTQ files as all the files within the
|
kkonganti@92
|
79 mentioned directory will be read. Ex: --
|
kkonganti@92
|
80 input /path/to/fastq_pass
|
kkonganti@92
|
81
|
kkonganti@92
|
82 --output : Absolute path to directory where all the
|
kkonganti@92
|
83 pipeline outputs should be stored. Ex: --
|
kkonganti@92
|
84 output /path/to/output
|
kkonganti@92
|
85
|
kkonganti@92
|
86 Other options :
|
kkonganti@92
|
87
|
kkonganti@92
|
88 --metadata : Absolute path to metadata CSV file
|
kkonganti@92
|
89 containing five mandatory columns: sample,
|
kkonganti@92
|
90 fq1,fq2,strandedness,single_end. The fq1
|
kkonganti@92
|
91 and fq2 columns contain absolute paths to
|
kkonganti@92
|
92 the FASTQ files. This option can be used in
|
kkonganti@92
|
93 place of --input option. This is rare. Ex: --
|
kkonganti@92
|
94 metadata samplesheet.csv
|
kkonganti@92
|
95
|
kkonganti@92
|
96 --fq_suffix : The suffix of FASTQ files (Unpaired reads
|
kkonganti@92
|
97 or R1 reads or Long reads) if an input
|
kkonganti@92
|
98 directory is mentioned via --input option.
|
kkonganti@92
|
99 Default: _R1_001.fastq.gz
|
kkonganti@92
|
100
|
kkonganti@92
|
101 --fq2_suffix : The suffix of FASTQ files (Paired-end reads
|
kkonganti@92
|
102 or R2 reads) if an input directory is
|
kkonganti@92
|
103 mentioned via --input option. Default:
|
kkonganti@92
|
104 _R2_001.fastq.gz
|
kkonganti@92
|
105
|
kkonganti@92
|
106 --fq_filter_by_len : Remove FASTQ reads that are less than this
|
kkonganti@92
|
107 many bases. Default: 75
|
kkonganti@92
|
108
|
kkonganti@92
|
109 --fq_strandedness : The strandedness of the sequencing run.
|
kkonganti@92
|
110 This is mostly needed if your sequencing
|
kkonganti@92
|
111 run is RNA-SEQ. For most of the other runs,
|
kkonganti@92
|
112 it is probably safe to use unstranded for
|
kkonganti@92
|
113 the option. Default: unstranded
|
kkonganti@92
|
114
|
kkonganti@92
|
115 --fq_single_end : SINGLE-END information will be auto-
|
kkonganti@92
|
116 detected but this option forces PAIRED-END
|
kkonganti@92
|
117 FASTQ files to be treated as SINGLE-END so
|
kkonganti@92
|
118 only read 1 information is included in auto-
|
kkonganti@92
|
119 generated samplesheet. Default: false
|
kkonganti@92
|
120
|
kkonganti@92
|
121 --fq_filename_delim : Delimiter by which the file name is split
|
kkonganti@92
|
122 to obtain sample name. Default: _
|
kkonganti@92
|
123
|
kkonganti@92
|
124 --fq_filename_delim_idx : After splitting FASTQ file name by using
|
kkonganti@92
|
125 the --fq_filename_delim option, all
|
kkonganti@92
|
126 elements before this index (1-based) will
|
kkonganti@92
|
127 be joined to create final sample name.
|
kkonganti@92
|
128 Default: 1
|
kkonganti@92
|
129
|
kkonganti@92
|
130 --kraken2_db : Absolute path to kraken database. Default: /
|
kkonganti@92
|
131 hpc/db/kraken2/standard-210914
|
kkonganti@92
|
132
|
kkonganti@92
|
133 --kraken2_confidence : Confidence score threshold which must be
|
kkonganti@92
|
134 between 0 and 1. Default: 0.0
|
kkonganti@92
|
135
|
kkonganti@92
|
136 --kraken2_quick : Quick operation (use first hit or hits).
|
kkonganti@92
|
137 Default: false
|
kkonganti@92
|
138
|
kkonganti@92
|
139 --kraken2_use_mpa_style : Report output like Kraken 1's kraken-mpa-
|
kkonganti@92
|
140 report. Default: false
|
kkonganti@92
|
141
|
kkonganti@92
|
142 --kraken2_minimum_base_quality : Minimum base quality used in classification
|
kkonganti@92
|
143 which is only effective with FASTQ input.
|
kkonganti@92
|
144 Default: 0
|
kkonganti@92
|
145
|
kkonganti@92
|
146 --kraken2_report_zero_counts : Report counts for ALL taxa, even if counts
|
kkonganti@92
|
147 are zero. Default: false
|
kkonganti@92
|
148
|
kkonganti@92
|
149 --kraken2_report_minmizer_data : Report minimizer and distinct minimizer
|
kkonganti@92
|
150 count information in addition to normal
|
kkonganti@92
|
151 Kraken report. Default: false
|
kkonganti@92
|
152
|
kkonganti@92
|
153 --kraken2_use_names : Print scientific names instead of just
|
kkonganti@92
|
154 taxids. Default: true
|
kkonganti@92
|
155
|
kkonganti@92
|
156 --kraken2_extract_bug : Extract the reads or contigs beloging to
|
kkonganti@92
|
157 this bug. Default: Escherichia coli
|
kkonganti@92
|
158
|
kkonganti@92
|
159 --centrifuge_x : Absolute path to centrifuge database.
|
kkonganti@92
|
160 Default: /hpc/db/centrifuge/2022-04-12/ab
|
kkonganti@92
|
161
|
kkonganti@92
|
162 --centrifuge_save_unaligned : Save SINGLE-END reads that did not align.
|
kkonganti@92
|
163 For PAIRED-END reads, save read pairs that
|
kkonganti@92
|
164 did not align concordantly. Default: false
|
kkonganti@92
|
165
|
kkonganti@92
|
166 --centrifuge_save_aligned : Save SINGLE-END reads that aligned. For
|
kkonganti@92
|
167 PAIRED-END reads, save read pairs that
|
kkonganti@92
|
168 aligned concordantly. Default: false
|
kkonganti@92
|
169
|
kkonganti@92
|
170 --centrifuge_out_fmt_sam : Centrifuge output should be in SAM. Default:
|
kkonganti@92
|
171 false
|
kkonganti@92
|
172
|
kkonganti@92
|
173 --centrifuge_extract_bug : Extract this bug from centrifuge results.
|
kkonganti@92
|
174 Default: Escherichia coli
|
kkonganti@92
|
175
|
kkonganti@92
|
176 --centrifuge_ignore_quals : Treat all quality values as 30 on Phred
|
kkonganti@92
|
177 scale. Default: false
|
kkonganti@92
|
178
|
kkonganti@97
|
179 --megahit_run : Run MEGAHIT assembler. Default: true
|
kkonganti@97
|
180
|
kkonganti@97
|
181 --megahit_min_count : <int>. Minimum multiplicity for filtering (
|
kkonganti@97
|
182 k_min+1)-mers. Defaut: false
|
kkonganti@97
|
183
|
kkonganti@97
|
184 --megahit_k_list : Comma-separated list of kmer size. All
|
kkonganti@97
|
185 values must be odd, in the range 15-255,
|
kkonganti@97
|
186 increment should be <= 28. Ex: '21,29,39,59,
|
kkonganti@97
|
187 79,99,119,141'. Default: false
|
kkonganti@97
|
188
|
kkonganti@97
|
189 --megahit_no_mercy : Do not add mercy k-mers. Default: false
|
kkonganti@97
|
190
|
kkonganti@97
|
191 --megahit_bubble_level : <int>. Intensity of bubble merging (0-2), 0
|
kkonganti@97
|
192 to disable. Default: false
|
kkonganti@97
|
193
|
kkonganti@97
|
194 --megahit_merge_level : <l,s>. Merge complex bubbles of length <= l*
|
kkonganti@97
|
195 kmer_size and similarity >= s. Default:
|
kkonganti@97
|
196 false
|
kkonganti@97
|
197
|
kkonganti@97
|
198 --megahit_prune_level : <int>. Strength of low depth pruning (0-3).
|
kkonganti@97
|
199 Default: false
|
kkonganti@97
|
200
|
kkonganti@97
|
201 --megahit_prune_depth : <int>. Remove unitigs with avg k-mer depth
|
kkonganti@97
|
202 less than this value. Default: false
|
kkonganti@97
|
203
|
kkonganti@97
|
204 --megahit_low_local_ratio : <float>. Ratio threshold to define low
|
kkonganti@97
|
205 local coverage contigs. Default: false
|
kkonganti@97
|
206
|
kkonganti@97
|
207 --megahit_max_tip_len : <int>. remove tips less than this value [<
|
kkonganti@97
|
208 int> * k]. Default: false
|
kkonganti@97
|
209
|
kkonganti@97
|
210 --megahit_no_local : Disable local assembly. Default: false
|
kkonganti@97
|
211
|
kkonganti@97
|
212 --megahit_kmin_1pass : Use 1pass mode to build SdBG of k_min.
|
kkonganti@97
|
213 Default: false
|
kkonganti@97
|
214
|
kkonganti@97
|
215 --megahit_preset : <str>. Override a group of parameters.
|
kkonganti@97
|
216 Valid values are meta-sensitive which
|
kkonganti@97
|
217 enforces '--min-count 1 --k-list 21,29,39,
|
kkonganti@97
|
218 49,...,129,141', meta-large (large &
|
kkonganti@97
|
219 complex metagenomes, like soil) which
|
kkonganti@97
|
220 enforces '--k-min 27 --k-max 127 --k-step
|
kkonganti@97
|
221 10'. Default: meta-sensitive
|
kkonganti@97
|
222
|
kkonganti@97
|
223 --megahit_mem_flag : <int>. SdBG builder memory mode. 0: minimum;
|
kkonganti@97
|
224 1: moderate; 2: use all memory specified.
|
kkonganti@97
|
225 Default: 2
|
kkonganti@97
|
226
|
kkonganti@97
|
227 --megahit_min_contig_len : <int>. Minimum length of contigs to output.
|
kkonganti@97
|
228 Default: false
|
kkonganti@97
|
229
|
kkonganti@97
|
230 --spades_run : Run SPAdes assembler. Default: false
|
kkonganti@97
|
231
|
kkonganti@92
|
232 --spades_isolate : This flag is highly recommended for high-
|
kkonganti@92
|
233 coverage isolate and multi-cell data.
|
kkonganti@92
|
234 Defaut: false
|
kkonganti@92
|
235
|
kkonganti@92
|
236 --spades_sc : This flag is required for MDA (single-cell)
|
kkonganti@92
|
237 data. Default: false
|
kkonganti@92
|
238
|
kkonganti@92
|
239 --spades_meta : This flag is required for metagenomic data.
|
kkonganti@92
|
240 Default: true
|
kkonganti@92
|
241
|
kkonganti@92
|
242 --spades_bio : This flag is required for biosytheticSPAdes
|
kkonganti@92
|
243 mode. Default: false
|
kkonganti@92
|
244
|
kkonganti@92
|
245 --spades_corona : This flag is required for coronaSPAdes mode.
|
kkonganti@92
|
246 Default: false
|
kkonganti@92
|
247
|
kkonganti@92
|
248 --spades_rna : This flag is required for RNA-Seq data.
|
kkonganti@92
|
249 Default: false
|
kkonganti@92
|
250
|
kkonganti@92
|
251 --spades_plasmid : Runs plasmidSPAdes pipeline for plasmid
|
kkonganti@92
|
252 detection. Default: false
|
kkonganti@92
|
253
|
kkonganti@92
|
254 --spades_metaviral : Runs metaviralSPAdes pipeline for virus
|
kkonganti@92
|
255 detection. Default: false
|
kkonganti@92
|
256
|
kkonganti@92
|
257 --spades_metaplasmid : Runs metaplasmidSPAdes pipeline for plasmid
|
kkonganti@92
|
258 detection in metagenomics datasets. Default:
|
kkonganti@92
|
259 false
|
kkonganti@92
|
260
|
kkonganti@92
|
261 --spades_rnaviral : This flag enables virus assembly module
|
kkonganti@92
|
262 from RNA-Seq data. Default: false
|
kkonganti@92
|
263
|
kkonganti@92
|
264 --spades_iontorrent : This flag is required for IonTorrent data.
|
kkonganti@92
|
265 Default: false
|
kkonganti@92
|
266
|
kkonganti@92
|
267 --spades_only_assembler : Runs only the SPAdes assembler module (
|
kkonganti@97
|
268 without read error correction). Default:
|
kkonganti@92
|
269 false
|
kkonganti@92
|
270
|
kkonganti@92
|
271 --spades_careful : Tries to reduce the number of mismatches
|
kkonganti@92
|
272 and short indels in the assembly. Default:
|
kkonganti@92
|
273 false
|
kkonganti@92
|
274
|
kkonganti@92
|
275 --spades_cov_cutoff : Coverage cutoff value (a positive float
|
kkonganti@92
|
276 number). Default: false
|
kkonganti@92
|
277
|
kkonganti@92
|
278 --spades_k : List of k-mer sizes (must be odd and less
|
kkonganti@92
|
279 than 128). Default: false
|
kkonganti@92
|
280
|
kkonganti@92
|
281 --spades_hmm : Directory with custom hmms that replace the
|
kkonganti@92
|
282 default ones (very rare). Default: false
|
kkonganti@92
|
283
|
kkonganti@92
|
284 --serotypefinder_run : Run SerotypeFinder tool. Default: true
|
kkonganti@92
|
285
|
kkonganti@92
|
286 --serotypefinder_x : Generate extended output files. Default:
|
kkonganti@92
|
287 true
|
kkonganti@92
|
288
|
kkonganti@92
|
289 --serotypefinder_db : Path to SerotypeFinder databases. Default: /
|
kkonganti@92
|
290 hpc/db/serotypefinder/2.0.2
|
kkonganti@92
|
291
|
kkonganti@92
|
292 --serotypefinder_min_threshold : Minimum percent identity (in float)
|
kkonganti@92
|
293 required for calling a hit. Default: 0.85
|
kkonganti@92
|
294
|
kkonganti@92
|
295 --serotypefinder_min_cov : Minumum percent coverage (in float)
|
kkonganti@92
|
296 required for calling a hit. Default: 0.80
|
kkonganti@92
|
297
|
kkonganti@92
|
298 --seqsero2_run : Run SeqSero2 tool. Default: false
|
kkonganti@92
|
299
|
kkonganti@92
|
300 --seqsero2_t : '1' for interleaved paired-end reads, '2'
|
kkonganti@92
|
301 for separated paired-end reads, '3' for
|
kkonganti@92
|
302 single reads, '4' for genome assembly, '5'
|
kkonganti@92
|
303 for nanopore reads (fasta/fastq). Default:
|
kkonganti@92
|
304 4
|
kkonganti@92
|
305
|
kkonganti@92
|
306 --seqsero2_m : Which workflow to apply, 'a'(raw reads
|
kkonganti@92
|
307 allele micro-assembly), 'k'(raw reads and
|
kkonganti@92
|
308 genome assembly k-mer). Default: k
|
kkonganti@92
|
309
|
kkonganti@92
|
310 --seqsero2_c : SeqSero2 will only output serotype
|
kkonganti@92
|
311 prediction without the directory containing
|
kkonganti@92
|
312 log files. Default: false
|
kkonganti@92
|
313
|
kkonganti@92
|
314 --seqsero2_s : SeqSero2 will not output header in
|
kkonganti@92
|
315 SeqSero_result.tsv. Default: false
|
kkonganti@92
|
316
|
kkonganti@92
|
317 --mlst_run : Run MLST tool. Default: true
|
kkonganti@92
|
318
|
kkonganti@92
|
319 --mlst_minid : DNA %identity of full allelle to consider '
|
kkonganti@92
|
320 similar' [~]. Default: 95
|
kkonganti@92
|
321
|
kkonganti@92
|
322 --mlst_mincov : DNA %cov to report partial allele at all [?].
|
kkonganti@92
|
323 Default: 10
|
kkonganti@92
|
324
|
kkonganti@92
|
325 --mlst_minscore : Minumum score out of 100 to match a scheme.
|
kkonganti@92
|
326 Default: 50
|
kkonganti@92
|
327
|
kkonganti@92
|
328 --abricate_run : Run ABRicate tool. Default: true
|
kkonganti@92
|
329
|
kkonganti@92
|
330 --abricate_minid : Minimum DNA %identity. Defaut: 90
|
kkonganti@92
|
331
|
kkonganti@92
|
332 --abricate_mincov : Minimum DNA %coverage. Defaut: 80
|
kkonganti@92
|
333
|
kkonganti@92
|
334 --abricate_datadir : ABRicate databases folder. Defaut: /hpc/db/
|
kkonganti@92
|
335 abricate/1.0.1/db
|
kkonganti@92
|
336
|
kkonganti@92
|
337 Help options :
|
kkonganti@92
|
338
|
kkonganti@92
|
339 --help : Display this message.
|
kkonganti@92
|
340 ```
|
kkonganti@92
|
341
|
kkonganti@92
|
342 ### **BETA**
|
kkonganti@92
|
343
|
kkonganti@92
|
344 ---
|
kkonganti@92
|
345 The development of the modular structure and flow is an ongoing effort and may change depending on assessment of various computational topics and other considerations.
|