comparison 0.4.2/readme/centriflaken.md @ 105:52045ea4679d

"planemo upload"
author kkonganti
date Thu, 27 Jun 2024 14:17:26 -0400
parents
children
comparison
equal deleted inserted replaced
104:17890124001d 105:52045ea4679d
1 # CPIPES (CFSAN PIPELINES)
2
3 ## The modular pipeline repository at CFSAN, FDA
4
5 **CPIPES** (CFSAN PIPELINES) is a collection of modular pipelines based on **NEXTFLOW**,
6 mostly for bioinformatics data analysis at **CFSAN, FDA.**
7
8 ---
9
10 ### **centriflaken**
11
12 ---
13 Precision long-read metagenomics sequencing for food safety by detection and assembly of Shiga toxin-producing Escherichia coli.
14
15 #### Workflow Usage
16
17 ```bash
18 module load cpipes/0.4.0
19
20 cpipes --pipeline centriflaken [options]
21 ```
22
23 Example: Run the default `centriflaken` pipeline with taxa of interest as *E. coli*.
24
25 ```bash
26 cd /hpc/scratch/$USER
27 mkdir nf-cpipes
28 cd nf-cpipes
29 cpipes --pipeline centriflaken --input /path/to/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
30 ```
31
32 Example: Run the `centriflaken` pipeline with taxa of interest as *Salmonella*. In this mode, `SerotypeFinder` tool will be replaced with `SeqSero2` tool.
33
34 ```bash
35 cd /hpc/scratch/$USER
36 mkdir nf-cpipes
37 cd nf-cpipes
38 cpipes --pipeline centriflaken --centrifuge_extract_bug 'Salmonella' --input /path/to/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
39 ```
40
41 #### `centriflaken` Help
42
43 ```text
44 [Kranti.Konganti@login2-slurm ]$ cpipes --pipeline centriflaken --help
45 N E X T F L O W ~ version 21.12.1-edge
46 Launching `/nfs/software/apps/cpipes/0.4.0/cpipes` [crazy_euler] - revision: 72db279311
47 ================================================================================
48 (o)
49 ___ _ __ _ _ __ ___ ___
50 / __|| '_ \ | || '_ \ / _ \/ __|
51 | (__ | |_) || || |_) || __/\__ \
52 \___|| .__/ |_|| .__/ \___||___/
53 | | | |
54 |_| |_|
55 --------------------------------------------------------------------------------
56 A collection of modular pipelines at CFSAN, FDA.
57 --------------------------------------------------------------------------------
58 Name : CPIPES
59 Author : Kranti.Konganti@fda.hhs.gov
60 Version : 0.4.0
61 Center : CFSAN, FDA.
62 ================================================================================
63
64 Workflow : centriflaken
65
66 Author : Kranti.Konganti@fda.hhs.gov
67
68 Version : 0.2.1
69
70
71 Usage : cpipes --pipeline centriflaken [options]
72
73
74 Required :
75
76 --input : Absolute path to directory containing FASTQ
77 files. The directory should contain only
78 FASTQ files as all the files within the
79 mentioned directory will be read. Ex: --
80 input /path/to/fastq_pass
81
82 --output : Absolute path to directory where all the
83 pipeline outputs should be stored. Ex: --
84 output /path/to/output
85
86 Other options :
87
88 --metadata : Absolute path to metadata CSV file
89 containing five mandatory columns: sample,
90 fq1,fq2,strandedness,single_end. The fq1
91 and fq2 columns contain absolute paths to
92 the FASTQ files. This option can be used in
93 place of --input option. This is rare. Ex: --
94 metadata samplesheet.csv
95
96 --fq_suffix : The suffix of FASTQ files (Unpaired reads
97 or R1 reads or Long reads) if an input
98 directory is mentioned via --input option.
99 Default: .fastq.gz
100
101 --fq2_suffix : The suffix of FASTQ files (Paired-end reads
102 or R2 reads) if an input directory is
103 mentioned via --input option. Default:
104 false
105
106 --fq_filter_by_len : Remove FASTQ reads that are less than this
107 many bases. Default: 4000
108
109 --fq_strandedness : The strandedness of the sequencing run.
110 This is mostly needed if your sequencing
111 run is RNA-SEQ. For most of the other runs,
112 it is probably safe to use unstranded for
113 the option. Default: unstranded
114
115 --fq_single_end : SINGLE-END information will be auto-
116 detected but this option forces PAIRED-END
117 FASTQ files to be treated as SINGLE-END so
118 only read 1 information is included in auto-
119 generated samplesheet. Default: false
120
121 --fq_filename_delim : Delimiter by which the file name is split
122 to obtain sample name. Default: _
123
124 --fq_filename_delim_idx : After splitting FASTQ file name by using
125 the --fq_filename_delim option, all
126 elements before this index (1-based) will
127 be joined to create final sample name.
128 Default: 1
129
130 --kraken2_db : Absolute path to kraken database. Default: /
131 hpc/db/kraken2/standard-210914
132
133 --kraken2_confidence : Confidence score threshold which must be
134 between 0 and 1. Default: 0.0
135
136 --kraken2_quick : Quick operation (use first hit or hits).
137 Default: false
138
139 --kraken2_use_mpa_style : Report output like Kraken 1's kraken-mpa-
140 report. Default: false
141
142 --kraken2_minimum_base_quality : Minimum base quality used in classification
143 which is only effective with FASTQ input.
144 Default: 0
145
146 --kraken2_report_zero_counts : Report counts for ALL taxa, even if counts
147 are zero. Default: false
148
149 --kraken2_report_minmizer_data : Report minimizer and distinct minimizer
150 count information in addition to normal
151 Kraken report. Default: false
152
153 --kraken2_use_names : Print scientific names instead of just
154 taxids. Default: true
155
156 --kraken2_extract_bug : Extract the reads or contigs beloging to
157 this bug. Default: Escherichia coli
158
159 --centrifuge_x : Absolute path to centrifuge database.
160 Default: /hpc/db/centrifuge/2022-04-12/ab
161
162 --centrifuge_save_unaligned : Save SINGLE-END reads that did not align.
163 For PAIRED-END reads, save read pairs that
164 did not align concordantly. Default: false
165
166 --centrifuge_save_aligned : Save SINGLE-END reads that aligned. For
167 PAIRED-END reads, save read pairs that
168 aligned concordantly. Default: false
169
170 --centrifuge_out_fmt_sam : Centrifuge output should be in SAM. Default:
171 false
172
173 --centrifuge_extract_bug : Extract this bug from centrifuge results.
174 Default: Escherichia coli
175
176 --centrifuge_ignore_quals : Treat all quality values as 30 on Phred
177 scale. Default: false
178
179 --flye_pacbio_raw : Input FASTQ reads are PacBio regular CLR
180 reads (<20% error) Defaut: false
181
182 --flye_pacbio_corr : Input FASTQ reads are PacBio reads that
183 were corrected with other methods (<3%
184 error). Default: false
185
186 --flye_pacbio_hifi : Input FASTQ reads are PacBio HiFi reads (<1%
187 error). Default: false
188
189 --flye_nano_raw : Input FASTQ reads are ONT regular reads,
190 pre-Guppy5 (<20% error). Default: true
191
192 --flye_nano_corr : Input FASTQ reads are ONT reads that were
193 corrected with other methods (<3% error).
194 Default: false
195
196 --flye_nano_hq : Input FASTQ reads are ONT high-quality
197 reads: Guppy5+ SUP or Q20 (<5% error).
198 Default: false
199
200 --flye_genome_size : Estimated genome size (for example, 5m or 2.
201 6g). Default: 5.5m
202
203 --flye_polish_iter : Number of genome polishing iterations.
204 Default: false
205
206 --flye_meta : Do a metagenome assembly (unenven coverage
207 mode). Default: true
208
209 --flye_min_overlap : Minimum overlap between reads. Default:
210 false
211
212 --flye_scaffold : Enable scaffolding using assembly graph.
213 Default: false
214
215 --serotypefinder_run : Run SerotypeFinder tool. Default: true
216
217 --serotypefinder_x : Generate extended output files. Default:
218 true
219
220 --serotypefinder_db : Path to SerotypeFinder databases. Default: /
221 hpc/db/serotypefinder/2.0.2
222
223 --serotypefinder_min_threshold : Minimum percent identity (in float)
224 required for calling a hit. Default: 0.85
225
226 --serotypefinder_min_cov : Minumum percent coverage (in float)
227 required for calling a hit. Default: 0.80
228
229 --seqsero2_run : Run SeqSero2 tool. Default: false
230
231 --seqsero2_t : '1' for interleaved paired-end reads, '2'
232 for separated paired-end reads, '3' for
233 single reads, '4' for genome assembly, '5'
234 for nanopore reads (fasta/fastq). Default:
235 4
236
237 --seqsero2_m : Which workflow to apply, 'a'(raw reads
238 allele micro-assembly), 'k'(raw reads and
239 genome assembly k-mer). Default: k
240
241 --seqsero2_c : SeqSero2 will only output serotype
242 prediction without the directory containing
243 log files. Default: false
244
245 --seqsero2_s : SeqSero2 will not output header in
246 SeqSero_result.tsv. Default: false
247
248 --mlst_run : Run MLST tool. Default: true
249
250 --mlst_minid : DNA %identity of full allelle to consider '
251 similar' [~]. Default: 95
252
253 --mlst_mincov : DNA %cov to report partial allele at all [?].
254 Default: 10
255
256 --mlst_minscore : Minumum score out of 100 to match a scheme.
257 Default: 50
258
259 --abricate_run : Run ABRicate tool. Default: true
260
261 --abricate_minid : Minimum DNA %identity. Defaut: 90
262
263 --abricate_mincov : Minimum DNA %coverage. Defaut: 80
264
265 --abricate_datadir : ABRicate databases folder. Defaut: /hpc/db/
266 abricate/1.0.1/db
267
268 Help options :
269
270 --help : Display this message.
271 ```
272
273 ### **BETA**
274
275 ---
276 The development of the modular structure and flow is an ongoing effort and may change depending on assessment of various computational topics and other considerations.