annotate 0.4.2/readme/centriflaken.md @ 130:04f6ac8ca13c

planemo upload
author kkonganti
date Wed, 03 Jul 2024 15:16:39 -0400
parents 52045ea4679d
children
rev   line source
kkonganti@105 1 # CPIPES (CFSAN PIPELINES)
kkonganti@105 2
kkonganti@105 3 ## The modular pipeline repository at CFSAN, FDA
kkonganti@105 4
kkonganti@105 5 **CPIPES** (CFSAN PIPELINES) is a collection of modular pipelines based on **NEXTFLOW**,
kkonganti@105 6 mostly for bioinformatics data analysis at **CFSAN, FDA.**
kkonganti@105 7
kkonganti@105 8 ---
kkonganti@105 9
kkonganti@105 10 ### **centriflaken**
kkonganti@105 11
kkonganti@105 12 ---
kkonganti@105 13 Precision long-read metagenomics sequencing for food safety by detection and assembly of Shiga toxin-producing Escherichia coli.
kkonganti@105 14
kkonganti@105 15 #### Workflow Usage
kkonganti@105 16
kkonganti@105 17 ```bash
kkonganti@105 18 module load cpipes/0.4.0
kkonganti@105 19
kkonganti@105 20 cpipes --pipeline centriflaken [options]
kkonganti@105 21 ```
kkonganti@105 22
kkonganti@105 23 Example: Run the default `centriflaken` pipeline with taxa of interest as *E. coli*.
kkonganti@105 24
kkonganti@105 25 ```bash
kkonganti@105 26 cd /hpc/scratch/$USER
kkonganti@105 27 mkdir nf-cpipes
kkonganti@105 28 cd nf-cpipes
kkonganti@105 29 cpipes --pipeline centriflaken --input /path/to/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
kkonganti@105 30 ```
kkonganti@105 31
kkonganti@105 32 Example: Run the `centriflaken` pipeline with taxa of interest as *Salmonella*. In this mode, `SerotypeFinder` tool will be replaced with `SeqSero2` tool.
kkonganti@105 33
kkonganti@105 34 ```bash
kkonganti@105 35 cd /hpc/scratch/$USER
kkonganti@105 36 mkdir nf-cpipes
kkonganti@105 37 cd nf-cpipes
kkonganti@105 38 cpipes --pipeline centriflaken --centrifuge_extract_bug 'Salmonella' --input /path/to/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
kkonganti@105 39 ```
kkonganti@105 40
kkonganti@105 41 #### `centriflaken` Help
kkonganti@105 42
kkonganti@105 43 ```text
kkonganti@105 44 [Kranti.Konganti@login2-slurm ]$ cpipes --pipeline centriflaken --help
kkonganti@105 45 N E X T F L O W ~ version 21.12.1-edge
kkonganti@105 46 Launching `/nfs/software/apps/cpipes/0.4.0/cpipes` [crazy_euler] - revision: 72db279311
kkonganti@105 47 ================================================================================
kkonganti@105 48 (o)
kkonganti@105 49 ___ _ __ _ _ __ ___ ___
kkonganti@105 50 / __|| '_ \ | || '_ \ / _ \/ __|
kkonganti@105 51 | (__ | |_) || || |_) || __/\__ \
kkonganti@105 52 \___|| .__/ |_|| .__/ \___||___/
kkonganti@105 53 | | | |
kkonganti@105 54 |_| |_|
kkonganti@105 55 --------------------------------------------------------------------------------
kkonganti@105 56 A collection of modular pipelines at CFSAN, FDA.
kkonganti@105 57 --------------------------------------------------------------------------------
kkonganti@105 58 Name : CPIPES
kkonganti@105 59 Author : Kranti.Konganti@fda.hhs.gov
kkonganti@105 60 Version : 0.4.0
kkonganti@105 61 Center : CFSAN, FDA.
kkonganti@105 62 ================================================================================
kkonganti@105 63
kkonganti@105 64 Workflow : centriflaken
kkonganti@105 65
kkonganti@105 66 Author : Kranti.Konganti@fda.hhs.gov
kkonganti@105 67
kkonganti@105 68 Version : 0.2.1
kkonganti@105 69
kkonganti@105 70
kkonganti@105 71 Usage : cpipes --pipeline centriflaken [options]
kkonganti@105 72
kkonganti@105 73
kkonganti@105 74 Required :
kkonganti@105 75
kkonganti@105 76 --input : Absolute path to directory containing FASTQ
kkonganti@105 77 files. The directory should contain only
kkonganti@105 78 FASTQ files as all the files within the
kkonganti@105 79 mentioned directory will be read. Ex: --
kkonganti@105 80 input /path/to/fastq_pass
kkonganti@105 81
kkonganti@105 82 --output : Absolute path to directory where all the
kkonganti@105 83 pipeline outputs should be stored. Ex: --
kkonganti@105 84 output /path/to/output
kkonganti@105 85
kkonganti@105 86 Other options :
kkonganti@105 87
kkonganti@105 88 --metadata : Absolute path to metadata CSV file
kkonganti@105 89 containing five mandatory columns: sample,
kkonganti@105 90 fq1,fq2,strandedness,single_end. The fq1
kkonganti@105 91 and fq2 columns contain absolute paths to
kkonganti@105 92 the FASTQ files. This option can be used in
kkonganti@105 93 place of --input option. This is rare. Ex: --
kkonganti@105 94 metadata samplesheet.csv
kkonganti@105 95
kkonganti@105 96 --fq_suffix : The suffix of FASTQ files (Unpaired reads
kkonganti@105 97 or R1 reads or Long reads) if an input
kkonganti@105 98 directory is mentioned via --input option.
kkonganti@105 99 Default: .fastq.gz
kkonganti@105 100
kkonganti@105 101 --fq2_suffix : The suffix of FASTQ files (Paired-end reads
kkonganti@105 102 or R2 reads) if an input directory is
kkonganti@105 103 mentioned via --input option. Default:
kkonganti@105 104 false
kkonganti@105 105
kkonganti@105 106 --fq_filter_by_len : Remove FASTQ reads that are less than this
kkonganti@105 107 many bases. Default: 4000
kkonganti@105 108
kkonganti@105 109 --fq_strandedness : The strandedness of the sequencing run.
kkonganti@105 110 This is mostly needed if your sequencing
kkonganti@105 111 run is RNA-SEQ. For most of the other runs,
kkonganti@105 112 it is probably safe to use unstranded for
kkonganti@105 113 the option. Default: unstranded
kkonganti@105 114
kkonganti@105 115 --fq_single_end : SINGLE-END information will be auto-
kkonganti@105 116 detected but this option forces PAIRED-END
kkonganti@105 117 FASTQ files to be treated as SINGLE-END so
kkonganti@105 118 only read 1 information is included in auto-
kkonganti@105 119 generated samplesheet. Default: false
kkonganti@105 120
kkonganti@105 121 --fq_filename_delim : Delimiter by which the file name is split
kkonganti@105 122 to obtain sample name. Default: _
kkonganti@105 123
kkonganti@105 124 --fq_filename_delim_idx : After splitting FASTQ file name by using
kkonganti@105 125 the --fq_filename_delim option, all
kkonganti@105 126 elements before this index (1-based) will
kkonganti@105 127 be joined to create final sample name.
kkonganti@105 128 Default: 1
kkonganti@105 129
kkonganti@105 130 --kraken2_db : Absolute path to kraken database. Default: /
kkonganti@105 131 hpc/db/kraken2/standard-210914
kkonganti@105 132
kkonganti@105 133 --kraken2_confidence : Confidence score threshold which must be
kkonganti@105 134 between 0 and 1. Default: 0.0
kkonganti@105 135
kkonganti@105 136 --kraken2_quick : Quick operation (use first hit or hits).
kkonganti@105 137 Default: false
kkonganti@105 138
kkonganti@105 139 --kraken2_use_mpa_style : Report output like Kraken 1's kraken-mpa-
kkonganti@105 140 report. Default: false
kkonganti@105 141
kkonganti@105 142 --kraken2_minimum_base_quality : Minimum base quality used in classification
kkonganti@105 143 which is only effective with FASTQ input.
kkonganti@105 144 Default: 0
kkonganti@105 145
kkonganti@105 146 --kraken2_report_zero_counts : Report counts for ALL taxa, even if counts
kkonganti@105 147 are zero. Default: false
kkonganti@105 148
kkonganti@105 149 --kraken2_report_minmizer_data : Report minimizer and distinct minimizer
kkonganti@105 150 count information in addition to normal
kkonganti@105 151 Kraken report. Default: false
kkonganti@105 152
kkonganti@105 153 --kraken2_use_names : Print scientific names instead of just
kkonganti@105 154 taxids. Default: true
kkonganti@105 155
kkonganti@105 156 --kraken2_extract_bug : Extract the reads or contigs beloging to
kkonganti@105 157 this bug. Default: Escherichia coli
kkonganti@105 158
kkonganti@105 159 --centrifuge_x : Absolute path to centrifuge database.
kkonganti@105 160 Default: /hpc/db/centrifuge/2022-04-12/ab
kkonganti@105 161
kkonganti@105 162 --centrifuge_save_unaligned : Save SINGLE-END reads that did not align.
kkonganti@105 163 For PAIRED-END reads, save read pairs that
kkonganti@105 164 did not align concordantly. Default: false
kkonganti@105 165
kkonganti@105 166 --centrifuge_save_aligned : Save SINGLE-END reads that aligned. For
kkonganti@105 167 PAIRED-END reads, save read pairs that
kkonganti@105 168 aligned concordantly. Default: false
kkonganti@105 169
kkonganti@105 170 --centrifuge_out_fmt_sam : Centrifuge output should be in SAM. Default:
kkonganti@105 171 false
kkonganti@105 172
kkonganti@105 173 --centrifuge_extract_bug : Extract this bug from centrifuge results.
kkonganti@105 174 Default: Escherichia coli
kkonganti@105 175
kkonganti@105 176 --centrifuge_ignore_quals : Treat all quality values as 30 on Phred
kkonganti@105 177 scale. Default: false
kkonganti@105 178
kkonganti@105 179 --flye_pacbio_raw : Input FASTQ reads are PacBio regular CLR
kkonganti@105 180 reads (<20% error) Defaut: false
kkonganti@105 181
kkonganti@105 182 --flye_pacbio_corr : Input FASTQ reads are PacBio reads that
kkonganti@105 183 were corrected with other methods (<3%
kkonganti@105 184 error). Default: false
kkonganti@105 185
kkonganti@105 186 --flye_pacbio_hifi : Input FASTQ reads are PacBio HiFi reads (<1%
kkonganti@105 187 error). Default: false
kkonganti@105 188
kkonganti@105 189 --flye_nano_raw : Input FASTQ reads are ONT regular reads,
kkonganti@105 190 pre-Guppy5 (<20% error). Default: true
kkonganti@105 191
kkonganti@105 192 --flye_nano_corr : Input FASTQ reads are ONT reads that were
kkonganti@105 193 corrected with other methods (<3% error).
kkonganti@105 194 Default: false
kkonganti@105 195
kkonganti@105 196 --flye_nano_hq : Input FASTQ reads are ONT high-quality
kkonganti@105 197 reads: Guppy5+ SUP or Q20 (<5% error).
kkonganti@105 198 Default: false
kkonganti@105 199
kkonganti@105 200 --flye_genome_size : Estimated genome size (for example, 5m or 2.
kkonganti@105 201 6g). Default: 5.5m
kkonganti@105 202
kkonganti@105 203 --flye_polish_iter : Number of genome polishing iterations.
kkonganti@105 204 Default: false
kkonganti@105 205
kkonganti@105 206 --flye_meta : Do a metagenome assembly (unenven coverage
kkonganti@105 207 mode). Default: true
kkonganti@105 208
kkonganti@105 209 --flye_min_overlap : Minimum overlap between reads. Default:
kkonganti@105 210 false
kkonganti@105 211
kkonganti@105 212 --flye_scaffold : Enable scaffolding using assembly graph.
kkonganti@105 213 Default: false
kkonganti@105 214
kkonganti@105 215 --serotypefinder_run : Run SerotypeFinder tool. Default: true
kkonganti@105 216
kkonganti@105 217 --serotypefinder_x : Generate extended output files. Default:
kkonganti@105 218 true
kkonganti@105 219
kkonganti@105 220 --serotypefinder_db : Path to SerotypeFinder databases. Default: /
kkonganti@105 221 hpc/db/serotypefinder/2.0.2
kkonganti@105 222
kkonganti@105 223 --serotypefinder_min_threshold : Minimum percent identity (in float)
kkonganti@105 224 required for calling a hit. Default: 0.85
kkonganti@105 225
kkonganti@105 226 --serotypefinder_min_cov : Minumum percent coverage (in float)
kkonganti@105 227 required for calling a hit. Default: 0.80
kkonganti@105 228
kkonganti@105 229 --seqsero2_run : Run SeqSero2 tool. Default: false
kkonganti@105 230
kkonganti@105 231 --seqsero2_t : '1' for interleaved paired-end reads, '2'
kkonganti@105 232 for separated paired-end reads, '3' for
kkonganti@105 233 single reads, '4' for genome assembly, '5'
kkonganti@105 234 for nanopore reads (fasta/fastq). Default:
kkonganti@105 235 4
kkonganti@105 236
kkonganti@105 237 --seqsero2_m : Which workflow to apply, 'a'(raw reads
kkonganti@105 238 allele micro-assembly), 'k'(raw reads and
kkonganti@105 239 genome assembly k-mer). Default: k
kkonganti@105 240
kkonganti@105 241 --seqsero2_c : SeqSero2 will only output serotype
kkonganti@105 242 prediction without the directory containing
kkonganti@105 243 log files. Default: false
kkonganti@105 244
kkonganti@105 245 --seqsero2_s : SeqSero2 will not output header in
kkonganti@105 246 SeqSero_result.tsv. Default: false
kkonganti@105 247
kkonganti@105 248 --mlst_run : Run MLST tool. Default: true
kkonganti@105 249
kkonganti@105 250 --mlst_minid : DNA %identity of full allelle to consider '
kkonganti@105 251 similar' [~]. Default: 95
kkonganti@105 252
kkonganti@105 253 --mlst_mincov : DNA %cov to report partial allele at all [?].
kkonganti@105 254 Default: 10
kkonganti@105 255
kkonganti@105 256 --mlst_minscore : Minumum score out of 100 to match a scheme.
kkonganti@105 257 Default: 50
kkonganti@105 258
kkonganti@105 259 --abricate_run : Run ABRicate tool. Default: true
kkonganti@105 260
kkonganti@105 261 --abricate_minid : Minimum DNA %identity. Defaut: 90
kkonganti@105 262
kkonganti@105 263 --abricate_mincov : Minimum DNA %coverage. Defaut: 80
kkonganti@105 264
kkonganti@105 265 --abricate_datadir : ABRicate databases folder. Defaut: /hpc/db/
kkonganti@105 266 abricate/1.0.1/db
kkonganti@105 267
kkonganti@105 268 Help options :
kkonganti@105 269
kkonganti@105 270 --help : Display this message.
kkonganti@105 271 ```
kkonganti@105 272
kkonganti@105 273 ### **BETA**
kkonganti@105 274
kkonganti@105 275 ---
kkonganti@105 276 The development of the modular structure and flow is an ongoing effort and may change depending on assessment of various computational topics and other considerations.