annotate 0.4.0/readme/centriflaken_hy.md @ 101:ce6d9548fe89

"planemo upload"
author kkonganti
date Thu, 04 Aug 2022 10:45:55 -0400
parents
children
rev   line source
kkonganti@101 1 # CPIPES (CFSAN PIPELINES)
kkonganti@101 2
kkonganti@101 3 ## The modular pipeline repository at CFSAN, FDA
kkonganti@101 4
kkonganti@101 5 **CPIPES** (CFSAN PIPELINES) is a collection of modular pipelines based on **NEXTFLOW**,
kkonganti@101 6 mostly for bioinformatics data analysis at **CFSAN, FDA.**
kkonganti@101 7
kkonganti@101 8 ---
kkonganti@101 9
kkonganti@101 10 ### **centriflaken_hy**
kkonganti@101 11
kkonganti@101 12 ---
kkonganti@101 13 `centriflaken_hy` is a variant of the original `centriflaken` pipeline but for Illumina short reads either single-end or paired-end.
kkonganti@101 14
kkonganti@101 15 #### Workflow Usage
kkonganti@101 16
kkonganti@101 17 ```bash
kkonganti@101 18 module load cpipes/0.4.0
kkonganti@101 19
kkonganti@101 20 cpipes --pipeline centriflaken_hy [options]
kkonganti@101 21 ```
kkonganti@101 22
kkonganti@101 23 Example: Run the default `centriflaken_hy` pipeline with taxa of interest as *E. coli*.
kkonganti@101 24
kkonganti@101 25 ```bash
kkonganti@101 26 cd /hpc/scratch/$USER
kkonganti@101 27 mkdir nf-cpipes
kkonganti@101 28 cd nf-cpipes
kkonganti@101 29 cpipes --pipeline centriflaken_hy --input /path/to/illumina/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
kkonganti@101 30 ```
kkonganti@101 31
kkonganti@101 32 Example: Run the `centriflaken_hy` pipeline with taxa of interest as *Salmonella*. In this mode, `SerotypeFinder` tool will be replaced with `SeqSero2` tool.
kkonganti@101 33
kkonganti@101 34 ```bash
kkonganti@101 35 cd /hpc/scratch/$USER
kkonganti@101 36 mkdir nf-cpipes
kkonganti@101 37 cd nf-cpipes
kkonganti@101 38 cpipes --pipeline centriflaken_hy --centrifuge_extract_bug 'Salmonella' --input /path/to/illumina/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
kkonganti@101 39 ```
kkonganti@101 40
kkonganti@101 41 #### `centriflaken_hy` Help
kkonganti@101 42
kkonganti@101 43 ```text
kkonganti@101 44 [Kranti.Konganti@login2-slurm ]$ cpipes --pipeline centriflaken_hy --help
kkonganti@101 45 N E X T F L O W ~ version 21.12.1-edge
kkonganti@101 46 Launching `/home/Kranti.Konganti/apps/cpipes/cpipes` [soggy_curie] - revision: 72db279311
kkonganti@101 47 ================================================================================
kkonganti@101 48 (o)
kkonganti@101 49 ___ _ __ _ _ __ ___ ___
kkonganti@101 50 / __|| '_ \ | || '_ \ / _ \/ __|
kkonganti@101 51 | (__ | |_) || || |_) || __/\__ \
kkonganti@101 52 \___|| .__/ |_|| .__/ \___||___/
kkonganti@101 53 | | | |
kkonganti@101 54 |_| |_|
kkonganti@101 55 --------------------------------------------------------------------------------
kkonganti@101 56 A collection of modular pipelines at CFSAN, FDA.
kkonganti@101 57 --------------------------------------------------------------------------------
kkonganti@101 58 Name : CPIPES
kkonganti@101 59 Author : Kranti.Konganti@fda.hhs.gov
kkonganti@101 60 Version : 0.4.0
kkonganti@101 61 Center : CFSAN, FDA.
kkonganti@101 62 ================================================================================
kkonganti@101 63
kkonganti@101 64 Workflow : centriflaken_hy
kkonganti@101 65
kkonganti@101 66 Author : Kranti.Konganti@fda.hhs.gov
kkonganti@101 67
kkonganti@101 68 Version : 0.4.0
kkonganti@101 69
kkonganti@101 70
kkonganti@101 71 Usage : cpipes --pipeline centriflaken_hy [options]
kkonganti@101 72
kkonganti@101 73
kkonganti@101 74 Required :
kkonganti@101 75
kkonganti@101 76 --input : Absolute path to directory containing FASTQ
kkonganti@101 77 files. The directory should contain only
kkonganti@101 78 FASTQ files as all the files within the
kkonganti@101 79 mentioned directory will be read. Ex: --
kkonganti@101 80 input /path/to/fastq_pass
kkonganti@101 81
kkonganti@101 82 --output : Absolute path to directory where all the
kkonganti@101 83 pipeline outputs should be stored. Ex: --
kkonganti@101 84 output /path/to/output
kkonganti@101 85
kkonganti@101 86 Other options :
kkonganti@101 87
kkonganti@101 88 --metadata : Absolute path to metadata CSV file
kkonganti@101 89 containing five mandatory columns: sample,
kkonganti@101 90 fq1,fq2,strandedness,single_end. The fq1
kkonganti@101 91 and fq2 columns contain absolute paths to
kkonganti@101 92 the FASTQ files. This option can be used in
kkonganti@101 93 place of --input option. This is rare. Ex: --
kkonganti@101 94 metadata samplesheet.csv
kkonganti@101 95
kkonganti@101 96 --fq_suffix : The suffix of FASTQ files (Unpaired reads
kkonganti@101 97 or R1 reads or Long reads) if an input
kkonganti@101 98 directory is mentioned via --input option.
kkonganti@101 99 Default: _R1_001.fastq.gz
kkonganti@101 100
kkonganti@101 101 --fq2_suffix : The suffix of FASTQ files (Paired-end reads
kkonganti@101 102 or R2 reads) if an input directory is
kkonganti@101 103 mentioned via --input option. Default:
kkonganti@101 104 _R2_001.fastq.gz
kkonganti@101 105
kkonganti@101 106 --fq_filter_by_len : Remove FASTQ reads that are less than this
kkonganti@101 107 many bases. Default: 75
kkonganti@101 108
kkonganti@101 109 --fq_strandedness : The strandedness of the sequencing run.
kkonganti@101 110 This is mostly needed if your sequencing
kkonganti@101 111 run is RNA-SEQ. For most of the other runs,
kkonganti@101 112 it is probably safe to use unstranded for
kkonganti@101 113 the option. Default: unstranded
kkonganti@101 114
kkonganti@101 115 --fq_single_end : SINGLE-END information will be auto-
kkonganti@101 116 detected but this option forces PAIRED-END
kkonganti@101 117 FASTQ files to be treated as SINGLE-END so
kkonganti@101 118 only read 1 information is included in auto-
kkonganti@101 119 generated samplesheet. Default: false
kkonganti@101 120
kkonganti@101 121 --fq_filename_delim : Delimiter by which the file name is split
kkonganti@101 122 to obtain sample name. Default: _
kkonganti@101 123
kkonganti@101 124 --fq_filename_delim_idx : After splitting FASTQ file name by using
kkonganti@101 125 the --fq_filename_delim option, all
kkonganti@101 126 elements before this index (1-based) will
kkonganti@101 127 be joined to create final sample name.
kkonganti@101 128 Default: 1
kkonganti@101 129
kkonganti@101 130 --seqkit_rmdup_run : Remove duplicate sequences using seqkit
kkonganti@101 131 rmdup. Default: false
kkonganti@101 132
kkonganti@101 133 --seqkit_rmdup_n : Match and remove duplicate sequences by
kkonganti@101 134 full name instead of just ID. Defaut: false
kkonganti@101 135
kkonganti@101 136 --seqkit_rmdup_s : Match and remove duplicate sequences by
kkonganti@101 137 sequence content. Defaut: true
kkonganti@101 138
kkonganti@101 139 --seqkit_rmdup_d : Save the duplicated sequences to a file.
kkonganti@101 140 Defaut: false
kkonganti@101 141
kkonganti@101 142 --seqkit_rmdup_D : Save the number and list of duplicated
kkonganti@101 143 sequences to a file. Defaut: false
kkonganti@101 144
kkonganti@101 145 --seqkit_rmdup_i : Ignore case while using seqkit rmdup.
kkonganti@101 146 Defaut: false
kkonganti@101 147
kkonganti@101 148 --seqkit_rmdup_P : Only consider positive strand (i.e. 5')
kkonganti@101 149 when comparing by sequence content. Defaut:
kkonganti@101 150 false
kkonganti@101 151
kkonganti@101 152 --kraken2_db : Absolute path to kraken database. Default: /
kkonganti@101 153 hpc/db/kraken2/standard-210914
kkonganti@101 154
kkonganti@101 155 --kraken2_confidence : Confidence score threshold which must be
kkonganti@101 156 between 0 and 1. Default: 0.0
kkonganti@101 157
kkonganti@101 158 --kraken2_quick : Quick operation (use first hit or hits).
kkonganti@101 159 Default: false
kkonganti@101 160
kkonganti@101 161 --kraken2_use_mpa_style : Report output like Kraken 1's kraken-mpa-
kkonganti@101 162 report. Default: false
kkonganti@101 163
kkonganti@101 164 --kraken2_minimum_base_quality : Minimum base quality used in classification
kkonganti@101 165 which is only effective with FASTQ input.
kkonganti@101 166 Default: 0
kkonganti@101 167
kkonganti@101 168 --kraken2_report_zero_counts : Report counts for ALL taxa, even if counts
kkonganti@101 169 are zero. Default: false
kkonganti@101 170
kkonganti@101 171 --kraken2_report_minmizer_data : Report minimizer and distinct minimizer
kkonganti@101 172 count information in addition to normal
kkonganti@101 173 Kraken report. Default: false
kkonganti@101 174
kkonganti@101 175 --kraken2_use_names : Print scientific names instead of just
kkonganti@101 176 taxids. Default: true
kkonganti@101 177
kkonganti@101 178 --kraken2_extract_bug : Extract the reads or contigs beloging to
kkonganti@101 179 this bug. Default: Escherichia coli
kkonganti@101 180
kkonganti@101 181 --centrifuge_x : Absolute path to centrifuge database.
kkonganti@101 182 Default: /hpc/db/centrifuge/2022-04-12/ab
kkonganti@101 183
kkonganti@101 184 --centrifuge_save_unaligned : Save SINGLE-END reads that did not align.
kkonganti@101 185 For PAIRED-END reads, save read pairs that
kkonganti@101 186 did not align concordantly. Default: false
kkonganti@101 187
kkonganti@101 188 --centrifuge_save_aligned : Save SINGLE-END reads that aligned. For
kkonganti@101 189 PAIRED-END reads, save read pairs that
kkonganti@101 190 aligned concordantly. Default: false
kkonganti@101 191
kkonganti@101 192 --centrifuge_out_fmt_sam : Centrifuge output should be in SAM. Default:
kkonganti@101 193 false
kkonganti@101 194
kkonganti@101 195 --centrifuge_extract_bug : Extract this bug from centrifuge results.
kkonganti@101 196 Default: Escherichia coli
kkonganti@101 197
kkonganti@101 198 --centrifuge_ignore_quals : Treat all quality values as 30 on Phred
kkonganti@101 199 scale. Default: false
kkonganti@101 200
kkonganti@101 201 --megahit_run : Run MEGAHIT assembler. Default: true
kkonganti@101 202
kkonganti@101 203 --megahit_min_count : <int>. Minimum multiplicity for filtering (
kkonganti@101 204 k_min+1)-mers. Defaut: false
kkonganti@101 205
kkonganti@101 206 --megahit_k_list : Comma-separated list of kmer size. All
kkonganti@101 207 values must be odd, in the range 15-255,
kkonganti@101 208 increment should be <= 28. Ex: '21,29,39,59,
kkonganti@101 209 79,99,119,141'. Default: false
kkonganti@101 210
kkonganti@101 211 --megahit_no_mercy : Do not add mercy k-mers. Default: false
kkonganti@101 212
kkonganti@101 213 --megahit_bubble_level : <int>. Intensity of bubble merging (0-2), 0
kkonganti@101 214 to disable. Default: false
kkonganti@101 215
kkonganti@101 216 --megahit_merge_level : <l,s>. Merge complex bubbles of length <= l*
kkonganti@101 217 kmer_size and similarity >= s. Default:
kkonganti@101 218 false
kkonganti@101 219
kkonganti@101 220 --megahit_prune_level : <int>. Strength of low depth pruning (0-3).
kkonganti@101 221 Default: false
kkonganti@101 222
kkonganti@101 223 --megahit_prune_depth : <int>. Remove unitigs with avg k-mer depth
kkonganti@101 224 less than this value. Default: false
kkonganti@101 225
kkonganti@101 226 --megahit_low_local_ratio : <float>. Ratio threshold to define low
kkonganti@101 227 local coverage contigs. Default: false
kkonganti@101 228
kkonganti@101 229 --megahit_max_tip_len : <int>. remove tips less than this value [<
kkonganti@101 230 int> * k]. Default: false
kkonganti@101 231
kkonganti@101 232 --megahit_no_local : Disable local assembly. Default: false
kkonganti@101 233
kkonganti@101 234 --megahit_kmin_1pass : Use 1pass mode to build SdBG of k_min.
kkonganti@101 235 Default: false
kkonganti@101 236
kkonganti@101 237 --megahit_preset : <str>. Override a group of parameters.
kkonganti@101 238 Valid values are meta-sensitive which
kkonganti@101 239 enforces '--min-count 1 --k-list 21,29,39,
kkonganti@101 240 49,...,129,141', meta-large (large &
kkonganti@101 241 complex metagenomes, like soil) which
kkonganti@101 242 enforces '--k-min 27 --k-max 127 --k-step
kkonganti@101 243 10'. Default: meta-sensitive
kkonganti@101 244
kkonganti@101 245 --megahit_mem_flag : <int>. SdBG builder memory mode. 0: minimum;
kkonganti@101 246 1: moderate; 2: use all memory specified.
kkonganti@101 247 Default: 2
kkonganti@101 248
kkonganti@101 249 --megahit_min_contig_len : <int>. Minimum length of contigs to output.
kkonganti@101 250 Default: false
kkonganti@101 251
kkonganti@101 252 --spades_run : Run SPAdes assembler. Default: false
kkonganti@101 253
kkonganti@101 254 --spades_isolate : This flag is highly recommended for high-
kkonganti@101 255 coverage isolate and multi-cell data.
kkonganti@101 256 Defaut: false
kkonganti@101 257
kkonganti@101 258 --spades_sc : This flag is required for MDA (single-cell)
kkonganti@101 259 data. Default: false
kkonganti@101 260
kkonganti@101 261 --spades_meta : This flag is required for metagenomic data.
kkonganti@101 262 Default: true
kkonganti@101 263
kkonganti@101 264 --spades_bio : This flag is required for biosytheticSPAdes
kkonganti@101 265 mode. Default: false
kkonganti@101 266
kkonganti@101 267 --spades_corona : This flag is required for coronaSPAdes mode.
kkonganti@101 268 Default: false
kkonganti@101 269
kkonganti@101 270 --spades_rna : This flag is required for RNA-Seq data.
kkonganti@101 271 Default: false
kkonganti@101 272
kkonganti@101 273 --spades_plasmid : Runs plasmidSPAdes pipeline for plasmid
kkonganti@101 274 detection. Default: false
kkonganti@101 275
kkonganti@101 276 --spades_metaviral : Runs metaviralSPAdes pipeline for virus
kkonganti@101 277 detection. Default: false
kkonganti@101 278
kkonganti@101 279 --spades_metaplasmid : Runs metaplasmidSPAdes pipeline for plasmid
kkonganti@101 280 detection in metagenomics datasets. Default:
kkonganti@101 281 false
kkonganti@101 282
kkonganti@101 283 --spades_rnaviral : This flag enables virus assembly module
kkonganti@101 284 from RNA-Seq data. Default: false
kkonganti@101 285
kkonganti@101 286 --spades_iontorrent : This flag is required for IonTorrent data.
kkonganti@101 287 Default: false
kkonganti@101 288
kkonganti@101 289 --spades_only_assembler : Runs only the SPAdes assembler module (
kkonganti@101 290 without read error correction). Default:
kkonganti@101 291 false
kkonganti@101 292
kkonganti@101 293 --spades_careful : Tries to reduce the number of mismatches
kkonganti@101 294 and short indels in the assembly. Default:
kkonganti@101 295 false
kkonganti@101 296
kkonganti@101 297 --spades_cov_cutoff : Coverage cutoff value (a positive float
kkonganti@101 298 number). Default: false
kkonganti@101 299
kkonganti@101 300 --spades_k : List of k-mer sizes (must be odd and less
kkonganti@101 301 than 128). Default: false
kkonganti@101 302
kkonganti@101 303 --spades_hmm : Directory with custom hmms that replace the
kkonganti@101 304 default ones (very rare). Default: false
kkonganti@101 305
kkonganti@101 306 --serotypefinder_run : Run SerotypeFinder tool. Default: true
kkonganti@101 307
kkonganti@101 308 --serotypefinder_x : Generate extended output files. Default:
kkonganti@101 309 true
kkonganti@101 310
kkonganti@101 311 --serotypefinder_db : Path to SerotypeFinder databases. Default: /
kkonganti@101 312 hpc/db/serotypefinder/2.0.2
kkonganti@101 313
kkonganti@101 314 --serotypefinder_min_threshold : Minimum percent identity (in float)
kkonganti@101 315 required for calling a hit. Default: 0.85
kkonganti@101 316
kkonganti@101 317 --serotypefinder_min_cov : Minumum percent coverage (in float)
kkonganti@101 318 required for calling a hit. Default: 0.80
kkonganti@101 319
kkonganti@101 320 --seqsero2_run : Run SeqSero2 tool. Default: false
kkonganti@101 321
kkonganti@101 322 --seqsero2_t : '1' for interleaved paired-end reads, '2'
kkonganti@101 323 for separated paired-end reads, '3' for
kkonganti@101 324 single reads, '4' for genome assembly, '5'
kkonganti@101 325 for nanopore reads (fasta/fastq). Default:
kkonganti@101 326 4
kkonganti@101 327
kkonganti@101 328 --seqsero2_m : Which workflow to apply, 'a'(raw reads
kkonganti@101 329 allele micro-assembly), 'k'(raw reads and
kkonganti@101 330 genome assembly k-mer). Default: k
kkonganti@101 331
kkonganti@101 332 --seqsero2_c : SeqSero2 will only output serotype
kkonganti@101 333 prediction without the directory containing
kkonganti@101 334 log files. Default: false
kkonganti@101 335
kkonganti@101 336 --seqsero2_s : SeqSero2 will not output header in
kkonganti@101 337 SeqSero_result.tsv. Default: false
kkonganti@101 338
kkonganti@101 339 --mlst_run : Run MLST tool. Default: true
kkonganti@101 340
kkonganti@101 341 --mlst_minid : DNA %identity of full allelle to consider '
kkonganti@101 342 similar' [~]. Default: 95
kkonganti@101 343
kkonganti@101 344 --mlst_mincov : DNA %cov to report partial allele at all [?].
kkonganti@101 345 Default: 10
kkonganti@101 346
kkonganti@101 347 --mlst_minscore : Minumum score out of 100 to match a scheme.
kkonganti@101 348 Default: 50
kkonganti@101 349
kkonganti@101 350 --abricate_run : Run ABRicate tool. Default: true
kkonganti@101 351
kkonganti@101 352 --abricate_minid : Minimum DNA %identity. Defaut: 90
kkonganti@101 353
kkonganti@101 354 --abricate_mincov : Minimum DNA %coverage. Defaut: 80
kkonganti@101 355
kkonganti@101 356 --abricate_datadir : ABRicate databases folder. Defaut: /hpc/db/
kkonganti@101 357 abricate/1.0.1/db
kkonganti@101 358
kkonganti@101 359 Help options :
kkonganti@101 360
kkonganti@101 361 --help : Display this message.
kkonganti@101 362 ```
kkonganti@101 363
kkonganti@101 364 ### **BETA**
kkonganti@101 365
kkonganti@101 366 ---
kkonganti@101 367 The development of the modular structure and flow is an ongoing effort and may change depending on assessment of various computational topics and other considerations.