annotate 0.3.0/readme/centriflaken_hy.md @ 92:295c2597a475

"planemo upload"
author kkonganti
date Tue, 19 Jul 2022 10:07:24 -0400
parents
children 8d7f482c64de
rev   line source
kkonganti@92 1 # CPIPES (CFSAN PIPELINES)
kkonganti@92 2
kkonganti@92 3 ## The modular pipeline repository at CFSAN, FDA
kkonganti@92 4
kkonganti@92 5 **CPIPES** (CFSAN PIPELINES) is a collection of modular pipelines based on **NEXTFLOW**,
kkonganti@92 6 mostly for bioinformatics data analysis at **CFSAN, FDA.**
kkonganti@92 7
kkonganti@92 8 ---
kkonganti@92 9
kkonganti@92 10 ### **centriflaken_hy**
kkonganti@92 11
kkonganti@92 12 ---
kkonganti@92 13 `centriflaken_hy` is a variant of the original `centriflaken` pipeline but for Illumina short reads either single-end or paired-end.
kkonganti@92 14
kkonganti@92 15 #### Workflow Usage
kkonganti@92 16
kkonganti@92 17 ```bash
kkonganti@92 18 module load cpipes/0.2.1
kkonganti@92 19
kkonganti@92 20 cpipes --pipeline centriflaken_hy [options]
kkonganti@92 21 ```
kkonganti@92 22
kkonganti@92 23 Example: Run the default `centriflaken_hy` pipeline with taxa of interest as *E. coli*.
kkonganti@92 24
kkonganti@92 25 ```bash
kkonganti@92 26 cd /hpc/scratch/$USER
kkonganti@92 27 mkdir nf-cpipes
kkonganti@92 28 cd nf-cpipes
kkonganti@92 29 cpipes --pipeline centriflaken_hy --input /path/to/illumina/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
kkonganti@92 30 ```
kkonganti@92 31
kkonganti@92 32 Example: Run the `centriflaken_hy` pipeline with taxa of interest as *Salmonella*. In this mode, `SerotypeFinder` tool will be replaced with `SeqSero2` tool.
kkonganti@92 33
kkonganti@92 34 ```bash
kkonganti@92 35 cd /hpc/scratch/$USER
kkonganti@92 36 mkdir nf-cpipes
kkonganti@92 37 cd nf-cpipes
kkonganti@92 38 cpipes --pipeline centriflaken_hy --centrifuge_extract_bug 'Salmonella' --input /path/to/illumina/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
kkonganti@92 39 ```
kkonganti@92 40
kkonganti@92 41 #### `centriflaken_hy` Help
kkonganti@92 42
kkonganti@92 43 ```text
kkonganti@92 44 [Kranti.Konganti@login2-slurm ]$ cpipes --pipeline centriflaken_hy --help
kkonganti@92 45 N E X T F L O W ~ version 21.12.1-edge
kkonganti@92 46 Launching `/nfs/software/apps/cpipes/0.2.1/cpipes` [wise_noyce] - revision: 72db279311
kkonganti@92 47 ================================================================================
kkonganti@92 48 (o)
kkonganti@92 49 ___ _ __ _ _ __ ___ ___
kkonganti@92 50 / __|| '_ \ | || '_ \ / _ \/ __|
kkonganti@92 51 | (__ | |_) || || |_) || __/\__ \
kkonganti@92 52 \___|| .__/ |_|| .__/ \___||___/
kkonganti@92 53 | | | |
kkonganti@92 54 |_| |_|
kkonganti@92 55 --------------------------------------------------------------------------------
kkonganti@92 56 A collection of modular pipelines at CFSAN, FDA.
kkonganti@92 57 --------------------------------------------------------------------------------
kkonganti@92 58 Name : CPIPES
kkonganti@92 59 Author : Kranti.Konganti@fda.hhs.gov
kkonganti@92 60 Version : 0.2.1
kkonganti@92 61 Center : CFSAN, FDA.
kkonganti@92 62 ================================================================================
kkonganti@92 63
kkonganti@92 64 Workflow : centriflaken_hy
kkonganti@92 65
kkonganti@92 66 Author : Kranti.Konganti@fda.hhs.gov
kkonganti@92 67
kkonganti@92 68 Version : 0.2.0
kkonganti@92 69
kkonganti@92 70
kkonganti@92 71 Usage : cpipes --pipeline centriflaken_hy [options]
kkonganti@92 72
kkonganti@92 73
kkonganti@92 74 Required :
kkonganti@92 75
kkonganti@92 76 --input : Absolute path to directory containing FASTQ
kkonganti@92 77 files. The directory should contain only
kkonganti@92 78 FASTQ files as all the files within the
kkonganti@92 79 mentioned directory will be read. Ex: --
kkonganti@92 80 input /path/to/fastq_pass
kkonganti@92 81
kkonganti@92 82 --output : Absolute path to directory where all the
kkonganti@92 83 pipeline outputs should be stored. Ex: --
kkonganti@92 84 output /path/to/output
kkonganti@92 85
kkonganti@92 86 Other options :
kkonganti@92 87
kkonganti@92 88 --metadata : Absolute path to metadata CSV file
kkonganti@92 89 containing five mandatory columns: sample,
kkonganti@92 90 fq1,fq2,strandedness,single_end. The fq1
kkonganti@92 91 and fq2 columns contain absolute paths to
kkonganti@92 92 the FASTQ files. This option can be used in
kkonganti@92 93 place of --input option. This is rare. Ex: --
kkonganti@92 94 metadata samplesheet.csv
kkonganti@92 95
kkonganti@92 96 --fq_suffix : The suffix of FASTQ files (Unpaired reads
kkonganti@92 97 or R1 reads or Long reads) if an input
kkonganti@92 98 directory is mentioned via --input option.
kkonganti@92 99 Default: _R1_001.fastq.gz
kkonganti@92 100
kkonganti@92 101 --fq2_suffix : The suffix of FASTQ files (Paired-end reads
kkonganti@92 102 or R2 reads) if an input directory is
kkonganti@92 103 mentioned via --input option. Default:
kkonganti@92 104 _R2_001.fastq.gz
kkonganti@92 105
kkonganti@92 106 --fq_filter_by_len : Remove FASTQ reads that are less than this
kkonganti@92 107 many bases. Default: 75
kkonganti@92 108
kkonganti@92 109 --fq_strandedness : The strandedness of the sequencing run.
kkonganti@92 110 This is mostly needed if your sequencing
kkonganti@92 111 run is RNA-SEQ. For most of the other runs,
kkonganti@92 112 it is probably safe to use unstranded for
kkonganti@92 113 the option. Default: unstranded
kkonganti@92 114
kkonganti@92 115 --fq_single_end : SINGLE-END information will be auto-
kkonganti@92 116 detected but this option forces PAIRED-END
kkonganti@92 117 FASTQ files to be treated as SINGLE-END so
kkonganti@92 118 only read 1 information is included in auto-
kkonganti@92 119 generated samplesheet. Default: false
kkonganti@92 120
kkonganti@92 121 --fq_filename_delim : Delimiter by which the file name is split
kkonganti@92 122 to obtain sample name. Default: _
kkonganti@92 123
kkonganti@92 124 --fq_filename_delim_idx : After splitting FASTQ file name by using
kkonganti@92 125 the --fq_filename_delim option, all
kkonganti@92 126 elements before this index (1-based) will
kkonganti@92 127 be joined to create final sample name.
kkonganti@92 128 Default: 1
kkonganti@92 129
kkonganti@92 130 --kraken2_db : Absolute path to kraken database. Default: /
kkonganti@92 131 hpc/db/kraken2/standard-210914
kkonganti@92 132
kkonganti@92 133 --kraken2_confidence : Confidence score threshold which must be
kkonganti@92 134 between 0 and 1. Default: 0.0
kkonganti@92 135
kkonganti@92 136 --kraken2_quick : Quick operation (use first hit or hits).
kkonganti@92 137 Default: false
kkonganti@92 138
kkonganti@92 139 --kraken2_use_mpa_style : Report output like Kraken 1's kraken-mpa-
kkonganti@92 140 report. Default: false
kkonganti@92 141
kkonganti@92 142 --kraken2_minimum_base_quality : Minimum base quality used in classification
kkonganti@92 143 which is only effective with FASTQ input.
kkonganti@92 144 Default: 0
kkonganti@92 145
kkonganti@92 146 --kraken2_report_zero_counts : Report counts for ALL taxa, even if counts
kkonganti@92 147 are zero. Default: false
kkonganti@92 148
kkonganti@92 149 --kraken2_report_minmizer_data : Report minimizer and distinct minimizer
kkonganti@92 150 count information in addition to normal
kkonganti@92 151 Kraken report. Default: false
kkonganti@92 152
kkonganti@92 153 --kraken2_use_names : Print scientific names instead of just
kkonganti@92 154 taxids. Default: true
kkonganti@92 155
kkonganti@92 156 --kraken2_extract_bug : Extract the reads or contigs beloging to
kkonganti@92 157 this bug. Default: Escherichia coli
kkonganti@92 158
kkonganti@92 159 --centrifuge_x : Absolute path to centrifuge database.
kkonganti@92 160 Default: /hpc/db/centrifuge/2022-04-12/ab
kkonganti@92 161
kkonganti@92 162 --centrifuge_save_unaligned : Save SINGLE-END reads that did not align.
kkonganti@92 163 For PAIRED-END reads, save read pairs that
kkonganti@92 164 did not align concordantly. Default: false
kkonganti@92 165
kkonganti@92 166 --centrifuge_save_aligned : Save SINGLE-END reads that aligned. For
kkonganti@92 167 PAIRED-END reads, save read pairs that
kkonganti@92 168 aligned concordantly. Default: false
kkonganti@92 169
kkonganti@92 170 --centrifuge_out_fmt_sam : Centrifuge output should be in SAM. Default:
kkonganti@92 171 false
kkonganti@92 172
kkonganti@92 173 --centrifuge_extract_bug : Extract this bug from centrifuge results.
kkonganti@92 174 Default: Escherichia coli
kkonganti@92 175
kkonganti@92 176 --centrifuge_ignore_quals : Treat all quality values as 30 on Phred
kkonganti@92 177 scale. Default: false
kkonganti@92 178
kkonganti@92 179 --spades_isolate : This flag is highly recommended for high-
kkonganti@92 180 coverage isolate and multi-cell data.
kkonganti@92 181 Defaut: false
kkonganti@92 182
kkonganti@92 183 --spades_sc : This flag is required for MDA (single-cell)
kkonganti@92 184 data. Default: false
kkonganti@92 185
kkonganti@92 186 --spades_meta : This flag is required for metagenomic data.
kkonganti@92 187 Default: true
kkonganti@92 188
kkonganti@92 189 --spades_bio : This flag is required for biosytheticSPAdes
kkonganti@92 190 mode. Default: false
kkonganti@92 191
kkonganti@92 192 --spades_corona : This flag is required for coronaSPAdes mode.
kkonganti@92 193 Default: false
kkonganti@92 194
kkonganti@92 195 --spades_rna : This flag is required for RNA-Seq data.
kkonganti@92 196 Default: false
kkonganti@92 197
kkonganti@92 198 --spades_plasmid : Runs plasmidSPAdes pipeline for plasmid
kkonganti@92 199 detection. Default: false
kkonganti@92 200
kkonganti@92 201 --spades_metaviral : Runs metaviralSPAdes pipeline for virus
kkonganti@92 202 detection. Default: false
kkonganti@92 203
kkonganti@92 204 --spades_metaplasmid : Runs metaplasmidSPAdes pipeline for plasmid
kkonganti@92 205 detection in metagenomics datasets. Default:
kkonganti@92 206 false
kkonganti@92 207
kkonganti@92 208 --spades_rnaviral : This flag enables virus assembly module
kkonganti@92 209 from RNA-Seq data. Default: false
kkonganti@92 210
kkonganti@92 211 --spades_iontorrent : This flag is required for IonTorrent data.
kkonganti@92 212 Default: false
kkonganti@92 213
kkonganti@92 214 --spades_only_assembler : Runs only the SPAdes assembler module (
kkonganti@92 215 without read error correction).Default:
kkonganti@92 216 false
kkonganti@92 217
kkonganti@92 218 --spades_careful : Tries to reduce the number of mismatches
kkonganti@92 219 and short indels in the assembly. Default:
kkonganti@92 220 false
kkonganti@92 221
kkonganti@92 222 --spades_cov_cutoff : Coverage cutoff value (a positive float
kkonganti@92 223 number). Default: false
kkonganti@92 224
kkonganti@92 225 --spades_k : List of k-mer sizes (must be odd and less
kkonganti@92 226 than 128). Default: false
kkonganti@92 227
kkonganti@92 228 --spades_hmm : Directory with custom hmms that replace the
kkonganti@92 229 default ones (very rare). Default: false
kkonganti@92 230
kkonganti@92 231 --serotypefinder_run : Run SerotypeFinder tool. Default: true
kkonganti@92 232
kkonganti@92 233 --serotypefinder_x : Generate extended output files. Default:
kkonganti@92 234 true
kkonganti@92 235
kkonganti@92 236 --serotypefinder_db : Path to SerotypeFinder databases. Default: /
kkonganti@92 237 hpc/db/serotypefinder/2.0.2
kkonganti@92 238
kkonganti@92 239 --serotypefinder_min_threshold : Minimum percent identity (in float)
kkonganti@92 240 required for calling a hit. Default: 0.85
kkonganti@92 241
kkonganti@92 242 --serotypefinder_min_cov : Minumum percent coverage (in float)
kkonganti@92 243 required for calling a hit. Default: 0.80
kkonganti@92 244
kkonganti@92 245 --seqsero2_run : Run SeqSero2 tool. Default: false
kkonganti@92 246
kkonganti@92 247 --seqsero2_t : '1' for interleaved paired-end reads, '2'
kkonganti@92 248 for separated paired-end reads, '3' for
kkonganti@92 249 single reads, '4' for genome assembly, '5'
kkonganti@92 250 for nanopore reads (fasta/fastq). Default:
kkonganti@92 251 4
kkonganti@92 252
kkonganti@92 253 --seqsero2_m : Which workflow to apply, 'a'(raw reads
kkonganti@92 254 allele micro-assembly), 'k'(raw reads and
kkonganti@92 255 genome assembly k-mer). Default: k
kkonganti@92 256
kkonganti@92 257 --seqsero2_c : SeqSero2 will only output serotype
kkonganti@92 258 prediction without the directory containing
kkonganti@92 259 log files. Default: false
kkonganti@92 260
kkonganti@92 261 --seqsero2_s : SeqSero2 will not output header in
kkonganti@92 262 SeqSero_result.tsv. Default: false
kkonganti@92 263
kkonganti@92 264 --mlst_run : Run MLST tool. Default: true
kkonganti@92 265
kkonganti@92 266 --mlst_minid : DNA %identity of full allelle to consider '
kkonganti@92 267 similar' [~]. Default: 95
kkonganti@92 268
kkonganti@92 269 --mlst_mincov : DNA %cov to report partial allele at all [?].
kkonganti@92 270 Default: 10
kkonganti@92 271
kkonganti@92 272 --mlst_minscore : Minumum score out of 100 to match a scheme.
kkonganti@92 273 Default: 50
kkonganti@92 274
kkonganti@92 275 --abricate_run : Run ABRicate tool. Default: true
kkonganti@92 276
kkonganti@92 277 --abricate_minid : Minimum DNA %identity. Defaut: 90
kkonganti@92 278
kkonganti@92 279 --abricate_mincov : Minimum DNA %coverage. Defaut: 80
kkonganti@92 280
kkonganti@92 281 --abricate_datadir : ABRicate databases folder. Defaut: /hpc/db/
kkonganti@92 282 abricate/1.0.1/db
kkonganti@92 283
kkonganti@92 284 Help options :
kkonganti@92 285
kkonganti@92 286 --help : Display this message.
kkonganti@92 287 ```
kkonganti@92 288
kkonganti@92 289 ### **BETA**
kkonganti@92 290
kkonganti@92 291 ---
kkonganti@92 292 The development of the modular structure and flow is an ongoing effort and may change depending on assessment of various computational topics and other considerations.