cfsan_centriflaken: 0.4.2/readme/centriflaken

annotate 0.4.2/readme/centriflaken_hy.md @ 105:52045ea4679d

"planemo upload"

author	kkonganti
date	Thu, 27 Jun 2024 14:17:26 -0400
parents
children

rev	line source
kkonganti@105	1 # CPIPES (CFSAN PIPELINES)
kkonganti@105	2
kkonganti@105	3 ## The modular pipeline repository at CFSAN, FDA
kkonganti@105	4
kkonganti@105	5 CPIPES (CFSAN PIPELINES) is a collection of modular pipelines based on NEXTFLOW,
kkonganti@105	6 mostly for bioinformatics data analysis at CFSAN, FDA.
kkonganti@105	7
kkonganti@105	8 ---
kkonganti@105	9
kkonganti@105	10 ### centriflaken_hy
kkonganti@105	11
kkonganti@105	12 ---
kkonganti@105	13 `centriflaken_hy` is a variant of the original `centriflaken` pipeline but for Illumina short reads either single-end or paired-end.
kkonganti@105	14
kkonganti@105	15 #### Workflow Usage
kkonganti@105	16
kkonganti@105	17 ```bash
kkonganti@105	18 module load cpipes/0.4.0
kkonganti@105	19
kkonganti@105	20 cpipes --pipeline centriflaken_hy [options]
kkonganti@105	21 ```
kkonganti@105	22
kkonganti@105	23 Example: Run the default `centriflaken_hy` pipeline with taxa of interest as E. coli.
kkonganti@105	24
kkonganti@105	25 ```bash
kkonganti@105	26 cd /hpc/scratch/$USER
kkonganti@105	27 mkdir nf-cpipes
kkonganti@105	28 cd nf-cpipes
kkonganti@105	29 cpipes --pipeline centriflaken_hy --input /path/to/illumina/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
kkonganti@105	30 ```
kkonganti@105	31
kkonganti@105	32 Example: Run the `centriflaken_hy` pipeline with taxa of interest as Salmonella. In this mode, `SerotypeFinder` tool will be replaced with `SeqSero2` tool.
kkonganti@105	33
kkonganti@105	34 ```bash
kkonganti@105	35 cd /hpc/scratch/$USER
kkonganti@105	36 mkdir nf-cpipes
kkonganti@105	37 cd nf-cpipes
kkonganti@105	38 cpipes --pipeline centriflaken_hy --centrifuge_extract_bug 'Salmonella' --input /path/to/illumina/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
kkonganti@105	39 ```
kkonganti@105	40
kkonganti@105	41 #### `centriflaken_hy` Help
kkonganti@105	42
kkonganti@105	43 ```text
kkonganti@105	44 [Kranti.Konganti@login2-slurm ]$ cpipes --pipeline centriflaken_hy --help
kkonganti@105	45 N E X T F L O W ~ version 21.12.1-edge
kkonganti@105	46 Launching `/home/Kranti.Konganti/apps/cpipes/cpipes` [soggy_curie] - revision: 72db279311
kkonganti@105	47 ================================================================================
kkonganti@105	48 (o)
kkonganti@105	49 ___ _ __ _ _ __ ___ ___
kkonganti@105	50 / __\|\| '_ \ \| \|\| '_ \ / _ \/ __\|
kkonganti@105	51 \| (__ \| \|_) \|\| \|\| \|_) \|\| __/\__ \
kkonganti@105	52 \___\|\| .__/ \|_\|\| .__/ \___\|\|___/
kkonganti@105	53 \| \| \| \|
kkonganti@105	54 \|_\| \|_\|
kkonganti@105	55 --------------------------------------------------------------------------------
kkonganti@105	56 A collection of modular pipelines at CFSAN, FDA.
kkonganti@105	57 --------------------------------------------------------------------------------
kkonganti@105	58 Name : CPIPES
kkonganti@105	59 Author : Kranti.Konganti@fda.hhs.gov
kkonganti@105	60 Version : 0.4.0
kkonganti@105	61 Center : CFSAN, FDA.
kkonganti@105	62 ================================================================================
kkonganti@105	63
kkonganti@105	64 Workflow : centriflaken_hy
kkonganti@105	65
kkonganti@105	66 Author : Kranti.Konganti@fda.hhs.gov
kkonganti@105	67
kkonganti@105	68 Version : 0.4.0
kkonganti@105	69
kkonganti@105	70
kkonganti@105	71 Usage : cpipes --pipeline centriflaken_hy [options]
kkonganti@105	72
kkonganti@105	73
kkonganti@105	74 Required :
kkonganti@105	75
kkonganti@105	76 --input : Absolute path to directory containing FASTQ
kkonganti@105	77 files. The directory should contain only
kkonganti@105	78 FASTQ files as all the files within the
kkonganti@105	79 mentioned directory will be read. Ex: --
kkonganti@105	80 input /path/to/fastq_pass
kkonganti@105	81
kkonganti@105	82 --output : Absolute path to directory where all the
kkonganti@105	83 pipeline outputs should be stored. Ex: --
kkonganti@105	84 output /path/to/output
kkonganti@105	85
kkonganti@105	86 Other options :
kkonganti@105	87
kkonganti@105	88 --metadata : Absolute path to metadata CSV file
kkonganti@105	89 containing five mandatory columns: sample,
kkonganti@105	90 fq1,fq2,strandedness,single_end. The fq1
kkonganti@105	91 and fq2 columns contain absolute paths to
kkonganti@105	92 the FASTQ files. This option can be used in
kkonganti@105	93 place of --input option. This is rare. Ex: --
kkonganti@105	94 metadata samplesheet.csv
kkonganti@105	95
kkonganti@105	96 --fq_suffix : The suffix of FASTQ files (Unpaired reads
kkonganti@105	97 or R1 reads or Long reads) if an input
kkonganti@105	98 directory is mentioned via --input option.
kkonganti@105	99 Default: _R1_001.fastq.gz
kkonganti@105	100
kkonganti@105	101 --fq2_suffix : The suffix of FASTQ files (Paired-end reads
kkonganti@105	102 or R2 reads) if an input directory is
kkonganti@105	103 mentioned via --input option. Default:
kkonganti@105	104 _R2_001.fastq.gz
kkonganti@105	105
kkonganti@105	106 --fq_filter_by_len : Remove FASTQ reads that are less than this
kkonganti@105	107 many bases. Default: 75
kkonganti@105	108
kkonganti@105	109 --fq_strandedness : The strandedness of the sequencing run.
kkonganti@105	110 This is mostly needed if your sequencing
kkonganti@105	111 run is RNA-SEQ. For most of the other runs,
kkonganti@105	112 it is probably safe to use unstranded for
kkonganti@105	113 the option. Default: unstranded
kkonganti@105	114
kkonganti@105	115 --fq_single_end : SINGLE-END information will be auto-
kkonganti@105	116 detected but this option forces PAIRED-END
kkonganti@105	117 FASTQ files to be treated as SINGLE-END so
kkonganti@105	118 only read 1 information is included in auto-
kkonganti@105	119 generated samplesheet. Default: false
kkonganti@105	120
kkonganti@105	121 --fq_filename_delim : Delimiter by which the file name is split
kkonganti@105	122 to obtain sample name. Default: _
kkonganti@105	123
kkonganti@105	124 --fq_filename_delim_idx : After splitting FASTQ file name by using
kkonganti@105	125 the --fq_filename_delim option, all
kkonganti@105	126 elements before this index (1-based) will
kkonganti@105	127 be joined to create final sample name.
kkonganti@105	128 Default: 1
kkonganti@105	129
kkonganti@105	130 --seqkit_rmdup_run : Remove duplicate sequences using seqkit
kkonganti@105	131 rmdup. Default: false
kkonganti@105	132
kkonganti@105	133 --seqkit_rmdup_n : Match and remove duplicate sequences by
kkonganti@105	134 full name instead of just ID. Defaut: false
kkonganti@105	135
kkonganti@105	136 --seqkit_rmdup_s : Match and remove duplicate sequences by
kkonganti@105	137 sequence content. Defaut: true
kkonganti@105	138
kkonganti@105	139 --seqkit_rmdup_d : Save the duplicated sequences to a file.
kkonganti@105	140 Defaut: false
kkonganti@105	141
kkonganti@105	142 --seqkit_rmdup_D : Save the number and list of duplicated
kkonganti@105	143 sequences to a file. Defaut: false
kkonganti@105	144
kkonganti@105	145 --seqkit_rmdup_i : Ignore case while using seqkit rmdup.
kkonganti@105	146 Defaut: false
kkonganti@105	147
kkonganti@105	148 --seqkit_rmdup_P : Only consider positive strand (i.e. 5')
kkonganti@105	149 when comparing by sequence content. Defaut:
kkonganti@105	150 false
kkonganti@105	151
kkonganti@105	152 --kraken2_db : Absolute path to kraken database. Default: /
kkonganti@105	153 hpc/db/kraken2/standard-210914
kkonganti@105	154
kkonganti@105	155 --kraken2_confidence : Confidence score threshold which must be
kkonganti@105	156 between 0 and 1. Default: 0.0
kkonganti@105	157
kkonganti@105	158 --kraken2_quick : Quick operation (use first hit or hits).
kkonganti@105	159 Default: false
kkonganti@105	160
kkonganti@105	161 --kraken2_use_mpa_style : Report output like Kraken 1's kraken-mpa-
kkonganti@105	162 report. Default: false
kkonganti@105	163
kkonganti@105	164 --kraken2_minimum_base_quality : Minimum base quality used in classification
kkonganti@105	165 which is only effective with FASTQ input.
kkonganti@105	166 Default: 0
kkonganti@105	167
kkonganti@105	168 --kraken2_report_zero_counts : Report counts for ALL taxa, even if counts
kkonganti@105	169 are zero. Default: false
kkonganti@105	170
kkonganti@105	171 --kraken2_report_minmizer_data : Report minimizer and distinct minimizer
kkonganti@105	172 count information in addition to normal
kkonganti@105	173 Kraken report. Default: false
kkonganti@105	174
kkonganti@105	175 --kraken2_use_names : Print scientific names instead of just
kkonganti@105	176 taxids. Default: true
kkonganti@105	177
kkonganti@105	178 --kraken2_extract_bug : Extract the reads or contigs beloging to
kkonganti@105	179 this bug. Default: Escherichia coli
kkonganti@105	180
kkonganti@105	181 --centrifuge_x : Absolute path to centrifuge database.
kkonganti@105	182 Default: /hpc/db/centrifuge/2022-04-12/ab
kkonganti@105	183
kkonganti@105	184 --centrifuge_save_unaligned : Save SINGLE-END reads that did not align.
kkonganti@105	185 For PAIRED-END reads, save read pairs that
kkonganti@105	186 did not align concordantly. Default: false
kkonganti@105	187
kkonganti@105	188 --centrifuge_save_aligned : Save SINGLE-END reads that aligned. For
kkonganti@105	189 PAIRED-END reads, save read pairs that
kkonganti@105	190 aligned concordantly. Default: false
kkonganti@105	191
kkonganti@105	192 --centrifuge_out_fmt_sam : Centrifuge output should be in SAM. Default:
kkonganti@105	193 false
kkonganti@105	194
kkonganti@105	195 --centrifuge_extract_bug : Extract this bug from centrifuge results.
kkonganti@105	196 Default: Escherichia coli
kkonganti@105	197
kkonganti@105	198 --centrifuge_ignore_quals : Treat all quality values as 30 on Phred
kkonganti@105	199 scale. Default: false
kkonganti@105	200
kkonganti@105	201 --megahit_run : Run MEGAHIT assembler. Default: true
kkonganti@105	202
kkonganti@105	203 --megahit_min_count : <int>. Minimum multiplicity for filtering (
kkonganti@105	204 k_min+1)-mers. Defaut: false
kkonganti@105	205
kkonganti@105	206 --megahit_k_list : Comma-separated list of kmer size. All
kkonganti@105	207 values must be odd, in the range 15-255,
kkonganti@105	208 increment should be <= 28. Ex: '21,29,39,59,
kkonganti@105	209 79,99,119,141'. Default: false
kkonganti@105	210
kkonganti@105	211 --megahit_no_mercy : Do not add mercy k-mers. Default: false
kkonganti@105	212
kkonganti@105	213 --megahit_bubble_level : <int>. Intensity of bubble merging (0-2), 0
kkonganti@105	214 to disable. Default: false
kkonganti@105	215
kkonganti@105	216 --megahit_merge_level : <l,s>. Merge complex bubbles of length <= l*
kkonganti@105	217 kmer_size and similarity >= s. Default:
kkonganti@105	218 false
kkonganti@105	219
kkonganti@105	220 --megahit_prune_level : <int>. Strength of low depth pruning (0-3).
kkonganti@105	221 Default: false
kkonganti@105	222
kkonganti@105	223 --megahit_prune_depth : <int>. Remove unitigs with avg k-mer depth
kkonganti@105	224 less than this value. Default: false
kkonganti@105	225
kkonganti@105	226 --megahit_low_local_ratio : <float>. Ratio threshold to define low
kkonganti@105	227 local coverage contigs. Default: false
kkonganti@105	228
kkonganti@105	229 --megahit_max_tip_len : <int>. remove tips less than this value [<
kkonganti@105	230 int> * k]. Default: false
kkonganti@105	231
kkonganti@105	232 --megahit_no_local : Disable local assembly. Default: false
kkonganti@105	233
kkonganti@105	234 --megahit_kmin_1pass : Use 1pass mode to build SdBG of k_min.
kkonganti@105	235 Default: false
kkonganti@105	236
kkonganti@105	237 --megahit_preset : <str>. Override a group of parameters.
kkonganti@105	238 Valid values are meta-sensitive which
kkonganti@105	239 enforces '--min-count 1 --k-list 21,29,39,
kkonganti@105	240 49,...,129,141', meta-large (large &
kkonganti@105	241 complex metagenomes, like soil) which
kkonganti@105	242 enforces '--k-min 27 --k-max 127 --k-step
kkonganti@105	243 10'. Default: meta-sensitive
kkonganti@105	244
kkonganti@105	245 --megahit_mem_flag : <int>. SdBG builder memory mode. 0: minimum;
kkonganti@105	246 1: moderate; 2: use all memory specified.
kkonganti@105	247 Default: 2
kkonganti@105	248
kkonganti@105	249 --megahit_min_contig_len : <int>. Minimum length of contigs to output.
kkonganti@105	250 Default: false
kkonganti@105	251
kkonganti@105	252 --spades_run : Run SPAdes assembler. Default: false
kkonganti@105	253
kkonganti@105	254 --spades_isolate : This flag is highly recommended for high-
kkonganti@105	255 coverage isolate and multi-cell data.
kkonganti@105	256 Defaut: false
kkonganti@105	257
kkonganti@105	258 --spades_sc : This flag is required for MDA (single-cell)
kkonganti@105	259 data. Default: false
kkonganti@105	260
kkonganti@105	261 --spades_meta : This flag is required for metagenomic data.
kkonganti@105	262 Default: true
kkonganti@105	263
kkonganti@105	264 --spades_bio : This flag is required for biosytheticSPAdes
kkonganti@105	265 mode. Default: false
kkonganti@105	266
kkonganti@105	267 --spades_corona : This flag is required for coronaSPAdes mode.
kkonganti@105	268 Default: false
kkonganti@105	269
kkonganti@105	270 --spades_rna : This flag is required for RNA-Seq data.
kkonganti@105	271 Default: false
kkonganti@105	272
kkonganti@105	273 --spades_plasmid : Runs plasmidSPAdes pipeline for plasmid
kkonganti@105	274 detection. Default: false
kkonganti@105	275
kkonganti@105	276 --spades_metaviral : Runs metaviralSPAdes pipeline for virus
kkonganti@105	277 detection. Default: false
kkonganti@105	278
kkonganti@105	279 --spades_metaplasmid : Runs metaplasmidSPAdes pipeline for plasmid
kkonganti@105	280 detection in metagenomics datasets. Default:
kkonganti@105	281 false
kkonganti@105	282
kkonganti@105	283 --spades_rnaviral : This flag enables virus assembly module
kkonganti@105	284 from RNA-Seq data. Default: false
kkonganti@105	285
kkonganti@105	286 --spades_iontorrent : This flag is required for IonTorrent data.
kkonganti@105	287 Default: false
kkonganti@105	288
kkonganti@105	289 --spades_only_assembler : Runs only the SPAdes assembler module (
kkonganti@105	290 without read error correction). Default:
kkonganti@105	291 false
kkonganti@105	292
kkonganti@105	293 --spades_careful : Tries to reduce the number of mismatches
kkonganti@105	294 and short indels in the assembly. Default:
kkonganti@105	295 false
kkonganti@105	296
kkonganti@105	297 --spades_cov_cutoff : Coverage cutoff value (a positive float
kkonganti@105	298 number). Default: false
kkonganti@105	299
kkonganti@105	300 --spades_k : List of k-mer sizes (must be odd and less
kkonganti@105	301 than 128). Default: false
kkonganti@105	302
kkonganti@105	303 --spades_hmm : Directory with custom hmms that replace the
kkonganti@105	304 default ones (very rare). Default: false
kkonganti@105	305
kkonganti@105	306 --serotypefinder_run : Run SerotypeFinder tool. Default: true
kkonganti@105	307
kkonganti@105	308 --serotypefinder_x : Generate extended output files. Default:
kkonganti@105	309 true
kkonganti@105	310
kkonganti@105	311 --serotypefinder_db : Path to SerotypeFinder databases. Default: /
kkonganti@105	312 hpc/db/serotypefinder/2.0.2
kkonganti@105	313
kkonganti@105	314 --serotypefinder_min_threshold : Minimum percent identity (in float)
kkonganti@105	315 required for calling a hit. Default: 0.85
kkonganti@105	316
kkonganti@105	317 --serotypefinder_min_cov : Minumum percent coverage (in float)
kkonganti@105	318 required for calling a hit. Default: 0.80
kkonganti@105	319
kkonganti@105	320 --seqsero2_run : Run SeqSero2 tool. Default: false
kkonganti@105	321
kkonganti@105	322 --seqsero2_t : '1' for interleaved paired-end reads, '2'
kkonganti@105	323 for separated paired-end reads, '3' for
kkonganti@105	324 single reads, '4' for genome assembly, '5'
kkonganti@105	325 for nanopore reads (fasta/fastq). Default:
kkonganti@105	326 4
kkonganti@105	327
kkonganti@105	328 --seqsero2_m : Which workflow to apply, 'a'(raw reads
kkonganti@105	329 allele micro-assembly), 'k'(raw reads and
kkonganti@105	330 genome assembly k-mer). Default: k
kkonganti@105	331
kkonganti@105	332 --seqsero2_c : SeqSero2 will only output serotype
kkonganti@105	333 prediction without the directory containing
kkonganti@105	334 log files. Default: false
kkonganti@105	335
kkonganti@105	336 --seqsero2_s : SeqSero2 will not output header in
kkonganti@105	337 SeqSero_result.tsv. Default: false
kkonganti@105	338
kkonganti@105	339 --mlst_run : Run MLST tool. Default: true
kkonganti@105	340
kkonganti@105	341 --mlst_minid : DNA %identity of full allelle to consider '
kkonganti@105	342 similar' [~]. Default: 95
kkonganti@105	343
kkonganti@105	344 --mlst_mincov : DNA %cov to report partial allele at all [?].
kkonganti@105	345 Default: 10
kkonganti@105	346
kkonganti@105	347 --mlst_minscore : Minumum score out of 100 to match a scheme.
kkonganti@105	348 Default: 50
kkonganti@105	349
kkonganti@105	350 --abricate_run : Run ABRicate tool. Default: true
kkonganti@105	351
kkonganti@105	352 --abricate_minid : Minimum DNA %identity. Defaut: 90
kkonganti@105	353
kkonganti@105	354 --abricate_mincov : Minimum DNA %coverage. Defaut: 80
kkonganti@105	355
kkonganti@105	356 --abricate_datadir : ABRicate databases folder. Defaut: /hpc/db/
kkonganti@105	357 abricate/1.0.1/db
kkonganti@105	358
kkonganti@105	359 Help options :
kkonganti@105	360
kkonganti@105	361 --help : Display this message.
kkonganti@105	362 ```
kkonganti@105	363
kkonganti@105	364 ### BETA
kkonganti@105	365
kkonganti@105	366 ---
kkonganti@105	367 The development of the modular structure and flow is an ongoing effort and may change depending on assessment of various computational topics and other considerations.

Mercurial > repos > kkonganti > cfsan_centriflaken

annotate 0.4.2/readme/centriflaken_hy.md @ 105:52045ea4679d