cfsan_centriflaken: 0.4.0/readme/centriflaken.md annotate

annotate 0.4.0/readme/centriflaken.md @ 101:ce6d9548fe89

"planemo upload"

author	kkonganti
date	Thu, 04 Aug 2022 10:45:55 -0400
parents
children	17890124001d

rev	line source
kkonganti@101	1 # CPIPES (CFSAN PIPELINES)
kkonganti@101	2
kkonganti@101	3 ## The modular pipeline repository at CFSAN, FDA
kkonganti@101	4
kkonganti@101	5 CPIPES (CFSAN PIPELINES) is a collection of modular pipelines based on NEXTFLOW,
kkonganti@101	6 mostly for bioinformatics data analysis at CFSAN, FDA.
kkonganti@101	7
kkonganti@101	8 ---
kkonganti@101	9
kkonganti@101	10 ### centriflaken
kkonganti@101	11
kkonganti@101	12 ---
kkonganti@101	13 Precision long-read metagenomics sequencing for food safety by detection and assembly of Shiga toxin-producing Escherichia coli.
kkonganti@101	14
kkonganti@101	15 #### Workflow Usage
kkonganti@101	16
kkonganti@101	17 ```bash
kkonganti@101	18 module load cpipes/0.3.0
kkonganti@101	19
kkonganti@101	20 cpipes --pipeline centriflaken [options]
kkonganti@101	21 ```
kkonganti@101	22
kkonganti@101	23 Example: Run the default `centriflaken` pipeline with taxa of interest as E. coli.
kkonganti@101	24
kkonganti@101	25 ```bash
kkonganti@101	26 cd /hpc/scratch/$USER
kkonganti@101	27 mkdir nf-cpipes
kkonganti@101	28 cd nf-cpipes
kkonganti@101	29 cpipes --pipeline centriflaken --input /path/to/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
kkonganti@101	30 ```
kkonganti@101	31
kkonganti@101	32 Example: Run the `centriflaken` pipeline with taxa of interest as Salmonella. In this mode, `SerotypeFinder` tool will be replaced with `SeqSero2` tool.
kkonganti@101	33
kkonganti@101	34 ```bash
kkonganti@101	35 cd /hpc/scratch/$USER
kkonganti@101	36 mkdir nf-cpipes
kkonganti@101	37 cd nf-cpipes
kkonganti@101	38 cpipes --pipeline centriflaken --centrifuge_extract_bug 'Salmonella' --input /path/to/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
kkonganti@101	39 ```
kkonganti@101	40
kkonganti@101	41 #### `centriflaken` Help
kkonganti@101	42
kkonganti@101	43 ```text
kkonganti@101	44 [Kranti.Konganti@login2-slurm ]$ cpipes --pipeline centriflaken --help
kkonganti@101	45 N E X T F L O W ~ version 21.12.1-edge
kkonganti@101	46 Launching `/nfs/software/apps/cpipes/0.2.1/cpipes` [drunk_ptolemy] - revision: 72db279311
kkonganti@101	47 ================================================================================
kkonganti@101	48 (o)
kkonganti@101	49 ___ _ __ _ _ __ ___ ___
kkonganti@101	50 / __\|\| '_ \ \| \|\| '_ \ / _ \/ __\|
kkonganti@101	51 \| (__ \| \|_) \|\| \|\| \|_) \|\| __/\__ \
kkonganti@101	52 \___\|\| .__/ \|_\|\| .__/ \___\|\|___/
kkonganti@101	53 \| \| \| \|
kkonganti@101	54 \|_\| \|_\|
kkonganti@101	55 --------------------------------------------------------------------------------
kkonganti@101	56 A collection of modular pipelines at CFSAN, FDA.
kkonganti@101	57 --------------------------------------------------------------------------------
kkonganti@101	58 Name : CPIPES
kkonganti@101	59 Author : Kranti.Konganti@fda.hhs.gov
kkonganti@101	60 Version : 0.2.1
kkonganti@101	61 Center : CFSAN, FDA.
kkonganti@101	62 ================================================================================
kkonganti@101	63
kkonganti@101	64 Workflow : centriflaken
kkonganti@101	65
kkonganti@101	66 Author : Kranti.Konganti@fda.hhs.gov
kkonganti@101	67
kkonganti@101	68 Version : 0.2.0
kkonganti@101	69
kkonganti@101	70
kkonganti@101	71 Usage : cpipes --pipeline centriflaken [options]
kkonganti@101	72
kkonganti@101	73
kkonganti@101	74 Required :
kkonganti@101	75
kkonganti@101	76 --input : Absolute path to directory containing FASTQ
kkonganti@101	77 files. The directory should contain only
kkonganti@101	78 FASTQ files as all the files within the
kkonganti@101	79 mentioned directory will be read. Ex: --
kkonganti@101	80 input /path/to/fastq_pass
kkonganti@101	81
kkonganti@101	82 --output : Absolute path to directory where all the
kkonganti@101	83 pipeline outputs should be stored. Ex: --
kkonganti@101	84 output /path/to/output
kkonganti@101	85
kkonganti@101	86 Other options :
kkonganti@101	87
kkonganti@101	88 --metadata : Absolute path to metadata CSV file
kkonganti@101	89 containing five mandatory columns: sample,
kkonganti@101	90 fq1,fq2,strandedness,single_end. The fq1
kkonganti@101	91 and fq2 columns contain absolute paths to
kkonganti@101	92 the FASTQ files. This option can be used in
kkonganti@101	93 place of --input option. This is rare. Ex: --
kkonganti@101	94 metadata samplesheet.csv
kkonganti@101	95
kkonganti@101	96 --fq_suffix : The suffix of FASTQ files (Unpaired reads
kkonganti@101	97 or R1 reads or Long reads) if an input
kkonganti@101	98 directory is mentioned via --input option.
kkonganti@101	99 Default: .fastq.gz
kkonganti@101	100
kkonganti@101	101 --fq2_suffix : The suffix of FASTQ files (Paired-end reads
kkonganti@101	102 or R2 reads) if an input directory is
kkonganti@101	103 mentioned via --input option. Default:
kkonganti@101	104 false
kkonganti@101	105
kkonganti@101	106 --fq_filter_by_len : Remove FASTQ reads that are less than this
kkonganti@101	107 many bases. Default: 4000
kkonganti@101	108
kkonganti@101	109 --fq_strandedness : The strandedness of the sequencing run.
kkonganti@101	110 This is mostly needed if your sequencing
kkonganti@101	111 run is RNA-SEQ. For most of the other runs,
kkonganti@101	112 it is probably safe to use unstranded for
kkonganti@101	113 the option. Default: unstranded
kkonganti@101	114
kkonganti@101	115 --fq_single_end : SINGLE-END information will be auto-
kkonganti@101	116 detected but this option forces PAIRED-END
kkonganti@101	117 FASTQ files to be treated as SINGLE-END so
kkonganti@101	118 only read 1 information is included in auto-
kkonganti@101	119 generated samplesheet. Default: false
kkonganti@101	120
kkonganti@101	121 --fq_filename_delim : Delimiter by which the file name is split
kkonganti@101	122 to obtain sample name. Default: _
kkonganti@101	123
kkonganti@101	124 --fq_filename_delim_idx : After splitting FASTQ file name by using
kkonganti@101	125 the --fq_filename_delim option, all
kkonganti@101	126 elements before this index (1-based) will
kkonganti@101	127 be joined to create final sample name.
kkonganti@101	128 Default: 1
kkonganti@101	129
kkonganti@101	130 --kraken2_db : Absolute path to kraken database. Default: /
kkonganti@101	131 hpc/db/kraken2/standard-210914
kkonganti@101	132
kkonganti@101	133 --kraken2_confidence : Confidence score threshold which must be
kkonganti@101	134 between 0 and 1. Default: 0.0
kkonganti@101	135
kkonganti@101	136 --kraken2_quick : Quick operation (use first hit or hits).
kkonganti@101	137 Default: false
kkonganti@101	138
kkonganti@101	139 --kraken2_use_mpa_style : Report output like Kraken 1's kraken-mpa-
kkonganti@101	140 report. Default: false
kkonganti@101	141
kkonganti@101	142 --kraken2_minimum_base_quality : Minimum base quality used in classification
kkonganti@101	143 which is only effective with FASTQ input.
kkonganti@101	144 Default: 0
kkonganti@101	145
kkonganti@101	146 --kraken2_report_zero_counts : Report counts for ALL taxa, even if counts
kkonganti@101	147 are zero. Default: false
kkonganti@101	148
kkonganti@101	149 --kraken2_report_minmizer_data : Report minimizer and distinct minimizer
kkonganti@101	150 count information in addition to normal
kkonganti@101	151 Kraken report. Default: false
kkonganti@101	152
kkonganti@101	153 --kraken2_use_names : Print scientific names instead of just
kkonganti@101	154 taxids. Default: true
kkonganti@101	155
kkonganti@101	156 --kraken2_extract_bug : Extract the reads or contigs beloging to
kkonganti@101	157 this bug. Default: Escherichia coli
kkonganti@101	158
kkonganti@101	159 --centrifuge_x : Absolute path to centrifuge database.
kkonganti@101	160 Default: /hpc/db/centrifuge/2022-04-12/ab
kkonganti@101	161
kkonganti@101	162 --centrifuge_save_unaligned : Save SINGLE-END reads that did not align.
kkonganti@101	163 For PAIRED-END reads, save read pairs that
kkonganti@101	164 did not align concordantly. Default: false
kkonganti@101	165
kkonganti@101	166 --centrifuge_save_aligned : Save SINGLE-END reads that aligned. For
kkonganti@101	167 PAIRED-END reads, save read pairs that
kkonganti@101	168 aligned concordantly. Default: false
kkonganti@101	169
kkonganti@101	170 --centrifuge_out_fmt_sam : Centrifuge output should be in SAM. Default:
kkonganti@101	171 false
kkonganti@101	172
kkonganti@101	173 --centrifuge_extract_bug : Extract this bug from centrifuge results.
kkonganti@101	174 Default: Escherichia coli
kkonganti@101	175
kkonganti@101	176 --centrifuge_ignore_quals : Treat all quality values as 30 on Phred
kkonganti@101	177 scale. Default: false
kkonganti@101	178
kkonganti@101	179 --flye_pacbio_raw : Input FASTQ reads are PacBio regular CLR
kkonganti@101	180 reads (<20% error) Defaut: false
kkonganti@101	181
kkonganti@101	182 --flye_pacbio_corr : Input FASTQ reads are PacBio reads that
kkonganti@101	183 were corrected with other methods (<3%
kkonganti@101	184 error). Default: false
kkonganti@101	185
kkonganti@101	186 --flye_pacbio_hifi : Input FASTQ reads are PacBio HiFi reads (<1%
kkonganti@101	187 error). Default: false
kkonganti@101	188
kkonganti@101	189 --flye_nano_raw : Input FASTQ reads are ONT regular reads,
kkonganti@101	190 pre-Guppy5 (<20% error). Default: true
kkonganti@101	191
kkonganti@101	192 --flye_nano_corr : Input FASTQ reads are ONT reads that were
kkonganti@101	193 corrected with other methods (<3% error).
kkonganti@101	194 Default: false
kkonganti@101	195
kkonganti@101	196 --flye_nano_hq : Input FASTQ reads are ONT high-quality
kkonganti@101	197 reads: Guppy5+ SUP or Q20 (<5% error).
kkonganti@101	198 Default: false
kkonganti@101	199
kkonganti@101	200 --flye_genome_size : Estimated genome size (for example, 5m or 2.
kkonganti@101	201 6g). Default: 5.5m
kkonganti@101	202
kkonganti@101	203 --flye_polish_iter : Number of genome polishing iterations.
kkonganti@101	204 Default: false
kkonganti@101	205
kkonganti@101	206 --flye_meta : Do a metagenome assembly (unenven coverage
kkonganti@101	207 mode). Default: true
kkonganti@101	208
kkonganti@101	209 --flye_min_overlap : Minimum overlap between reads. Default:
kkonganti@101	210 false
kkonganti@101	211
kkonganti@101	212 --flye_scaffold : Enable scaffolding using assembly graph.
kkonganti@101	213 Default: false
kkonganti@101	214
kkonganti@101	215 --serotypefinder_run : Run SerotypeFinder tool. Default: true
kkonganti@101	216
kkonganti@101	217 --serotypefinder_x : Generate extended output files. Default:
kkonganti@101	218 true
kkonganti@101	219
kkonganti@101	220 --serotypefinder_db : Path to SerotypeFinder databases. Default: /
kkonganti@101	221 hpc/db/serotypefinder/2.0.2
kkonganti@101	222
kkonganti@101	223 --serotypefinder_min_threshold : Minimum percent identity (in float)
kkonganti@101	224 required for calling a hit. Default: 0.85
kkonganti@101	225
kkonganti@101	226 --serotypefinder_min_cov : Minumum percent coverage (in float)
kkonganti@101	227 required for calling a hit. Default: 0.80
kkonganti@101	228
kkonganti@101	229 --seqsero2_run : Run SeqSero2 tool. Default: false
kkonganti@101	230
kkonganti@101	231 --seqsero2_t : '1' for interleaved paired-end reads, '2'
kkonganti@101	232 for separated paired-end reads, '3' for
kkonganti@101	233 single reads, '4' for genome assembly, '5'
kkonganti@101	234 for nanopore reads (fasta/fastq). Default:
kkonganti@101	235 4
kkonganti@101	236
kkonganti@101	237 --seqsero2_m : Which workflow to apply, 'a'(raw reads
kkonganti@101	238 allele micro-assembly), 'k'(raw reads and
kkonganti@101	239 genome assembly k-mer). Default: k
kkonganti@101	240
kkonganti@101	241 --seqsero2_c : SeqSero2 will only output serotype
kkonganti@101	242 prediction without the directory containing
kkonganti@101	243 log files. Default: false
kkonganti@101	244
kkonganti@101	245 --seqsero2_s : SeqSero2 will not output header in
kkonganti@101	246 SeqSero_result.tsv. Default: false
kkonganti@101	247
kkonganti@101	248 --mlst_run : Run MLST tool. Default: true
kkonganti@101	249
kkonganti@101	250 --mlst_minid : DNA %identity of full allelle to consider '
kkonganti@101	251 similar' [~]. Default: 95
kkonganti@101	252
kkonganti@101	253 --mlst_mincov : DNA %cov to report partial allele at all [?].
kkonganti@101	254 Default: 10
kkonganti@101	255
kkonganti@101	256 --mlst_minscore : Minumum score out of 100 to match a scheme.
kkonganti@101	257 Default: 50
kkonganti@101	258
kkonganti@101	259 --abricate_run : Run ABRicate tool. Default: true
kkonganti@101	260
kkonganti@101	261 --abricate_minid : Minimum DNA %identity. Defaut: 90
kkonganti@101	262
kkonganti@101	263 --abricate_mincov : Minimum DNA %coverage. Defaut: 80
kkonganti@101	264
kkonganti@101	265 --abricate_datadir : ABRicate databases folder. Defaut: /hpc/db/
kkonganti@101	266 abricate/1.0.1/db
kkonganti@101	267
kkonganti@101	268 Help options :
kkonganti@101	269
kkonganti@101	270 --help : Display this message.
kkonganti@101	271 ```
kkonganti@101	272
kkonganti@101	273 ### BETA
kkonganti@101	274
kkonganti@101	275 ---
kkonganti@101	276 The development of the modular structure and flow is an ongoing effort and may change depending on assessment of various computational topics and other considerations.

Mercurial > repos > kkonganti > cfsan_centriflaken

annotate 0.4.0/readme/centriflaken.md @ 101:ce6d9548fe89