Mercurial > repos > kkonganti > cfsan_centriflaken
diff 0.4.2/readme/centriflaken.md @ 130:04f6ac8ca13c
planemo upload
author | kkonganti |
---|---|
date | Wed, 03 Jul 2024 15:16:39 -0400 |
parents | 52045ea4679d |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.4.2/readme/centriflaken.md Wed Jul 03 15:16:39 2024 -0400 @@ -0,0 +1,276 @@ +# CPIPES (CFSAN PIPELINES) + +## The modular pipeline repository at CFSAN, FDA + +**CPIPES** (CFSAN PIPELINES) is a collection of modular pipelines based on **NEXTFLOW**, +mostly for bioinformatics data analysis at **CFSAN, FDA.** + +--- + +### **centriflaken** + +--- +Precision long-read metagenomics sequencing for food safety by detection and assembly of Shiga toxin-producing Escherichia coli. + +#### Workflow Usage + +```bash +module load cpipes/0.4.0 + +cpipes --pipeline centriflaken [options] +``` + +Example: Run the default `centriflaken` pipeline with taxa of interest as *E. coli*. + +```bash +cd /hpc/scratch/$USER +mkdir nf-cpipes +cd nf-cpipes +cpipes --pipeline centriflaken --input /path/to/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov' +``` + +Example: Run the `centriflaken` pipeline with taxa of interest as *Salmonella*. In this mode, `SerotypeFinder` tool will be replaced with `SeqSero2` tool. + +```bash +cd /hpc/scratch/$USER +mkdir nf-cpipes +cd nf-cpipes +cpipes --pipeline centriflaken --centrifuge_extract_bug 'Salmonella' --input /path/to/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov' +``` + +#### `centriflaken` Help + +```text +[Kranti.Konganti@login2-slurm ]$ cpipes --pipeline centriflaken --help +N E X T F L O W ~ version 21.12.1-edge +Launching `/nfs/software/apps/cpipes/0.4.0/cpipes` [crazy_euler] - revision: 72db279311 +================================================================================ + (o) + ___ _ __ _ _ __ ___ ___ + / __|| '_ \ | || '_ \ / _ \/ __| +| (__ | |_) || || |_) || __/\__ \ + \___|| .__/ |_|| .__/ \___||___/ + | | | | + |_| |_| +-------------------------------------------------------------------------------- +A collection of modular pipelines at CFSAN, FDA. +-------------------------------------------------------------------------------- +Name : CPIPES +Author : Kranti.Konganti@fda.hhs.gov +Version : 0.4.0 +Center : CFSAN, FDA. +================================================================================ + +Workflow : centriflaken + +Author : Kranti.Konganti@fda.hhs.gov + +Version : 0.2.1 + + +Usage : cpipes --pipeline centriflaken [options] + + +Required : + +--input : Absolute path to directory containing FASTQ + files. The directory should contain only + FASTQ files as all the files within the + mentioned directory will be read. Ex: -- + input /path/to/fastq_pass + +--output : Absolute path to directory where all the + pipeline outputs should be stored. Ex: -- + output /path/to/output + +Other options : + +--metadata : Absolute path to metadata CSV file + containing five mandatory columns: sample, + fq1,fq2,strandedness,single_end. The fq1 + and fq2 columns contain absolute paths to + the FASTQ files. This option can be used in + place of --input option. This is rare. Ex: -- + metadata samplesheet.csv + +--fq_suffix : The suffix of FASTQ files (Unpaired reads + or R1 reads or Long reads) if an input + directory is mentioned via --input option. + Default: .fastq.gz + +--fq2_suffix : The suffix of FASTQ files (Paired-end reads + or R2 reads) if an input directory is + mentioned via --input option. Default: + false + +--fq_filter_by_len : Remove FASTQ reads that are less than this + many bases. Default: 4000 + +--fq_strandedness : The strandedness of the sequencing run. + This is mostly needed if your sequencing + run is RNA-SEQ. For most of the other runs, + it is probably safe to use unstranded for + the option. Default: unstranded + +--fq_single_end : SINGLE-END information will be auto- + detected but this option forces PAIRED-END + FASTQ files to be treated as SINGLE-END so + only read 1 information is included in auto- + generated samplesheet. Default: false + +--fq_filename_delim : Delimiter by which the file name is split + to obtain sample name. Default: _ + +--fq_filename_delim_idx : After splitting FASTQ file name by using + the --fq_filename_delim option, all + elements before this index (1-based) will + be joined to create final sample name. + Default: 1 + +--kraken2_db : Absolute path to kraken database. Default: / + hpc/db/kraken2/standard-210914 + +--kraken2_confidence : Confidence score threshold which must be + between 0 and 1. Default: 0.0 + +--kraken2_quick : Quick operation (use first hit or hits). + Default: false + +--kraken2_use_mpa_style : Report output like Kraken 1's kraken-mpa- + report. Default: false + +--kraken2_minimum_base_quality : Minimum base quality used in classification + which is only effective with FASTQ input. + Default: 0 + +--kraken2_report_zero_counts : Report counts for ALL taxa, even if counts + are zero. Default: false + +--kraken2_report_minmizer_data : Report minimizer and distinct minimizer + count information in addition to normal + Kraken report. Default: false + +--kraken2_use_names : Print scientific names instead of just + taxids. Default: true + +--kraken2_extract_bug : Extract the reads or contigs beloging to + this bug. Default: Escherichia coli + +--centrifuge_x : Absolute path to centrifuge database. + Default: /hpc/db/centrifuge/2022-04-12/ab + +--centrifuge_save_unaligned : Save SINGLE-END reads that did not align. + For PAIRED-END reads, save read pairs that + did not align concordantly. Default: false + +--centrifuge_save_aligned : Save SINGLE-END reads that aligned. For + PAIRED-END reads, save read pairs that + aligned concordantly. Default: false + +--centrifuge_out_fmt_sam : Centrifuge output should be in SAM. Default: + false + +--centrifuge_extract_bug : Extract this bug from centrifuge results. + Default: Escherichia coli + +--centrifuge_ignore_quals : Treat all quality values as 30 on Phred + scale. Default: false + +--flye_pacbio_raw : Input FASTQ reads are PacBio regular CLR + reads (<20% error) Defaut: false + +--flye_pacbio_corr : Input FASTQ reads are PacBio reads that + were corrected with other methods (<3% + error). Default: false + +--flye_pacbio_hifi : Input FASTQ reads are PacBio HiFi reads (<1% + error). Default: false + +--flye_nano_raw : Input FASTQ reads are ONT regular reads, + pre-Guppy5 (<20% error). Default: true + +--flye_nano_corr : Input FASTQ reads are ONT reads that were + corrected with other methods (<3% error). + Default: false + +--flye_nano_hq : Input FASTQ reads are ONT high-quality + reads: Guppy5+ SUP or Q20 (<5% error). + Default: false + +--flye_genome_size : Estimated genome size (for example, 5m or 2. + 6g). Default: 5.5m + +--flye_polish_iter : Number of genome polishing iterations. + Default: false + +--flye_meta : Do a metagenome assembly (unenven coverage + mode). Default: true + +--flye_min_overlap : Minimum overlap between reads. Default: + false + +--flye_scaffold : Enable scaffolding using assembly graph. + Default: false + +--serotypefinder_run : Run SerotypeFinder tool. Default: true + +--serotypefinder_x : Generate extended output files. Default: + true + +--serotypefinder_db : Path to SerotypeFinder databases. Default: / + hpc/db/serotypefinder/2.0.2 + +--serotypefinder_min_threshold : Minimum percent identity (in float) + required for calling a hit. Default: 0.85 + +--serotypefinder_min_cov : Minumum percent coverage (in float) + required for calling a hit. Default: 0.80 + +--seqsero2_run : Run SeqSero2 tool. Default: false + +--seqsero2_t : '1' for interleaved paired-end reads, '2' + for separated paired-end reads, '3' for + single reads, '4' for genome assembly, '5' + for nanopore reads (fasta/fastq). Default: + 4 + +--seqsero2_m : Which workflow to apply, 'a'(raw reads + allele micro-assembly), 'k'(raw reads and + genome assembly k-mer). Default: k + +--seqsero2_c : SeqSero2 will only output serotype + prediction without the directory containing + log files. Default: false + +--seqsero2_s : SeqSero2 will not output header in + SeqSero_result.tsv. Default: false + +--mlst_run : Run MLST tool. Default: true + +--mlst_minid : DNA %identity of full allelle to consider ' + similar' [~]. Default: 95 + +--mlst_mincov : DNA %cov to report partial allele at all [?]. + Default: 10 + +--mlst_minscore : Minumum score out of 100 to match a scheme. + Default: 50 + +--abricate_run : Run ABRicate tool. Default: true + +--abricate_minid : Minimum DNA %identity. Defaut: 90 + +--abricate_mincov : Minimum DNA %coverage. Defaut: 80 + +--abricate_datadir : ABRicate databases folder. Defaut: /hpc/db/ + abricate/1.0.1/db + +Help options : + +--help : Display this message. +``` + +### **BETA** + +--- +The development of the modular structure and flow is an ongoing effort and may change depending on assessment of various computational topics and other considerations.