diff 0.3.0/readme/centriflaken.md @ 92:295c2597a475

"planemo upload"
author kkonganti
date Tue, 19 Jul 2022 10:07:24 -0400
parents
children 8d7f482c64de
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/0.3.0/readme/centriflaken.md	Tue Jul 19 10:07:24 2022 -0400
@@ -0,0 +1,276 @@
+# CPIPES (CFSAN PIPELINES)
+
+## The modular pipeline repository at CFSAN, FDA
+
+**CPIPES** (CFSAN PIPELINES) is a collection of modular pipelines based on **NEXTFLOW**,
+mostly for bioinformatics data analysis at **CFSAN, FDA.**
+
+---
+
+### **centriflaken**
+
+---
+Precision long-read metagenomics sequencing for food safety by detection and assembly of Shiga toxin-producing Escherichia coli.
+
+#### Workflow Usage
+
+```bash
+module load cpipes/0.2.1
+
+cpipes --pipeline centriflaken [options]
+```
+
+Example: Run the default `centriflaken` pipeline with taxa of interest as *E. coli*.
+
+```bash
+cd /hpc/scratch/$USER
+mkdir nf-cpipes
+cd nf-cpipes
+cpipes --pipeline centriflaken --input /path/to/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
+```
+
+Example: Run the `centriflaken` pipeline with taxa of interest as *Salmonella*. In this mode, `SerotypeFinder` tool will be replaced with `SeqSero2` tool.
+
+```bash
+cd /hpc/scratch/$USER
+mkdir nf-cpipes
+cd nf-cpipes
+cpipes --pipeline centriflaken --centrifuge_extract_bug 'Salmonella' --input /path/to/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
+```
+
+#### `centriflaken` Help
+
+```text
+[Kranti.Konganti@login2-slurm ]$ cpipes --pipeline centriflaken --help
+N E X T F L O W  ~  version 21.12.1-edge
+Launching `/nfs/software/apps/cpipes/0.2.1/cpipes` [drunk_ptolemy] - revision: 72db279311
+================================================================================
+             (o)                  
+  ___  _ __   _  _ __    ___  ___ 
+ / __|| '_ \ | || '_ \  / _ \/ __|
+| (__ | |_) || || |_) ||  __/\__ \
+ \___|| .__/ |_|| .__/  \___||___/
+      | |       | |               
+      |_|       |_|
+--------------------------------------------------------------------------------
+A collection of modular pipelines at CFSAN, FDA.
+--------------------------------------------------------------------------------
+Name                            : CPIPES
+Author                          : Kranti.Konganti@fda.hhs.gov
+Version                         : 0.2.1
+Center                          : CFSAN, FDA.
+================================================================================
+
+Workflow                        : centriflaken
+
+Author                          : Kranti.Konganti@fda.hhs.gov
+
+Version                         : 0.2.0
+
+
+Usage                           : cpipes --pipeline centriflaken [options]
+
+
+Required                        : 
+
+--input                         : Absolute path to directory containing FASTQ 
+                                  files. The directory should contain only 
+                                  FASTQ files as all the files within the 
+                                  mentioned directory will be read. Ex: --
+                                  input /path/to/fastq_pass
+
+--output                        : Absolute path to directory where all the 
+                                  pipeline outputs should be stored. Ex: --
+                                  output /path/to/output
+
+Other options                   : 
+
+--metadata                      : Absolute path to metadata CSV file 
+                                  containing five mandatory columns: sample,
+                                  fq1,fq2,strandedness,single_end. The fq1 
+                                  and fq2 columns contain absolute paths to 
+                                  the FASTQ files. This option can be used in 
+                                  place of --input option. This is rare. Ex: --
+                                  metadata samplesheet.csv
+
+--fq_suffix                     : The suffix of FASTQ files (Unpaired reads 
+                                  or R1 reads or Long reads) if an input 
+                                  directory is mentioned via --input option. 
+                                  Default: .fastq.gz
+
+--fq2_suffix                    : The suffix of FASTQ files (Paired-end reads 
+                                  or R2 reads) if an input directory is 
+                                  mentioned via --input option. Default: 
+                                  false
+
+--fq_filter_by_len              : Remove FASTQ reads that are less than this 
+                                  many bases. Default: 4000
+
+--fq_strandedness               : The strandedness of the sequencing run. 
+                                  This is mostly needed if your sequencing 
+                                  run is RNA-SEQ. For most of the other runs, 
+                                  it is probably safe to use unstranded for 
+                                  the option. Default: unstranded
+
+--fq_single_end                 : SINGLE-END information will be auto-
+                                  detected but this option forces PAIRED-END 
+                                  FASTQ files to be treated as SINGLE-END so 
+                                  only read 1 information is included in auto-
+                                  generated samplesheet. Default: false
+
+--fq_filename_delim             : Delimiter by which the file name is split 
+                                  to obtain sample name. Default: _
+
+--fq_filename_delim_idx         : After splitting FASTQ file name by using 
+                                  the --fq_filename_delim option, all 
+                                  elements before this index (1-based) will 
+                                  be joined to create final sample name. 
+                                  Default: 1
+
+--kraken2_db                    : Absolute path to kraken database. Default: /
+                                  hpc/db/kraken2/standard-210914
+
+--kraken2_confidence            : Confidence score threshold which must be 
+                                  between 0 and 1. Default: 0.0
+
+--kraken2_quick                 : Quick operation (use first hit or hits). 
+                                  Default: false
+
+--kraken2_use_mpa_style         : Report output like Kraken 1's kraken-mpa-
+                                  report. Default: false
+
+--kraken2_minimum_base_quality  : Minimum base quality used in classification  
+                                  which is only effective with FASTQ input. 
+                                  Default: 0
+
+--kraken2_report_zero_counts    : Report counts for ALL taxa, even if counts 
+                                  are zero. Default: false
+
+--kraken2_report_minmizer_data  : Report minimizer and distinct minimizer 
+                                  count information in addition to normal 
+                                  Kraken report. Default: false
+
+--kraken2_use_names             : Print scientific names instead of just 
+                                  taxids. Default: true
+
+--kraken2_extract_bug           : Extract the reads or contigs beloging to 
+                                  this bug. Default: Escherichia coli
+
+--centrifuge_x                  : Absolute path to centrifuge database. 
+                                  Default: /hpc/db/centrifuge/2022-04-12/ab
+
+--centrifuge_save_unaligned     : Save SINGLE-END reads that did not align. 
+                                  For PAIRED-END reads, save read pairs that 
+                                  did not align concordantly. Default: false
+
+--centrifuge_save_aligned       : Save SINGLE-END reads that aligned. For 
+                                  PAIRED-END reads, save read pairs that 
+                                  aligned concordantly. Default: false
+
+--centrifuge_out_fmt_sam        : Centrifuge output should be in SAM. Default: 
+                                  false
+
+--centrifuge_extract_bug        : Extract this bug from centrifuge results. 
+                                  Default: Escherichia coli
+
+--centrifuge_ignore_quals       : Treat all quality values as 30 on Phred 
+                                  scale. Default: false
+
+--flye_pacbio_raw               : Input FASTQ reads are PacBio regular CLR 
+                                  reads (<20% error) Defaut: false
+
+--flye_pacbio_corr              : Input FASTQ reads are PacBio reads that 
+                                  were corrected with other methods (<3% 
+                                  error). Default: false
+
+--flye_pacbio_hifi              : Input FASTQ reads are PacBio HiFi reads (<1% 
+                                  error). Default: false
+
+--flye_nano_raw                 : Input FASTQ reads are ONT regular reads, 
+                                  pre-Guppy5 (<20% error). Default: true
+
+--flye_nano_corr                : Input FASTQ reads are ONT reads that were 
+                                  corrected with other methods (<3% error). 
+                                  Default: false
+
+--flye_nano_hq                  : Input FASTQ reads are ONT high-quality 
+                                  reads: Guppy5+ SUP or Q20 (<5% error). 
+                                  Default: false
+
+--flye_genome_size              : Estimated genome size (for example, 5m or 2.
+                                  6g). Default: 5.5m
+
+--flye_polish_iter              : Number of genome polishing iterations. 
+                                  Default: false
+
+--flye_meta                     : Do a metagenome assembly (unenven coverage 
+                                  mode). Default: true
+
+--flye_min_overlap              : Minimum overlap between reads. Default: 
+                                  false
+
+--flye_scaffold                 : Enable scaffolding using assembly graph. 
+                                  Default: false
+
+--serotypefinder_run            : Run SerotypeFinder tool. Default: true
+
+--serotypefinder_x              : Generate extended output files. Default: 
+                                  true
+
+--serotypefinder_db             : Path to SerotypeFinder databases. Default: /
+                                  hpc/db/serotypefinder/2.0.2
+
+--serotypefinder_min_threshold  : Minimum percent identity (in float) 
+                                  required for calling a hit. Default: 0.85
+
+--serotypefinder_min_cov        : Minumum percent coverage (in float) 
+                                  required for calling a hit. Default: 0.80
+
+--seqsero2_run                  : Run SeqSero2 tool. Default: false
+
+--seqsero2_t                    : '1' for interleaved paired-end reads, '2' 
+                                  for separated paired-end reads, '3' for 
+                                  single reads, '4' for genome assembly, '5' 
+                                  for nanopore reads (fasta/fastq). Default: 
+                                  4
+
+--seqsero2_m                    : Which workflow to apply, 'a'(raw reads 
+                                  allele micro-assembly), 'k'(raw reads and 
+                                  genome assembly k-mer). Default: k
+
+--seqsero2_c                    : SeqSero2 will only output serotype 
+                                  prediction without the directory containing 
+                                  log files. Default: false
+
+--seqsero2_s                    : SeqSero2 will not output header in 
+                                  SeqSero_result.tsv. Default: false
+
+--mlst_run                      : Run MLST tool. Default: true
+
+--mlst_minid                    : DNA %identity of full allelle to consider '
+                                  similar' [~]. Default: 95
+
+--mlst_mincov                   : DNA %cov to report partial allele at all [?].
+                                  Default: 10
+
+--mlst_minscore                 : Minumum score out of 100 to match a scheme.
+                                  Default: 50
+
+--abricate_run                  : Run ABRicate tool. Default: true
+
+--abricate_minid                : Minimum DNA %identity. Defaut: 90
+
+--abricate_mincov               : Minimum DNA %coverage. Defaut: 80
+
+--abricate_datadir              : ABRicate databases folder. Defaut: /hpc/db/
+                                  abricate/1.0.1/db
+
+Help options                    : 
+
+--help                          : Display this message.
+```
+
+### **BETA**
+
+---
+The development of the modular structure and flow is an ongoing effort and may change depending on assessment of various computational topics and other considerations.