Mercurial > repos > kkonganti > cfsan_centriflaken

diff 0.3.0/readme/centriflaken.md @ 92:295c2597a475
"planemo upload"
author: kkonganti
date: Tue, 19 Jul 2022 10:07:24 -0400
children: 8d7f482c64de
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/0.3.0/readme/centriflaken.md	Tue Jul 19 10:07:24 2022 -0400
@@ -0,0 +1,276 @@
+# CPIPES (CFSAN PIPELINES)
+
+## The modular pipeline repository at CFSAN, FDA
+
+**CPIPES** (CFSAN PIPELINES) is a collection of modular pipelines based on **NEXTFLOW**,
+mostly for bioinformatics data analysis at **CFSAN, FDA.**
+
+---
+
+### **centriflaken**
+
+---
+Precision long-read metagenomics sequencing for food safety by detection and assembly of Shiga toxin-producing Escherichia coli.
+
+#### Workflow Usage
+
+```bash
+module load cpipes/0.2.1
+
+cpipes --pipeline centriflaken [options]
+```
+
+Example: Run the default `centriflaken` pipeline with taxa of interest as *E. coli*.
+
+```bash
+cd /hpc/scratch/$USER
+mkdir nf-cpipes
+cd nf-cpipes
+cpipes --pipeline centriflaken --input /path/to/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
+```
+
+Example: Run the `centriflaken` pipeline with taxa of interest as *Salmonella*. In this mode, `SerotypeFinder` tool will be replaced with `SeqSero2` tool.
+
+```bash
+cd /hpc/scratch/$USER
+mkdir nf-cpipes
+cd nf-cpipes
+cpipes --pipeline centriflaken --centrifuge_extract_bug 'Salmonella' --input /path/to/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
+```
+
+#### `centriflaken` Help
+
+```text
+[Kranti.Konganti@login2-slurm ]$ cpipes --pipeline centriflaken --help
+N E X T F L O W  ~  version 21.12.1-edge
+Launching `/nfs/software/apps/cpipes/0.2.1/cpipes` [drunk_ptolemy] - revision: 72db279311
+================================================================================
+             (o)                  
+  ___  _ __   _  _ __    ___  ___ 
+ / __|| '_ \ | || '_ \  / _ \/ __|
+| (__ | |_) || || |_) ||  __/\__ \
+ \___|| .__/ |_|| .__/  \___||___/
+      | |       | |               
+      |_|       |_|
+--------------------------------------------------------------------------------
+A collection of modular pipelines at CFSAN, FDA.
+--------------------------------------------------------------------------------
+Name                            : CPIPES
+Author                          : Kranti.Konganti@fda.hhs.gov
+Version                         : 0.2.1
+Center                          : CFSAN, FDA.
+================================================================================
+
+Workflow                        : centriflaken
+
+Author                          : Kranti.Konganti@fda.hhs.gov
+
+Version                         : 0.2.0
+
+
+Usage                           : cpipes --pipeline centriflaken [options]
+
+
+Required                        : 
+
+--input                         : Absolute path to directory containing FASTQ 
+                                  files. The directory should contain only 
+                                  FASTQ files as all the files within the 
+                                  mentioned directory will be read. Ex: --
+                                  input /path/to/fastq_pass
+
+--output                        : Absolute path to directory where all the 
+                                  pipeline outputs should be stored. Ex: --
+                                  output /path/to/output
+
+Other options                   : 
+
+--metadata                      : Absolute path to metadata CSV file 
+                                  containing five mandatory columns: sample,
+                                  fq1,fq2,strandedness,single_end. The fq1 
+                                  and fq2 columns contain absolute paths to 
+                                  the FASTQ files. This option can be used in 
+                                  place of --input option. This is rare. Ex: --
+                                  metadata samplesheet.csv
+
+--fq_suffix                     : The suffix of FASTQ files (Unpaired reads 
+                                  or R1 reads or Long reads) if an input 
+                                  directory is mentioned via --input option. 
+                                  Default: .fastq.gz
+
+--fq2_suffix                    : The suffix of FASTQ files (Paired-end reads 
+                                  or R2 reads) if an input directory is 
+                                  mentioned via --input option. Default: 
+                                  false
+
+--fq_filter_by_len              : Remove FASTQ reads that are less than this 
+                                  many bases. Default: 4000
+
+--fq_strandedness               : The strandedness of the sequencing run. 
+                                  This is mostly needed if your sequencing 
+                                  run is RNA-SEQ. For most of the other runs, 
+                                  it is probably safe to use unstranded for 
+                                  the option. Default: unstranded
+
+--fq_single_end                 : SINGLE-END information will be auto-
+                                  detected but this option forces PAIRED-END 
+                                  FASTQ files to be treated as SINGLE-END so 
+                                  only read 1 information is included in auto-
+                                  generated samplesheet. Default: false
+
+--fq_filename_delim             : Delimiter by which the file name is split 
+                                  to obtain sample name. Default: _
+
+--fq_filename_delim_idx         : After splitting FASTQ file name by using 
+                                  the --fq_filename_delim option, all 
+                                  elements before this index (1-based) will 
+                                  be joined to create final sample name. 
+                                  Default: 1
+
+--kraken2_db                    : Absolute path to kraken database. Default: /
+                                  hpc/db/kraken2/standard-210914
+
+--kraken2_confidence            : Confidence score threshold which must be 
+                                  between 0 and 1. Default: 0.0
+
+--kraken2_quick                 : Quick operation (use first hit or hits). 
+                                  Default: false
+
+--kraken2_use_mpa_style         : Report output like Kraken 1's kraken-mpa-
+                                  report. Default: false
+
+--kraken2_minimum_base_quality  : Minimum base quality used in classification  
+                                  which is only effective with FASTQ input. 
+                                  Default: 0
+
+--kraken2_report_zero_counts    : Report counts for ALL taxa, even if counts 
+                                  are zero. Default: false
+
+--kraken2_report_minmizer_data  : Report minimizer and distinct minimizer 
+                                  count information in addition to normal 
+                                  Kraken report. Default: false
+
+--kraken2_use_names             : Print scientific names instead of just 
+                                  taxids. Default: true
+
+--kraken2_extract_bug           : Extract the reads or contigs beloging to 
+                                  this bug. Default: Escherichia coli
+
+--centrifuge_x                  : Absolute path to centrifuge database. 
+                                  Default: /hpc/db/centrifuge/2022-04-12/ab
+
+--centrifuge_save_unaligned     : Save SINGLE-END reads that did not align. 
+                                  For PAIRED-END reads, save read pairs that 
+                                  did not align concordantly. Default: false
+
+--centrifuge_save_aligned       : Save SINGLE-END reads that aligned. For 
+                                  PAIRED-END reads, save read pairs that 
+                                  aligned concordantly. Default: false
+
+--centrifuge_out_fmt_sam        : Centrifuge output should be in SAM. Default: 
+                                  false
+
+--centrifuge_extract_bug        : Extract this bug from centrifuge results. 
+                                  Default: Escherichia coli
+
+--centrifuge_ignore_quals       : Treat all quality values as 30 on Phred 
+                                  scale. Default: false
+
+--flye_pacbio_raw               : Input FASTQ reads are PacBio regular CLR 
+                                  reads (<20% error) Defaut: false
+
+--flye_pacbio_corr              : Input FASTQ reads are PacBio reads that 
+                                  were corrected with other methods (<3% 
+                                  error). Default: false
+
+--flye_pacbio_hifi              : Input FASTQ reads are PacBio HiFi reads (<1% 
+                                  error). Default: false
+
+--flye_nano_raw                 : Input FASTQ reads are ONT regular reads, 
+                                  pre-Guppy5 (<20% error). Default: true
+
+--flye_nano_corr                : Input FASTQ reads are ONT reads that were 
+                                  corrected with other methods (<3% error). 
+                                  Default: false
+
+--flye_nano_hq                  : Input FASTQ reads are ONT high-quality 
+                                  reads: Guppy5+ SUP or Q20 (<5% error). 
+                                  Default: false
+
+--flye_genome_size              : Estimated genome size (for example, 5m or 2.
+                                  6g). Default: 5.5m
+
+--flye_polish_iter              : Number of genome polishing iterations. 
+                                  Default: false
+
+--flye_meta                     : Do a metagenome assembly (unenven coverage 
+                                  mode). Default: true
+
+--flye_min_overlap              : Minimum overlap between reads. Default: 
+                                  false
+
+--flye_scaffold                 : Enable scaffolding using assembly graph. 
+                                  Default: false
+
+--serotypefinder_run            : Run SerotypeFinder tool. Default: true
+
+--serotypefinder_x              : Generate extended output files. Default: 
+                                  true
+
+--serotypefinder_db             : Path to SerotypeFinder databases. Default: /
+                                  hpc/db/serotypefinder/2.0.2
+
+--serotypefinder_min_threshold  : Minimum percent identity (in float) 
+                                  required for calling a hit. Default: 0.85
+
+--serotypefinder_min_cov        : Minumum percent coverage (in float) 
+                                  required for calling a hit. Default: 0.80
+
+--seqsero2_run                  : Run SeqSero2 tool. Default: false
+
+--seqsero2_t                    : '1' for interleaved paired-end reads, '2' 
+                                  for separated paired-end reads, '3' for 
+                                  single reads, '4' for genome assembly, '5' 
+                                  for nanopore reads (fasta/fastq). Default: 
+                                  4
+
+--seqsero2_m                    : Which workflow to apply, 'a'(raw reads 
+                                  allele micro-assembly), 'k'(raw reads and 
+                                  genome assembly k-mer). Default: k
+
+--seqsero2_c                    : SeqSero2 will only output serotype 
+                                  prediction without the directory containing 
+                                  log files. Default: false
+
+--seqsero2_s                    : SeqSero2 will not output header in 
+                                  SeqSero_result.tsv. Default: false
+
+--mlst_run                      : Run MLST tool. Default: true
+
+--mlst_minid                    : DNA %identity of full allelle to consider '
+                                  similar' [~]. Default: 95
+
+--mlst_mincov                   : DNA %cov to report partial allele at all [?].
+                                  Default: 10
+
+--mlst_minscore                 : Minumum score out of 100 to match a scheme.
+                                  Default: 50
+
+--abricate_run                  : Run ABRicate tool. Default: true
+
+--abricate_minid                : Minimum DNA %identity. Defaut: 90
+
+--abricate_mincov               : Minimum DNA %coverage. Defaut: 80
+
+--abricate_datadir              : ABRicate databases folder. Defaut: /hpc/db/
+                                  abricate/1.0.1/db
+
+Help options                    : 
+
+--help                          : Display this message.
+```
+
+### **BETA**
+
+---
+The development of the modular structure and flow is an ongoing effort and may change depending on assessment of various computational topics and other considerations.
author	kkonganti
date	Tue, 19 Jul 2022 10:07:24 -0400
parents
children	8d7f482c64de