view 0.4.2/readme/centriflaken_hy.md @ 105:52045ea4679d

"planemo upload"
author kkonganti
date Thu, 27 Jun 2024 14:17:26 -0400
parents
children
line wrap: on
line source
# CPIPES (CFSAN PIPELINES)

## The modular pipeline repository at CFSAN, FDA

**CPIPES** (CFSAN PIPELINES) is a collection of modular pipelines based on **NEXTFLOW**,
mostly for bioinformatics data analysis at **CFSAN, FDA.**

---

### **centriflaken_hy**

---
`centriflaken_hy` is a variant of the original `centriflaken` pipeline but for Illumina short reads either single-end or paired-end.

#### Workflow Usage

```bash
module load cpipes/0.4.0

cpipes --pipeline centriflaken_hy [options]
```

Example: Run the default `centriflaken_hy` pipeline with taxa of interest as *E. coli*.

```bash
cd /hpc/scratch/$USER
mkdir nf-cpipes
cd nf-cpipes
cpipes --pipeline centriflaken_hy --input /path/to/illumina/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
```

Example: Run the `centriflaken_hy` pipeline with taxa of interest as *Salmonella*. In this mode, `SerotypeFinder` tool will be replaced with `SeqSero2` tool.

```bash
cd /hpc/scratch/$USER
mkdir nf-cpipes
cd nf-cpipes
cpipes --pipeline centriflaken_hy --centrifuge_extract_bug 'Salmonella' --input /path/to/illumina/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
```

#### `centriflaken_hy` Help

```text
[Kranti.Konganti@login2-slurm ]$ cpipes --pipeline centriflaken_hy --help
N E X T F L O W  ~  version 21.12.1-edge
Launching `/home/Kranti.Konganti/apps/cpipes/cpipes` [soggy_curie] - revision: 72db279311
================================================================================
             (o)                  
  ___  _ __   _  _ __    ___  ___ 
 / __|| '_ \ | || '_ \  / _ \/ __|
| (__ | |_) || || |_) ||  __/\__ \
 \___|| .__/ |_|| .__/  \___||___/
      | |       | |               
      |_|       |_|
--------------------------------------------------------------------------------
A collection of modular pipelines at CFSAN, FDA.
--------------------------------------------------------------------------------
Name                            : CPIPES
Author                          : Kranti.Konganti@fda.hhs.gov
Version                         : 0.4.0
Center                          : CFSAN, FDA.
================================================================================

Workflow                        : centriflaken_hy

Author                          : Kranti.Konganti@fda.hhs.gov

Version                         : 0.4.0


Usage                           : cpipes --pipeline centriflaken_hy [options]


Required                        : 

--input                         : Absolute path to directory containing FASTQ 
                                  files. The directory should contain only 
                                  FASTQ files as all the files within the 
                                  mentioned directory will be read. Ex: --
                                  input /path/to/fastq_pass

--output                        : Absolute path to directory where all the 
                                  pipeline outputs should be stored. Ex: --
                                  output /path/to/output

Other options                   : 

--metadata                      : Absolute path to metadata CSV file 
                                  containing five mandatory columns: sample,
                                  fq1,fq2,strandedness,single_end. The fq1 
                                  and fq2 columns contain absolute paths to 
                                  the FASTQ files. This option can be used in 
                                  place of --input option. This is rare. Ex: --
                                  metadata samplesheet.csv

--fq_suffix                     : The suffix of FASTQ files (Unpaired reads 
                                  or R1 reads or Long reads) if an input 
                                  directory is mentioned via --input option. 
                                  Default: _R1_001.fastq.gz

--fq2_suffix                    : The suffix of FASTQ files (Paired-end reads 
                                  or R2 reads) if an input directory is 
                                  mentioned via --input option. Default: 
                                  _R2_001.fastq.gz

--fq_filter_by_len              : Remove FASTQ reads that are less than this 
                                  many bases. Default: 75

--fq_strandedness               : The strandedness of the sequencing run. 
                                  This is mostly needed if your sequencing 
                                  run is RNA-SEQ. For most of the other runs, 
                                  it is probably safe to use unstranded for 
                                  the option. Default: unstranded

--fq_single_end                 : SINGLE-END information will be auto-
                                  detected but this option forces PAIRED-END 
                                  FASTQ files to be treated as SINGLE-END so 
                                  only read 1 information is included in auto-
                                  generated samplesheet. Default: false

--fq_filename_delim             : Delimiter by which the file name is split 
                                  to obtain sample name. Default: _

--fq_filename_delim_idx         : After splitting FASTQ file name by using 
                                  the --fq_filename_delim option, all 
                                  elements before this index (1-based) will 
                                  be joined to create final sample name. 
                                  Default: 1

--seqkit_rmdup_run              : Remove duplicate sequences using seqkit 
                                  rmdup. Default: false

--seqkit_rmdup_n                : Match and remove duplicate sequences by 
                                  full name instead of just ID. Defaut: false

--seqkit_rmdup_s                : Match and remove duplicate sequences by 
                                  sequence content. Defaut: true

--seqkit_rmdup_d                : Save the duplicated sequences to a file. 
                                  Defaut: false

--seqkit_rmdup_D                : Save the number and list of duplicated 
                                  sequences to a file. Defaut: false

--seqkit_rmdup_i                : Ignore case while using seqkit rmdup. 
                                  Defaut: false

--seqkit_rmdup_P                : Only consider positive strand (i.e. 5') 
                                  when comparing by sequence content. Defaut: 
                                  false

--kraken2_db                    : Absolute path to kraken database. Default: /
                                  hpc/db/kraken2/standard-210914

--kraken2_confidence            : Confidence score threshold which must be 
                                  between 0 and 1. Default: 0.0

--kraken2_quick                 : Quick operation (use first hit or hits). 
                                  Default: false

--kraken2_use_mpa_style         : Report output like Kraken 1's kraken-mpa-
                                  report. Default: false

--kraken2_minimum_base_quality  : Minimum base quality used in classification  
                                  which is only effective with FASTQ input. 
                                  Default: 0

--kraken2_report_zero_counts    : Report counts for ALL taxa, even if counts 
                                  are zero. Default: false

--kraken2_report_minmizer_data  : Report minimizer and distinct minimizer 
                                  count information in addition to normal 
                                  Kraken report. Default: false

--kraken2_use_names             : Print scientific names instead of just 
                                  taxids. Default: true

--kraken2_extract_bug           : Extract the reads or contigs beloging to 
                                  this bug. Default: Escherichia coli

--centrifuge_x                  : Absolute path to centrifuge database. 
                                  Default: /hpc/db/centrifuge/2022-04-12/ab

--centrifuge_save_unaligned     : Save SINGLE-END reads that did not align. 
                                  For PAIRED-END reads, save read pairs that 
                                  did not align concordantly. Default: false

--centrifuge_save_aligned       : Save SINGLE-END reads that aligned. For 
                                  PAIRED-END reads, save read pairs that 
                                  aligned concordantly. Default: false

--centrifuge_out_fmt_sam        : Centrifuge output should be in SAM. Default: 
                                  false

--centrifuge_extract_bug        : Extract this bug from centrifuge results. 
                                  Default: Escherichia coli

--centrifuge_ignore_quals       : Treat all quality values as 30 on Phred 
                                  scale. Default: false

--megahit_run                   : Run MEGAHIT assembler. Default: true

--megahit_min_count             : <int>. Minimum multiplicity for filtering (
                                  k_min+1)-mers. Defaut: false

--megahit_k_list                : Comma-separated list of kmer size. All 
                                  values must be odd, in the range 15-255, 
                                  increment should be <= 28. Ex: '21,29,39,59,
                                  79,99,119,141'. Default: false

--megahit_no_mercy              : Do not add mercy k-mers. Default: false

--megahit_bubble_level          : <int>. Intensity of bubble merging (0-2), 0 
                                  to disable. Default: false

--megahit_merge_level           : <l,s>. Merge complex bubbles of length <= l*
                                  kmer_size and similarity >= s. Default: 
                                  false

--megahit_prune_level           : <int>. Strength of low depth pruning (0-3). 
                                  Default: false

--megahit_prune_depth           : <int>. Remove unitigs with avg k-mer depth 
                                  less than this value. Default: false

--megahit_low_local_ratio       : <float>. Ratio threshold to define low 
                                  local coverage contigs. Default: false

--megahit_max_tip_len           : <int>. remove tips less than this value [<
                                  int> * k]. Default: false

--megahit_no_local              : Disable local assembly. Default: false

--megahit_kmin_1pass            : Use 1pass mode to build SdBG of k_min. 
                                  Default: false

--megahit_preset                : <str>. Override a group of parameters. 
                                  Valid values are meta-sensitive which 
                                  enforces '--min-count 1 --k-list 21,29,39,
                                  49,...,129,141', meta-large (large & 
                                  complex metagenomes, like soil) which 
                                  enforces '--k-min 27 --k-max 127 --k-step 
                                  10'. Default: meta-sensitive

--megahit_mem_flag              : <int>. SdBG builder memory mode. 0: minimum; 
                                  1: moderate; 2: use all memory specified. 
                                  Default: 2

--megahit_min_contig_len        : <int>.  Minimum length of contigs to output. 
                                  Default: false

--spades_run                    : Run SPAdes assembler. Default: false

--spades_isolate                : This flag is highly recommended for high-
                                  coverage isolate and multi-cell data. 
                                  Defaut: false

--spades_sc                     : This flag is required for MDA (single-cell) 
                                  data. Default: false

--spades_meta                   : This flag is required for metagenomic data. 
                                  Default: true

--spades_bio                    : This flag is required for biosytheticSPAdes 
                                  mode. Default: false

--spades_corona                 : This flag is required for coronaSPAdes mode. 
                                  Default: false

--spades_rna                    : This flag is required for RNA-Seq data. 
                                  Default: false

--spades_plasmid                : Runs plasmidSPAdes pipeline for plasmid 
                                  detection. Default: false

--spades_metaviral              : Runs metaviralSPAdes pipeline for virus 
                                  detection. Default: false

--spades_metaplasmid            : Runs metaplasmidSPAdes pipeline for plasmid 
                                  detection in metagenomics datasets. Default: 
                                  false

--spades_rnaviral               : This flag enables virus assembly module 
                                  from RNA-Seq data. Default: false

--spades_iontorrent             : This flag is required for IonTorrent data. 
                                  Default: false

--spades_only_assembler         : Runs only the SPAdes assembler module (
                                  without read error correction). Default: 
                                  false

--spades_careful                : Tries to reduce the number of mismatches 
                                  and short indels in the assembly. Default: 
                                  false

--spades_cov_cutoff             : Coverage cutoff value (a positive float 
                                  number). Default: false

--spades_k                      : List of k-mer sizes (must be odd and less 
                                  than 128). Default: false

--spades_hmm                    : Directory with custom hmms that replace the 
                                  default ones (very rare). Default: false

--serotypefinder_run            : Run SerotypeFinder tool. Default: true

--serotypefinder_x              : Generate extended output files. Default: 
                                  true

--serotypefinder_db             : Path to SerotypeFinder databases. Default: /
                                  hpc/db/serotypefinder/2.0.2

--serotypefinder_min_threshold  : Minimum percent identity (in float) 
                                  required for calling a hit. Default: 0.85

--serotypefinder_min_cov        : Minumum percent coverage (in float) 
                                  required for calling a hit. Default: 0.80

--seqsero2_run                  : Run SeqSero2 tool. Default: false

--seqsero2_t                    : '1' for interleaved paired-end reads, '2' 
                                  for separated paired-end reads, '3' for 
                                  single reads, '4' for genome assembly, '5' 
                                  for nanopore reads (fasta/fastq). Default: 
                                  4

--seqsero2_m                    : Which workflow to apply, 'a'(raw reads 
                                  allele micro-assembly), 'k'(raw reads and 
                                  genome assembly k-mer). Default: k

--seqsero2_c                    : SeqSero2 will only output serotype 
                                  prediction without the directory containing 
                                  log files. Default: false

--seqsero2_s                    : SeqSero2 will not output header in 
                                  SeqSero_result.tsv. Default: false

--mlst_run                      : Run MLST tool. Default: true

--mlst_minid                    : DNA %identity of full allelle to consider '
                                  similar' [~]. Default: 95

--mlst_mincov                   : DNA %cov to report partial allele at all [?].
                                  Default: 10

--mlst_minscore                 : Minumum score out of 100 to match a scheme.
                                  Default: 50

--abricate_run                  : Run ABRicate tool. Default: true

--abricate_minid                : Minimum DNA %identity. Defaut: 90

--abricate_mincov               : Minimum DNA %coverage. Defaut: 80

--abricate_datadir              : ABRicate databases folder. Defaut: /hpc/db/
                                  abricate/1.0.1/db

Help options                    : 

--help                          : Display this message.
```

### **BETA**

---
The development of the modular structure and flow is an ongoing effort and may change depending on assessment of various computational topics and other considerations.