Mercurial > repos > kkonganti > hfp_nowayout

diff 0.5.0/readme/centriflaken_hy.md @ 0:97cd2f532efe
planemo upload
author: kkonganti
date: Mon, 31 Mar 2025 14:50:40 -0400
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/0.5.0/readme/centriflaken_hy.md	Mon Mar 31 14:50:40 2025 -0400
@@ -0,0 +1,367 @@
+# CPIPES (CFSAN PIPELINES)
+
+## The modular pipeline repository at CFSAN, FDA
+
+**CPIPES** (CFSAN PIPELINES) is a collection of modular pipelines based on **NEXTFLOW**,
+mostly for bioinformatics data analysis at **CFSAN, FDA.**
+
+---
+
+### **centriflaken_hy**
+
+---
+`centriflaken_hy` is a variant of the original `centriflaken` pipeline but for Illumina short reads either single-end or paired-end.
+
+#### Workflow Usage
+
+```bash
+module load cpipes/0.4.0
+
+cpipes --pipeline centriflaken_hy [options]
+```
+
+Example: Run the default `centriflaken_hy` pipeline with taxa of interest as *E. coli*.
+
+```bash
+cd /hpc/scratch/$USER
+mkdir nf-cpipes
+cd nf-cpipes
+cpipes --pipeline centriflaken_hy --input /path/to/illumina/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
+```
+
+Example: Run the `centriflaken_hy` pipeline with taxa of interest as *Salmonella*. In this mode, `SerotypeFinder` tool will be replaced with `SeqSero2` tool.
+
+```bash
+cd /hpc/scratch/$USER
+mkdir nf-cpipes
+cd nf-cpipes
+cpipes --pipeline centriflaken_hy --centrifuge_extract_bug 'Salmonella' --input /path/to/illumina/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
+```
+
+#### `centriflaken_hy` Help
+
+```text
+[Kranti.Konganti@login2-slurm ]$ cpipes --pipeline centriflaken_hy --help
+N E X T F L O W  ~  version 21.12.1-edge
+Launching `/home/Kranti.Konganti/apps/cpipes/cpipes` [soggy_curie] - revision: 72db279311
+================================================================================
+             (o)                  
+  ___  _ __   _  _ __    ___  ___ 
+ / __|| '_ \ | || '_ \  / _ \/ __|
+| (__ | |_) || || |_) ||  __/\__ \
+ \___|| .__/ |_|| .__/  \___||___/
+      | |       | |               
+      |_|       |_|
+--------------------------------------------------------------------------------
+A collection of modular pipelines at CFSAN, FDA.
+--------------------------------------------------------------------------------
+Name                            : CPIPES
+Author                          : Kranti.Konganti@fda.hhs.gov
+Version                         : 0.4.0
+Center                          : CFSAN, FDA.
+================================================================================
+
+Workflow                        : centriflaken_hy
+
+Author                          : Kranti.Konganti@fda.hhs.gov
+
+Version                         : 0.4.0
+
+
+Usage                           : cpipes --pipeline centriflaken_hy [options]
+
+
+Required                        : 
+
+--input                         : Absolute path to directory containing FASTQ 
+                                  files. The directory should contain only 
+                                  FASTQ files as all the files within the 
+                                  mentioned directory will be read. Ex: --
+                                  input /path/to/fastq_pass
+
+--output                        : Absolute path to directory where all the 
+                                  pipeline outputs should be stored. Ex: --
+                                  output /path/to/output
+
+Other options                   : 
+
+--metadata                      : Absolute path to metadata CSV file 
+                                  containing five mandatory columns: sample,
+                                  fq1,fq2,strandedness,single_end. The fq1 
+                                  and fq2 columns contain absolute paths to 
+                                  the FASTQ files. This option can be used in 
+                                  place of --input option. This is rare. Ex: --
+                                  metadata samplesheet.csv
+
+--fq_suffix                     : The suffix of FASTQ files (Unpaired reads 
+                                  or R1 reads or Long reads) if an input 
+                                  directory is mentioned via --input option. 
+                                  Default: _R1_001.fastq.gz
+
+--fq2_suffix                    : The suffix of FASTQ files (Paired-end reads 
+                                  or R2 reads) if an input directory is 
+                                  mentioned via --input option. Default: 
+                                  _R2_001.fastq.gz
+
+--fq_filter_by_len              : Remove FASTQ reads that are less than this 
+                                  many bases. Default: 75
+
+--fq_strandedness               : The strandedness of the sequencing run. 
+                                  This is mostly needed if your sequencing 
+                                  run is RNA-SEQ. For most of the other runs, 
+                                  it is probably safe to use unstranded for 
+                                  the option. Default: unstranded
+
+--fq_single_end                 : SINGLE-END information will be auto-
+                                  detected but this option forces PAIRED-END 
+                                  FASTQ files to be treated as SINGLE-END so 
+                                  only read 1 information is included in auto-
+                                  generated samplesheet. Default: false
+
+--fq_filename_delim             : Delimiter by which the file name is split 
+                                  to obtain sample name. Default: _
+
+--fq_filename_delim_idx         : After splitting FASTQ file name by using 
+                                  the --fq_filename_delim option, all 
+                                  elements before this index (1-based) will 
+                                  be joined to create final sample name. 
+                                  Default: 1
+
+--seqkit_rmdup_run              : Remove duplicate sequences using seqkit 
+                                  rmdup. Default: false
+
+--seqkit_rmdup_n                : Match and remove duplicate sequences by 
+                                  full name instead of just ID. Defaut: false
+
+--seqkit_rmdup_s                : Match and remove duplicate sequences by 
+                                  sequence content. Defaut: true
+
+--seqkit_rmdup_d                : Save the duplicated sequences to a file. 
+                                  Defaut: false
+
+--seqkit_rmdup_D                : Save the number and list of duplicated 
+                                  sequences to a file. Defaut: false
+
+--seqkit_rmdup_i                : Ignore case while using seqkit rmdup. 
+                                  Defaut: false
+
+--seqkit_rmdup_P                : Only consider positive strand (i.e. 5') 
+                                  when comparing by sequence content. Defaut: 
+                                  false
+
+--kraken2_db                    : Absolute path to kraken database. Default: /
+                                  hpc/db/kraken2/standard-210914
+
+--kraken2_confidence            : Confidence score threshold which must be 
+                                  between 0 and 1. Default: 0.0
+
+--kraken2_quick                 : Quick operation (use first hit or hits). 
+                                  Default: false
+
+--kraken2_use_mpa_style         : Report output like Kraken 1's kraken-mpa-
+                                  report. Default: false
+
+--kraken2_minimum_base_quality  : Minimum base quality used in classification  
+                                  which is only effective with FASTQ input. 
+                                  Default: 0
+
+--kraken2_report_zero_counts    : Report counts for ALL taxa, even if counts 
+                                  are zero. Default: false
+
+--kraken2_report_minmizer_data  : Report minimizer and distinct minimizer 
+                                  count information in addition to normal 
+                                  Kraken report. Default: false
+
+--kraken2_use_names             : Print scientific names instead of just 
+                                  taxids. Default: true
+
+--kraken2_extract_bug           : Extract the reads or contigs beloging to 
+                                  this bug. Default: Escherichia coli
+
+--centrifuge_x                  : Absolute path to centrifuge database. 
+                                  Default: /hpc/db/centrifuge/2022-04-12/ab
+
+--centrifuge_save_unaligned     : Save SINGLE-END reads that did not align. 
+                                  For PAIRED-END reads, save read pairs that 
+                                  did not align concordantly. Default: false
+
+--centrifuge_save_aligned       : Save SINGLE-END reads that aligned. For 
+                                  PAIRED-END reads, save read pairs that 
+                                  aligned concordantly. Default: false
+
+--centrifuge_out_fmt_sam        : Centrifuge output should be in SAM. Default: 
+                                  false
+
+--centrifuge_extract_bug        : Extract this bug from centrifuge results. 
+                                  Default: Escherichia coli
+
+--centrifuge_ignore_quals       : Treat all quality values as 30 on Phred 
+                                  scale. Default: false
+
+--megahit_run                   : Run MEGAHIT assembler. Default: true
+
+--megahit_min_count             : <int>. Minimum multiplicity for filtering (
+                                  k_min+1)-mers. Defaut: false
+
+--megahit_k_list                : Comma-separated list of kmer size. All 
+                                  values must be odd, in the range 15-255, 
+                                  increment should be <= 28. Ex: '21,29,39,59,
+                                  79,99,119,141'. Default: false
+
+--megahit_no_mercy              : Do not add mercy k-mers. Default: false
+
+--megahit_bubble_level          : <int>. Intensity of bubble merging (0-2), 0 
+                                  to disable. Default: false
+
+--megahit_merge_level           : <l,s>. Merge complex bubbles of length <= l*
+                                  kmer_size and similarity >= s. Default: 
+                                  false
+
+--megahit_prune_level           : <int>. Strength of low depth pruning (0-3). 
+                                  Default: false
+
+--megahit_prune_depth           : <int>. Remove unitigs with avg k-mer depth 
+                                  less than this value. Default: false
+
+--megahit_low_local_ratio       : <float>. Ratio threshold to define low 
+                                  local coverage contigs. Default: false
+
+--megahit_max_tip_len           : <int>. remove tips less than this value [<
+                                  int> * k]. Default: false
+
+--megahit_no_local              : Disable local assembly. Default: false
+
+--megahit_kmin_1pass            : Use 1pass mode to build SdBG of k_min. 
+                                  Default: false
+
+--megahit_preset                : <str>. Override a group of parameters. 
+                                  Valid values are meta-sensitive which 
+                                  enforces '--min-count 1 --k-list 21,29,39,
+                                  49,...,129,141', meta-large (large & 
+                                  complex metagenomes, like soil) which 
+                                  enforces '--k-min 27 --k-max 127 --k-step 
+                                  10'. Default: meta-sensitive
+
+--megahit_mem_flag              : <int>. SdBG builder memory mode. 0: minimum; 
+                                  1: moderate; 2: use all memory specified. 
+                                  Default: 2
+
+--megahit_min_contig_len        : <int>.  Minimum length of contigs to output. 
+                                  Default: false
+
+--spades_run                    : Run SPAdes assembler. Default: false
+
+--spades_isolate                : This flag is highly recommended for high-
+                                  coverage isolate and multi-cell data. 
+                                  Defaut: false
+
+--spades_sc                     : This flag is required for MDA (single-cell) 
+                                  data. Default: false
+
+--spades_meta                   : This flag is required for metagenomic data. 
+                                  Default: true
+
+--spades_bio                    : This flag is required for biosytheticSPAdes 
+                                  mode. Default: false
+
+--spades_corona                 : This flag is required for coronaSPAdes mode. 
+                                  Default: false
+
+--spades_rna                    : This flag is required for RNA-Seq data. 
+                                  Default: false
+
+--spades_plasmid                : Runs plasmidSPAdes pipeline for plasmid 
+                                  detection. Default: false
+
+--spades_metaviral              : Runs metaviralSPAdes pipeline for virus 
+                                  detection. Default: false
+
+--spades_metaplasmid            : Runs metaplasmidSPAdes pipeline for plasmid 
+                                  detection in metagenomics datasets. Default: 
+                                  false
+
+--spades_rnaviral               : This flag enables virus assembly module 
+                                  from RNA-Seq data. Default: false
+
+--spades_iontorrent             : This flag is required for IonTorrent data. 
+                                  Default: false
+
+--spades_only_assembler         : Runs only the SPAdes assembler module (
+                                  without read error correction). Default: 
+                                  false
+
+--spades_careful                : Tries to reduce the number of mismatches 
+                                  and short indels in the assembly. Default: 
+                                  false
+
+--spades_cov_cutoff             : Coverage cutoff value (a positive float 
+                                  number). Default: false
+
+--spades_k                      : List of k-mer sizes (must be odd and less 
+                                  than 128). Default: false
+
+--spades_hmm                    : Directory with custom hmms that replace the 
+                                  default ones (very rare). Default: false
+
+--serotypefinder_run            : Run SerotypeFinder tool. Default: true
+
+--serotypefinder_x              : Generate extended output files. Default: 
+                                  true
+
+--serotypefinder_db             : Path to SerotypeFinder databases. Default: /
+                                  hpc/db/serotypefinder/2.0.2
+
+--serotypefinder_min_threshold  : Minimum percent identity (in float) 
+                                  required for calling a hit. Default: 0.85
+
+--serotypefinder_min_cov        : Minumum percent coverage (in float) 
+                                  required for calling a hit. Default: 0.80
+
+--seqsero2_run                  : Run SeqSero2 tool. Default: false
+
+--seqsero2_t                    : '1' for interleaved paired-end reads, '2' 
+                                  for separated paired-end reads, '3' for 
+                                  single reads, '4' for genome assembly, '5' 
+                                  for nanopore reads (fasta/fastq). Default: 
+                                  4
+
+--seqsero2_m                    : Which workflow to apply, 'a'(raw reads 
+                                  allele micro-assembly), 'k'(raw reads and 
+                                  genome assembly k-mer). Default: k
+
+--seqsero2_c                    : SeqSero2 will only output serotype 
+                                  prediction without the directory containing 
+                                  log files. Default: false
+
+--seqsero2_s                    : SeqSero2 will not output header in 
+                                  SeqSero_result.tsv. Default: false
+
+--mlst_run                      : Run MLST tool. Default: true
+
+--mlst_minid                    : DNA %identity of full allelle to consider '
+                                  similar' [~]. Default: 95
+
+--mlst_mincov                   : DNA %cov to report partial allele at all [?].
+                                  Default: 10
+
+--mlst_minscore                 : Minumum score out of 100 to match a scheme.
+                                  Default: 50
+
+--abricate_run                  : Run ABRicate tool. Default: true
+
+--abricate_minid                : Minimum DNA %identity. Defaut: 90
+
+--abricate_mincov               : Minimum DNA %coverage. Defaut: 80
+
+--abricate_datadir              : ABRicate databases folder. Defaut: /hpc/db/
+                                  abricate/1.0.1/db
+
+Help options                    : 
+
+--help                          : Display this message.
+```
+
+### **BETA**
+
+---
+The development of the modular structure and flow is an ongoing effort and may change depending on assessment of various computational topics and other considerations.
author	kkonganti
date	Mon, 31 Mar 2025 14:50:40 -0400
parents
children