kkonganti@0: # CPIPES (CFSAN PIPELINES)
kkonganti@0: 
kkonganti@0: ## The modular pipeline repository at CFSAN, FDA
kkonganti@0: 
kkonganti@0: **CPIPES** (CFSAN PIPELINES) is a collection of modular pipelines based on **NEXTFLOW**,
kkonganti@0: mostly for bioinformatics data analysis at **CFSAN, FDA.**
kkonganti@0: 
kkonganti@0: ---
kkonganti@0: 
kkonganti@0: ### **centriflaken_hy**
kkonganti@0: 
kkonganti@0: ---
kkonganti@0: `centriflaken_hy` is a variant of the original `centriflaken` pipeline but for Illumina short reads either single-end or paired-end.
kkonganti@0: 
kkonganti@0: #### Workflow Usage
kkonganti@0: 
kkonganti@0: ```bash
kkonganti@47: module load cpipes/0.2.1
kkonganti@0: 
kkonganti@0: cpipes --pipeline centriflaken_hy [options]
kkonganti@0: ```
kkonganti@0: 
kkonganti@0: Example: Run the default `centriflaken_hy` pipeline with taxa of interest as *E. coli*.
kkonganti@0: 
kkonganti@0: ```bash
kkonganti@0: cd /hpc/scratch/$USER
kkonganti@0: mkdir nf-cpipes
kkonganti@0: cd nf-cpipes
kkonganti@0: cpipes --pipeline centriflaken_hy --input /path/to/illumina/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
kkonganti@0: ```
kkonganti@0: 
kkonganti@0: Example: Run the `centriflaken_hy` pipeline with taxa of interest as *Salmonella*. In this mode, `SerotypeFinder` tool will be replaced with `SeqSero2` tool.
kkonganti@0: 
kkonganti@0: ```bash
kkonganti@0: cd /hpc/scratch/$USER
kkonganti@0: mkdir nf-cpipes
kkonganti@0: cd nf-cpipes
kkonganti@0: cpipes --pipeline centriflaken_hy --centrifuge_extract_bug 'Salmonella' --input /path/to/illumina/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
kkonganti@0: ```
kkonganti@0: 
kkonganti@0: #### `centriflaken_hy` Help
kkonganti@0: 
kkonganti@0: ```text
kkonganti@0: [Kranti.Konganti@login2-slurm ]$ cpipes --pipeline centriflaken_hy --help
kkonganti@0: N E X T F L O W  ~  version 21.12.1-edge
kkonganti@0: Launching `/nfs/software/apps/cpipes/0.2.1/cpipes` [wise_noyce] - revision: 72db279311
kkonganti@0: ================================================================================
kkonganti@0:              (o)                  
kkonganti@0:   ___  _ __   _  _ __    ___  ___ 
kkonganti@0:  / __|| '_ \ | || '_ \  / _ \/ __|
kkonganti@0: | (__ | |_) || || |_) ||  __/\__ \
kkonganti@0:  \___|| .__/ |_|| .__/  \___||___/
kkonganti@0:       | |       | |               
kkonganti@0:       |_|       |_|
kkonganti@0: --------------------------------------------------------------------------------
kkonganti@0: A collection of modular pipelines at CFSAN, FDA.
kkonganti@0: --------------------------------------------------------------------------------
kkonganti@0: Name                            : CPIPES
kkonganti@0: Author                          : Kranti.Konganti@fda.hhs.gov
kkonganti@0: Version                         : 0.2.1
kkonganti@0: Center                          : CFSAN, FDA.
kkonganti@0: ================================================================================
kkonganti@0: 
kkonganti@0: Workflow                        : centriflaken_hy
kkonganti@0: 
kkonganti@0: Author                          : Kranti.Konganti@fda.hhs.gov
kkonganti@0: 
kkonganti@0: Version                         : 0.2.0
kkonganti@0: 
kkonganti@0: 
kkonganti@0: Usage                           : cpipes --pipeline centriflaken_hy [options]
kkonganti@0: 
kkonganti@0: 
kkonganti@0: Required                        : 
kkonganti@0: 
kkonganti@0: --input                         : Absolute path to directory containing FASTQ 
kkonganti@0:                                   files. The directory should contain only 
kkonganti@0:                                   FASTQ files as all the files within the 
kkonganti@0:                                   mentioned directory will be read. Ex: --
kkonganti@0:                                   input /path/to/fastq_pass
kkonganti@0: 
kkonganti@0: --output                        : Absolute path to directory where all the 
kkonganti@0:                                   pipeline outputs should be stored. Ex: --
kkonganti@0:                                   output /path/to/output
kkonganti@0: 
kkonganti@0: Other options                   : 
kkonganti@0: 
kkonganti@0: --metadata                      : Absolute path to metadata CSV file 
kkonganti@0:                                   containing five mandatory columns: sample,
kkonganti@0:                                   fq1,fq2,strandedness,single_end. The fq1 
kkonganti@0:                                   and fq2 columns contain absolute paths to 
kkonganti@0:                                   the FASTQ files. This option can be used in 
kkonganti@0:                                   place of --input option. This is rare. Ex: --
kkonganti@0:                                   metadata samplesheet.csv
kkonganti@0: 
kkonganti@0: --fq_suffix                     : The suffix of FASTQ files (Unpaired reads 
kkonganti@0:                                   or R1 reads or Long reads) if an input 
kkonganti@0:                                   directory is mentioned via --input option. 
kkonganti@0:                                   Default: _R1_001.fastq.gz
kkonganti@0: 
kkonganti@0: --fq2_suffix                    : The suffix of FASTQ files (Paired-end reads 
kkonganti@0:                                   or R2 reads) if an input directory is 
kkonganti@0:                                   mentioned via --input option. Default: 
kkonganti@0:                                   _R2_001.fastq.gz
kkonganti@0: 
kkonganti@0: --fq_filter_by_len              : Remove FASTQ reads that are less than this 
kkonganti@0:                                   many bases. Default: 75
kkonganti@0: 
kkonganti@0: --fq_strandedness               : The strandedness of the sequencing run. 
kkonganti@0:                                   This is mostly needed if your sequencing 
kkonganti@0:                                   run is RNA-SEQ. For most of the other runs, 
kkonganti@0:                                   it is probably safe to use unstranded for 
kkonganti@0:                                   the option. Default: unstranded
kkonganti@0: 
kkonganti@0: --fq_single_end                 : SINGLE-END information will be auto-
kkonganti@0:                                   detected but this option forces PAIRED-END 
kkonganti@0:                                   FASTQ files to be treated as SINGLE-END so 
kkonganti@0:                                   only read 1 information is included in auto-
kkonganti@0:                                   generated samplesheet. Default: false
kkonganti@0: 
kkonganti@0: --fq_filename_delim             : Delimiter by which the file name is split 
kkonganti@0:                                   to obtain sample name. Default: _
kkonganti@0: 
kkonganti@0: --fq_filename_delim_idx         : After splitting FASTQ file name by using 
kkonganti@0:                                   the --fq_filename_delim option, all 
kkonganti@0:                                   elements before this index (1-based) will 
kkonganti@0:                                   be joined to create final sample name. 
kkonganti@0:                                   Default: 1
kkonganti@0: 
kkonganti@0: --kraken2_db                    : Absolute path to kraken database. Default: /
kkonganti@0:                                   hpc/db/kraken2/standard-210914
kkonganti@0: 
kkonganti@0: --kraken2_confidence            : Confidence score threshold which must be 
kkonganti@0:                                   between 0 and 1. Default: 0.0
kkonganti@0: 
kkonganti@0: --kraken2_quick                 : Quick operation (use first hit or hits). 
kkonganti@0:                                   Default: false
kkonganti@0: 
kkonganti@0: --kraken2_use_mpa_style         : Report output like Kraken 1's kraken-mpa-
kkonganti@0:                                   report. Default: false
kkonganti@0: 
kkonganti@0: --kraken2_minimum_base_quality  : Minimum base quality used in classification  
kkonganti@0:                                   which is only effective with FASTQ input. 
kkonganti@0:                                   Default: 0
kkonganti@0: 
kkonganti@0: --kraken2_report_zero_counts    : Report counts for ALL taxa, even if counts 
kkonganti@0:                                   are zero. Default: false
kkonganti@0: 
kkonganti@0: --kraken2_report_minmizer_data  : Report minimizer and distinct minimizer 
kkonganti@0:                                   count information in addition to normal 
kkonganti@0:                                   Kraken report. Default: false
kkonganti@0: 
kkonganti@0: --kraken2_use_names             : Print scientific names instead of just 
kkonganti@0:                                   taxids. Default: true
kkonganti@0: 
kkonganti@0: --kraken2_extract_bug           : Extract the reads or contigs beloging to 
kkonganti@0:                                   this bug. Default: Escherichia coli
kkonganti@0: 
kkonganti@0: --centrifuge_x                  : Absolute path to centrifuge database. 
kkonganti@0:                                   Default: /hpc/db/centrifuge/2022-04-12/ab
kkonganti@0: 
kkonganti@0: --centrifuge_save_unaligned     : Save SINGLE-END reads that did not align. 
kkonganti@0:                                   For PAIRED-END reads, save read pairs that 
kkonganti@0:                                   did not align concordantly. Default: false
kkonganti@0: 
kkonganti@0: --centrifuge_save_aligned       : Save SINGLE-END reads that aligned. For 
kkonganti@0:                                   PAIRED-END reads, save read pairs that 
kkonganti@0:                                   aligned concordantly. Default: false
kkonganti@0: 
kkonganti@0: --centrifuge_out_fmt_sam        : Centrifuge output should be in SAM. Default: 
kkonganti@0:                                   false
kkonganti@0: 
kkonganti@0: --centrifuge_extract_bug        : Extract this bug from centrifuge results. 
kkonganti@0:                                   Default: Escherichia coli
kkonganti@0: 
kkonganti@0: --centrifuge_ignore_quals       : Treat all quality values as 30 on Phred 
kkonganti@0:                                   scale. Default: false
kkonganti@0: 
kkonganti@0: --spades_isolate                : This flag is highly recommended for high-
kkonganti@0:                                   coverage isolate and multi-cell data. 
kkonganti@0:                                   Defaut: false
kkonganti@0: 
kkonganti@0: --spades_sc                     : This flag is required for MDA (single-cell) 
kkonganti@0:                                   data. Default: false
kkonganti@0: 
kkonganti@0: --spades_meta                   : This flag is required for metagenomic data. 
kkonganti@0:                                   Default: true
kkonganti@0: 
kkonganti@0: --spades_bio                    : This flag is required for biosytheticSPAdes 
kkonganti@0:                                   mode. Default: false
kkonganti@0: 
kkonganti@0: --spades_corona                 : This flag is required for coronaSPAdes mode. 
kkonganti@0:                                   Default: false
kkonganti@0: 
kkonganti@0: --spades_rna                    : This flag is required for RNA-Seq data. 
kkonganti@0:                                   Default: false
kkonganti@0: 
kkonganti@0: --spades_plasmid                : Runs plasmidSPAdes pipeline for plasmid 
kkonganti@0:                                   detection. Default: false
kkonganti@0: 
kkonganti@0: --spades_metaviral              : Runs metaviralSPAdes pipeline for virus 
kkonganti@0:                                   detection. Default: false
kkonganti@0: 
kkonganti@0: --spades_metaplasmid            : Runs metaplasmidSPAdes pipeline for plasmid 
kkonganti@0:                                   detection in metagenomics datasets. Default: 
kkonganti@0:                                   false
kkonganti@0: 
kkonganti@0: --spades_rnaviral               : This flag enables virus assembly module 
kkonganti@0:                                   from RNA-Seq data. Default: false
kkonganti@0: 
kkonganti@0: --spades_iontorrent             : This flag is required for IonTorrent data. 
kkonganti@0:                                   Default: false
kkonganti@0: 
kkonganti@0: --spades_only_assembler         : Runs only the SPAdes assembler module (
kkonganti@0:                                   without read error correction).Default: 
kkonganti@0:                                   false
kkonganti@0: 
kkonganti@0: --spades_careful                : Tries to reduce the number of mismatches 
kkonganti@0:                                   and short indels in the assembly. Default: 
kkonganti@0:                                   false
kkonganti@0: 
kkonganti@0: --spades_cov_cutoff             : Coverage cutoff value (a positive float 
kkonganti@0:                                   number). Default: false
kkonganti@0: 
kkonganti@0: --spades_k                      : List of k-mer sizes (must be odd and less 
kkonganti@0:                                   than 128). Default: false
kkonganti@0: 
kkonganti@0: --spades_hmm                    : Directory with custom hmms that replace the 
kkonganti@0:                                   default ones (very rare). Default: false
kkonganti@0: 
kkonganti@0: --serotypefinder_run            : Run SerotypeFinder tool. Default: true
kkonganti@0: 
kkonganti@0: --serotypefinder_x              : Generate extended output files. Default: 
kkonganti@0:                                   true
kkonganti@0: 
kkonganti@0: --serotypefinder_db             : Path to SerotypeFinder databases. Default: /
kkonganti@0:                                   hpc/db/serotypefinder/2.0.2
kkonganti@0: 
kkonganti@0: --serotypefinder_min_threshold  : Minimum percent identity (in float) 
kkonganti@0:                                   required for calling a hit. Default: 0.85
kkonganti@0: 
kkonganti@0: --serotypefinder_min_cov        : Minumum percent coverage (in float) 
kkonganti@0:                                   required for calling a hit. Default: 0.80
kkonganti@0: 
kkonganti@0: --seqsero2_run                  : Run SeqSero2 tool. Default: false
kkonganti@0: 
kkonganti@0: --seqsero2_t                    : '1' for interleaved paired-end reads, '2' 
kkonganti@0:                                   for separated paired-end reads, '3' for 
kkonganti@0:                                   single reads, '4' for genome assembly, '5' 
kkonganti@0:                                   for nanopore reads (fasta/fastq). Default: 
kkonganti@0:                                   4
kkonganti@0: 
kkonganti@0: --seqsero2_m                    : Which workflow to apply, 'a'(raw reads 
kkonganti@0:                                   allele micro-assembly), 'k'(raw reads and 
kkonganti@0:                                   genome assembly k-mer). Default: k
kkonganti@0: 
kkonganti@0: --seqsero2_c                    : SeqSero2 will only output serotype 
kkonganti@0:                                   prediction without the directory containing 
kkonganti@0:                                   log files. Default: false
kkonganti@0: 
kkonganti@0: --seqsero2_s                    : SeqSero2 will not output header in 
kkonganti@0:                                   SeqSero_result.tsv. Default: false
kkonganti@0: 
kkonganti@0: --mlst_run                      : Run MLST tool. Default: true
kkonganti@0: 
kkonganti@0: --mlst_minid                    : DNA %identity of full allelle to consider '
kkonganti@0:                                   similar' [~]. Default: 95
kkonganti@0: 
kkonganti@0: --mlst_mincov                   : DNA %cov to report partial allele at all [?].
kkonganti@0:                                   Default: 10
kkonganti@0: 
kkonganti@0: --mlst_minscore                 : Minumum score out of 100 to match a scheme.
kkonganti@0:                                   Default: 50
kkonganti@0: 
kkonganti@0: --abricate_run                  : Run ABRicate tool. Default: true
kkonganti@0: 
kkonganti@0: --abricate_minid                : Minimum DNA %identity. Defaut: 90
kkonganti@0: 
kkonganti@0: --abricate_mincov               : Minimum DNA %coverage. Defaut: 80
kkonganti@0: 
kkonganti@0: --abricate_datadir              : ABRicate databases folder. Defaut: /hpc/db/
kkonganti@0:                                   abricate/1.0.1/db
kkonganti@0: 
kkonganti@0: Help options                    : 
kkonganti@0: 
kkonganti@0: --help                          : Display this message.
kkonganti@0: ```
kkonganti@0: 
kkonganti@0: ### **PRE ALPHA**
kkonganti@0: 
kkonganti@0: ---
kkonganti@47: The development of the modular structure and flow is an ongoing effort and may change depending on assessment of various computational topics and other considerations.
kkonganti@47: