kkonganti@92: # CPIPES (CFSAN PIPELINES) kkonganti@92: kkonganti@92: ## The modular pipeline repository at CFSAN, FDA kkonganti@92: kkonganti@92: **CPIPES** (CFSAN PIPELINES) is a collection of modular pipelines based on **NEXTFLOW**, kkonganti@92: mostly for bioinformatics data analysis at **CFSAN, FDA.** kkonganti@92: kkonganti@92: --- kkonganti@92: kkonganti@92: ### **centriflaken_hy** kkonganti@92: kkonganti@92: --- kkonganti@92: `centriflaken_hy` is a variant of the original `centriflaken` pipeline but for Illumina short reads either single-end or paired-end. kkonganti@92: kkonganti@92: #### Workflow Usage kkonganti@92: kkonganti@92: ```bash kkonganti@97: module load cpipes/0.3.0 kkonganti@92: kkonganti@92: cpipes --pipeline centriflaken_hy [options] kkonganti@92: ``` kkonganti@92: kkonganti@92: Example: Run the default `centriflaken_hy` pipeline with taxa of interest as *E. coli*. kkonganti@92: kkonganti@92: ```bash kkonganti@92: cd /hpc/scratch/$USER kkonganti@92: mkdir nf-cpipes kkonganti@92: cd nf-cpipes kkonganti@92: cpipes --pipeline centriflaken_hy --input /path/to/illumina/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov' kkonganti@92: ``` kkonganti@92: kkonganti@92: Example: Run the `centriflaken_hy` pipeline with taxa of interest as *Salmonella*. In this mode, `SerotypeFinder` tool will be replaced with `SeqSero2` tool. kkonganti@92: kkonganti@92: ```bash kkonganti@92: cd /hpc/scratch/$USER kkonganti@92: mkdir nf-cpipes kkonganti@92: cd nf-cpipes kkonganti@92: cpipes --pipeline centriflaken_hy --centrifuge_extract_bug 'Salmonella' --input /path/to/illumina/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov' kkonganti@92: ``` kkonganti@92: kkonganti@92: #### `centriflaken_hy` Help kkonganti@92: kkonganti@92: ```text kkonganti@92: [Kranti.Konganti@login2-slurm ]$ cpipes --pipeline centriflaken_hy --help kkonganti@92: N E X T F L O W ~ version 21.12.1-edge kkonganti@97: Launching `/home/Kranti.Konganti/apps/cpipes/cpipes` [condescending_jang] - revision: 72db279311 kkonganti@92: ================================================================================ kkonganti@92: (o) kkonganti@92: ___ _ __ _ _ __ ___ ___ kkonganti@92: / __|| '_ \ | || '_ \ / _ \/ __| kkonganti@92: | (__ | |_) || || |_) || __/\__ \ kkonganti@92: \___|| .__/ |_|| .__/ \___||___/ kkonganti@92: | | | | kkonganti@92: |_| |_| kkonganti@92: -------------------------------------------------------------------------------- kkonganti@92: A collection of modular pipelines at CFSAN, FDA. kkonganti@92: -------------------------------------------------------------------------------- kkonganti@92: Name : CPIPES kkonganti@92: Author : Kranti.Konganti@fda.hhs.gov kkonganti@97: Version : 0.3.0 kkonganti@92: Center : CFSAN, FDA. kkonganti@92: ================================================================================ kkonganti@92: kkonganti@92: Workflow : centriflaken_hy kkonganti@92: kkonganti@92: Author : Kranti.Konganti@fda.hhs.gov kkonganti@92: kkonganti@97: Version : 0.3.0 kkonganti@92: kkonganti@92: kkonganti@92: Usage : cpipes --pipeline centriflaken_hy [options] kkonganti@92: kkonganti@92: kkonganti@92: Required : kkonganti@92: kkonganti@92: --input : Absolute path to directory containing FASTQ kkonganti@92: files. The directory should contain only kkonganti@92: FASTQ files as all the files within the kkonganti@92: mentioned directory will be read. Ex: -- kkonganti@92: input /path/to/fastq_pass kkonganti@92: kkonganti@92: --output : Absolute path to directory where all the kkonganti@92: pipeline outputs should be stored. Ex: -- kkonganti@92: output /path/to/output kkonganti@92: kkonganti@92: Other options : kkonganti@92: kkonganti@92: --metadata : Absolute path to metadata CSV file kkonganti@92: containing five mandatory columns: sample, kkonganti@92: fq1,fq2,strandedness,single_end. The fq1 kkonganti@92: and fq2 columns contain absolute paths to kkonganti@92: the FASTQ files. This option can be used in kkonganti@92: place of --input option. This is rare. Ex: -- kkonganti@92: metadata samplesheet.csv kkonganti@92: kkonganti@92: --fq_suffix : The suffix of FASTQ files (Unpaired reads kkonganti@92: or R1 reads or Long reads) if an input kkonganti@92: directory is mentioned via --input option. kkonganti@92: Default: _R1_001.fastq.gz kkonganti@92: kkonganti@92: --fq2_suffix : The suffix of FASTQ files (Paired-end reads kkonganti@92: or R2 reads) if an input directory is kkonganti@92: mentioned via --input option. Default: kkonganti@92: _R2_001.fastq.gz kkonganti@92: kkonganti@92: --fq_filter_by_len : Remove FASTQ reads that are less than this kkonganti@92: many bases. Default: 75 kkonganti@92: kkonganti@92: --fq_strandedness : The strandedness of the sequencing run. kkonganti@92: This is mostly needed if your sequencing kkonganti@92: run is RNA-SEQ. For most of the other runs, kkonganti@92: it is probably safe to use unstranded for kkonganti@92: the option. Default: unstranded kkonganti@92: kkonganti@92: --fq_single_end : SINGLE-END information will be auto- kkonganti@92: detected but this option forces PAIRED-END kkonganti@92: FASTQ files to be treated as SINGLE-END so kkonganti@92: only read 1 information is included in auto- kkonganti@92: generated samplesheet. Default: false kkonganti@92: kkonganti@92: --fq_filename_delim : Delimiter by which the file name is split kkonganti@92: to obtain sample name. Default: _ kkonganti@92: kkonganti@92: --fq_filename_delim_idx : After splitting FASTQ file name by using kkonganti@92: the --fq_filename_delim option, all kkonganti@92: elements before this index (1-based) will kkonganti@92: be joined to create final sample name. kkonganti@92: Default: 1 kkonganti@92: kkonganti@92: --kraken2_db : Absolute path to kraken database. Default: / kkonganti@92: hpc/db/kraken2/standard-210914 kkonganti@92: kkonganti@92: --kraken2_confidence : Confidence score threshold which must be kkonganti@92: between 0 and 1. Default: 0.0 kkonganti@92: kkonganti@92: --kraken2_quick : Quick operation (use first hit or hits). kkonganti@92: Default: false kkonganti@92: kkonganti@92: --kraken2_use_mpa_style : Report output like Kraken 1's kraken-mpa- kkonganti@92: report. Default: false kkonganti@92: kkonganti@92: --kraken2_minimum_base_quality : Minimum base quality used in classification kkonganti@92: which is only effective with FASTQ input. kkonganti@92: Default: 0 kkonganti@92: kkonganti@92: --kraken2_report_zero_counts : Report counts for ALL taxa, even if counts kkonganti@92: are zero. Default: false kkonganti@92: kkonganti@92: --kraken2_report_minmizer_data : Report minimizer and distinct minimizer kkonganti@92: count information in addition to normal kkonganti@92: Kraken report. Default: false kkonganti@92: kkonganti@92: --kraken2_use_names : Print scientific names instead of just kkonganti@92: taxids. Default: true kkonganti@92: kkonganti@92: --kraken2_extract_bug : Extract the reads or contigs beloging to kkonganti@92: this bug. Default: Escherichia coli kkonganti@92: kkonganti@92: --centrifuge_x : Absolute path to centrifuge database. kkonganti@92: Default: /hpc/db/centrifuge/2022-04-12/ab kkonganti@92: kkonganti@92: --centrifuge_save_unaligned : Save SINGLE-END reads that did not align. kkonganti@92: For PAIRED-END reads, save read pairs that kkonganti@92: did not align concordantly. Default: false kkonganti@92: kkonganti@92: --centrifuge_save_aligned : Save SINGLE-END reads that aligned. For kkonganti@92: PAIRED-END reads, save read pairs that kkonganti@92: aligned concordantly. Default: false kkonganti@92: kkonganti@92: --centrifuge_out_fmt_sam : Centrifuge output should be in SAM. Default: kkonganti@92: false kkonganti@92: kkonganti@92: --centrifuge_extract_bug : Extract this bug from centrifuge results. kkonganti@92: Default: Escherichia coli kkonganti@92: kkonganti@92: --centrifuge_ignore_quals : Treat all quality values as 30 on Phred kkonganti@92: scale. Default: false kkonganti@92: kkonganti@97: --megahit_run : Run MEGAHIT assembler. Default: true kkonganti@97: kkonganti@97: --megahit_min_count : . Minimum multiplicity for filtering ( kkonganti@97: k_min+1)-mers. Defaut: false kkonganti@97: kkonganti@97: --megahit_k_list : Comma-separated list of kmer size. All kkonganti@97: values must be odd, in the range 15-255, kkonganti@97: increment should be <= 28. Ex: '21,29,39,59, kkonganti@97: 79,99,119,141'. Default: false kkonganti@97: kkonganti@97: --megahit_no_mercy : Do not add mercy k-mers. Default: false kkonganti@97: kkonganti@97: --megahit_bubble_level : . Intensity of bubble merging (0-2), 0 kkonganti@97: to disable. Default: false kkonganti@97: kkonganti@97: --megahit_merge_level : . Merge complex bubbles of length <= l* kkonganti@97: kmer_size and similarity >= s. Default: kkonganti@97: false kkonganti@97: kkonganti@97: --megahit_prune_level : . Strength of low depth pruning (0-3). kkonganti@97: Default: false kkonganti@97: kkonganti@97: --megahit_prune_depth : . Remove unitigs with avg k-mer depth kkonganti@97: less than this value. Default: false kkonganti@97: kkonganti@97: --megahit_low_local_ratio : . Ratio threshold to define low kkonganti@97: local coverage contigs. Default: false kkonganti@97: kkonganti@97: --megahit_max_tip_len : . remove tips less than this value [< kkonganti@97: int> * k]. Default: false kkonganti@97: kkonganti@97: --megahit_no_local : Disable local assembly. Default: false kkonganti@97: kkonganti@97: --megahit_kmin_1pass : Use 1pass mode to build SdBG of k_min. kkonganti@97: Default: false kkonganti@97: kkonganti@97: --megahit_preset : . Override a group of parameters. kkonganti@97: Valid values are meta-sensitive which kkonganti@97: enforces '--min-count 1 --k-list 21,29,39, kkonganti@97: 49,...,129,141', meta-large (large & kkonganti@97: complex metagenomes, like soil) which kkonganti@97: enforces '--k-min 27 --k-max 127 --k-step kkonganti@97: 10'. Default: meta-sensitive kkonganti@97: kkonganti@97: --megahit_mem_flag : . SdBG builder memory mode. 0: minimum; kkonganti@97: 1: moderate; 2: use all memory specified. kkonganti@97: Default: 2 kkonganti@97: kkonganti@97: --megahit_min_contig_len : . Minimum length of contigs to output. kkonganti@97: Default: false kkonganti@97: kkonganti@97: --spades_run : Run SPAdes assembler. Default: false kkonganti@97: kkonganti@92: --spades_isolate : This flag is highly recommended for high- kkonganti@92: coverage isolate and multi-cell data. kkonganti@92: Defaut: false kkonganti@92: kkonganti@92: --spades_sc : This flag is required for MDA (single-cell) kkonganti@92: data. Default: false kkonganti@92: kkonganti@92: --spades_meta : This flag is required for metagenomic data. kkonganti@92: Default: true kkonganti@92: kkonganti@92: --spades_bio : This flag is required for biosytheticSPAdes kkonganti@92: mode. Default: false kkonganti@92: kkonganti@92: --spades_corona : This flag is required for coronaSPAdes mode. kkonganti@92: Default: false kkonganti@92: kkonganti@92: --spades_rna : This flag is required for RNA-Seq data. kkonganti@92: Default: false kkonganti@92: kkonganti@92: --spades_plasmid : Runs plasmidSPAdes pipeline for plasmid kkonganti@92: detection. Default: false kkonganti@92: kkonganti@92: --spades_metaviral : Runs metaviralSPAdes pipeline for virus kkonganti@92: detection. Default: false kkonganti@92: kkonganti@92: --spades_metaplasmid : Runs metaplasmidSPAdes pipeline for plasmid kkonganti@92: detection in metagenomics datasets. Default: kkonganti@92: false kkonganti@92: kkonganti@92: --spades_rnaviral : This flag enables virus assembly module kkonganti@92: from RNA-Seq data. Default: false kkonganti@92: kkonganti@92: --spades_iontorrent : This flag is required for IonTorrent data. kkonganti@92: Default: false kkonganti@92: kkonganti@92: --spades_only_assembler : Runs only the SPAdes assembler module ( kkonganti@97: without read error correction). Default: kkonganti@92: false kkonganti@92: kkonganti@92: --spades_careful : Tries to reduce the number of mismatches kkonganti@92: and short indels in the assembly. Default: kkonganti@92: false kkonganti@92: kkonganti@92: --spades_cov_cutoff : Coverage cutoff value (a positive float kkonganti@92: number). Default: false kkonganti@92: kkonganti@92: --spades_k : List of k-mer sizes (must be odd and less kkonganti@92: than 128). Default: false kkonganti@92: kkonganti@92: --spades_hmm : Directory with custom hmms that replace the kkonganti@92: default ones (very rare). Default: false kkonganti@92: kkonganti@92: --serotypefinder_run : Run SerotypeFinder tool. Default: true kkonganti@92: kkonganti@92: --serotypefinder_x : Generate extended output files. Default: kkonganti@92: true kkonganti@92: kkonganti@92: --serotypefinder_db : Path to SerotypeFinder databases. Default: / kkonganti@92: hpc/db/serotypefinder/2.0.2 kkonganti@92: kkonganti@92: --serotypefinder_min_threshold : Minimum percent identity (in float) kkonganti@92: required for calling a hit. Default: 0.85 kkonganti@92: kkonganti@92: --serotypefinder_min_cov : Minumum percent coverage (in float) kkonganti@92: required for calling a hit. Default: 0.80 kkonganti@92: kkonganti@92: --seqsero2_run : Run SeqSero2 tool. Default: false kkonganti@92: kkonganti@92: --seqsero2_t : '1' for interleaved paired-end reads, '2' kkonganti@92: for separated paired-end reads, '3' for kkonganti@92: single reads, '4' for genome assembly, '5' kkonganti@92: for nanopore reads (fasta/fastq). Default: kkonganti@92: 4 kkonganti@92: kkonganti@92: --seqsero2_m : Which workflow to apply, 'a'(raw reads kkonganti@92: allele micro-assembly), 'k'(raw reads and kkonganti@92: genome assembly k-mer). Default: k kkonganti@92: kkonganti@92: --seqsero2_c : SeqSero2 will only output serotype kkonganti@92: prediction without the directory containing kkonganti@92: log files. Default: false kkonganti@92: kkonganti@92: --seqsero2_s : SeqSero2 will not output header in kkonganti@92: SeqSero_result.tsv. Default: false kkonganti@92: kkonganti@92: --mlst_run : Run MLST tool. Default: true kkonganti@92: kkonganti@92: --mlst_minid : DNA %identity of full allelle to consider ' kkonganti@92: similar' [~]. Default: 95 kkonganti@92: kkonganti@92: --mlst_mincov : DNA %cov to report partial allele at all [?]. kkonganti@92: Default: 10 kkonganti@92: kkonganti@92: --mlst_minscore : Minumum score out of 100 to match a scheme. kkonganti@92: Default: 50 kkonganti@92: kkonganti@92: --abricate_run : Run ABRicate tool. Default: true kkonganti@92: kkonganti@92: --abricate_minid : Minimum DNA %identity. Defaut: 90 kkonganti@92: kkonganti@92: --abricate_mincov : Minimum DNA %coverage. Defaut: 80 kkonganti@92: kkonganti@92: --abricate_datadir : ABRicate databases folder. Defaut: /hpc/db/ kkonganti@92: abricate/1.0.1/db kkonganti@92: kkonganti@92: Help options : kkonganti@92: kkonganti@92: --help : Display this message. kkonganti@92: ``` kkonganti@92: kkonganti@92: ### **BETA** kkonganti@92: kkonganti@92: --- kkonganti@92: The development of the modular structure and flow is an ongoing effort and may change depending on assessment of various computational topics and other considerations.