kkonganti@0: # CPIPES (CFSAN PIPELINES) kkonganti@0: kkonganti@0: ## The modular pipeline repository at CFSAN, FDA kkonganti@0: kkonganti@0: **CPIPES** (CFSAN PIPELINES) is a collection of modular pipelines based on **NEXTFLOW**, kkonganti@0: mostly for bioinformatics data analysis at **CFSAN, FDA.** kkonganti@0: kkonganti@0: --- kkonganti@0: kkonganti@0: ### **centriflaken** kkonganti@0: kkonganti@0: --- kkonganti@0: Precision long-read metagenomics sequencing for food safety by detection and assembly of Shiga toxin-producing Escherichia coli. kkonganti@0: kkonganti@0: #### Workflow Usage kkonganti@0: kkonganti@0: ```bash kkonganti@0: module load cpipes/0.4.0 kkonganti@0: kkonganti@0: cpipes --pipeline centriflaken [options] kkonganti@0: ``` kkonganti@0: kkonganti@0: Example: Run the default `centriflaken` pipeline with taxa of interest as *E. coli*. kkonganti@0: kkonganti@0: ```bash kkonganti@0: cd /hpc/scratch/$USER kkonganti@0: mkdir nf-cpipes kkonganti@0: cd nf-cpipes kkonganti@0: cpipes --pipeline centriflaken --input /path/to/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov' kkonganti@0: ``` kkonganti@0: kkonganti@0: Example: Run the `centriflaken` pipeline with taxa of interest as *Salmonella*. In this mode, `SerotypeFinder` tool will be replaced with `SeqSero2` tool. kkonganti@0: kkonganti@0: ```bash kkonganti@0: cd /hpc/scratch/$USER kkonganti@0: mkdir nf-cpipes kkonganti@0: cd nf-cpipes kkonganti@0: cpipes --pipeline centriflaken --centrifuge_extract_bug 'Salmonella' --input /path/to/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov' kkonganti@0: ``` kkonganti@0: kkonganti@0: #### `centriflaken` Help kkonganti@0: kkonganti@0: ```text kkonganti@0: [Kranti.Konganti@login2-slurm ]$ cpipes --pipeline centriflaken --help kkonganti@0: N E X T F L O W ~ version 21.12.1-edge kkonganti@0: Launching `/nfs/software/apps/cpipes/0.4.0/cpipes` [crazy_euler] - revision: 72db279311 kkonganti@0: ================================================================================ kkonganti@0: (o) kkonganti@0: ___ _ __ _ _ __ ___ ___ kkonganti@0: / __|| '_ \ | || '_ \ / _ \/ __| kkonganti@0: | (__ | |_) || || |_) || __/\__ \ kkonganti@0: \___|| .__/ |_|| .__/ \___||___/ kkonganti@0: | | | | kkonganti@0: |_| |_| kkonganti@0: -------------------------------------------------------------------------------- kkonganti@0: A collection of modular pipelines at CFSAN, FDA. kkonganti@0: -------------------------------------------------------------------------------- kkonganti@0: Name : CPIPES kkonganti@0: Author : Kranti.Konganti@fda.hhs.gov kkonganti@0: Version : 0.4.0 kkonganti@0: Center : CFSAN, FDA. kkonganti@0: ================================================================================ kkonganti@0: kkonganti@0: Workflow : centriflaken kkonganti@0: kkonganti@0: Author : Kranti.Konganti@fda.hhs.gov kkonganti@0: kkonganti@0: Version : 0.2.1 kkonganti@0: kkonganti@0: kkonganti@0: Usage : cpipes --pipeline centriflaken [options] kkonganti@0: kkonganti@0: kkonganti@0: Required : kkonganti@0: kkonganti@0: --input : Absolute path to directory containing FASTQ kkonganti@0: files. The directory should contain only kkonganti@0: FASTQ files as all the files within the kkonganti@0: mentioned directory will be read. Ex: -- kkonganti@0: input /path/to/fastq_pass kkonganti@0: kkonganti@0: --output : Absolute path to directory where all the kkonganti@0: pipeline outputs should be stored. Ex: -- kkonganti@0: output /path/to/output kkonganti@0: kkonganti@0: Other options : kkonganti@0: kkonganti@0: --metadata : Absolute path to metadata CSV file kkonganti@0: containing five mandatory columns: sample, kkonganti@0: fq1,fq2,strandedness,single_end. The fq1 kkonganti@0: and fq2 columns contain absolute paths to kkonganti@0: the FASTQ files. This option can be used in kkonganti@0: place of --input option. This is rare. Ex: -- kkonganti@0: metadata samplesheet.csv kkonganti@0: kkonganti@0: --fq_suffix : The suffix of FASTQ files (Unpaired reads kkonganti@0: or R1 reads or Long reads) if an input kkonganti@0: directory is mentioned via --input option. kkonganti@0: Default: .fastq.gz kkonganti@0: kkonganti@0: --fq2_suffix : The suffix of FASTQ files (Paired-end reads kkonganti@0: or R2 reads) if an input directory is kkonganti@0: mentioned via --input option. Default: kkonganti@0: false kkonganti@0: kkonganti@0: --fq_filter_by_len : Remove FASTQ reads that are less than this kkonganti@0: many bases. Default: 4000 kkonganti@0: kkonganti@0: --fq_strandedness : The strandedness of the sequencing run. kkonganti@0: This is mostly needed if your sequencing kkonganti@0: run is RNA-SEQ. For most of the other runs, kkonganti@0: it is probably safe to use unstranded for kkonganti@0: the option. Default: unstranded kkonganti@0: kkonganti@0: --fq_single_end : SINGLE-END information will be auto- kkonganti@0: detected but this option forces PAIRED-END kkonganti@0: FASTQ files to be treated as SINGLE-END so kkonganti@0: only read 1 information is included in auto- kkonganti@0: generated samplesheet. Default: false kkonganti@0: kkonganti@0: --fq_filename_delim : Delimiter by which the file name is split kkonganti@0: to obtain sample name. Default: _ kkonganti@0: kkonganti@0: --fq_filename_delim_idx : After splitting FASTQ file name by using kkonganti@0: the --fq_filename_delim option, all kkonganti@0: elements before this index (1-based) will kkonganti@0: be joined to create final sample name. kkonganti@0: Default: 1 kkonganti@0: kkonganti@0: --kraken2_db : Absolute path to kraken database. Default: / kkonganti@0: hpc/db/kraken2/standard-210914 kkonganti@0: kkonganti@0: --kraken2_confidence : Confidence score threshold which must be kkonganti@0: between 0 and 1. Default: 0.0 kkonganti@0: kkonganti@0: --kraken2_quick : Quick operation (use first hit or hits). kkonganti@0: Default: false kkonganti@0: kkonganti@0: --kraken2_use_mpa_style : Report output like Kraken 1's kraken-mpa- kkonganti@0: report. Default: false kkonganti@0: kkonganti@0: --kraken2_minimum_base_quality : Minimum base quality used in classification kkonganti@0: which is only effective with FASTQ input. kkonganti@0: Default: 0 kkonganti@0: kkonganti@0: --kraken2_report_zero_counts : Report counts for ALL taxa, even if counts kkonganti@0: are zero. Default: false kkonganti@0: kkonganti@0: --kraken2_report_minmizer_data : Report minimizer and distinct minimizer kkonganti@0: count information in addition to normal kkonganti@0: Kraken report. Default: false kkonganti@0: kkonganti@0: --kraken2_use_names : Print scientific names instead of just kkonganti@0: taxids. Default: true kkonganti@0: kkonganti@0: --kraken2_extract_bug : Extract the reads or contigs beloging to kkonganti@0: this bug. Default: Escherichia coli kkonganti@0: kkonganti@0: --centrifuge_x : Absolute path to centrifuge database. kkonganti@0: Default: /hpc/db/centrifuge/2022-04-12/ab kkonganti@0: kkonganti@0: --centrifuge_save_unaligned : Save SINGLE-END reads that did not align. kkonganti@0: For PAIRED-END reads, save read pairs that kkonganti@0: did not align concordantly. Default: false kkonganti@0: kkonganti@0: --centrifuge_save_aligned : Save SINGLE-END reads that aligned. For kkonganti@0: PAIRED-END reads, save read pairs that kkonganti@0: aligned concordantly. Default: false kkonganti@0: kkonganti@0: --centrifuge_out_fmt_sam : Centrifuge output should be in SAM. Default: kkonganti@0: false kkonganti@0: kkonganti@0: --centrifuge_extract_bug : Extract this bug from centrifuge results. kkonganti@0: Default: Escherichia coli kkonganti@0: kkonganti@0: --centrifuge_ignore_quals : Treat all quality values as 30 on Phred kkonganti@0: scale. Default: false kkonganti@0: kkonganti@0: --flye_pacbio_raw : Input FASTQ reads are PacBio regular CLR kkonganti@0: reads (<20% error) Defaut: false kkonganti@0: kkonganti@0: --flye_pacbio_corr : Input FASTQ reads are PacBio reads that kkonganti@0: were corrected with other methods (<3% kkonganti@0: error). Default: false kkonganti@0: kkonganti@0: --flye_pacbio_hifi : Input FASTQ reads are PacBio HiFi reads (<1% kkonganti@0: error). Default: false kkonganti@0: kkonganti@0: --flye_nano_raw : Input FASTQ reads are ONT regular reads, kkonganti@0: pre-Guppy5 (<20% error). Default: true kkonganti@0: kkonganti@0: --flye_nano_corr : Input FASTQ reads are ONT reads that were kkonganti@0: corrected with other methods (<3% error). kkonganti@0: Default: false kkonganti@0: kkonganti@0: --flye_nano_hq : Input FASTQ reads are ONT high-quality kkonganti@0: reads: Guppy5+ SUP or Q20 (<5% error). kkonganti@0: Default: false kkonganti@0: kkonganti@0: --flye_genome_size : Estimated genome size (for example, 5m or 2. kkonganti@0: 6g). Default: 5.5m kkonganti@0: kkonganti@0: --flye_polish_iter : Number of genome polishing iterations. kkonganti@0: Default: false kkonganti@0: kkonganti@0: --flye_meta : Do a metagenome assembly (unenven coverage kkonganti@0: mode). Default: true kkonganti@0: kkonganti@0: --flye_min_overlap : Minimum overlap between reads. Default: kkonganti@0: false kkonganti@0: kkonganti@0: --flye_scaffold : Enable scaffolding using assembly graph. kkonganti@0: Default: false kkonganti@0: kkonganti@0: --serotypefinder_run : Run SerotypeFinder tool. Default: true kkonganti@0: kkonganti@0: --serotypefinder_x : Generate extended output files. Default: kkonganti@0: true kkonganti@0: kkonganti@0: --serotypefinder_db : Path to SerotypeFinder databases. Default: / kkonganti@0: hpc/db/serotypefinder/2.0.2 kkonganti@0: kkonganti@0: --serotypefinder_min_threshold : Minimum percent identity (in float) kkonganti@0: required for calling a hit. Default: 0.85 kkonganti@0: kkonganti@0: --serotypefinder_min_cov : Minumum percent coverage (in float) kkonganti@0: required for calling a hit. Default: 0.80 kkonganti@0: kkonganti@0: --seqsero2_run : Run SeqSero2 tool. Default: false kkonganti@0: kkonganti@0: --seqsero2_t : '1' for interleaved paired-end reads, '2' kkonganti@0: for separated paired-end reads, '3' for kkonganti@0: single reads, '4' for genome assembly, '5' kkonganti@0: for nanopore reads (fasta/fastq). Default: kkonganti@0: 4 kkonganti@0: kkonganti@0: --seqsero2_m : Which workflow to apply, 'a'(raw reads kkonganti@0: allele micro-assembly), 'k'(raw reads and kkonganti@0: genome assembly k-mer). Default: k kkonganti@0: kkonganti@0: --seqsero2_c : SeqSero2 will only output serotype kkonganti@0: prediction without the directory containing kkonganti@0: log files. Default: false kkonganti@0: kkonganti@0: --seqsero2_s : SeqSero2 will not output header in kkonganti@0: SeqSero_result.tsv. Default: false kkonganti@0: kkonganti@0: --mlst_run : Run MLST tool. Default: true kkonganti@0: kkonganti@0: --mlst_minid : DNA %identity of full allelle to consider ' kkonganti@0: similar' [~]. Default: 95 kkonganti@0: kkonganti@0: --mlst_mincov : DNA %cov to report partial allele at all [?]. kkonganti@0: Default: 10 kkonganti@0: kkonganti@0: --mlst_minscore : Minumum score out of 100 to match a scheme. kkonganti@0: Default: 50 kkonganti@0: kkonganti@0: --abricate_run : Run ABRicate tool. Default: true kkonganti@0: kkonganti@0: --abricate_minid : Minimum DNA %identity. Defaut: 90 kkonganti@0: kkonganti@0: --abricate_mincov : Minimum DNA %coverage. Defaut: 80 kkonganti@0: kkonganti@0: --abricate_datadir : ABRicate databases folder. Defaut: /hpc/db/ kkonganti@0: abricate/1.0.1/db kkonganti@0: kkonganti@0: Help options : kkonganti@0: kkonganti@0: --help : Display this message. kkonganti@0: ``` kkonganti@0: kkonganti@0: ### **BETA** kkonganti@0: kkonganti@0: --- kkonganti@0: The development of the modular structure and flow is an ongoing effort and may change depending on assessment of various computational topics and other considerations.