Mercurial > repos > kkonganti > hfp_nowayout
view 0.5.0/readme/centriflaken.md @ 0:97cd2f532efe
planemo upload
author | kkonganti |
---|---|
date | Mon, 31 Mar 2025 14:50:40 -0400 |
parents | |
children |
line wrap: on
line source
# CPIPES (CFSAN PIPELINES) ## The modular pipeline repository at CFSAN, FDA **CPIPES** (CFSAN PIPELINES) is a collection of modular pipelines based on **NEXTFLOW**, mostly for bioinformatics data analysis at **CFSAN, FDA.** --- ### **centriflaken** --- Precision long-read metagenomics sequencing for food safety by detection and assembly of Shiga toxin-producing Escherichia coli. #### Workflow Usage ```bash module load cpipes/0.4.0 cpipes --pipeline centriflaken [options] ``` Example: Run the default `centriflaken` pipeline with taxa of interest as *E. coli*. ```bash cd /hpc/scratch/$USER mkdir nf-cpipes cd nf-cpipes cpipes --pipeline centriflaken --input /path/to/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov' ``` Example: Run the `centriflaken` pipeline with taxa of interest as *Salmonella*. In this mode, `SerotypeFinder` tool will be replaced with `SeqSero2` tool. ```bash cd /hpc/scratch/$USER mkdir nf-cpipes cd nf-cpipes cpipes --pipeline centriflaken --centrifuge_extract_bug 'Salmonella' --input /path/to/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov' ``` #### `centriflaken` Help ```text [Kranti.Konganti@login2-slurm ]$ cpipes --pipeline centriflaken --help N E X T F L O W ~ version 21.12.1-edge Launching `/nfs/software/apps/cpipes/0.4.0/cpipes` [crazy_euler] - revision: 72db279311 ================================================================================ (o) ___ _ __ _ _ __ ___ ___ / __|| '_ \ | || '_ \ / _ \/ __| | (__ | |_) || || |_) || __/\__ \ \___|| .__/ |_|| .__/ \___||___/ | | | | |_| |_| -------------------------------------------------------------------------------- A collection of modular pipelines at CFSAN, FDA. -------------------------------------------------------------------------------- Name : CPIPES Author : Kranti.Konganti@fda.hhs.gov Version : 0.4.0 Center : CFSAN, FDA. ================================================================================ Workflow : centriflaken Author : Kranti.Konganti@fda.hhs.gov Version : 0.2.1 Usage : cpipes --pipeline centriflaken [options] Required : --input : Absolute path to directory containing FASTQ files. The directory should contain only FASTQ files as all the files within the mentioned directory will be read. Ex: -- input /path/to/fastq_pass --output : Absolute path to directory where all the pipeline outputs should be stored. Ex: -- output /path/to/output Other options : --metadata : Absolute path to metadata CSV file containing five mandatory columns: sample, fq1,fq2,strandedness,single_end. The fq1 and fq2 columns contain absolute paths to the FASTQ files. This option can be used in place of --input option. This is rare. Ex: -- metadata samplesheet.csv --fq_suffix : The suffix of FASTQ files (Unpaired reads or R1 reads or Long reads) if an input directory is mentioned via --input option. Default: .fastq.gz --fq2_suffix : The suffix of FASTQ files (Paired-end reads or R2 reads) if an input directory is mentioned via --input option. Default: false --fq_filter_by_len : Remove FASTQ reads that are less than this many bases. Default: 4000 --fq_strandedness : The strandedness of the sequencing run. This is mostly needed if your sequencing run is RNA-SEQ. For most of the other runs, it is probably safe to use unstranded for the option. Default: unstranded --fq_single_end : SINGLE-END information will be auto- detected but this option forces PAIRED-END FASTQ files to be treated as SINGLE-END so only read 1 information is included in auto- generated samplesheet. Default: false --fq_filename_delim : Delimiter by which the file name is split to obtain sample name. Default: _ --fq_filename_delim_idx : After splitting FASTQ file name by using the --fq_filename_delim option, all elements before this index (1-based) will be joined to create final sample name. Default: 1 --kraken2_db : Absolute path to kraken database. Default: / hpc/db/kraken2/standard-210914 --kraken2_confidence : Confidence score threshold which must be between 0 and 1. Default: 0.0 --kraken2_quick : Quick operation (use first hit or hits). Default: false --kraken2_use_mpa_style : Report output like Kraken 1's kraken-mpa- report. Default: false --kraken2_minimum_base_quality : Minimum base quality used in classification which is only effective with FASTQ input. Default: 0 --kraken2_report_zero_counts : Report counts for ALL taxa, even if counts are zero. Default: false --kraken2_report_minmizer_data : Report minimizer and distinct minimizer count information in addition to normal Kraken report. Default: false --kraken2_use_names : Print scientific names instead of just taxids. Default: true --kraken2_extract_bug : Extract the reads or contigs beloging to this bug. Default: Escherichia coli --centrifuge_x : Absolute path to centrifuge database. Default: /hpc/db/centrifuge/2022-04-12/ab --centrifuge_save_unaligned : Save SINGLE-END reads that did not align. For PAIRED-END reads, save read pairs that did not align concordantly. Default: false --centrifuge_save_aligned : Save SINGLE-END reads that aligned. For PAIRED-END reads, save read pairs that aligned concordantly. Default: false --centrifuge_out_fmt_sam : Centrifuge output should be in SAM. Default: false --centrifuge_extract_bug : Extract this bug from centrifuge results. Default: Escherichia coli --centrifuge_ignore_quals : Treat all quality values as 30 on Phred scale. Default: false --flye_pacbio_raw : Input FASTQ reads are PacBio regular CLR reads (<20% error) Defaut: false --flye_pacbio_corr : Input FASTQ reads are PacBio reads that were corrected with other methods (<3% error). Default: false --flye_pacbio_hifi : Input FASTQ reads are PacBio HiFi reads (<1% error). Default: false --flye_nano_raw : Input FASTQ reads are ONT regular reads, pre-Guppy5 (<20% error). Default: true --flye_nano_corr : Input FASTQ reads are ONT reads that were corrected with other methods (<3% error). Default: false --flye_nano_hq : Input FASTQ reads are ONT high-quality reads: Guppy5+ SUP or Q20 (<5% error). Default: false --flye_genome_size : Estimated genome size (for example, 5m or 2. 6g). Default: 5.5m --flye_polish_iter : Number of genome polishing iterations. Default: false --flye_meta : Do a metagenome assembly (unenven coverage mode). Default: true --flye_min_overlap : Minimum overlap between reads. Default: false --flye_scaffold : Enable scaffolding using assembly graph. Default: false --serotypefinder_run : Run SerotypeFinder tool. Default: true --serotypefinder_x : Generate extended output files. Default: true --serotypefinder_db : Path to SerotypeFinder databases. Default: / hpc/db/serotypefinder/2.0.2 --serotypefinder_min_threshold : Minimum percent identity (in float) required for calling a hit. Default: 0.85 --serotypefinder_min_cov : Minumum percent coverage (in float) required for calling a hit. Default: 0.80 --seqsero2_run : Run SeqSero2 tool. Default: false --seqsero2_t : '1' for interleaved paired-end reads, '2' for separated paired-end reads, '3' for single reads, '4' for genome assembly, '5' for nanopore reads (fasta/fastq). Default: 4 --seqsero2_m : Which workflow to apply, 'a'(raw reads allele micro-assembly), 'k'(raw reads and genome assembly k-mer). Default: k --seqsero2_c : SeqSero2 will only output serotype prediction without the directory containing log files. Default: false --seqsero2_s : SeqSero2 will not output header in SeqSero_result.tsv. Default: false --mlst_run : Run MLST tool. Default: true --mlst_minid : DNA %identity of full allelle to consider ' similar' [~]. Default: 95 --mlst_mincov : DNA %cov to report partial allele at all [?]. Default: 10 --mlst_minscore : Minumum score out of 100 to match a scheme. Default: 50 --abricate_run : Run ABRicate tool. Default: true --abricate_minid : Minimum DNA %identity. Defaut: 90 --abricate_mincov : Minimum DNA %coverage. Defaut: 80 --abricate_datadir : ABRicate databases folder. Defaut: /hpc/db/ abricate/1.0.1/db Help options : --help : Display this message. ``` ### **BETA** --- The development of the modular structure and flow is an ongoing effort and may change depending on assessment of various computational topics and other considerations.