cfsan_centriflaken: 0.4.0/readme/centriflaken

comparison 0.4.0/readme/centriflaken_hy.md @ 101:ce6d9548fe89

"planemo upload"

author	kkonganti
date	Thu, 04 Aug 2022 10:45:55 -0400
parents
children

comparison

equal deleted inserted replaced

-:9d9537c907bd
+:ce6d9548fe89
+# CPIPES (CFSAN PIPELINES)
+## The modular pipeline repository at CFSAN, FDA
+**CPIPES** (CFSAN PIPELINES) is a collection of modular pipelines based on **NEXTFLOW**,
+mostly for bioinformatics data analysis at **CFSAN, FDA.**
+---
+### **centriflaken_hy**
+---
+`centriflaken_hy` is a variant of the original `centriflaken` pipeline but for Illumina short reads either single-end or paired-end.
+#### Workflow Usage
+```bash
+module load cpipes/0.4.0
+cpipes --pipeline centriflaken_hy [options]
+```
+Example: Run the default `centriflaken_hy` pipeline with taxa of interest as *E. coli*.
+```bash
+cd /hpc/scratch/$USER
+mkdir nf-cpipes
+cd nf-cpipes
+cpipes --pipeline centriflaken_hy --input /path/to/illumina/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
+```
+Example: Run the `centriflaken_hy` pipeline with taxa of interest as *Salmonella*. In this mode, `SerotypeFinder` tool will be replaced with `SeqSero2` tool.
+```bash
+cd /hpc/scratch/$USER
+mkdir nf-cpipes
+cd nf-cpipes
+cpipes --pipeline centriflaken_hy --centrifuge_extract_bug 'Salmonella' --input /path/to/illumina/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov'
+```
+#### `centriflaken_hy` Help
+```text
+[Kranti.Konganti@login2-slurm ]$ cpipes --pipeline centriflaken_hy --help
+N E X T F L O W  ~  version 21.12.1-edge
+Launching `/home/Kranti.Konganti/apps/cpipes/cpipes` [soggy_curie] - revision: 72db279311
+================================================================================
+(o)
+___  _ __   _  _ __    ___  ___
+/ __|| '_ \ | || '_ \  / _ \/ __|
+| (__ | |_) || || |_) ||  __/\__ \
+\___|| .__/ |_|| .__/  \___||___/
+| |       | |
+|_|       |_|
+--------------------------------------------------------------------------------
+A collection of modular pipelines at CFSAN, FDA.
+--------------------------------------------------------------------------------
+Name                            : CPIPES
+Author                          : Kranti.Konganti@fda.hhs.gov
+Version                         : 0.4.0
+Center                          : CFSAN, FDA.
+================================================================================
+Workflow                        : centriflaken_hy
+Author                          : Kranti.Konganti@fda.hhs.gov
+Version                         : 0.4.0
+Usage                           : cpipes --pipeline centriflaken_hy [options]
+Required                        :
+--input                         : Absolute path to directory containing FASTQ
+files. The directory should contain only
+FASTQ files as all the files within the
+mentioned directory will be read. Ex: --
+input /path/to/fastq_pass
+--output                        : Absolute path to directory where all the
+pipeline outputs should be stored. Ex: --
+output /path/to/output
+Other options                   :
+--metadata                      : Absolute path to metadata CSV file
+containing five mandatory columns: sample,
+fq1,fq2,strandedness,single_end. The fq1
+and fq2 columns contain absolute paths to
+the FASTQ files. This option can be used in
+place of --input option. This is rare. Ex: --
+metadata samplesheet.csv
+--fq_suffix                     : The suffix of FASTQ files (Unpaired reads
+or R1 reads or Long reads) if an input
+directory is mentioned via --input option.
+Default: _R1_001.fastq.gz
+--fq2_suffix                    : The suffix of FASTQ files (Paired-end reads
+or R2 reads) if an input directory is
+mentioned via --input option. Default:
+_R2_001.fastq.gz
+--fq_filter_by_len              : Remove FASTQ reads that are less than this
+many bases. Default: 75
+--fq_strandedness               : The strandedness of the sequencing run.
+This is mostly needed if your sequencing
+run is RNA-SEQ. For most of the other runs,
+it is probably safe to use unstranded for
+the option. Default: unstranded
+--fq_single_end                 : SINGLE-END information will be auto-
+detected but this option forces PAIRED-END
+FASTQ files to be treated as SINGLE-END so
+only read 1 information is included in auto-
+generated samplesheet. Default: false
+--fq_filename_delim             : Delimiter by which the file name is split
+to obtain sample name. Default: _
+--fq_filename_delim_idx         : After splitting FASTQ file name by using
+the --fq_filename_delim option, all
+elements before this index (1-based) will
+be joined to create final sample name.
+Default: 1
+--seqkit_rmdup_run              : Remove duplicate sequences using seqkit
+rmdup. Default: false
+--seqkit_rmdup_n                : Match and remove duplicate sequences by
+full name instead of just ID. Defaut: false
+--seqkit_rmdup_s                : Match and remove duplicate sequences by
+sequence content. Defaut: true
+--seqkit_rmdup_d                : Save the duplicated sequences to a file.
+Defaut: false
+--seqkit_rmdup_D                : Save the number and list of duplicated
+sequences to a file. Defaut: false
+--seqkit_rmdup_i                : Ignore case while using seqkit rmdup.
+Defaut: false
+--seqkit_rmdup_P                : Only consider positive strand (i.e. 5')
+when comparing by sequence content. Defaut:
+false
+--kraken2_db                    : Absolute path to kraken database. Default: /
+hpc/db/kraken2/standard-210914
+--kraken2_confidence            : Confidence score threshold which must be
+between 0 and 1. Default: 0.0
+--kraken2_quick                 : Quick operation (use first hit or hits).
+Default: false
+--kraken2_use_mpa_style         : Report output like Kraken 1's kraken-mpa-
+report. Default: false
+--kraken2_minimum_base_quality  : Minimum base quality used in classification
+which is only effective with FASTQ input.
+Default: 0
+--kraken2_report_zero_counts    : Report counts for ALL taxa, even if counts
+are zero. Default: false
+--kraken2_report_minmizer_data  : Report minimizer and distinct minimizer
+count information in addition to normal
+Kraken report. Default: false
+--kraken2_use_names             : Print scientific names instead of just
+taxids. Default: true
+--kraken2_extract_bug           : Extract the reads or contigs beloging to
+this bug. Default: Escherichia coli
+--centrifuge_x                  : Absolute path to centrifuge database.
+Default: /hpc/db/centrifuge/2022-04-12/ab
+--centrifuge_save_unaligned     : Save SINGLE-END reads that did not align.
+For PAIRED-END reads, save read pairs that
+did not align concordantly. Default: false
+--centrifuge_save_aligned       : Save SINGLE-END reads that aligned. For
+PAIRED-END reads, save read pairs that
+aligned concordantly. Default: false
+--centrifuge_out_fmt_sam        : Centrifuge output should be in SAM. Default:
+false
+--centrifuge_extract_bug        : Extract this bug from centrifuge results.
+Default: Escherichia coli
+--centrifuge_ignore_quals       : Treat all quality values as 30 on Phred
+scale. Default: false
+--megahit_run                   : Run MEGAHIT assembler. Default: true
+--megahit_min_count             : <int>. Minimum multiplicity for filtering (
+k_min+1)-mers. Defaut: false
+--megahit_k_list                : Comma-separated list of kmer size. All
+values must be odd, in the range 15-255,
+increment should be <= 28. Ex: '21,29,39,59,
+79,99,119,141'. Default: false
+--megahit_no_mercy              : Do not add mercy k-mers. Default: false
+--megahit_bubble_level          : <int>. Intensity of bubble merging (0-2), 0
+to disable. Default: false
+--megahit_merge_level           : <l,s>. Merge complex bubbles of length <= l*
+kmer_size and similarity >= s. Default:
+false
+--megahit_prune_level           : <int>. Strength of low depth pruning (0-3).
+Default: false
+--megahit_prune_depth           : <int>. Remove unitigs with avg k-mer depth
+less than this value. Default: false
+--megahit_low_local_ratio       : <float>. Ratio threshold to define low
+local coverage contigs. Default: false
+--megahit_max_tip_len           : <int>. remove tips less than this value [<
+int> * k]. Default: false
+--megahit_no_local              : Disable local assembly. Default: false
+--megahit_kmin_1pass            : Use 1pass mode to build SdBG of k_min.
+Default: false
+--megahit_preset                : <str>. Override a group of parameters.
+Valid values are meta-sensitive which
+enforces '--min-count 1 --k-list 21,29,39,
+49,...,129,141', meta-large (large &
+complex metagenomes, like soil) which
+enforces '--k-min 27 --k-max 127 --k-step
+10'. Default: meta-sensitive
+--megahit_mem_flag              : <int>. SdBG builder memory mode. 0: minimum;
+1: moderate; 2: use all memory specified.
+Default: 2
+--megahit_min_contig_len        : <int>.  Minimum length of contigs to output.
+Default: false
+--spades_run                    : Run SPAdes assembler. Default: false
+--spades_isolate                : This flag is highly recommended for high-
+coverage isolate and multi-cell data.
+Defaut: false
+--spades_sc                     : This flag is required for MDA (single-cell)
+data. Default: false
+--spades_meta                   : This flag is required for metagenomic data.
+Default: true
+--spades_bio                    : This flag is required for biosytheticSPAdes
+mode. Default: false
+--spades_corona                 : This flag is required for coronaSPAdes mode.
+Default: false
+--spades_rna                    : This flag is required for RNA-Seq data.
+Default: false
+--spades_plasmid                : Runs plasmidSPAdes pipeline for plasmid
+detection. Default: false
+--spades_metaviral              : Runs metaviralSPAdes pipeline for virus
+detection. Default: false
+--spades_metaplasmid            : Runs metaplasmidSPAdes pipeline for plasmid
+detection in metagenomics datasets. Default:
+false
+--spades_rnaviral               : This flag enables virus assembly module
+from RNA-Seq data. Default: false
+--spades_iontorrent             : This flag is required for IonTorrent data.
+Default: false
+--spades_only_assembler         : Runs only the SPAdes assembler module (
+without read error correction). Default:
+false
+--spades_careful                : Tries to reduce the number of mismatches
+and short indels in the assembly. Default:
+false
+--spades_cov_cutoff             : Coverage cutoff value (a positive float
+number). Default: false
+--spades_k                      : List of k-mer sizes (must be odd and less
+than 128). Default: false
+--spades_hmm                    : Directory with custom hmms that replace the
+default ones (very rare). Default: false
+--serotypefinder_run            : Run SerotypeFinder tool. Default: true
+--serotypefinder_x              : Generate extended output files. Default:
+true
+--serotypefinder_db             : Path to SerotypeFinder databases. Default: /
+hpc/db/serotypefinder/2.0.2
+--serotypefinder_min_threshold  : Minimum percent identity (in float)
+required for calling a hit. Default: 0.85
+--serotypefinder_min_cov        : Minumum percent coverage (in float)
+required for calling a hit. Default: 0.80
+--seqsero2_run                  : Run SeqSero2 tool. Default: false
+--seqsero2_t                    : '1' for interleaved paired-end reads, '2'
+for separated paired-end reads, '3' for
+single reads, '4' for genome assembly, '5'
+for nanopore reads (fasta/fastq). Default:
+4
+--seqsero2_m                    : Which workflow to apply, 'a'(raw reads
+allele micro-assembly), 'k'(raw reads and
+genome assembly k-mer). Default: k
+--seqsero2_c                    : SeqSero2 will only output serotype
+prediction without the directory containing
+log files. Default: false
+--seqsero2_s                    : SeqSero2 will not output header in
+SeqSero_result.tsv. Default: false
+--mlst_run                      : Run MLST tool. Default: true
+--mlst_minid                    : DNA %identity of full allelle to consider '
+similar' [~]. Default: 95
+--mlst_mincov                   : DNA %cov to report partial allele at all [?].
+Default: 10
+--mlst_minscore                 : Minumum score out of 100 to match a scheme.
+Default: 50
+--abricate_run                  : Run ABRicate tool. Default: true
+--abricate_minid                : Minimum DNA %identity. Defaut: 90
+--abricate_mincov               : Minimum DNA %coverage. Defaut: 80
+--abricate_datadir              : ABRicate databases folder. Defaut: /hpc/db/
+abricate/1.0.1/db
+Help options                    :
+--help                          : Display this message.
+```
+### **BETA**
+---
+The development of the modular structure and flow is an ongoing effort and may change depending on assessment of various computational topics and other considerations.

Mercurial > repos > kkonganti > cfsan_centriflaken

comparison 0.4.0/readme/centriflaken_hy.md @ 101:ce6d9548fe89