Mercurial > repos > kkonganti > cfsan_bettercallsal
view 0.5.0/readme/bettercallsal.md @ 1:365849f031fd
"planemo upload"
author | kkonganti |
---|---|
date | Mon, 05 Jun 2023 18:48:51 -0400 |
parents | |
children |
line wrap: on
line source
# bettercallsal `bettercallsal` is an automated workflow to assign Salmonella serotype based on [NCBI Pathogens Database](https://www.ncbi.nlm.nih.gov/pathogens). It uses `MASH` to reduce the search space followed by additional genome filtering with `sourmash`. It then performs genome based alignment with `kma` followed by count generation using `salmon`. This workflow is especially useful in a case where a sample is of multi-serovar mixture. \ <!-- TOC --> - [Minimum Requirements](#minimum-requirements) - [Usage and Examples](#usage-and-examples) - [Database](#database) - [Input](#input) - [Output](#output) - [Computational resources](#computational-resources) - [Runtime profiles](#runtime-profiles) - [your_institution.config](#your_institutionconfig) - [Cloud computing](#cloud-computing) - [Example data](#example-data) - [Using sourmash](#using-sourmash) - [bettercallsal CLI Help](#bettercallsal-cli-help) <!-- /TOC --> \ ## Minimum Requirements 1. [Nextflow version 22.10.0](https://github.com/nextflow-io/nextflow/releases/download/v22.10.0/nextflow). - Make the `nextflow` binary executable (`chmod 755 nextflow`) and also make sure that it is made available in your `$PATH`. - If your existing `JAVA` install does not support the newest **Nextflow** version, you can try **Amazon**'s `JAVA` (OpenJDK): [Corretto](https://corretto.aws/downloads/latest/amazon-corretto-17-x64-linux-jdk.tar.gz). 2. Either of `micromamba` or `docker` or `singularity` installed and made available in your `$PATH`. - Running the workflow via `micromamba` software provisioning is **preferred** as it does not require any `sudo` or `admin` privileges or any other configurations with respect to the various container providers. - To install `micromamba` for your system type, please follow these [installation steps](https://mamba.readthedocs.io/en/latest/installation.html#manual-installation) and make sure that the `micromamba` binary is made available in your `$PATH`. - Just the `curl` step is sufficient to download the binary as far as running the workflows are concerned. 3. Minimum of 10 CPU cores and about 16 GBs for main workflow steps. More memory may be required if your **FASTQ** files are big. \ ## Usage and Examples Clone or download this repository and then call `cpipes`. ```bash cpipes --pipeline bettercallsal [options] ``` \ **Example**: Run the default `bettercallsal` pipeline in single-end mode. ```bash cd /data/scratch/$USER mkdir nf-cpipes cd nf-cpipes cpipes --pipeline bettercallsal \ --input /path/to/illumina/fastq/dir \ --output /path/to/output \ --bcs_root_dbdir /data/Kranti_Konganti/bettercallsal_db ``` \ **Example**: Run the `bettercallsal` pipeline in paired-end mode. In this mode, the `R1` and `R2` files are concatenated. We have found that concatenated reads yields better calling rates. Please refer to the **Methods** and the **Results** section in our [preprint](https://www.biorxiv.org/content/10.1101/2023.04.06.535929v1.full) for more information. Users can still choose to use `bbmerge.sh` by adding the following options on the command-line: `--bbmerge_run true --bcs_concat_pe false`. ```bash cd /data/scratch/$USER mkdir nf-cpipes cd nf-cpipes cpipes \ --pipeline bettercallsal \ --input /path/to/illumina/fastq/dir \ --output /path/to/output \ --bcs_root_dbdir /data/Kranti_Konganti/bettercallsal_db \ --fq_single_end false \ --fq_suffix '_R1_001.fastq.gz' ``` \ ### Database --- The successful run of the workflow requires certain database flat files specific for the workflow. Please refer to `bettercallsal_db` [README](./bettercallsal_db.md) if you would like to run the workflow on the latest version of the **PDG** release. ### Input --- The input to the workflow is a folder containing compressed (`.gz`) FASTQ files. Please note that the sample grouping happens automatically by the file name of the FASTQ file. If for example, a single sample is sequenced across multiple sequencing lanes, you can choose to group those FASTQ files into one sample by using the `--fq_filename_delim` and `--fq_filename_delim_idx` options. By default, `--fq_filename_delim` is set to `_` (underscore) and `--fq_filename_delim_idx` is set to 1. For example, if the directory contains FASTQ files as shown below: - KB-01_apple_L001_R1.fastq.gz - KB-01_apple_L001_R2.fastq.gz - KB-01_apple_L002_R1.fastq.gz - KB-01_apple_L002_R2.fastq.gz - KB-02_mango_L001_R1.fastq.gz - KB-02_mango_L001_R2.fastq.gz - KB-02_mango_L002_R1.fastq.gz - KB-02_mango_L002_R2.fastq.gz Then, to create 2 sample groups, `apple` and `mango`, we split the file name by the delimitor (underscore in the case, which is default) and group by the first 2 words (`--fq_filename_delim_idx 2`). This goes without saying that all the FASTQ files should have uniform naming patterns so that `--fq_filename_delim` and `--fq_filename_delim_idx` options do not have any adverse effect in collecting and creating a sample metadata sheet. \ ### Output --- All the outputs for each step are stored inside the folder mentioned with the `--output` option. A `multiqc_report.html` file inside the `bettercallsal-multiqc` folder can be opened in any browser on your local workstation which contains a consolidated brief report. \ ### Computational resources --- The workflow `bettercallsal` requires at least a minimum of 16 GBs of memory to successfully finish the workflow. By default, `bettercallsal` uses 10 CPU cores where possible. You can change this behavior and adjust the CPU cores with `--max_cpus` option. \ Example: ```bash cpipes \ --pipeline bettercallsal \ --input /path/to/bettercallsal_sim_reads \ --output /path/to/bettercallsal_sim_reads_output \ --bcs_root_dbdir /path/to/PDG000000002.2537 --kmaalign_ignorequals \ --max_cpus 5 \ -profile stdkondagac \ -resume ``` \ ### Runtime profiles --- You can use different run time profiles that suit your specific compute environments i.e., you can run the workflow locally on your machine or in a grid computing infrastructure. \ Example: ```bash cd /data/scratch/$USER mkdir nf-cpipes cd nf-cpipes cpipes \ --pipeline bettercallsal \ --input /path/to/fastq_pass_dir \ --output /path/to/where/output/should/go \ -profile your_institution ``` The above command would run the pipeline and store the output at the location per the `--output` flag and the **NEXTFLOW** reports are always stored in the current working directory from where `cpipes` is run. For example, for the above command, a directory called `CPIPES-bettercallsal` would hold all the **NEXTFLOW** related logs, reports and trace files. \ ### `your_institution.config` --- In the above example, we can see that we have mentioned the run time profile as `your_institution`. For this to work, add the following lines at the end of [`computeinfra.config`](../conf/computeinfra.config) file which should be located inside the `conf` folder. For example, if your institution uses **SGE** or **UNIVA** for grid computing instead of **SLURM** and has a job queue named `normal.q`, then add these lines: \ ```groovy your_institution { process.executor = 'sge' process.queue = 'normal.q' singularity.enabled = false singularity.autoMounts = true docker.enabled = false params.enable_conda = true conda.enabled = true conda.useMicromamba = true params.enable_module = false } ``` In the above example, by default, all the software provisioning choices are disabled except `conda`. You can also choose to remove the `process.queue` line altogether and the `bettercallsal` workflow will request the appropriate memory and number of CPU cores automatically, which ranges from 1 CPU, 1 GB and 1 hour for job completion up to 10 CPU cores, 1 TB and 120 hours for job completion. \ ### Cloud computing --- You can run the workflow in the cloud (works only with proper set up of AWS resources). Add new run time profiles with required parameters per [Nextflow docs](https://www.nextflow.io/docs/latest/executor.html): \ Example: ```groovy my_aws_batch { executor = 'awsbatch' queue = 'my-batch-queue' aws.batch.cliPath = '/home/ec2-user/miniconda/bin/aws' aws.batch.region = 'us-east-1' singularity.enabled = false singularity.autoMounts = true docker.enabled = true params.conda_enabled = false params.enable_module = false } ``` \ ### Example data --- After you make sure that you have all the [minimum requirements](#minimum-requirements) to run the workflow, you can try the `bettercallsal` pipeline on some simulated reads. The following input dataset contains simulated reads for `Montevideo` and `I 4,[5],12:i:-` in about roughly equal proportions. - Download simulated reads: [S3](https://cfsan-pub-xfer.s3.amazonaws.com/Kranti.Konganti/bettercallsal/bettercallsal_sim_reads.tar.bz2) (~ 3 GB). - Download pre-formatted test database: [S3](https://cfsan-pub-xfer.s3.amazonaws.com/Kranti.Konganti/bettercallsal/PDG000000002.2491.test-db.tar.bz2) (~ 75 MB). This test database works only with the simulated reads. - Download pre-formatted full database (**Optional**): If you would like to do a complete run with your own **FASTQ** datasets, you can either create your own [database](./bettercallsal_db.md) or use [PDG000000002.2537](https://cfsan-pub-xfer.s3.amazonaws.com/Kranti.Konganti/bettercallsal/PDG000000002.2537.tar.bz2) version of the database (~ 37 GB). - After succesful run of the workflow, your **MultiQC** report should look something like [this](https://cfsan-pub-xfer.s3.amazonaws.com/Kranti.Konganti/bettercallsal/bettercallsal_sim_reads_mqc.html). Now run the workflow by ignoring quality values since these are simulated base qualities: \ ```bash cpipes \ --pipeline bettercallsal \ --input /path/to/bettercallsal_sim_reads \ --output /path/to/bettercallsal_sim_reads_output \ --bcs_root_dbdir /path/to/PDG000000002.2537 --kmaalign_ignorequals \ -profile stdkondagac \ -resume ``` Please note that the run time profile `stdkondagac` will run jobs locally using `micromamba` for software provisioning. The first time you run the command, a new folder called `kondagac_cache` will be created and subsequent runs should use this `conda` cache. \ ## Using `sourmash` Beginning with `v0.3.0` of `bettercallsal` workflow, `sourmash` sketching is used to further narrow down possible serotype hits. It is **ON** by default. This will enable the generation of **ANI Containment** matrix for **Samples** vs **Genomes**. There may be multiple hits for the same serotype in the final **MultiQC** report as multiple genome accessions can belong to a single serotype. You can turn **OFF** this feature with `--sourmashsketch_run false` option. \ ## `bettercallsal` CLI Help ```text [Kranti_Konganti@my-unix-box ]$ cpipes --pipeline bettercallsal --help N E X T F L O W ~ version 22.10.0 Launching `./bettercallsal/cpipes` [awesome_chandrasekhar] DSL2 - revision: 8da4e11078 ================================================================================ (o) ___ _ __ _ _ __ ___ ___ / __|| '_ \ | || '_ \ / _ \/ __| | (__ | |_) || || |_) || __/\__ \ \___|| .__/ |_|| .__/ \___||___/ | | | | |_| |_| -------------------------------------------------------------------------------- A collection of modular pipelines at CFSAN, FDA. -------------------------------------------------------------------------------- Name : CPIPES Author : Kranti Konganti Version : 0.5.0 Center : CFSAN, FDA. ================================================================================ Workflow : bettercallsal Author : Kranti Konganti Version : 0.5.0 Usage : cpipes --pipeline bettercallsal [options] Required : --input : Absolute path to directory containing FASTQ files. The directory should contain only FASTQ files as all the files within the mentioned directory will be read. Ex: -- input /path/to/fastq_pass --output : Absolute path to directory where all the pipeline outputs should be stored. Ex: -- output /path/to/output Other options : --metadata : Absolute path to metadata CSV file containing five mandatory columns: sample, fq1,fq2,strandedness,single_end. The fq1 and fq2 columns contain absolute paths to the FASTQ files. This option can be used in place of --input option. This is rare. Ex : --metadata samplesheet.csv --fq_suffix : The suffix of FASTQ files (Unpaired reads or R1 reads or Long reads) if an input directory is mentioned via --input option. Default: .fastq.gz --fq2_suffix : The suffix of FASTQ files (Paired-end reads or R2 reads) if an input directory is mentioned via --input option. Default: _R2_001.fastq.gz --fq_filter_by_len : Remove FASTQ reads that are less than this many bases. Default: 0 --fq_strandedness : The strandedness of the sequencing run. This is mostly needed if your sequencing run is RNA-SEQ. For most of the other runs , it is probably safe to use unstranded for the option. Default: unstranded --fq_single_end : SINGLE-END information will be auto- detected but this option forces PAIRED-END FASTQ files to be treated as SINGLE-END so only read 1 information is included in auto -generated samplesheet. Default: true --fq_filename_delim : Delimiter by which the file name is split to obtain sample name. Default: _ --fq_filename_delim_idx : After splitting FASTQ file name by using the --fq_filename_delim option, all elements before this index (1-based) will be joined to create final sample name. Default: 1 --bcs_concat_pe : Concatenate paired-end files. Default: true --bbmerge_run : Run BBMerge tool. Default: false --bbmerge_reads : Quit after this many read pairs (-1 means all) Default: -1 --bbmerge_adapters : Absolute UNIX path pointing to the adapters file in FASTA format. Default: false --bbmerge_ziplevel : Set to 1 (lowest) through 9 (max) to change compression level; lower compression is faster. Default: 1 --bbmerge_ordered : Output reads in the same order as input. Default: false --bbmerge_qtrim : Trim read ends to remove bases with quality below --bbmerge_minq. Trims BEFORE merging . Values: t (trim both ends), f (neither end), r (right end only), l (left end only ). Default: true --bbmerge_qtrim2 : May be specified instead of --bbmerge_qtrim to perform trimming only if merging is unsuccesful. then retry merging. Default: false --bbmerge_trimq : Trim quality threshold. This may be comma- delimited list (ascending) to try multiple values. Default: 10 --bbmerge_minlength : (ml) Reads shorter than this after trimming , but before merging, will be discarded. Pairs will be discarded onlyif both are shorter. Default: 1 --bbmerge_tbo : (trimbyoverlap). Trim overlapping reads to remove right most (3') non-overlaping portion instead of joining Default: false --bbmerge_minavgquality : (maq). Reads with average quality below this after trimming will not be attempted to merge. Default: 30 --bbmerge_trimpolya : Trim trailing poly-A tail from adapter output. Only affects outadapter. This also trims poly-A followed by poly-G, which occurs on NextSeq. Default: true --bbmerge_pfilter : Ban improbable overlaps. Higher is more strict. 0 will disable the filter; 1 will allow only perfect overlaps. Default: 1 --bbmerge_ouq : Calculate best overlap using quality values . Default: false --bbmerge_owq : Calculate best overlap without using quality values. Default: true --bbmerge_strict : Decrease false positive rate and merging rate. Default: false --bbmerge_verystrict : Greatly decrease false positive rate and merging rate. Default: false --bbmerge_ultrastrict : Decrease false positive rate and merging rate even more. Default: true --bbmerge_maxstrict : Maxiamally decrease false positive rate and merging rate. Default: false --bbmerge_loose : Increase false positive rate and merging rate. Default: false --bbmerge_veryloose : Greatly increase false positive rate and merging rate. Default: false --bbmerge_ultraloose : Increase false positive rate and merging rate even more. Default: false --bbmerge_maxloose : Maximally increase false positive rate and merging rate. Default: false --bbmerge_fast : Fastest possible preset. Default: false --bbmerge_k : Kmer length. 31 (or less) is fastest and uses the least memory, but higher values may be more accurate. 60 tends to work well for 150bp reads. Default: 60 --bbmerge_prealloc : Pre-allocate memory rather than dynamically growing. Faster and more memory-efficient for large datasets. A float fraction (0-1) may be specified, default 1. Default: true --fastp_run : Run fastp tool. Default: true --fastp_failed_out : Specify whether to store reads that cannot pass the filters. Default: false --fastp_merged_out : Specify whether to store merged output or not. Default: false --fastp_overlapped_out : For each read pair, output the overlapped region if it has no mismatched base. Default: false --fastp_6 : Indicate that the input is using phred64 scoring (it'll be converted to phred33, so the output will still be phred33). Default : false --fastp_reads_to_process : Specify how many reads/pairs are to be processed. Default value 0 means process all reads. Default: 0 --fastp_fix_mgi_id : The MGI FASTQ ID format is not compatible with many BAM operation tools, enable this option to fix it. Default: false --fastp_A : Disable adapter trimming. On by default. Default: false --fastp_adapter_fasta : Specify a FASTA file to trim both read1 and read2 (if PE) by all the sequences in this FASTA file. Default: false --fastp_f : Trim how many bases in front of read1. Default: 0 --fastp_t : Trim how many bases at the end of read1. Default: 0 --fastp_b : Max length of read1 after trimming. Default : 0 --fastp_F : Trim how many bases in front of read2. Default: 0 --fastp_T : Trim how many bases at the end of read2. Default: 0 --fastp_B : Max length of read2 after trimming. Default : 0 --fastp_dedup : Enable deduplication to drop the duplicated reads/pairs. Default: true --fastp_dup_calc_accuracy : Accuracy level to calculate duplication (1~ 6), higher level uses more memory (1G, 2G, 4G, 8G, 16G, 24G). Default 1 for no-dedup mode, and 3 for dedup mode. Default: 6 --fastp_poly_g_min_len : The minimum length to detect polyG in the read tail. Default: 10 --fastp_G : Disable polyG tail trimming. Default: true --fastp_x : Enable polyX trimming in 3' ends. Default: false --fastp_poly_x_min_len : The minimum length to detect polyX in the read tail. Default: 10 --fastp_cut_front : Move a sliding window from front (5') to tail, drop the bases in the window if its mean quality < threshold, stop otherwise. Default: true --fastp_cut_tail : Move a sliding window from tail (3') to front, drop the bases in the window if its mean quality < threshold, stop otherwise. Default: false --fastp_cut_right : Move a sliding window from tail, drop the bases in the window and the right part if its mean quality < threshold, and then stop . Default: true --fastp_W : Sliding window size shared by -- fastp_cut_front, --fastp_cut_tail and -- fastp_cut_right. Default: 20 --fastp_M : The mean quality requirement shared by -- fastp_cut_front, --fastp_cut_tail and -- fastp_cut_right. Default: 30 --fastp_q : The quality value below which a base should is not qualified. Default: 30 --fastp_u : What percent of bases are allowed to be unqualified. Default: 40 --fastp_n : How many N's can a read have. Default: 5 --fastp_e : If the full reads' average quality is below this value, then it is discarded. Default : 0 --fastp_l : Reads shorter than this length will be discarded. Default: 35 --fastp_max_len : Reads longer than this length will be discarded. Default: 0 --fastp_y : Enable low complexity filter. The complexity is defined as the percentage of bases that are different from its next base (base[i] != base[i+1]). Default: true --fastp_Y : The threshold for low complexity filter (0~ 100). Ex: A value of 30 means 30% complexity is required. Default: 30 --fastp_U : Enable Unique Molecular Identifier (UMI) pre-processing. Default: false --fastp_umi_loc : Specify the location of UMI, can be one of index1/index2/read1/read2/per_index/ per_read. Default: false --fastp_umi_len : If the UMI is in read1 or read2, its length should be provided. Default: false --fastp_umi_prefix : If specified, an underline will be used to connect prefix and UMI (i.e. prefix=UMI, UMI=AATTCG, final=UMI_AATTCG). Default: false --fastp_umi_skip : If the UMI is in read1 or read2, fastp can skip several bases following the UMI. Default: false --fastp_p : Enable overrepresented sequence analysis. Default: true --fastp_P : One in this many number of reads will be computed for overrepresentation analysis (1 ~10000), smaller is slower. Default: 20 --fastp_use_custom_adapaters : Use custom adapter FASTA with fastp on top of built-in adapter sequence auto-detection . Enabling this option will attempt to find and remove all possible Illumina adapter and primer sequences but will make the workflow run slow. Default: false --mashscreen_run : Run `mash screen` tool. Default: true --mashscreen_w : Winner-takes-all strategy for identity estimates. After counting hashes for each query, hashes that appear in multiple queries will be removed from all except the one with the best identity (ties broken by larger query), and other identities will be reduced. This removes output redundancy , providing a rough compositional outline . Default: false --mashscreen_i : Minimum identity to report. Inclusive unless set to zero, in which case only identities greater than zero (i.e. with at least one shared hash) will be reported. Set to -1 to output everything. (-1-1). Default: false --mashscreen_v : Maximum p-value to report (0-1). Default: false --tuspy_run : Run the get_top_unique_mash_hits_genomes.py script. Default: true --tuspy_s : Absolute UNIX path to metadata text file with the field separator, | and 5 fields: serotype|asm_lvl|asm_url|snp_cluster_idEx: serotype=Derby,antigen_formula=4:f,g:-| Scaffold|402440|ftp://...|PDS000096654.2. Mentioning this option will create a pickle file for the provided metadata and exits. Default: false --tuspy_m : Absolute UNIX path to mash screen results file. Default: false --tuspy_ps : Absolute UNIX Path to serialized metadata object in a pickle file. Default: /hpc/db/ bettercallsal/latest/index_metadata/ per_snp_cluster.ACC2SERO.pickle --tuspy_gd : Absolute UNIX Path to directory containing gzipped genome FASTA files. Default: /hpc/ db/bettercallsal/latest/scaffold_genomes --tuspy_gds : Genome FASTA file suffix to search for in the genome directory. Default: _scaffolded_genomic.fna.gz --tuspy_n : Return up to this many number of top N unique genome accession hits. Default: 10 --sourmashsketch_run : Run `sourmash sketch dna` tool. Default: true --sourmashsketch_mode : Select which type of signatures to be created: dna, protein, fromfile or translate. Default: dna --sourmashsketch_p : Signature parameters to use. Default: abund ,scaled=1000,k=51,k=61,k=71 --sourmashsketch_file : <path> A text file containing a list of sequence files to load. Default: false --sourmashsketch_f : Recompute signatures even if the file exists. Default: false --sourmashsketch_merge : Merge all input files into one signature file with the specified name. Default: false --sourmashsketch_singleton : Compute a signature for each sequence record individually. Default: true --sourmashsketch_name : Name the signature generated from each file after the first record in the file. Default: false --sourmashsketch_randomize : Shuffle the list of input files randomly. Default: false --sourmashgather_run : Run `sourmash gather` tool. Default: true --sourmashgather_n : Number of results to report. By default, will terminate at --sourmashgather_thr_bp value. Default: false --sourmashgather_thr_bp : Reporting threshold (in bp) for estimated overlap with remaining query. Default: false --sourmashgather_ignoreabn : Do NOT use k-mer abundances if present. Default: false --sourmashgather_prefetch : Use prefetch before gather. Default: false --sourmashgather_noprefetch : Do not use prefetch before gather. Default : false --sourmashgather_ani_ci : Output confidence intervals for ANI estimates. Default: true --sourmashgather_k : The k-mer size to select. Default: 71 --sourmashgather_protein : Choose a protein signature. Default: false --sourmashgather_noprotein : Do not choose a protein signature. Default : false --sourmashgather_dayhoff : Choose Dayhoff-encoded amino acid signatures. Default: false --sourmashgather_nodayhoff : Do not choose Dayhoff-encoded amino acid signatures. Default: false --sourmashgather_hp : Choose hydrophobic-polar-encoded amino acid signatures. Default: false --sourmashgather_nohp : Do not choose hydrophobic-polar-encoded amino acid signatures. Default: false --sourmashgather_dna : Choose DNA signature. Default: true --sourmashgather_nodna : Do not choose DNA signature. Default: false --sourmashgather_scaled : Scaled value should be between 100 and 1e6 . Default: false --sourmashgather_inc_pat : Search only signatures that match this pattern in name, filename, or md5. Default : false --sourmashgather_exc_pat : Search only signatures that do not match this pattern in name, filename, or md5. Default: false --sourmashsearch_run : Run `sourmash search` tool. Default: false --sourmashsearch_n : Number of results to report. By default, will terminate at --sourmashsearch_thr value. Default: false --sourmashsearch_thr : Reporting threshold (similarity) to return results. Default: 0 --sourmashsearch_contain : Score based on containment rather than similarity. Default: false --sourmashsearch_maxcontain : Score based on max containment rather than similarity. Default: false --sourmashsearch_ignoreabn : Do NOT use k-mer abundances if present. Default: true --sourmashsearch_ani_ci : Output confidence intervals for ANI estimates. Default: false --sourmashsearch_k : The k-mer size to select. Default: 71 --sourmashsearch_protein : Choose a protein signature. Default: false --sourmashsearch_noprotein : Do not choose a protein signature. Default : false --sourmashsearch_dayhoff : Choose Dayhoff-encoded amino acid signatures. Default: false --sourmashsearch_nodayhoff : Do not choose Dayhoff-encoded amino acid signatures. Default: false --sourmashsearch_hp : Choose hydrophobic-polar-encoded amino acid signatures. Default: false --sourmashsearch_nohp : Do not choose hydrophobic-polar-encoded amino acid signatures. Default: false --sourmashsearch_dna : Choose DNA signature. Default: true --sourmashsearch_nodna : Do not choose DNA signature. Default: false --sourmashsearch_scaled : Scaled value should be between 100 and 1e6 . Default: false --sourmashsearch_inc_pat : Search only signatures that match this pattern in name, filename, or md5. Default : false --sourmashsearch_exc_pat : Search only signatures that do not match this pattern in name, filename, or md5. Default: false --sfhpy_run : Run the sourmash_filter_hits.py script. Default: true --sfhpy_fcn : Column name by which filtering of rows should be applied. Default: f_match --sfhpy_fcv : Remove genomes whose match with the query FASTQ is less than this much. Default: 0.1 --sfhpy_gt : Apply greather than or equal to condition on numeric values of --sfhpy_fcn column. Default: true --sfhpy_lt : Apply less than or equal to condition on numeric values of --sfhpy_fcn column. Default: false --kmaindex_run : Run kma index tool. Default: true --kmaindex_t_db : Add to existing DB. Default: false --kmaindex_k : k-mer size. Default: 31 --kmaindex_m : Minimizer size. Default: false --kmaindex_hc : Homopolymer compression. Default: false --kmaindex_ML : Minimum length of templates. Defaults to -- kmaindex_k Default: false --kmaindex_ME : Mega DB. Default: false --kmaindex_Sparse : Make Sparse DB. Default: false --kmaindex_ht : Homology template. Default: false --kmaindex_hq : Homology query. Default: false --kmaindex_and : Both homology thresholds have to reach. Default: false --kmaindex_nbp : No bias print. Default: false --kmaalign_run : Run kma tool. Default: true --kmaalign_int : Input file has interleaved reads. Default : false --kmaalign_ef : Output additional features. Default: false --kmaalign_vcf : Output vcf file. 2 to apply FT. Default: false --kmaalign_sam : Output SAM, 4/2096 for mapped/aligned. Default: false --kmaalign_nc : No consensus file. Default: true --kmaalign_na : No aln file. Default: true --kmaalign_nf : No frag file. Default: true --kmaalign_a : Output all template mappings. Default: false --kmaalign_and : Use both -mrs and p-value on consensus. Default: false --kmaalign_oa : Use neither -mrs or p-value on consensus. Default: false --kmaalign_bc : Minimum support to call bases. Default: false --kmaalign_bcNano : Altered indel calling for ONT data. Default : false --kmaalign_bcd : Minimum depth to call bases. Default: false --kmaalign_bcg : Maintain insignificant gaps. Default: false --kmaalign_ID : Minimum consensus ID. Default: false --kmaalign_md : Minimum depth. Default: false --kmaalign_dense : Skip insertion in consensus. Default: false --kmaalign_ref_fsa : Use Ns on indels. Default: false --kmaalign_Mt1 : Map everything to one template. Default: false --kmaalign_1t1 : Map one query to one template. Default: false --kmaalign_mrs : Minimum relative alignment score. Default: false --kmaalign_mrc : Minimum query coverage. Default: 0.99 --kmaalign_mp : Minimum phred score of trailing and leading bases. Default: 30 --kmaalign_mq : Set the minimum mapping quality. Default: false --kmaalign_eq : Minimum average quality score. Default: 30 --kmaalign_5p : Trim 5 prime by this many bases. Default: false --kmaalign_3p : Trim 3 prime by this many bases Default: false --kmaalign_apm : Sets both -pm and -fpm Default: false --kmaalign_cge : Set CGE penalties and rewards Default: false --salmonidx_run : Run `salmon index` tool. Default: true --salmonidx_k : The size of k-mers that should be used for the quasi index. Default: false --salmonidx_gencode : This flag will expect the input transcript FASTA to be in GENCODE format, and will split the transcript name at the first `|` character. These reduced names will be used in the output and when looking for these transcripts in a gene to transcript GTF. Default: false --salmonidx_features : This flag will expect the input reference to be in the tsv file format, and will split the feature name at the first `tab` character. These reduced names will be used in the output and when looking for the sequence of the features. GTF. Default: false --salmonidx_keepDuplicates : This flag will disable the default indexing behavior of discarding sequence-identical duplicate transcripts. If this flag is passed then duplicate transcripts that appear in the input will be retained and quantified separately. Default: false --salmonidx_keepFixedFasta : Retain the fixed fasta file (without short transcripts and duplicates, clipped, etc.) generated during indexing. Default: false --salmonidx_filterSize : The size of the Bloom filter that will be used by TwoPaCo during indexing. The filter will be of size 2^{filterSize}. A value of -1 means that the filter size will be automatically set based on the number of distinct k-mers in the input, as estimated by nthll. Default: false --salmonidx_sparse : Build the index using a sparse sampling of k-mer positions This will require less memory (especially during quantification), but will take longer to constructand can slow down mapping / alignment. Default: false --salmonidx_n : Do not clip poly-A tails from the ends of target sequences. Default: false --gsrpy_run : Run the gen_salmon_res_table.py script. Default: true --gsrpy_url : Generate an additional column in final results table which links out to NCBI Pathogens Isolate Browser. Default: true Help options : --help : Display this message. ```