annotate 0.5.0/readme/bettercallsal.md @ 1:365849f031fd

"planemo upload"
author kkonganti
date Mon, 05 Jun 2023 18:48:51 -0400
parents
children
rev   line source
kkonganti@1 1 # bettercallsal
kkonganti@1 2
kkonganti@1 3 `bettercallsal` is an automated workflow to assign Salmonella serotype based on [NCBI Pathogens Database](https://www.ncbi.nlm.nih.gov/pathogens). It uses `MASH` to reduce the search space followed by additional genome filtering with `sourmash`. It then performs genome based alignment with `kma` followed by count generation using `salmon`. This workflow is especially useful in a case where a sample is of multi-serovar mixture.
kkonganti@1 4
kkonganti@1 5 \
kkonganti@1 6  
kkonganti@1 7
kkonganti@1 8 <!-- TOC -->
kkonganti@1 9
kkonganti@1 10 - [Minimum Requirements](#minimum-requirements)
kkonganti@1 11 - [Usage and Examples](#usage-and-examples)
kkonganti@1 12 - [Database](#database)
kkonganti@1 13 - [Input](#input)
kkonganti@1 14 - [Output](#output)
kkonganti@1 15 - [Computational resources](#computational-resources)
kkonganti@1 16 - [Runtime profiles](#runtime-profiles)
kkonganti@1 17 - [your_institution.config](#your_institutionconfig)
kkonganti@1 18 - [Cloud computing](#cloud-computing)
kkonganti@1 19 - [Example data](#example-data)
kkonganti@1 20 - [Using sourmash](#using-sourmash)
kkonganti@1 21 - [bettercallsal CLI Help](#bettercallsal-cli-help)
kkonganti@1 22
kkonganti@1 23 <!-- /TOC -->
kkonganti@1 24
kkonganti@1 25 \
kkonganti@1 26 &nbsp;
kkonganti@1 27
kkonganti@1 28 ## Minimum Requirements
kkonganti@1 29
kkonganti@1 30 1. [Nextflow version 22.10.0](https://github.com/nextflow-io/nextflow/releases/download/v22.10.0/nextflow).
kkonganti@1 31 - Make the `nextflow` binary executable (`chmod 755 nextflow`) and also make sure that it is made available in your `$PATH`.
kkonganti@1 32 - If your existing `JAVA` install does not support the newest **Nextflow** version, you can try **Amazon**'s `JAVA` (OpenJDK): [Corretto](https://corretto.aws/downloads/latest/amazon-corretto-17-x64-linux-jdk.tar.gz).
kkonganti@1 33 2. Either of `micromamba` or `docker` or `singularity` installed and made available in your `$PATH`.
kkonganti@1 34 - Running the workflow via `micromamba` software provisioning is **preferred** as it does not require any `sudo` or `admin` privileges or any other configurations with respect to the various container providers.
kkonganti@1 35 - To install `micromamba` for your system type, please follow these [installation steps](https://mamba.readthedocs.io/en/latest/installation.html#manual-installation) and make sure that the `micromamba` binary is made available in your `$PATH`.
kkonganti@1 36 - Just the `curl` step is sufficient to download the binary as far as running the workflows are concerned.
kkonganti@1 37 3. Minimum of 10 CPU cores and about 16 GBs for main workflow steps. More memory may be required if your **FASTQ** files are big.
kkonganti@1 38
kkonganti@1 39 \
kkonganti@1 40 &nbsp;
kkonganti@1 41
kkonganti@1 42 ## Usage and Examples
kkonganti@1 43
kkonganti@1 44 Clone or download this repository and then call `cpipes`.
kkonganti@1 45
kkonganti@1 46 ```bash
kkonganti@1 47 cpipes --pipeline bettercallsal [options]
kkonganti@1 48 ```
kkonganti@1 49
kkonganti@1 50 \
kkonganti@1 51 &nbsp;
kkonganti@1 52
kkonganti@1 53 **Example**: Run the default `bettercallsal` pipeline in single-end mode.
kkonganti@1 54
kkonganti@1 55 ```bash
kkonganti@1 56 cd /data/scratch/$USER
kkonganti@1 57 mkdir nf-cpipes
kkonganti@1 58 cd nf-cpipes
kkonganti@1 59 cpipes
kkonganti@1 60 --pipeline bettercallsal \
kkonganti@1 61 --input /path/to/illumina/fastq/dir \
kkonganti@1 62 --output /path/to/output \
kkonganti@1 63 --bcs_root_dbdir /data/Kranti_Konganti/bettercallsal_db
kkonganti@1 64 ```
kkonganti@1 65
kkonganti@1 66 \
kkonganti@1 67 &nbsp;
kkonganti@1 68
kkonganti@1 69 **Example**: Run the `bettercallsal` pipeline in paired-end mode. In this mode, the `R1` and `R2` files are concatenated. We have found that concatenated reads yields better calling rates. Please refer to the **Methods** and the **Results** section in our [preprint](https://www.biorxiv.org/content/10.1101/2023.04.06.535929v1.full) for more information. Users can still choose to use `bbmerge.sh` by adding the following options on the command-line: `--bbmerge_run true --bcs_concat_pe false`.
kkonganti@1 70
kkonganti@1 71 ```bash
kkonganti@1 72 cd /data/scratch/$USER
kkonganti@1 73 mkdir nf-cpipes
kkonganti@1 74 cd nf-cpipes
kkonganti@1 75 cpipes \
kkonganti@1 76 --pipeline bettercallsal \
kkonganti@1 77 --input /path/to/illumina/fastq/dir \
kkonganti@1 78 --output /path/to/output \
kkonganti@1 79 --bcs_root_dbdir /data/Kranti_Konganti/bettercallsal_db \
kkonganti@1 80 --fq_single_end false \
kkonganti@1 81 --fq_suffix '_R1_001.fastq.gz'
kkonganti@1 82 ```
kkonganti@1 83
kkonganti@1 84 \
kkonganti@1 85 &nbsp;
kkonganti@1 86
kkonganti@1 87 ### Database
kkonganti@1 88
kkonganti@1 89 ---
kkonganti@1 90
kkonganti@1 91 The successful run of the workflow requires certain database flat files specific for the workflow.
kkonganti@1 92
kkonganti@1 93 Please refer to `bettercallsal_db` [README](./bettercallsal_db.md) if you would like to run the workflow on the latest version of the **PDG** release.
kkonganti@1 94
kkonganti@1 95 &nbsp;
kkonganti@1 96
kkonganti@1 97 ### Input
kkonganti@1 98
kkonganti@1 99 ---
kkonganti@1 100
kkonganti@1 101 The input to the workflow is a folder containing compressed (`.gz`) FASTQ files. Please note that the sample grouping happens automatically by the file name of the FASTQ file. If for example, a single sample is sequenced across multiple sequencing lanes, you can choose to group those FASTQ files into one sample by using the `--fq_filename_delim` and `--fq_filename_delim_idx` options. By default, `--fq_filename_delim` is set to `_` (underscore) and `--fq_filename_delim_idx` is set to 1.
kkonganti@1 102
kkonganti@1 103 For example, if the directory contains FASTQ files as shown below:
kkonganti@1 104
kkonganti@1 105 - KB-01_apple_L001_R1.fastq.gz
kkonganti@1 106 - KB-01_apple_L001_R2.fastq.gz
kkonganti@1 107 - KB-01_apple_L002_R1.fastq.gz
kkonganti@1 108 - KB-01_apple_L002_R2.fastq.gz
kkonganti@1 109 - KB-02_mango_L001_R1.fastq.gz
kkonganti@1 110 - KB-02_mango_L001_R2.fastq.gz
kkonganti@1 111 - KB-02_mango_L002_R1.fastq.gz
kkonganti@1 112 - KB-02_mango_L002_R2.fastq.gz
kkonganti@1 113
kkonganti@1 114 Then, to create 2 sample groups, `apple` and `mango`, we split the file name by the delimitor (underscore in the case, which is default) and group by the first 2 words (`--fq_filename_delim_idx 2`).
kkonganti@1 115
kkonganti@1 116 This goes without saying that all the FASTQ files should have uniform naming patterns so that `--fq_filename_delim` and `--fq_filename_delim_idx` options do not have any adverse effect in collecting and creating a sample metadata sheet.
kkonganti@1 117
kkonganti@1 118 \
kkonganti@1 119 &nbsp;
kkonganti@1 120
kkonganti@1 121 ### Output
kkonganti@1 122
kkonganti@1 123 ---
kkonganti@1 124
kkonganti@1 125 All the outputs for each step are stored inside the folder mentioned with the `--output` option. A `multiqc_report.html` file inside the `bettercallsal-multiqc` folder can be opened in any browser on your local workstation which contains a consolidated brief report.
kkonganti@1 126
kkonganti@1 127 \
kkonganti@1 128 &nbsp;
kkonganti@1 129
kkonganti@1 130 ### Computational resources
kkonganti@1 131
kkonganti@1 132 ---
kkonganti@1 133
kkonganti@1 134 The workflow `bettercallsal` requires at least a minimum of 16 GBs of memory to successfully finish the workflow. By default, `bettercallsal` uses 10 CPU cores where possible. You can change this behavior and adjust the CPU cores with `--max_cpus` option.
kkonganti@1 135
kkonganti@1 136 \
kkonganti@1 137 &nbsp;
kkonganti@1 138
kkonganti@1 139 Example:
kkonganti@1 140
kkonganti@1 141 ```bash
kkonganti@1 142 cpipes \
kkonganti@1 143 --pipeline bettercallsal \
kkonganti@1 144 --input /path/to/bettercallsal_sim_reads \
kkonganti@1 145 --output /path/to/bettercallsal_sim_reads_output \
kkonganti@1 146 --bcs_root_dbdir /path/to/PDG000000002.2537
kkonganti@1 147 --kmaalign_ignorequals \
kkonganti@1 148 --max_cpus 5 \
kkonganti@1 149 -profile stdkondagac \
kkonganti@1 150 -resume
kkonganti@1 151 ```
kkonganti@1 152
kkonganti@1 153 \
kkonganti@1 154 &nbsp;
kkonganti@1 155
kkonganti@1 156 ### Runtime profiles
kkonganti@1 157
kkonganti@1 158 ---
kkonganti@1 159
kkonganti@1 160 You can use different run time profiles that suit your specific compute environments i.e., you can run the workflow locally on your machine or in a grid computing infrastructure.
kkonganti@1 161
kkonganti@1 162 \
kkonganti@1 163 &nbsp;
kkonganti@1 164
kkonganti@1 165 Example:
kkonganti@1 166
kkonganti@1 167 ```bash
kkonganti@1 168 cd /data/scratch/$USER
kkonganti@1 169 mkdir nf-cpipes
kkonganti@1 170 cd nf-cpipes
kkonganti@1 171 cpipes \
kkonganti@1 172 --pipeline bettercallsal \
kkonganti@1 173 --input /path/to/fastq_pass_dir \
kkonganti@1 174 --output /path/to/where/output/should/go \
kkonganti@1 175 -profile your_institution
kkonganti@1 176 ```
kkonganti@1 177
kkonganti@1 178 The above command would run the pipeline and store the output at the location per the `--output` flag and the **NEXTFLOW** reports are always stored in the current working directory from where `cpipes` is run. For example, for the above command, a directory called `CPIPES-bettercallsal` would hold all the **NEXTFLOW** related logs, reports and trace files.
kkonganti@1 179
kkonganti@1 180 \
kkonganti@1 181 &nbsp;
kkonganti@1 182
kkonganti@1 183 ### `your_institution.config`
kkonganti@1 184
kkonganti@1 185 ---
kkonganti@1 186
kkonganti@1 187 In the above example, we can see that we have mentioned the run time profile as `your_institution`. For this to work, add the following lines at the end of [`computeinfra.config`](../conf/computeinfra.config) file which should be located inside the `conf` folder. For example, if your institution uses **SGE** or **UNIVA** for grid computing instead of **SLURM** and has a job queue named `normal.q`, then add these lines:
kkonganti@1 188
kkonganti@1 189 \
kkonganti@1 190 &nbsp;
kkonganti@1 191
kkonganti@1 192 ```groovy
kkonganti@1 193 your_institution {
kkonganti@1 194 process.executor = 'sge'
kkonganti@1 195 process.queue = 'normal.q'
kkonganti@1 196 singularity.enabled = false
kkonganti@1 197 singularity.autoMounts = true
kkonganti@1 198 docker.enabled = false
kkonganti@1 199 params.enable_conda = true
kkonganti@1 200 conda.enabled = true
kkonganti@1 201 conda.useMicromamba = true
kkonganti@1 202 params.enable_module = false
kkonganti@1 203 }
kkonganti@1 204 ```
kkonganti@1 205
kkonganti@1 206 In the above example, by default, all the software provisioning choices are disabled except `conda`. You can also choose to remove the `process.queue` line altogether and the `bettercallsal` workflow will request the appropriate memory and number of CPU cores automatically, which ranges from 1 CPU, 1 GB and 1 hour for job completion up to 10 CPU cores, 1 TB and 120 hours for job completion.
kkonganti@1 207
kkonganti@1 208 \
kkonganti@1 209 &nbsp;
kkonganti@1 210
kkonganti@1 211 ### Cloud computing
kkonganti@1 212
kkonganti@1 213 ---
kkonganti@1 214
kkonganti@1 215 You can run the workflow in the cloud (works only with proper set up of AWS resources). Add new run time profiles with required parameters per [Nextflow docs](https://www.nextflow.io/docs/latest/executor.html):
kkonganti@1 216
kkonganti@1 217 \
kkonganti@1 218 &nbsp;
kkonganti@1 219
kkonganti@1 220 Example:
kkonganti@1 221
kkonganti@1 222 ```groovy
kkonganti@1 223 my_aws_batch {
kkonganti@1 224 executor = 'awsbatch'
kkonganti@1 225 queue = 'my-batch-queue'
kkonganti@1 226 aws.batch.cliPath = '/home/ec2-user/miniconda/bin/aws'
kkonganti@1 227 aws.batch.region = 'us-east-1'
kkonganti@1 228 singularity.enabled = false
kkonganti@1 229 singularity.autoMounts = true
kkonganti@1 230 docker.enabled = true
kkonganti@1 231 params.conda_enabled = false
kkonganti@1 232 params.enable_module = false
kkonganti@1 233 }
kkonganti@1 234 ```
kkonganti@1 235
kkonganti@1 236 \
kkonganti@1 237 &nbsp;
kkonganti@1 238
kkonganti@1 239 ### Example data
kkonganti@1 240
kkonganti@1 241 ---
kkonganti@1 242
kkonganti@1 243 After you make sure that you have all the [minimum requirements](#minimum-requirements) to run the workflow, you can try the `bettercallsal` pipeline on some simulated reads. The following input dataset contains simulated reads for `Montevideo` and `I 4,[5],12:i:-` in about roughly equal proportions.
kkonganti@1 244
kkonganti@1 245 - Download simulated reads: [S3](https://cfsan-pub-xfer.s3.amazonaws.com/Kranti.Konganti/bettercallsal/bettercallsal_sim_reads.tar.bz2) (~ 3 GB).
kkonganti@1 246 - Download pre-formatted test database: [S3](https://cfsan-pub-xfer.s3.amazonaws.com/Kranti.Konganti/bettercallsal/PDG000000002.2491.test-db.tar.bz2) (~ 75 MB). This test database works only with the simulated reads.
kkonganti@1 247 - Download pre-formatted full database (**Optional**): If you would like to do a complete run with your own **FASTQ** datasets, you can either create your own [database](./bettercallsal_db.md) or use [PDG000000002.2537](https://cfsan-pub-xfer.s3.amazonaws.com/Kranti.Konganti/bettercallsal/PDG000000002.2537.tar.bz2) version of the database (~ 37 GB).
kkonganti@1 248 - After succesful run of the workflow, your **MultiQC** report should look something like [this](https://cfsan-pub-xfer.s3.amazonaws.com/Kranti.Konganti/bettercallsal/bettercallsal_sim_reads_mqc.html).
kkonganti@1 249
kkonganti@1 250 Now run the workflow by ignoring quality values since these are simulated base qualities:
kkonganti@1 251
kkonganti@1 252 \
kkonganti@1 253 &nbsp;
kkonganti@1 254
kkonganti@1 255 ```bash
kkonganti@1 256 cpipes \
kkonganti@1 257 --pipeline bettercallsal \
kkonganti@1 258 --input /path/to/bettercallsal_sim_reads \
kkonganti@1 259 --output /path/to/bettercallsal_sim_reads_output \
kkonganti@1 260 --bcs_root_dbdir /path/to/PDG000000002.2537
kkonganti@1 261 --kmaalign_ignorequals \
kkonganti@1 262 -profile stdkondagac \
kkonganti@1 263 -resume
kkonganti@1 264 ```
kkonganti@1 265
kkonganti@1 266 Please note that the run time profile `stdkondagac` will run jobs locally using `micromamba` for software provisioning. The first time you run the command, a new folder called `kondagac_cache` will be created and subsequent runs should use this `conda` cache.
kkonganti@1 267
kkonganti@1 268 \
kkonganti@1 269 &nbsp;
kkonganti@1 270
kkonganti@1 271 ## Using `sourmash`
kkonganti@1 272
kkonganti@1 273 Beginning with `v0.3.0` of `bettercallsal` workflow, `sourmash` sketching is used to further narrow down possible serotype hits. It is **ON** by default. This will enable the generation of **ANI Containment** matrix for **Samples** vs **Genomes**. There may be multiple hits for the same serotype in the final **MultiQC** report as multiple genome accessions can belong to a single serotype.
kkonganti@1 274
kkonganti@1 275 You can turn **OFF** this feature with `--sourmashsketch_run false` option.
kkonganti@1 276
kkonganti@1 277 \
kkonganti@1 278 &nbsp;
kkonganti@1 279
kkonganti@1 280 ## `bettercallsal` CLI Help
kkonganti@1 281
kkonganti@1 282 ```text
kkonganti@1 283 [Kranti_Konganti@my-unix-box ]$ cpipes --pipeline bettercallsal --help
kkonganti@1 284 N E X T F L O W ~ version 22.10.0
kkonganti@1 285 Launching `./bettercallsal/cpipes` [awesome_chandrasekhar] DSL2 - revision: 8da4e11078
kkonganti@1 286 ================================================================================
kkonganti@1 287 (o)
kkonganti@1 288 ___ _ __ _ _ __ ___ ___
kkonganti@1 289 / __|| '_ \ | || '_ \ / _ \/ __|
kkonganti@1 290 | (__ | |_) || || |_) || __/\__ \
kkonganti@1 291 \___|| .__/ |_|| .__/ \___||___/
kkonganti@1 292 | | | |
kkonganti@1 293 |_| |_|
kkonganti@1 294 --------------------------------------------------------------------------------
kkonganti@1 295 A collection of modular pipelines at CFSAN, FDA.
kkonganti@1 296 --------------------------------------------------------------------------------
kkonganti@1 297 Name : CPIPES
kkonganti@1 298 Author : Kranti Konganti
kkonganti@1 299 Version : 0.5.0
kkonganti@1 300 Center : CFSAN, FDA.
kkonganti@1 301 ================================================================================
kkonganti@1 302
kkonganti@1 303 Workflow : bettercallsal
kkonganti@1 304
kkonganti@1 305 Author : Kranti Konganti
kkonganti@1 306
kkonganti@1 307 Version : 0.5.0
kkonganti@1 308
kkonganti@1 309
kkonganti@1 310 Usage : cpipes --pipeline bettercallsal [options]
kkonganti@1 311
kkonganti@1 312
kkonganti@1 313 Required :
kkonganti@1 314
kkonganti@1 315 --input : Absolute path to directory containing FASTQ
kkonganti@1 316 files. The directory should contain only
kkonganti@1 317 FASTQ files as all the files within the
kkonganti@1 318 mentioned directory will be read. Ex: --
kkonganti@1 319 input /path/to/fastq_pass
kkonganti@1 320
kkonganti@1 321 --output : Absolute path to directory where all the
kkonganti@1 322 pipeline outputs should be stored. Ex: --
kkonganti@1 323 output /path/to/output
kkonganti@1 324
kkonganti@1 325 Other options :
kkonganti@1 326
kkonganti@1 327 --metadata : Absolute path to metadata CSV file
kkonganti@1 328 containing five mandatory columns: sample,
kkonganti@1 329 fq1,fq2,strandedness,single_end. The fq1
kkonganti@1 330 and fq2 columns contain absolute paths to
kkonganti@1 331 the FASTQ files. This option can be used in
kkonganti@1 332 place of --input option. This is rare. Ex
kkonganti@1 333 : --metadata samplesheet.csv
kkonganti@1 334
kkonganti@1 335 --fq_suffix : The suffix of FASTQ files (Unpaired reads
kkonganti@1 336 or R1 reads or Long reads) if an input
kkonganti@1 337 directory is mentioned via --input option.
kkonganti@1 338 Default: .fastq.gz
kkonganti@1 339
kkonganti@1 340 --fq2_suffix : The suffix of FASTQ files (Paired-end reads
kkonganti@1 341 or R2 reads) if an input directory is
kkonganti@1 342 mentioned via --input option. Default:
kkonganti@1 343 _R2_001.fastq.gz
kkonganti@1 344
kkonganti@1 345 --fq_filter_by_len : Remove FASTQ reads that are less than this
kkonganti@1 346 many bases. Default: 0
kkonganti@1 347
kkonganti@1 348 --fq_strandedness : The strandedness of the sequencing run.
kkonganti@1 349 This is mostly needed if your sequencing
kkonganti@1 350 run is RNA-SEQ. For most of the other runs
kkonganti@1 351 , it is probably safe to use unstranded for
kkonganti@1 352 the option. Default: unstranded
kkonganti@1 353
kkonganti@1 354 --fq_single_end : SINGLE-END information will be auto-
kkonganti@1 355 detected but this option forces PAIRED-END
kkonganti@1 356 FASTQ files to be treated as SINGLE-END so
kkonganti@1 357 only read 1 information is included in auto
kkonganti@1 358 -generated samplesheet. Default: true
kkonganti@1 359
kkonganti@1 360 --fq_filename_delim : Delimiter by which the file name is split
kkonganti@1 361 to obtain sample name. Default: _
kkonganti@1 362
kkonganti@1 363 --fq_filename_delim_idx : After splitting FASTQ file name by using
kkonganti@1 364 the --fq_filename_delim option, all
kkonganti@1 365 elements before this index (1-based) will
kkonganti@1 366 be joined to create final sample name.
kkonganti@1 367 Default: 1
kkonganti@1 368
kkonganti@1 369 --bcs_concat_pe : Concatenate paired-end files. Default: true
kkonganti@1 370
kkonganti@1 371 --bbmerge_run : Run BBMerge tool. Default: false
kkonganti@1 372
kkonganti@1 373 --bbmerge_reads : Quit after this many read pairs (-1 means
kkonganti@1 374 all) Default: -1
kkonganti@1 375
kkonganti@1 376 --bbmerge_adapters : Absolute UNIX path pointing to the adapters
kkonganti@1 377 file in FASTA format. Default: false
kkonganti@1 378
kkonganti@1 379 --bbmerge_ziplevel : Set to 1 (lowest) through 9 (max) to change
kkonganti@1 380 compression level; lower compression is
kkonganti@1 381 faster. Default: 1
kkonganti@1 382
kkonganti@1 383 --bbmerge_ordered : Output reads in the same order as input.
kkonganti@1 384 Default: false
kkonganti@1 385
kkonganti@1 386 --bbmerge_qtrim : Trim read ends to remove bases with quality
kkonganti@1 387 below --bbmerge_minq. Trims BEFORE merging
kkonganti@1 388 . Values: t (trim both ends), f (neither
kkonganti@1 389 end), r (right end only), l (left end only
kkonganti@1 390 ). Default: true
kkonganti@1 391
kkonganti@1 392 --bbmerge_qtrim2 : May be specified instead of --bbmerge_qtrim
kkonganti@1 393 to perform trimming only if merging is
kkonganti@1 394 unsuccesful. then retry merging. Default:
kkonganti@1 395 false
kkonganti@1 396
kkonganti@1 397 --bbmerge_trimq : Trim quality threshold. This may be comma-
kkonganti@1 398 delimited list (ascending) to try multiple
kkonganti@1 399 values. Default: 10
kkonganti@1 400
kkonganti@1 401 --bbmerge_minlength : (ml) Reads shorter than this after trimming
kkonganti@1 402 , but before merging, will be discarded.
kkonganti@1 403 Pairs will be discarded onlyif both are
kkonganti@1 404 shorter. Default: 1
kkonganti@1 405
kkonganti@1 406 --bbmerge_tbo : (trimbyoverlap). Trim overlapping reads to
kkonganti@1 407 remove right most (3') non-overlaping
kkonganti@1 408 portion instead of joining Default: false
kkonganti@1 409
kkonganti@1 410 --bbmerge_minavgquality : (maq). Reads with average quality below
kkonganti@1 411 this after trimming will not be attempted
kkonganti@1 412 to merge. Default: 30
kkonganti@1 413
kkonganti@1 414 --bbmerge_trimpolya : Trim trailing poly-A tail from adapter
kkonganti@1 415 output. Only affects outadapter. This also
kkonganti@1 416 trims poly-A followed by poly-G, which
kkonganti@1 417 occurs on NextSeq. Default: true
kkonganti@1 418
kkonganti@1 419 --bbmerge_pfilter : Ban improbable overlaps. Higher is more
kkonganti@1 420 strict. 0 will disable the filter; 1 will
kkonganti@1 421 allow only perfect overlaps. Default: 1
kkonganti@1 422
kkonganti@1 423 --bbmerge_ouq : Calculate best overlap using quality values
kkonganti@1 424 . Default: false
kkonganti@1 425
kkonganti@1 426 --bbmerge_owq : Calculate best overlap without using
kkonganti@1 427 quality values. Default: true
kkonganti@1 428
kkonganti@1 429 --bbmerge_strict : Decrease false positive rate and merging
kkonganti@1 430 rate. Default: false
kkonganti@1 431
kkonganti@1 432 --bbmerge_verystrict : Greatly decrease false positive rate and
kkonganti@1 433 merging rate. Default: false
kkonganti@1 434
kkonganti@1 435 --bbmerge_ultrastrict : Decrease false positive rate and merging
kkonganti@1 436 rate even more. Default: true
kkonganti@1 437
kkonganti@1 438 --bbmerge_maxstrict : Maxiamally decrease false positive rate and
kkonganti@1 439 merging rate. Default: false
kkonganti@1 440
kkonganti@1 441 --bbmerge_loose : Increase false positive rate and merging
kkonganti@1 442 rate. Default: false
kkonganti@1 443
kkonganti@1 444 --bbmerge_veryloose : Greatly increase false positive rate and
kkonganti@1 445 merging rate. Default: false
kkonganti@1 446
kkonganti@1 447 --bbmerge_ultraloose : Increase false positive rate and merging
kkonganti@1 448 rate even more. Default: false
kkonganti@1 449
kkonganti@1 450 --bbmerge_maxloose : Maximally increase false positive rate and
kkonganti@1 451 merging rate. Default: false
kkonganti@1 452
kkonganti@1 453 --bbmerge_fast : Fastest possible preset. Default: false
kkonganti@1 454
kkonganti@1 455 --bbmerge_k : Kmer length. 31 (or less) is fastest and
kkonganti@1 456 uses the least memory, but higher values
kkonganti@1 457 may be more accurate. 60 tends to work well
kkonganti@1 458 for 150bp reads. Default: 60
kkonganti@1 459
kkonganti@1 460 --bbmerge_prealloc : Pre-allocate memory rather than dynamically
kkonganti@1 461 growing. Faster and more memory-efficient
kkonganti@1 462 for large datasets. A float fraction (0-1)
kkonganti@1 463 may be specified, default 1. Default: true
kkonganti@1 464
kkonganti@1 465 --fastp_run : Run fastp tool. Default: true
kkonganti@1 466
kkonganti@1 467 --fastp_failed_out : Specify whether to store reads that cannot
kkonganti@1 468 pass the filters. Default: false
kkonganti@1 469
kkonganti@1 470 --fastp_merged_out : Specify whether to store merged output or
kkonganti@1 471 not. Default: false
kkonganti@1 472
kkonganti@1 473 --fastp_overlapped_out : For each read pair, output the overlapped
kkonganti@1 474 region if it has no mismatched base.
kkonganti@1 475 Default: false
kkonganti@1 476
kkonganti@1 477 --fastp_6 : Indicate that the input is using phred64
kkonganti@1 478 scoring (it'll be converted to phred33, so
kkonganti@1 479 the output will still be phred33). Default
kkonganti@1 480 : false
kkonganti@1 481
kkonganti@1 482 --fastp_reads_to_process : Specify how many reads/pairs are to be
kkonganti@1 483 processed. Default value 0 means process
kkonganti@1 484 all reads. Default: 0
kkonganti@1 485
kkonganti@1 486 --fastp_fix_mgi_id : The MGI FASTQ ID format is not compatible
kkonganti@1 487 with many BAM operation tools, enable this
kkonganti@1 488 option to fix it. Default: false
kkonganti@1 489
kkonganti@1 490 --fastp_A : Disable adapter trimming. On by default.
kkonganti@1 491 Default: false
kkonganti@1 492
kkonganti@1 493 --fastp_adapter_fasta : Specify a FASTA file to trim both read1 and
kkonganti@1 494 read2 (if PE) by all the sequences in this
kkonganti@1 495 FASTA file. Default: false
kkonganti@1 496
kkonganti@1 497 --fastp_f : Trim how many bases in front of read1.
kkonganti@1 498 Default: 0
kkonganti@1 499
kkonganti@1 500 --fastp_t : Trim how many bases at the end of read1.
kkonganti@1 501 Default: 0
kkonganti@1 502
kkonganti@1 503 --fastp_b : Max length of read1 after trimming. Default
kkonganti@1 504 : 0
kkonganti@1 505
kkonganti@1 506 --fastp_F : Trim how many bases in front of read2.
kkonganti@1 507 Default: 0
kkonganti@1 508
kkonganti@1 509 --fastp_T : Trim how many bases at the end of read2.
kkonganti@1 510 Default: 0
kkonganti@1 511
kkonganti@1 512 --fastp_B : Max length of read2 after trimming. Default
kkonganti@1 513 : 0
kkonganti@1 514
kkonganti@1 515 --fastp_dedup : Enable deduplication to drop the duplicated
kkonganti@1 516 reads/pairs. Default: true
kkonganti@1 517
kkonganti@1 518 --fastp_dup_calc_accuracy : Accuracy level to calculate duplication (1~
kkonganti@1 519 6), higher level uses more memory (1G, 2G,
kkonganti@1 520 4G, 8G, 16G, 24G). Default 1 for no-dedup
kkonganti@1 521 mode, and 3 for dedup mode. Default: 6
kkonganti@1 522
kkonganti@1 523 --fastp_poly_g_min_len : The minimum length to detect polyG in the
kkonganti@1 524 read tail. Default: 10
kkonganti@1 525
kkonganti@1 526 --fastp_G : Disable polyG tail trimming. Default: true
kkonganti@1 527
kkonganti@1 528 --fastp_x : Enable polyX trimming in 3' ends. Default:
kkonganti@1 529 false
kkonganti@1 530
kkonganti@1 531 --fastp_poly_x_min_len : The minimum length to detect polyX in the
kkonganti@1 532 read tail. Default: 10
kkonganti@1 533
kkonganti@1 534 --fastp_cut_front : Move a sliding window from front (5') to
kkonganti@1 535 tail, drop the bases in the window if its
kkonganti@1 536 mean quality < threshold, stop otherwise.
kkonganti@1 537 Default: true
kkonganti@1 538
kkonganti@1 539 --fastp_cut_tail : Move a sliding window from tail (3') to
kkonganti@1 540 front, drop the bases in the window if its
kkonganti@1 541 mean quality < threshold, stop otherwise.
kkonganti@1 542 Default: false
kkonganti@1 543
kkonganti@1 544 --fastp_cut_right : Move a sliding window from tail, drop the
kkonganti@1 545 bases in the window and the right part if
kkonganti@1 546 its mean quality < threshold, and then stop
kkonganti@1 547 . Default: true
kkonganti@1 548
kkonganti@1 549 --fastp_W : Sliding window size shared by --
kkonganti@1 550 fastp_cut_front, --fastp_cut_tail and --
kkonganti@1 551 fastp_cut_right. Default: 20
kkonganti@1 552
kkonganti@1 553 --fastp_M : The mean quality requirement shared by --
kkonganti@1 554 fastp_cut_front, --fastp_cut_tail and --
kkonganti@1 555 fastp_cut_right. Default: 30
kkonganti@1 556
kkonganti@1 557 --fastp_q : The quality value below which a base should
kkonganti@1 558 is not qualified. Default: 30
kkonganti@1 559
kkonganti@1 560 --fastp_u : What percent of bases are allowed to be
kkonganti@1 561 unqualified. Default: 40
kkonganti@1 562
kkonganti@1 563 --fastp_n : How many N's can a read have. Default: 5
kkonganti@1 564
kkonganti@1 565 --fastp_e : If the full reads' average quality is below
kkonganti@1 566 this value, then it is discarded. Default
kkonganti@1 567 : 0
kkonganti@1 568
kkonganti@1 569 --fastp_l : Reads shorter than this length will be
kkonganti@1 570 discarded. Default: 35
kkonganti@1 571
kkonganti@1 572 --fastp_max_len : Reads longer than this length will be
kkonganti@1 573 discarded. Default: 0
kkonganti@1 574
kkonganti@1 575 --fastp_y : Enable low complexity filter. The
kkonganti@1 576 complexity is defined as the percentage of
kkonganti@1 577 bases that are different from its next base
kkonganti@1 578 (base[i] != base[i+1]). Default: true
kkonganti@1 579
kkonganti@1 580 --fastp_Y : The threshold for low complexity filter (0~
kkonganti@1 581 100). Ex: A value of 30 means 30%
kkonganti@1 582 complexity is required. Default: 30
kkonganti@1 583
kkonganti@1 584 --fastp_U : Enable Unique Molecular Identifier (UMI)
kkonganti@1 585 pre-processing. Default: false
kkonganti@1 586
kkonganti@1 587 --fastp_umi_loc : Specify the location of UMI, can be one of
kkonganti@1 588 index1/index2/read1/read2/per_index/
kkonganti@1 589 per_read. Default: false
kkonganti@1 590
kkonganti@1 591 --fastp_umi_len : If the UMI is in read1 or read2, its length
kkonganti@1 592 should be provided. Default: false
kkonganti@1 593
kkonganti@1 594 --fastp_umi_prefix : If specified, an underline will be used to
kkonganti@1 595 connect prefix and UMI (i.e. prefix=UMI,
kkonganti@1 596 UMI=AATTCG, final=UMI_AATTCG). Default:
kkonganti@1 597 false
kkonganti@1 598
kkonganti@1 599 --fastp_umi_skip : If the UMI is in read1 or read2, fastp can
kkonganti@1 600 skip several bases following the UMI.
kkonganti@1 601 Default: false
kkonganti@1 602
kkonganti@1 603 --fastp_p : Enable overrepresented sequence analysis.
kkonganti@1 604 Default: true
kkonganti@1 605
kkonganti@1 606 --fastp_P : One in this many number of reads will be
kkonganti@1 607 computed for overrepresentation analysis (1
kkonganti@1 608 ~10000), smaller is slower. Default: 20
kkonganti@1 609
kkonganti@1 610 --fastp_use_custom_adapaters : Use custom adapter FASTA with fastp on top
kkonganti@1 611 of built-in adapter sequence auto-detection
kkonganti@1 612 . Enabling this option will attempt to find
kkonganti@1 613 and remove all possible Illumina adapter
kkonganti@1 614 and primer sequences but will make the
kkonganti@1 615 workflow run slow. Default: false
kkonganti@1 616
kkonganti@1 617 --mashscreen_run : Run `mash screen` tool. Default: true
kkonganti@1 618
kkonganti@1 619 --mashscreen_w : Winner-takes-all strategy for identity
kkonganti@1 620 estimates. After counting hashes for each
kkonganti@1 621 query, hashes that appear in multiple
kkonganti@1 622 queries will be removed from all except the
kkonganti@1 623 one with the best identity (ties broken by
kkonganti@1 624 larger query), and other identities will
kkonganti@1 625 be reduced. This removes output redundancy
kkonganti@1 626 , providing a rough compositional outline
kkonganti@1 627 . Default: false
kkonganti@1 628
kkonganti@1 629 --mashscreen_i : Minimum identity to report. Inclusive
kkonganti@1 630 unless set to zero, in which case only
kkonganti@1 631 identities greater than zero (i.e. with at
kkonganti@1 632 least one shared hash) will be reported.
kkonganti@1 633 Set to -1 to output everything. (-1-1).
kkonganti@1 634 Default: false
kkonganti@1 635
kkonganti@1 636 --mashscreen_v : Maximum p-value to report (0-1). Default:
kkonganti@1 637 false
kkonganti@1 638
kkonganti@1 639 --tuspy_run : Run the get_top_unique_mash_hits_genomes.py
kkonganti@1 640 script. Default: true
kkonganti@1 641
kkonganti@1 642 --tuspy_s : Absolute UNIX path to metadata text file
kkonganti@1 643 with the field separator, | and 5 fields:
kkonganti@1 644 serotype|asm_lvl|asm_url|snp_cluster_idEx:
kkonganti@1 645 serotype=Derby,antigen_formula=4:f,g:-|
kkonganti@1 646 Scaffold|402440|ftp://...|PDS000096654.2.
kkonganti@1 647 Mentioning this option will create a pickle
kkonganti@1 648 file for the provided metadata and exits.
kkonganti@1 649 Default: false
kkonganti@1 650
kkonganti@1 651 --tuspy_m : Absolute UNIX path to mash screen results
kkonganti@1 652 file. Default: false
kkonganti@1 653
kkonganti@1 654 --tuspy_ps : Absolute UNIX Path to serialized metadata
kkonganti@1 655 object in a pickle file. Default: /hpc/db/
kkonganti@1 656 bettercallsal/latest/index_metadata/
kkonganti@1 657 per_snp_cluster.ACC2SERO.pickle
kkonganti@1 658
kkonganti@1 659 --tuspy_gd : Absolute UNIX Path to directory containing
kkonganti@1 660 gzipped genome FASTA files. Default: /hpc/
kkonganti@1 661 db/bettercallsal/latest/scaffold_genomes
kkonganti@1 662
kkonganti@1 663 --tuspy_gds : Genome FASTA file suffix to search for in
kkonganti@1 664 the genome directory. Default:
kkonganti@1 665 _scaffolded_genomic.fna.gz
kkonganti@1 666
kkonganti@1 667 --tuspy_n : Return up to this many number of top N
kkonganti@1 668 unique genome accession hits. Default: 10
kkonganti@1 669
kkonganti@1 670 --sourmashsketch_run : Run `sourmash sketch dna` tool. Default:
kkonganti@1 671 true
kkonganti@1 672
kkonganti@1 673 --sourmashsketch_mode : Select which type of signatures to be
kkonganti@1 674 created: dna, protein, fromfile or
kkonganti@1 675 translate. Default: dna
kkonganti@1 676
kkonganti@1 677 --sourmashsketch_p : Signature parameters to use. Default: abund
kkonganti@1 678 ,scaled=1000,k=51,k=61,k=71
kkonganti@1 679
kkonganti@1 680 --sourmashsketch_file : <path> A text file containing a list of
kkonganti@1 681 sequence files to load. Default: false
kkonganti@1 682
kkonganti@1 683 --sourmashsketch_f : Recompute signatures even if the file
kkonganti@1 684 exists. Default: false
kkonganti@1 685
kkonganti@1 686 --sourmashsketch_merge : Merge all input files into one signature
kkonganti@1 687 file with the specified name. Default:
kkonganti@1 688 false
kkonganti@1 689
kkonganti@1 690 --sourmashsketch_singleton : Compute a signature for each sequence
kkonganti@1 691 record individually. Default: true
kkonganti@1 692
kkonganti@1 693 --sourmashsketch_name : Name the signature generated from each file
kkonganti@1 694 after the first record in the file.
kkonganti@1 695 Default: false
kkonganti@1 696
kkonganti@1 697 --sourmashsketch_randomize : Shuffle the list of input files randomly.
kkonganti@1 698 Default: false
kkonganti@1 699
kkonganti@1 700 --sourmashgather_run : Run `sourmash gather` tool. Default: true
kkonganti@1 701
kkonganti@1 702 --sourmashgather_n : Number of results to report. By default,
kkonganti@1 703 will terminate at --sourmashgather_thr_bp
kkonganti@1 704 value. Default: false
kkonganti@1 705
kkonganti@1 706 --sourmashgather_thr_bp : Reporting threshold (in bp) for estimated
kkonganti@1 707 overlap with remaining query. Default:
kkonganti@1 708 false
kkonganti@1 709
kkonganti@1 710 --sourmashgather_ignoreabn : Do NOT use k-mer abundances if present.
kkonganti@1 711 Default: false
kkonganti@1 712
kkonganti@1 713 --sourmashgather_prefetch : Use prefetch before gather. Default: false
kkonganti@1 714
kkonganti@1 715 --sourmashgather_noprefetch : Do not use prefetch before gather. Default
kkonganti@1 716 : false
kkonganti@1 717
kkonganti@1 718 --sourmashgather_ani_ci : Output confidence intervals for ANI
kkonganti@1 719 estimates. Default: true
kkonganti@1 720
kkonganti@1 721 --sourmashgather_k : The k-mer size to select. Default: 71
kkonganti@1 722
kkonganti@1 723 --sourmashgather_protein : Choose a protein signature. Default: false
kkonganti@1 724
kkonganti@1 725 --sourmashgather_noprotein : Do not choose a protein signature. Default
kkonganti@1 726 : false
kkonganti@1 727
kkonganti@1 728 --sourmashgather_dayhoff : Choose Dayhoff-encoded amino acid
kkonganti@1 729 signatures. Default: false
kkonganti@1 730
kkonganti@1 731 --sourmashgather_nodayhoff : Do not choose Dayhoff-encoded amino acid
kkonganti@1 732 signatures. Default: false
kkonganti@1 733
kkonganti@1 734 --sourmashgather_hp : Choose hydrophobic-polar-encoded amino acid
kkonganti@1 735 signatures. Default: false
kkonganti@1 736
kkonganti@1 737 --sourmashgather_nohp : Do not choose hydrophobic-polar-encoded
kkonganti@1 738 amino acid signatures. Default: false
kkonganti@1 739
kkonganti@1 740 --sourmashgather_dna : Choose DNA signature. Default: true
kkonganti@1 741
kkonganti@1 742 --sourmashgather_nodna : Do not choose DNA signature. Default: false
kkonganti@1 743
kkonganti@1 744 --sourmashgather_scaled : Scaled value should be between 100 and 1e6
kkonganti@1 745 . Default: false
kkonganti@1 746
kkonganti@1 747 --sourmashgather_inc_pat : Search only signatures that match this
kkonganti@1 748 pattern in name, filename, or md5. Default
kkonganti@1 749 : false
kkonganti@1 750
kkonganti@1 751 --sourmashgather_exc_pat : Search only signatures that do not match
kkonganti@1 752 this pattern in name, filename, or md5.
kkonganti@1 753 Default: false
kkonganti@1 754
kkonganti@1 755 --sourmashsearch_run : Run `sourmash search` tool. Default: false
kkonganti@1 756
kkonganti@1 757 --sourmashsearch_n : Number of results to report. By default,
kkonganti@1 758 will terminate at --sourmashsearch_thr
kkonganti@1 759 value. Default: false
kkonganti@1 760
kkonganti@1 761 --sourmashsearch_thr : Reporting threshold (similarity) to return
kkonganti@1 762 results. Default: 0
kkonganti@1 763
kkonganti@1 764 --sourmashsearch_contain : Score based on containment rather than
kkonganti@1 765 similarity. Default: false
kkonganti@1 766
kkonganti@1 767 --sourmashsearch_maxcontain : Score based on max containment rather than
kkonganti@1 768 similarity. Default: false
kkonganti@1 769
kkonganti@1 770 --sourmashsearch_ignoreabn : Do NOT use k-mer abundances if present.
kkonganti@1 771 Default: true
kkonganti@1 772
kkonganti@1 773 --sourmashsearch_ani_ci : Output confidence intervals for ANI
kkonganti@1 774 estimates. Default: false
kkonganti@1 775
kkonganti@1 776 --sourmashsearch_k : The k-mer size to select. Default: 71
kkonganti@1 777
kkonganti@1 778 --sourmashsearch_protein : Choose a protein signature. Default: false
kkonganti@1 779
kkonganti@1 780 --sourmashsearch_noprotein : Do not choose a protein signature. Default
kkonganti@1 781 : false
kkonganti@1 782
kkonganti@1 783 --sourmashsearch_dayhoff : Choose Dayhoff-encoded amino acid
kkonganti@1 784 signatures. Default: false
kkonganti@1 785
kkonganti@1 786 --sourmashsearch_nodayhoff : Do not choose Dayhoff-encoded amino acid
kkonganti@1 787 signatures. Default: false
kkonganti@1 788
kkonganti@1 789 --sourmashsearch_hp : Choose hydrophobic-polar-encoded amino acid
kkonganti@1 790 signatures. Default: false
kkonganti@1 791
kkonganti@1 792 --sourmashsearch_nohp : Do not choose hydrophobic-polar-encoded
kkonganti@1 793 amino acid signatures. Default: false
kkonganti@1 794
kkonganti@1 795 --sourmashsearch_dna : Choose DNA signature. Default: true
kkonganti@1 796
kkonganti@1 797 --sourmashsearch_nodna : Do not choose DNA signature. Default: false
kkonganti@1 798
kkonganti@1 799 --sourmashsearch_scaled : Scaled value should be between 100 and 1e6
kkonganti@1 800 . Default: false
kkonganti@1 801
kkonganti@1 802 --sourmashsearch_inc_pat : Search only signatures that match this
kkonganti@1 803 pattern in name, filename, or md5. Default
kkonganti@1 804 : false
kkonganti@1 805
kkonganti@1 806 --sourmashsearch_exc_pat : Search only signatures that do not match
kkonganti@1 807 this pattern in name, filename, or md5.
kkonganti@1 808 Default: false
kkonganti@1 809
kkonganti@1 810 --sfhpy_run : Run the sourmash_filter_hits.py script.
kkonganti@1 811 Default: true
kkonganti@1 812
kkonganti@1 813 --sfhpy_fcn : Column name by which filtering of rows
kkonganti@1 814 should be applied. Default: f_match
kkonganti@1 815
kkonganti@1 816 --sfhpy_fcv : Remove genomes whose match with the query
kkonganti@1 817 FASTQ is less than this much. Default: 0.1
kkonganti@1 818
kkonganti@1 819 --sfhpy_gt : Apply greather than or equal to condition
kkonganti@1 820 on numeric values of --sfhpy_fcn column.
kkonganti@1 821 Default: true
kkonganti@1 822
kkonganti@1 823 --sfhpy_lt : Apply less than or equal to condition on
kkonganti@1 824 numeric values of --sfhpy_fcn column.
kkonganti@1 825 Default: false
kkonganti@1 826
kkonganti@1 827 --kmaindex_run : Run kma index tool. Default: true
kkonganti@1 828
kkonganti@1 829 --kmaindex_t_db : Add to existing DB. Default: false
kkonganti@1 830
kkonganti@1 831 --kmaindex_k : k-mer size. Default: 31
kkonganti@1 832
kkonganti@1 833 --kmaindex_m : Minimizer size. Default: false
kkonganti@1 834
kkonganti@1 835 --kmaindex_hc : Homopolymer compression. Default: false
kkonganti@1 836
kkonganti@1 837 --kmaindex_ML : Minimum length of templates. Defaults to --
kkonganti@1 838 kmaindex_k Default: false
kkonganti@1 839
kkonganti@1 840 --kmaindex_ME : Mega DB. Default: false
kkonganti@1 841
kkonganti@1 842 --kmaindex_Sparse : Make Sparse DB. Default: false
kkonganti@1 843
kkonganti@1 844 --kmaindex_ht : Homology template. Default: false
kkonganti@1 845
kkonganti@1 846 --kmaindex_hq : Homology query. Default: false
kkonganti@1 847
kkonganti@1 848 --kmaindex_and : Both homology thresholds have to reach.
kkonganti@1 849 Default: false
kkonganti@1 850
kkonganti@1 851 --kmaindex_nbp : No bias print. Default: false
kkonganti@1 852
kkonganti@1 853 --kmaalign_run : Run kma tool. Default: true
kkonganti@1 854
kkonganti@1 855 --kmaalign_int : Input file has interleaved reads. Default
kkonganti@1 856 : false
kkonganti@1 857
kkonganti@1 858 --kmaalign_ef : Output additional features. Default: false
kkonganti@1 859
kkonganti@1 860 --kmaalign_vcf : Output vcf file. 2 to apply FT. Default:
kkonganti@1 861 false
kkonganti@1 862
kkonganti@1 863 --kmaalign_sam : Output SAM, 4/2096 for mapped/aligned.
kkonganti@1 864 Default: false
kkonganti@1 865
kkonganti@1 866 --kmaalign_nc : No consensus file. Default: true
kkonganti@1 867
kkonganti@1 868 --kmaalign_na : No aln file. Default: true
kkonganti@1 869
kkonganti@1 870 --kmaalign_nf : No frag file. Default: true
kkonganti@1 871
kkonganti@1 872 --kmaalign_a : Output all template mappings. Default:
kkonganti@1 873 false
kkonganti@1 874
kkonganti@1 875 --kmaalign_and : Use both -mrs and p-value on consensus.
kkonganti@1 876 Default: false
kkonganti@1 877
kkonganti@1 878 --kmaalign_oa : Use neither -mrs or p-value on consensus.
kkonganti@1 879 Default: false
kkonganti@1 880
kkonganti@1 881 --kmaalign_bc : Minimum support to call bases. Default:
kkonganti@1 882 false
kkonganti@1 883
kkonganti@1 884 --kmaalign_bcNano : Altered indel calling for ONT data. Default
kkonganti@1 885 : false
kkonganti@1 886
kkonganti@1 887 --kmaalign_bcd : Minimum depth to call bases. Default: false
kkonganti@1 888
kkonganti@1 889 --kmaalign_bcg : Maintain insignificant gaps. Default: false
kkonganti@1 890
kkonganti@1 891 --kmaalign_ID : Minimum consensus ID. Default: false
kkonganti@1 892
kkonganti@1 893 --kmaalign_md : Minimum depth. Default: false
kkonganti@1 894
kkonganti@1 895 --kmaalign_dense : Skip insertion in consensus. Default: false
kkonganti@1 896
kkonganti@1 897 --kmaalign_ref_fsa : Use Ns on indels. Default: false
kkonganti@1 898
kkonganti@1 899 --kmaalign_Mt1 : Map everything to one template. Default:
kkonganti@1 900 false
kkonganti@1 901
kkonganti@1 902 --kmaalign_1t1 : Map one query to one template. Default:
kkonganti@1 903 false
kkonganti@1 904
kkonganti@1 905 --kmaalign_mrs : Minimum relative alignment score. Default:
kkonganti@1 906 false
kkonganti@1 907
kkonganti@1 908 --kmaalign_mrc : Minimum query coverage. Default: 0.99
kkonganti@1 909
kkonganti@1 910 --kmaalign_mp : Minimum phred score of trailing and leading
kkonganti@1 911 bases. Default: 30
kkonganti@1 912
kkonganti@1 913 --kmaalign_mq : Set the minimum mapping quality. Default:
kkonganti@1 914 false
kkonganti@1 915
kkonganti@1 916 --kmaalign_eq : Minimum average quality score. Default: 30
kkonganti@1 917
kkonganti@1 918 --kmaalign_5p : Trim 5 prime by this many bases. Default:
kkonganti@1 919 false
kkonganti@1 920
kkonganti@1 921 --kmaalign_3p : Trim 3 prime by this many bases Default:
kkonganti@1 922 false
kkonganti@1 923
kkonganti@1 924 --kmaalign_apm : Sets both -pm and -fpm Default: false
kkonganti@1 925
kkonganti@1 926 --kmaalign_cge : Set CGE penalties and rewards Default:
kkonganti@1 927 false
kkonganti@1 928
kkonganti@1 929 --salmonidx_run : Run `salmon index` tool. Default: true
kkonganti@1 930
kkonganti@1 931 --salmonidx_k : The size of k-mers that should be used for
kkonganti@1 932 the quasi index. Default: false
kkonganti@1 933
kkonganti@1 934 --salmonidx_gencode : This flag will expect the input transcript
kkonganti@1 935 FASTA to be in GENCODE format, and will
kkonganti@1 936 split the transcript name at the first `|`
kkonganti@1 937 character. These reduced names will be used
kkonganti@1 938 in the output and when looking for these
kkonganti@1 939 transcripts in a gene to transcript GTF.
kkonganti@1 940 Default: false
kkonganti@1 941
kkonganti@1 942 --salmonidx_features : This flag will expect the input reference
kkonganti@1 943 to be in the tsv file format, and will
kkonganti@1 944 split the feature name at the first `tab`
kkonganti@1 945 character. These reduced names will be used
kkonganti@1 946 in the output and when looking for the
kkonganti@1 947 sequence of the features. GTF. Default:
kkonganti@1 948 false
kkonganti@1 949
kkonganti@1 950 --salmonidx_keepDuplicates : This flag will disable the default indexing
kkonganti@1 951 behavior of discarding sequence-identical
kkonganti@1 952 duplicate transcripts. If this flag is
kkonganti@1 953 passed then duplicate transcripts that
kkonganti@1 954 appear in the input will be retained and
kkonganti@1 955 quantified separately. Default: false
kkonganti@1 956
kkonganti@1 957 --salmonidx_keepFixedFasta : Retain the fixed fasta file (without short
kkonganti@1 958 transcripts and duplicates, clipped, etc.)
kkonganti@1 959 generated during indexing. Default: false
kkonganti@1 960
kkonganti@1 961 --salmonidx_filterSize : The size of the Bloom filter that will be
kkonganti@1 962 used by TwoPaCo during indexing. The filter
kkonganti@1 963 will be of size 2^{filterSize}. A value of
kkonganti@1 964 -1 means that the filter size will be
kkonganti@1 965 automatically set based on the number of
kkonganti@1 966 distinct k-mers in the input, as estimated
kkonganti@1 967 by nthll. Default: false
kkonganti@1 968
kkonganti@1 969 --salmonidx_sparse : Build the index using a sparse sampling of
kkonganti@1 970 k-mer positions This will require less
kkonganti@1 971 memory (especially during quantification),
kkonganti@1 972 but will take longer to constructand can
kkonganti@1 973 slow down mapping / alignment. Default:
kkonganti@1 974 false
kkonganti@1 975
kkonganti@1 976 --salmonidx_n : Do not clip poly-A tails from the ends of
kkonganti@1 977 target sequences. Default: false
kkonganti@1 978
kkonganti@1 979 --gsrpy_run : Run the gen_salmon_res_table.py script.
kkonganti@1 980 Default: true
kkonganti@1 981
kkonganti@1 982 --gsrpy_url : Generate an additional column in final
kkonganti@1 983 results table which links out to NCBI
kkonganti@1 984 Pathogens Isolate Browser. Default: true
kkonganti@1 985
kkonganti@1 986 Help options :
kkonganti@1 987
kkonganti@1 988 --help : Display this message.
kkonganti@1 989
kkonganti@1 990 ```