Mercurial > repos > galaxytrakr > hfp_nowayout_awsbatch
comparison 0.5.0/readme/nowayout.md @ 0:3c767f9cfd88 draft default tip
planemo upload
| author | galaxytrakr |
|---|---|
| date | Fri, 29 May 2026 13:37:56 +0000 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| -1:000000000000 | 0:3c767f9cfd88 |
|---|---|
| 1 <p align="center"> | |
| 2 <img src="../assets/nowayout-icon.png" width="20%" height="20%" /> | |
| 3 </p> | |
| 4 | |
| 5 --- | |
| 6 | |
| 7 `nowayout` is a **super-fast** automated software pipeline for taxonomic classification of Eukaryotic mitochondrial reads. It uses a custom database to first identify mitochondrial reads and performs read classification on those identified reads. | |
| 8 | |
| 9 --- | |
| 10 | |
| 11 <!-- TOC --> | |
| 12 | |
| 13 - [Minimum Requirements](#minimum-requirements) | |
| 14 - [HFP GalaxyTrakr](#hfp-galaxytrakr) | |
| 15 - [Usage and Examples](#usage-and-examples) | |
| 16 - [Databases](#databases) | |
| 17 - [Input](#input) | |
| 18 - [Output](#output) | |
| 19 - [Preset filters](#preset-filters) | |
| 20 - [Computational resources](#computational-resources) | |
| 21 - [Runtime profiles](#runtime-profiles) | |
| 22 - [your_institution.config](#your_institutionconfig) | |
| 23 - [Test Run](#test-run) | |
| 24 - [nowayout CLI Help](#nowayout-cli-help) | |
| 25 | |
| 26 <!-- /TOC --> | |
| 27 | |
| 28 \ | |
| 29 | |
| 30 | |
| 31 ## Minimum Requirements | |
| 32 | |
| 33 1. [Nextflow version 25.04.6](https://github.com/nextflow-io/nextflow/releases/download/v25.04.6/nextflow). | |
| 34 - Make the `nextflow` binary executable (`chmod 755 nextflow`) and also make sure that it is made available in your `$PATH`. | |
| 35 - If your existing `JAVA` install does not support the newest **Nextflow** version, you can try **Amazon**'s `JAVA` (OpenJDK): [Corretto](https://docs.aws.amazon.com/corretto/latest/corretto-21-ug/downloads-list.html). | |
| 36 2. Either of `micromamba` (version `1.5.9`) or `docker` or `singularity` installed and made available in your `$PATH`. | |
| 37 - Running the workflow via `micromamba` software provisioning is **preferred** as it does not require any `sudo` or `admin` privileges or any other configurations with respect to the various container providers. | |
| 38 - To install `micromamba` for your system type, please follow these [installation steps](https://mamba.readthedocs.io/en/latest/installation/micromamba-installation.html#linux-and-macos) and make sure that the `micromamba` binary is made available in your `$PATH`. | |
| 39 - Just the `curl` step is sufficient to download the binary as far as running the workflows are concerned. | |
| 40 - Once you have finished the installation, **it is important that you downgrade `micromamba` to version `1.5.9`**. | |
| 41 - First check, if your version is other than `1.5.9` and if not, do the downgrade. | |
| 42 | |
| 43 ```bash | |
| 44 micromamba --version | |
| 45 micromamba self-update --version 1.5.9 -c conda-forge | |
| 46 ``` | |
| 47 | |
| 48 3. Minimum of 10 CPU cores and about 60 GBs for main workflow steps. More memory may be required if your **FASTQ** files are big. | |
| 49 | |
| 50 \ | |
| 51 | |
| 52 | |
| 53 ## HFP GalaxyTrakr | |
| 54 | |
| 55 The `nowayout` pipeline **will** be made available for use on the newest version of [Galaxy instance supported by HFP, FDA](https://galaxytrakr.org/) (`version >= 24.x`). Please check this space for announcements in this regard. | |
| 56 | |
| 57 Please note that the pipeline on [HFP GalaxyTrakr](https://galaxytrakr.org) in most cases may be a version older than the one on **GitHub** due to testing prioritization. | |
| 58 | |
| 59 \ | |
| 60 | |
| 61 | |
| 62 ## Usage and Examples | |
| 63 | |
| 64 Clone or download this repository and then call `cpipes`. | |
| 65 | |
| 66 ```bash | |
| 67 cpipes --pipeline nowayout [options] | |
| 68 ``` | |
| 69 | |
| 70 Alternatively, you can use `nextflow` to directly pull and run the pipeline. | |
| 71 | |
| 72 ```bash | |
| 73 nextflow pull CFSAN-Biostatistics/nowayout | |
| 74 nextflow list | |
| 75 nextflow info CFSAN-Biostatistics/nowayout | |
| 76 nextflow run CFSAN-Biostatistics/nowayout --pipeline nowayout --help | |
| 77 ``` | |
| 78 | |
| 79 \ | |
| 80 | |
| 81 | |
| 82 ### Databases | |
| 83 | |
| 84 --- | |
| 85 | |
| 86 The successful run of the workflow requires proper setup of the custom database files: | |
| 87 | |
| 88 - `nowayout_dbs`: [Download](https://cfsan-pub-xfer.s3.amazonaws.com/Kranti.Konganti/nowayout/nowayout_dbs.tar.bz2) (~ 22 GB). | |
| 89 | |
| 90 Once you have downloaded the databases, uncompress and set the **UNIX symbolic link** to the database folders in [assets](../assets/) folder as follows: | |
| 91 | |
| 92 ```bash | |
| 93 mkdir assets/dbfiles | |
| 94 cd assets/dbfiles | |
| 95 ln -s /path/to/nowayout_dbs/kma kma | |
| 96 ln -s /path/to/nowayout_dbs/reference reference | |
| 97 ln -s /path/to/nowayout_dbs/taxonomy taxonomy | |
| 98 ``` | |
| 99 | |
| 100 That's it! | |
| 101 | |
| 102 \ | |
| 103 | |
| 104 | |
| 105 ### Input | |
| 106 | |
| 107 --- | |
| 108 | |
| 109 The input to the workflow is a folder containing compressed (`.gz`) FASTQ files of long reads or short reads. Please note that the sample grouping happens automatically by the file name of the FASTQ file. If for example, a single sample is sequenced across multiple sequencing lanes, you can choose to group those FASTQ files into one sample by using the `--fq_filename_delim` and `--fq_filename_delim_idx` options. By default, `--fq_filename_delim` is set to `_` (underscore) and `--fq_filename_delim_idx` is set to 1. | |
| 110 | |
| 111 For example, if the directory contains FASTQ files as shown below: | |
| 112 | |
| 113 - KB-01_apple_L001_R1.fastq.gz | |
| 114 - KB-01_apple_L001_R2.fastq.gz | |
| 115 - KB-01_apple_L002_R1.fastq.gz | |
| 116 - KB-01_apple_L002_R2.fastq.gz | |
| 117 - KB-02_mango_L001_R1.fastq.gz | |
| 118 - KB-02_mango_L001_R2.fastq.gz | |
| 119 - KB-02_mango_L002_R1.fastq.gz | |
| 120 - KB-02_mango_L002_R2.fastq.gz | |
| 121 | |
| 122 Then, to create 2 sample groups, `apple` and `mango`, we split the file name by the delimitor (underscore in the case, which is default) and group by the first 2 words (`--fq_filename_delim_idx 2`). | |
| 123 | |
| 124 This goes without saying that all the FASTQ files should have uniform naming patterns so that `--fq_filename_delim` and `--fq_filename_delim_idx` options do not have any adverse effect in collecting and creating a sample metadata sheet. | |
| 125 | |
| 126 \ | |
| 127 | |
| 128 | |
| 129 ### Output | |
| 130 | |
| 131 --- | |
| 132 | |
| 133 All the outputs for each step are stored inside the folder mentioned with the `--output` option. A `multiqc_report.html` file inside the `nowayout-multiqc` folder can be opened in any browser on your local workstation which contains a consolidated brief report. | |
| 134 | |
| 135 Please note that the percentage relative abundances seen are relative to the total number of mitochondrial reads and not the total number of reads per sample. | |
| 136 | |
| 137 \ | |
| 138 | |
| 139 | |
| 140 ### Preset filters | |
| 141 | |
| 142 --- | |
| 143 | |
| 144 There are three preset threshold filters that are available with the pipeline: `--nowo_thresholds strict`, `--nowo_thresholds mild` and `--nowo_thresholds relax`. Use these options for exploration of results via multiple runs on the same input dataset. The default is `strict` thresholds. | |
| 145 | |
| 146 \ | |
| 147 | |
| 148 | |
| 149 ### Computational resources | |
| 150 | |
| 151 --- | |
| 152 | |
| 153 The workflows `nowayout` require at least a minimum of 10 CPU cores and 60 GBs of memory to successfully finish the workflow. | |
| 154 | |
| 155 \ | |
| 156 | |
| 157 | |
| 158 ### Runtime profiles | |
| 159 | |
| 160 --- | |
| 161 | |
| 162 You can use different run time profiles that suit your specific compute environments i.e., you can run the workflow locally on your machine or in a grid computing infrastructure. | |
| 163 | |
| 164 \ | |
| 165 | |
| 166 | |
| 167 Example: | |
| 168 | |
| 169 ```bash | |
| 170 cd /data/scratch/$USER | |
| 171 mkdir nf-cpipes | |
| 172 cd nf-cpipes | |
| 173 cpipes \ | |
| 174 --pipeline nowayout \ | |
| 175 --input /path/to/fastq_pass_dir \ | |
| 176 --output /path/to/where/output/should/go \ | |
| 177 -profile your_institution | |
| 178 ``` | |
| 179 | |
| 180 The above command would run the pipeline and store the output at the location per the `--output` flag and the **NEXTFLOW** reports are always stored in the current working directory from where `cpipes` is run. For example, for the above command, a directory called `CPIPES-nowayout` would hold all the **NEXTFLOW** related logs, reports and trace files. | |
| 181 | |
| 182 \ | |
| 183 | |
| 184 | |
| 185 ### `your_institution.config` | |
| 186 | |
| 187 --- | |
| 188 | |
| 189 In the above example, we can see that we have mentioned the run time profile as `your_institution`. For this to work, add the following lines at the end of [`computeinfra.config`](../conf/computeinfra.config) file which should be located inside the `conf` folder. For example, if your institution uses **SGE** or **UNIVA** for grid computing instead of **SLURM** and has a job queue named `normal.q`, then add these lines: | |
| 190 | |
| 191 \ | |
| 192 | |
| 193 | |
| 194 ```groovy | |
| 195 your_institution { | |
| 196 process.executor = 'sge' | |
| 197 process.queue = 'normal.q' | |
| 198 singularity.enabled = false | |
| 199 singularity.autoMounts = true | |
| 200 docker.enabled = false | |
| 201 params.enable_conda = true | |
| 202 conda.enabled = true | |
| 203 conda.useMicromamba = true | |
| 204 params.enable_module = false | |
| 205 } | |
| 206 ``` | |
| 207 | |
| 208 In the above example, by default, all the software provisioning choices are disabled except `conda`. You can also choose to remove the `process.queue` line altogether and the `nowayout` workflow will request the appropriate memory and number of CPU cores automatically, which ranges from 1 CPU, 1 GB and 1 hour for job completion up to 10 CPU cores, 1 TB and 120 hours for job completion. | |
| 209 | |
| 210 \ | |
| 211 | |
| 212 | |
| 213 ### Cloud computing | |
| 214 | |
| 215 --- | |
| 216 | |
| 217 You can run the workflow in the cloud (works only with proper set up of AWS resources). Add new run time profiles with required parameters per [Nextflow docs](https://www.nextflow.io/docs/latest/executor.html): | |
| 218 | |
| 219 \ | |
| 220 | |
| 221 | |
| 222 Example: | |
| 223 | |
| 224 ```groovy | |
| 225 my_aws_batch { | |
| 226 executor = 'awsbatch' | |
| 227 queue = 'my-batch-queue' | |
| 228 aws.batch.cliPath = '/home/ec2-user/miniconda/bin/aws' | |
| 229 aws.batch.region = 'us-east-1' | |
| 230 singularity.enabled = false | |
| 231 singularity.autoMounts = true | |
| 232 docker.enabled = true | |
| 233 params.conda_enabled = false | |
| 234 params.enable_module = false | |
| 235 } | |
| 236 ``` | |
| 237 | |
| 238 \ | |
| 239 | |
| 240 | |
| 241 ## Test Run | |
| 242 | |
| 243 After you make sure that you have all the [minimum requirements](#minimum-requirements) to run the workflow, you can try the `nowayout` on some datasets. | |
| 244 | |
| 245 - Download input reads [from S3](https://cfsan-pub-xfer.s3.amazonaws.com/Kranti.Konganti/nowayout/nowayout_test_reads.tar.bz2) (~ 8 GB). | |
| 246 - This dataset was part of the research for detecting and identifying insects or insect fragments in food, an essential component of food safety and regulatory monitoring. Insects such as **_Plodia interpunctella_** (Indian meal moth) and _**Tribolium castaneum**_ (red flour beetle) were intentionally spiked into wheat flour at varying concentrations to create benchmark samples. These serve as reference materials to test and validate molecular detection workflows. | |
| 247 - Download pre-formatted databases (**MANDATORY**) [from S3](https://cfsan-pub-xfer.s3.amazonaws.com/Kranti.Konganti/nowayout/nowayout_dbs.tar.bz2) (~ 22 GB). | |
| 248 - After successful download, untar and add **symbolic links** in [assets](../assets) folder as described in the [Databases](#databases) section. | |
| 249 - It is always a best practice to use absolute UNIX paths and real destinations of symbolic links during pipeline execution. For example, find out the real path(s) of your absolute UNIX path(s) and use that for the `--input` and `--output` options of the pipeline. | |
| 250 | |
| 251 ```bash | |
| 252 realpath /hpc/scratch/user/input/srr | |
| 253 ``` | |
| 254 | |
| 255 - Now run the workflow by ignoring quality values since these are simulated base qualities: | |
| 256 | |
| 257 ```bash | |
| 258 cpipes \ | |
| 259 --pipeline nowayout \ | |
| 260 --input /path/to/nowayout_test_reads \ | |
| 261 --output /path/to/nowayout_test_output \ | |
| 262 --fq_single_end true \ | |
| 263 -profile stdkondagac \ | |
| 264 -resume | |
| 265 ``` | |
| 266 | |
| 267 - After succesful run of the workflow, your **MultiQC** report should look something like [this](https://cfsan-pub-xfer.s3.us-east-1.amazonaws.com/Kranti.Konganti/nowayout/CPIPES-Report_multiqc_report.html). | |
| 268 | |
| 269 - `nowayout` also automatically generates [Krona](https://github.com/marbl/Krona) charts. The **Krona** chart for the above test run should look something like [this](https://cfsan-pub-xfer.s3.us-east-1.amazonaws.com/Kranti.Konganti/nowayout/CPIPES_nowayout_krona.html) | |
| 270 | |
| 271 Please note that the run time profile `stdkondagac` will run jobs locally using `micromamba` for software provisioning. The first time you run the command, a new folder called `kondagac_cache` will be created and subsequent runs should use this `conda` cache. | |
| 272 | |
| 273 \ | |
| 274 | |
| 275 | |
| 276 ## `nowayout` CLI Help | |
| 277 | |
| 278 ```text | |
| 279 cpipes --pipeline nowayout --help | |
| 280 | |
| 281 N E X T F L O W ~ version 24.10.4 | |
| 282 | |
| 283 Launching `/home/user/nowayout/cpipes` [sleepy_pauling] DSL2 - revision: 55d6f63710 | |
| 284 | |
| 285 ================================================================================ | |
| 286 (o) | |
| 287 ___ _ __ _ _ __ ___ ___ | |
| 288 / __|| '_ \ | || '_ \ / _ \/ __| | |
| 289 | (__ | |_) || || |_) || __/\__ \ | |
| 290 \___|| .__/ |_|| .__/ \___||___/ | |
| 291 | | | | | |
| 292 |_| |_| | |
| 293 -------------------------------------------------------------------------------- | |
| 294 A collection of modular pipelines at CFSAN, FDA. | |
| 295 -------------------------------------------------------------------------------- | |
| 296 Name : CPIPES | |
| 297 Author : Kranti.Konganti@fda.hhs.gov | |
| 298 Version : 0.8.0 | |
| 299 Center : CFSAN, FDA. | |
| 300 ================================================================================ | |
| 301 | |
| 302 Workflow : nowayout | |
| 303 | |
| 304 Author : Kranti Konganti | |
| 305 | |
| 306 Version : 0.5.0 | |
| 307 | |
| 308 | |
| 309 Usage : cpipes --pipeline nowayout [options] | |
| 310 | |
| 311 | |
| 312 Required : | |
| 313 | |
| 314 --input : Absolute path to directory containing FASTQ | |
| 315 files. The directory should contain only | |
| 316 FASTQ files as all the files within the | |
| 317 mentioned directory will be read. Ex: -- | |
| 318 input /path/to/fastq_pass | |
| 319 | |
| 320 --output : Absolute path to directory where all the | |
| 321 pipeline outputs should be stored. Ex: -- | |
| 322 output /path/to/output | |
| 323 | |
| 324 Other options : | |
| 325 | |
| 326 --metadata : Absolute path to metadata CSV file | |
| 327 containing five mandatory columns: sample, | |
| 328 fq1,fq2,strandedness,single_end. The fq1 | |
| 329 and fq2 columns contain absolute paths to | |
| 330 the FASTQ files. This option can be used in | |
| 331 place of --input option. This is rare. Ex | |
| 332 : --metadata samplesheet.csv | |
| 333 | |
| 334 --fq_suffix : The suffix of FASTQ files (Unpaired reads | |
| 335 or R1 reads or Long reads) if an input | |
| 336 directory is mentioned via --input option. | |
| 337 Default: _R1_001.fastq.gz | |
| 338 | |
| 339 --fq2_suffix : The suffix of FASTQ files (Paired-end reads | |
| 340 or R2 reads) if an input directory is | |
| 341 mentioned via --input option. Default: | |
| 342 _R2_001.fastq.gz | |
| 343 | |
| 344 --fq_filter_by_len : Remove FASTQ reads that are less than this | |
| 345 many bases. Default: 0 | |
| 346 | |
| 347 --fq_strandedness : The strandedness of the sequencing run. | |
| 348 This is mostly needed if your sequencing | |
| 349 run is RNA-SEQ. For most of the other runs | |
| 350 , it is probably safe to use unstranded for | |
| 351 the option. Default: unstranded | |
| 352 | |
| 353 --fq_single_end : SINGLE-END information will be auto- | |
| 354 detected but this option forces PAIRED-END | |
| 355 FASTQ files to be treated as SINGLE-END so | |
| 356 only read 1 information is included in auto | |
| 357 -generated samplesheet. Default: false | |
| 358 | |
| 359 --fq_filename_delim : Delimiter by which the file name is split | |
| 360 to obtain sample name. Default: _ | |
| 361 | |
| 362 --fq_filename_delim_idx : After splitting FASTQ file name by using | |
| 363 the --fq_filename_delim option, all | |
| 364 elements before this index (1-based) will | |
| 365 be joined to create final sample name. | |
| 366 Default: 1 | |
| 367 | |
| 368 --fastp_run : Run fastp tool. Default: true | |
| 369 | |
| 370 --fastp_failed_out : Specify whether to store reads that cannot | |
| 371 pass the filters. Default: false | |
| 372 | |
| 373 --fastp_merged_out : Specify whether to store merged output or | |
| 374 not. Default: false | |
| 375 | |
| 376 --fastp_overlapped_out : For each read pair, output the overlapped | |
| 377 region if it has no mismatched base. | |
| 378 Default: false | |
| 379 | |
| 380 --fastp_6 : Indicate that the input is using phred64 | |
| 381 scoring (it'll be converted to phred33, so | |
| 382 the output will still be phred33). Default | |
| 383 : false | |
| 384 | |
| 385 --fastp_reads_to_process : Specify how many reads/pairs are to be | |
| 386 processed. Default value 0 means process | |
| 387 all reads. Default: 0 | |
| 388 | |
| 389 --fastp_fix_mgi_id : The MGI FASTQ ID format is not compatible | |
| 390 with many BAM operation tools, enable this | |
| 391 option to fix it. Default: false | |
| 392 | |
| 393 --fastp_A : Disable adapter trimming. On by default. | |
| 394 Default: false | |
| 395 | |
| 396 --fastp_adapter_fasta : Specify a FASTA file to trim both read1 and | |
| 397 read2 (if PE) by all the sequences in this | |
| 398 FASTA file. Default: false | |
| 399 | |
| 400 --fastp_f : Trim how many bases in front of read1. | |
| 401 Default: 0 | |
| 402 | |
| 403 --fastp_t : Trim how many bases at the end of read1. | |
| 404 Default: 0 | |
| 405 | |
| 406 --fastp_b : Max length of read1 after trimming. Default | |
| 407 : 0 | |
| 408 | |
| 409 --fastp_F : Trim how many bases in front of read2. | |
| 410 Default: 0 | |
| 411 | |
| 412 --fastp_T : Trim how many bases at the end of read2. | |
| 413 Default: 0 | |
| 414 | |
| 415 --fastp_B : Max length of read2 after trimming. Default | |
| 416 : 0 | |
| 417 | |
| 418 --fastp_dedup : Enable deduplication to drop the duplicated | |
| 419 reads/pairs. Default: true | |
| 420 | |
| 421 --fastp_dup_calc_accuracy : Accuracy level to calculate duplication (1~ | |
| 422 6), higher level uses more memory (1G, 2G, | |
| 423 4G, 8G, 16G, 24G). Default 1 for no-dedup | |
| 424 mode, and 3 for dedup mode. Default: 6 | |
| 425 | |
| 426 --fastp_poly_g_min_len : The minimum length to detect polyG in the | |
| 427 read tail. Default: 10 | |
| 428 | |
| 429 --fastp_G : Disable polyG tail trimming. Default: true | |
| 430 | |
| 431 --fastp_x : Enable polyX trimming in 3' ends. Default: | |
| 432 false | |
| 433 | |
| 434 --fastp_poly_x_min_len : The minimum length to detect polyX in the | |
| 435 read tail. Default: 10 | |
| 436 | |
| 437 --fastp_cut_front : Move a sliding window from front (5') to | |
| 438 tail, drop the bases in the window if its | |
| 439 mean quality < threshold, stop otherwise. | |
| 440 Default: true | |
| 441 | |
| 442 --fastp_cut_tail : Move a sliding window from tail (3') to | |
| 443 front, drop the bases in the window if its | |
| 444 mean quality < threshold, stop otherwise. | |
| 445 Default: false | |
| 446 | |
| 447 --fastp_cut_right : Move a sliding window from tail, drop the | |
| 448 bases in the window and the right part if | |
| 449 its mean quality < threshold, and then stop | |
| 450 . Default: true | |
| 451 | |
| 452 --fastp_W : Sliding window size shared by -- | |
| 453 fastp_cut_front, --fastp_cut_tail and -- | |
| 454 fastp_cut_right. Default: 20 | |
| 455 | |
| 456 --fastp_M : The mean quality requirement shared by -- | |
| 457 fastp_cut_front, --fastp_cut_tail and -- | |
| 458 fastp_cut_right. Default: 30 | |
| 459 | |
| 460 --fastp_q : The quality value below which a base should | |
| 461 is not qualified. Default: 30 | |
| 462 | |
| 463 --fastp_u : What percent of bases are allowed to be | |
| 464 unqualified. Default: 40 | |
| 465 | |
| 466 --fastp_n : How many N's can a read have. Default: 5 | |
| 467 | |
| 468 --fastp_e : If the full reads' average quality is below | |
| 469 this value, then it is discarded. Default | |
| 470 : 0 | |
| 471 | |
| 472 --fastp_l : Reads shorter than this length will be | |
| 473 discarded. Default: 35 | |
| 474 | |
| 475 --fastp_max_len : Reads longer than this length will be | |
| 476 discarded. Default: 0 | |
| 477 | |
| 478 --fastp_y : Enable low complexity filter. The | |
| 479 complexity is defined as the percentage of | |
| 480 bases that are different from its next base | |
| 481 (base[i] != base[i+1]). Default: true | |
| 482 | |
| 483 --fastp_Y : The threshold for low complexity filter (0~ | |
| 484 100). Ex: A value of 30 means 30% | |
| 485 complexity is required. Default: 30 | |
| 486 | |
| 487 --fastp_U : Enable Unique Molecular Identifier (UMI) | |
| 488 pre-processing. Default: false | |
| 489 | |
| 490 --fastp_umi_loc : Specify the location of UMI, can be one of | |
| 491 index1/index2/read1/read2/per_index/ | |
| 492 per_read. Default: false | |
| 493 | |
| 494 --fastp_umi_len : If the UMI is in read1 or read2, its length | |
| 495 should be provided. Default: false | |
| 496 | |
| 497 --fastp_umi_prefix : If specified, an underline will be used to | |
| 498 connect prefix and UMI (i.e. prefix=UMI, | |
| 499 UMI=AATTCG, final=UMI_AATTCG). Default: | |
| 500 false | |
| 501 | |
| 502 --fastp_umi_skip : If the UMI is in read1 or read2, fastp can | |
| 503 skip several bases following the UMI. | |
| 504 Default: false | |
| 505 | |
| 506 --fastp_p : Enable overrepresented sequence analysis. | |
| 507 Default: true | |
| 508 | |
| 509 --fastp_P : One in this many number of reads will be | |
| 510 computed for overrepresentation analysis (1 | |
| 511 ~10000), smaller is slower. Default: 20 | |
| 512 | |
| 513 --kmaalign_run : Run kma tool. Default: true | |
| 514 | |
| 515 --kmaalign_int : Input file has interleaved reads. Default | |
| 516 : false | |
| 517 | |
| 518 --kmaalign_ef : Output additional features. Default: false | |
| 519 | |
| 520 --kmaalign_vcf : Output vcf file. 2 to apply FT. Default: | |
| 521 false | |
| 522 | |
| 523 --kmaalign_sam : Output SAM, 4/2096 for mapped/aligned. | |
| 524 Default: false | |
| 525 | |
| 526 --kmaalign_nc : No consensus file. Default: true | |
| 527 | |
| 528 --kmaalign_na : No aln file. Default: true | |
| 529 | |
| 530 --kmaalign_nf : No frag file. Default: false | |
| 531 | |
| 532 --kmaalign_a : Output all template mappings. Default: | |
| 533 false | |
| 534 | |
| 535 --kmaalign_and : Use both -mrs and p-value on consensus. | |
| 536 Default: true | |
| 537 | |
| 538 --kmaalign_oa : Use neither -mrs or p-value on consensus. | |
| 539 Default: false | |
| 540 | |
| 541 --kmaalign_bc : Minimum support to call bases. Default: | |
| 542 false | |
| 543 | |
| 544 --kmaalign_bcNano : Altered indel calling for ONT data. Default | |
| 545 : false | |
| 546 | |
| 547 --kmaalign_bcd : Minimum depth to call bases. Default: false | |
| 548 | |
| 549 --kmaalign_bcg : Maintain insignificant gaps. Default: false | |
| 550 | |
| 551 --kmaalign_ID : Minimum consensus ID. Default: 85.0 | |
| 552 | |
| 553 --kmaalign_md : Minimum depth. Default: false | |
| 554 | |
| 555 --kmaalign_dense : Skip insertion in consensus. Default: false | |
| 556 | |
| 557 --kmaalign_ref_fsa : Use Ns on indels. Default: false | |
| 558 | |
| 559 --kmaalign_Mt1 : Map everything to one template. Default: | |
| 560 false | |
| 561 | |
| 562 --kmaalign_1t1 : Map one query to one template. Default: | |
| 563 false | |
| 564 | |
| 565 --kmaalign_mrs : Minimum relative alignment score. Default: | |
| 566 0.99 | |
| 567 | |
| 568 --kmaalign_mrc : Minimum query coverage. Default: 0.99 | |
| 569 | |
| 570 --kmaalign_mp : Minimum phred score of trailing and leading | |
| 571 bases. Default: 30 | |
| 572 | |
| 573 --kmaalign_mq : Set the minimum mapping quality. Default: | |
| 574 false | |
| 575 | |
| 576 --kmaalign_eq : Minimum average quality score. Default: 30 | |
| 577 | |
| 578 --kmaalign_5p : Trim 5 prime by this many bases. Default: | |
| 579 false | |
| 580 | |
| 581 --kmaalign_3p : Trim 3 prime by this many bases Default: | |
| 582 false | |
| 583 | |
| 584 --kmaalign_apm : Sets both -pm and -fpm Default: false | |
| 585 | |
| 586 --kmaalign_cge : Set CGE penalties and rewards Default: | |
| 587 false | |
| 588 | |
| 589 --seqkit_grep_run : Run the seqkit `grep` tool. Default: true | |
| 590 | |
| 591 --seqkit_grep_n : Match by full name instead of just ID. | |
| 592 Default: undefined | |
| 593 | |
| 594 --seqkit_grep_s : Search subseq on seq, both positive and | |
| 595 negative strand are searched, and mismatch | |
| 596 allowed using flag --seqkit_grep_m. Default | |
| 597 : undefined | |
| 598 | |
| 599 --seqkit_grep_c : Input is circular genome Default: undefined | |
| 600 | |
| 601 --seqkit_grep_C : Just print a count of matching records. | |
| 602 With the --seqkit_grep_v flag, count non- | |
| 603 matching records. Default: undefined | |
| 604 | |
| 605 --seqkit_grep_i : Ignore case while using seqkit grep. | |
| 606 Default: undefined | |
| 607 | |
| 608 --seqkit_grep_v : Invert the match i.e. select non-matching | |
| 609 records. Default: undefined | |
| 610 | |
| 611 --seqkit_grep_m : Maximum mismatches when matching by | |
| 612 sequence. Default: undefined | |
| 613 | |
| 614 --seqkit_grep_r : Input patters are regular expressions. | |
| 615 Default: undefined | |
| 616 | |
| 617 --salmonidx_run : Run `salmon index` tool. Default: true | |
| 618 | |
| 619 --salmonidx_k : The size of k-mers that should be used for | |
| 620 the quasi index. Default: false | |
| 621 | |
| 622 --salmonidx_gencode : This flag will expect the input transcript | |
| 623 FASTA to be in GENCODE format, and will | |
| 624 split the transcript name at the first `|` | |
| 625 character. These reduced names will be used | |
| 626 in the output and when looking for these | |
| 627 transcripts in a gene to transcript GTF. | |
| 628 Default: false | |
| 629 | |
| 630 --salmonidx_features : This flag will expect the input reference | |
| 631 to be in the tsv file format, and will | |
| 632 split the feature name at the first `tab` | |
| 633 character. These reduced names will be used | |
| 634 in the output and when looking for the | |
| 635 sequence of the features. GTF. Default: | |
| 636 false | |
| 637 | |
| 638 --salmonidx_keepDuplicates : This flag will disable the default indexing | |
| 639 behavior of discarding sequence-identical | |
| 640 duplicate transcripts. If this flag is | |
| 641 passed then duplicate transcripts that | |
| 642 appear in the input will be retained and | |
| 643 quantified separately. Default: true | |
| 644 | |
| 645 --salmonidx_keepFixedFasta : Retain the fixed fasta file (without short | |
| 646 transcripts and duplicates, clipped, etc.) | |
| 647 generated during indexing. Default: false | |
| 648 | |
| 649 --salmonidx_filterSize : The size of the Bloom filter that will be | |
| 650 used by TwoPaCo during indexing. The filter | |
| 651 will be of size 2^{filterSize}. A value of | |
| 652 -1 means that the filter size will be | |
| 653 automatically set based on the number of | |
| 654 distinct k-mers in the input, as estimated | |
| 655 by nthll. Default: false | |
| 656 | |
| 657 --salmonidx_sparse : Build the index using a sparse sampling of | |
| 658 k-mer positions This will require less | |
| 659 memory (especially during quantification), | |
| 660 but will take longer to constructand can | |
| 661 slow down mapping / alignment. Default: | |
| 662 false | |
| 663 | |
| 664 --salmonidx_n : Do not clip poly-A tails from the ends of | |
| 665 target sequences. Default: true | |
| 666 | |
| 667 --sourmashsketch_run : Run `sourmash sketch dna` tool. Default: | |
| 668 true | |
| 669 | |
| 670 --sourmashsketch_mode : Select which type of signatures to be | |
| 671 created: dna, protein, fromfile or | |
| 672 translate. Default: dna | |
| 673 | |
| 674 --sourmashsketch_p : Signature parameters to use. Default: ' | |
| 675 abund,scaled=100,k=71 | |
| 676 | |
| 677 --sourmashsketch_file : <path> A text file containing a list of | |
| 678 sequence files to load. Default: false | |
| 679 | |
| 680 --sourmashsketch_f : Recompute signatures even if the file | |
| 681 exists. Default: false | |
| 682 | |
| 683 --sourmashsketch_name : Name the signature generated from each file | |
| 684 after the first record in the file. | |
| 685 Default: false | |
| 686 | |
| 687 --sourmashsketch_randomize : Shuffle the list of input files randomly. | |
| 688 Default: false | |
| 689 | |
| 690 --sourmashgather_run : Run `sourmash gather` tool. Default: true | |
| 691 | |
| 692 --sourmashgather_n : Number of results to report. By default, | |
| 693 will terminate at --sourmashgather_thr_bp | |
| 694 value. Default: false | |
| 695 | |
| 696 --sourmashgather_thr_bp : Reporting threshold (in bp) for estimated | |
| 697 overlap with remaining query. Default: 100 | |
| 698 | |
| 699 --sourmashgather_ani_ci : Output confidence intervals for ANI | |
| 700 estimates. Default: true | |
| 701 | |
| 702 --sourmashgather_k : The k-mer size to select. Default: 71 | |
| 703 | |
| 704 --sourmashgather_dna : Choose DNA signature. Default: true | |
| 705 | |
| 706 --sourmashgather_rna : Choose RNA signature. Default: false | |
| 707 | |
| 708 --sourmashgather_nuc : Choose Nucleotide signature. Default: false | |
| 709 | |
| 710 --sourmashgather_scaled : Scaled value should be between 100 and 1e6 | |
| 711 . Default: false | |
| 712 | |
| 713 --sourmashgather_inc_pat : Search only signatures that match this | |
| 714 pattern in name, filename, or md5. Default | |
| 715 : false | |
| 716 | |
| 717 --sourmashgather_exc_pat : Search only signatures that do not match | |
| 718 this pattern in name, filename, or md5. | |
| 719 Default: false | |
| 720 | |
| 721 --sfhpy_run : Run the sourmash_filter_hits.py script. | |
| 722 Default: true | |
| 723 | |
| 724 --sfhpy_fcn : Column name by which filtering of rows | |
| 725 should be applied. Default: f_match | |
| 726 | |
| 727 --sfhpy_fcv : Remove genomes whose match with the query | |
| 728 FASTQ is less than this much. Default: 0.8 | |
| 729 | |
| 730 --sfhpy_gt : Apply greather than or equal to condition | |
| 731 on numeric values of --sfhpy_fcn column. | |
| 732 Default: true | |
| 733 | |
| 734 --sfhpy_lt : Apply less than or equal to condition on | |
| 735 numeric values of --sfhpy_fcn column. | |
| 736 Default: false | |
| 737 | |
| 738 --sfhpy_all : Instead of just the column value, print | |
| 739 entire row. Default: true | |
| 740 | |
| 741 --gsalkronapy_run : Run the `gen_salmon_tph_and_krona_tsv.py` | |
| 742 script. Default: true | |
| 743 | |
| 744 --gsalkronapy_sf : Set the scaling factor by which TPM values | |
| 745 are scaled down. Default: 10000 | |
| 746 | |
| 747 --gsalkronapy_smres_suffix : Find the `sourmash gather` result files | |
| 748 ending in this suffix. Default: false | |
| 749 | |
| 750 --gsalkronapy_failed_suffix : Find the sample names which failed | |
| 751 classification stored inside the files | |
| 752 ending in this suffix. Default: false | |
| 753 | |
| 754 --gsalkronapy_num_lin_cols : Number of columns expected in the lineages | |
| 755 CSV file. Default: false | |
| 756 | |
| 757 --gsalkronapy_lin_regex : Number of columns expected in the lineages | |
| 758 CSV file. Default: false | |
| 759 | |
| 760 --krona_ktIT_run : Run the ktImportText (ktIT) from krona. | |
| 761 Default: true | |
| 762 | |
| 763 --krona_ktIT_n : Name of the highest level. Default: all | |
| 764 | |
| 765 --krona_ktIT_q : Input file(s) do not have a field for | |
| 766 quantity. Default: false | |
| 767 | |
| 768 --krona_ktIT_c : Combine data from each file, rather than | |
| 769 creating separate datasets within the chart | |
| 770 . Default: false | |
| 771 | |
| 772 Help options : | |
| 773 | |
| 774 --help : Display this message. | |
| 775 ``` |
