comparison 0.5.0/readme/nowayout.md @ 0:3c767f9cfd88 draft default tip

planemo upload
author galaxytrakr
date Fri, 29 May 2026 13:37:56 +0000
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:3c767f9cfd88
1 <p align="center">
2 <img src="../assets/nowayout-icon.png" width="20%" height="20%" />
3 </p>
4
5 ---
6
7 `nowayout` is a **super-fast** automated software pipeline for taxonomic classification of Eukaryotic mitochondrial reads. It uses a custom database to first identify mitochondrial reads and performs read classification on those identified reads.
8
9 ---
10
11 <!-- TOC -->
12
13 - [Minimum Requirements](#minimum-requirements)
14 - [HFP GalaxyTrakr](#hfp-galaxytrakr)
15 - [Usage and Examples](#usage-and-examples)
16 - [Databases](#databases)
17 - [Input](#input)
18 - [Output](#output)
19 - [Preset filters](#preset-filters)
20 - [Computational resources](#computational-resources)
21 - [Runtime profiles](#runtime-profiles)
22 - [your_institution.config](#your_institutionconfig)
23 - [Test Run](#test-run)
24 - [nowayout CLI Help](#nowayout-cli-help)
25
26 <!-- /TOC -->
27
28 \
29 &nbsp;
30
31 ## Minimum Requirements
32
33 1. [Nextflow version 25.04.6](https://github.com/nextflow-io/nextflow/releases/download/v25.04.6/nextflow).
34 - Make the `nextflow` binary executable (`chmod 755 nextflow`) and also make sure that it is made available in your `$PATH`.
35 - If your existing `JAVA` install does not support the newest **Nextflow** version, you can try **Amazon**'s `JAVA` (OpenJDK): [Corretto](https://docs.aws.amazon.com/corretto/latest/corretto-21-ug/downloads-list.html).
36 2. Either of `micromamba` (version `1.5.9`) or `docker` or `singularity` installed and made available in your `$PATH`.
37 - Running the workflow via `micromamba` software provisioning is **preferred** as it does not require any `sudo` or `admin` privileges or any other configurations with respect to the various container providers.
38 - To install `micromamba` for your system type, please follow these [installation steps](https://mamba.readthedocs.io/en/latest/installation/micromamba-installation.html#linux-and-macos) and make sure that the `micromamba` binary is made available in your `$PATH`.
39 - Just the `curl` step is sufficient to download the binary as far as running the workflows are concerned.
40 - Once you have finished the installation, **it is important that you downgrade `micromamba` to version `1.5.9`**.
41 - First check, if your version is other than `1.5.9` and if not, do the downgrade.
42
43 ```bash
44 micromamba --version
45 micromamba self-update --version 1.5.9 -c conda-forge
46 ```
47
48 3. Minimum of 10 CPU cores and about 60 GBs for main workflow steps. More memory may be required if your **FASTQ** files are big.
49
50 \
51 &nbsp;
52
53 ## HFP GalaxyTrakr
54
55 The `nowayout` pipeline **will** be made available for use on the newest version of [Galaxy instance supported by HFP, FDA](https://galaxytrakr.org/) (`version >= 24.x`). Please check this space for announcements in this regard.
56
57 Please note that the pipeline on [HFP GalaxyTrakr](https://galaxytrakr.org) in most cases may be a version older than the one on **GitHub** due to testing prioritization.
58
59 \
60 &nbsp;
61
62 ## Usage and Examples
63
64 Clone or download this repository and then call `cpipes`.
65
66 ```bash
67 cpipes --pipeline nowayout [options]
68 ```
69
70 Alternatively, you can use `nextflow` to directly pull and run the pipeline.
71
72 ```bash
73 nextflow pull CFSAN-Biostatistics/nowayout
74 nextflow list
75 nextflow info CFSAN-Biostatistics/nowayout
76 nextflow run CFSAN-Biostatistics/nowayout --pipeline nowayout --help
77 ```
78
79 \
80 &nbsp;
81
82 ### Databases
83
84 ---
85
86 The successful run of the workflow requires proper setup of the custom database files:
87
88 - `nowayout_dbs`: [Download](https://cfsan-pub-xfer.s3.amazonaws.com/Kranti.Konganti/nowayout/nowayout_dbs.tar.bz2) (~ 22 GB).
89
90 Once you have downloaded the databases, uncompress and set the **UNIX symbolic link** to the database folders in [assets](../assets/) folder as follows:
91
92 ```bash
93 mkdir assets/dbfiles
94 cd assets/dbfiles
95 ln -s /path/to/nowayout_dbs/kma kma
96 ln -s /path/to/nowayout_dbs/reference reference
97 ln -s /path/to/nowayout_dbs/taxonomy taxonomy
98 ```
99
100 That's it!
101
102 \
103 &nbsp;
104
105 ### Input
106
107 ---
108
109 The input to the workflow is a folder containing compressed (`.gz`) FASTQ files of long reads or short reads. Please note that the sample grouping happens automatically by the file name of the FASTQ file. If for example, a single sample is sequenced across multiple sequencing lanes, you can choose to group those FASTQ files into one sample by using the `--fq_filename_delim` and `--fq_filename_delim_idx` options. By default, `--fq_filename_delim` is set to `_` (underscore) and `--fq_filename_delim_idx` is set to 1.
110
111 For example, if the directory contains FASTQ files as shown below:
112
113 - KB-01_apple_L001_R1.fastq.gz
114 - KB-01_apple_L001_R2.fastq.gz
115 - KB-01_apple_L002_R1.fastq.gz
116 - KB-01_apple_L002_R2.fastq.gz
117 - KB-02_mango_L001_R1.fastq.gz
118 - KB-02_mango_L001_R2.fastq.gz
119 - KB-02_mango_L002_R1.fastq.gz
120 - KB-02_mango_L002_R2.fastq.gz
121
122 Then, to create 2 sample groups, `apple` and `mango`, we split the file name by the delimitor (underscore in the case, which is default) and group by the first 2 words (`--fq_filename_delim_idx 2`).
123
124 This goes without saying that all the FASTQ files should have uniform naming patterns so that `--fq_filename_delim` and `--fq_filename_delim_idx` options do not have any adverse effect in collecting and creating a sample metadata sheet.
125
126 \
127 &nbsp;
128
129 ### Output
130
131 ---
132
133 All the outputs for each step are stored inside the folder mentioned with the `--output` option. A `multiqc_report.html` file inside the `nowayout-multiqc` folder can be opened in any browser on your local workstation which contains a consolidated brief report.
134
135 Please note that the percentage relative abundances seen are relative to the total number of mitochondrial reads and not the total number of reads per sample.
136
137 \
138 &nbsp;
139
140 ### Preset filters
141
142 ---
143
144 There are three preset threshold filters that are available with the pipeline: `--nowo_thresholds strict`, `--nowo_thresholds mild` and `--nowo_thresholds relax`. Use these options for exploration of results via multiple runs on the same input dataset. The default is `strict` thresholds.
145
146 \
147 &nbsp;
148
149 ### Computational resources
150
151 ---
152
153 The workflows `nowayout` require at least a minimum of 10 CPU cores and 60 GBs of memory to successfully finish the workflow.
154
155 \
156 &nbsp;
157
158 ### Runtime profiles
159
160 ---
161
162 You can use different run time profiles that suit your specific compute environments i.e., you can run the workflow locally on your machine or in a grid computing infrastructure.
163
164 \
165 &nbsp;
166
167 Example:
168
169 ```bash
170 cd /data/scratch/$USER
171 mkdir nf-cpipes
172 cd nf-cpipes
173 cpipes \
174 --pipeline nowayout \
175 --input /path/to/fastq_pass_dir \
176 --output /path/to/where/output/should/go \
177 -profile your_institution
178 ```
179
180 The above command would run the pipeline and store the output at the location per the `--output` flag and the **NEXTFLOW** reports are always stored in the current working directory from where `cpipes` is run. For example, for the above command, a directory called `CPIPES-nowayout` would hold all the **NEXTFLOW** related logs, reports and trace files.
181
182 \
183 &nbsp;
184
185 ### `your_institution.config`
186
187 ---
188
189 In the above example, we can see that we have mentioned the run time profile as `your_institution`. For this to work, add the following lines at the end of [`computeinfra.config`](../conf/computeinfra.config) file which should be located inside the `conf` folder. For example, if your institution uses **SGE** or **UNIVA** for grid computing instead of **SLURM** and has a job queue named `normal.q`, then add these lines:
190
191 \
192 &nbsp;
193
194 ```groovy
195 your_institution {
196 process.executor = 'sge'
197 process.queue = 'normal.q'
198 singularity.enabled = false
199 singularity.autoMounts = true
200 docker.enabled = false
201 params.enable_conda = true
202 conda.enabled = true
203 conda.useMicromamba = true
204 params.enable_module = false
205 }
206 ```
207
208 In the above example, by default, all the software provisioning choices are disabled except `conda`. You can also choose to remove the `process.queue` line altogether and the `nowayout` workflow will request the appropriate memory and number of CPU cores automatically, which ranges from 1 CPU, 1 GB and 1 hour for job completion up to 10 CPU cores, 1 TB and 120 hours for job completion.
209
210 \
211 &nbsp;
212
213 ### Cloud computing
214
215 ---
216
217 You can run the workflow in the cloud (works only with proper set up of AWS resources). Add new run time profiles with required parameters per [Nextflow docs](https://www.nextflow.io/docs/latest/executor.html):
218
219 \
220 &nbsp;
221
222 Example:
223
224 ```groovy
225 my_aws_batch {
226 executor = 'awsbatch'
227 queue = 'my-batch-queue'
228 aws.batch.cliPath = '/home/ec2-user/miniconda/bin/aws'
229 aws.batch.region = 'us-east-1'
230 singularity.enabled = false
231 singularity.autoMounts = true
232 docker.enabled = true
233 params.conda_enabled = false
234 params.enable_module = false
235 }
236 ```
237
238 \
239 &nbsp;
240
241 ## Test Run
242
243 After you make sure that you have all the [minimum requirements](#minimum-requirements) to run the workflow, you can try the `nowayout` on some datasets.
244
245 - Download input reads [from S3](https://cfsan-pub-xfer.s3.amazonaws.com/Kranti.Konganti/nowayout/nowayout_test_reads.tar.bz2) (~ 8 GB).
246 - This dataset was part of the research for detecting and identifying insects or insect fragments in food, an essential component of food safety and regulatory monitoring. Insects such as **_Plodia interpunctella_** (Indian meal moth) and _**Tribolium castaneum**_ (red flour beetle) were intentionally spiked into wheat flour at varying concentrations to create benchmark samples. These serve as reference materials to test and validate molecular detection workflows.
247 - Download pre-formatted databases (**MANDATORY**) [from S3](https://cfsan-pub-xfer.s3.amazonaws.com/Kranti.Konganti/nowayout/nowayout_dbs.tar.bz2) (~ 22 GB).
248 - After successful download, untar and add **symbolic links** in [assets](../assets) folder as described in the [Databases](#databases) section.
249 - It is always a best practice to use absolute UNIX paths and real destinations of symbolic links during pipeline execution. For example, find out the real path(s) of your absolute UNIX path(s) and use that for the `--input` and `--output` options of the pipeline.
250
251 ```bash
252 realpath /hpc/scratch/user/input/srr
253 ```
254
255 - Now run the workflow by ignoring quality values since these are simulated base qualities:
256
257 ```bash
258 cpipes \
259 --pipeline nowayout \
260 --input /path/to/nowayout_test_reads \
261 --output /path/to/nowayout_test_output \
262 --fq_single_end true \
263 -profile stdkondagac \
264 -resume
265 ```
266
267 - After succesful run of the workflow, your **MultiQC** report should look something like [this](https://cfsan-pub-xfer.s3.us-east-1.amazonaws.com/Kranti.Konganti/nowayout/CPIPES-Report_multiqc_report.html).
268
269 - `nowayout` also automatically generates [Krona](https://github.com/marbl/Krona) charts. The **Krona** chart for the above test run should look something like [this](https://cfsan-pub-xfer.s3.us-east-1.amazonaws.com/Kranti.Konganti/nowayout/CPIPES_nowayout_krona.html)
270
271 Please note that the run time profile `stdkondagac` will run jobs locally using `micromamba` for software provisioning. The first time you run the command, a new folder called `kondagac_cache` will be created and subsequent runs should use this `conda` cache.
272
273 \
274 &nbsp;
275
276 ## `nowayout` CLI Help
277
278 ```text
279 cpipes --pipeline nowayout --help
280
281 N E X T F L O W ~ version 24.10.4
282
283 Launching `/home/user/nowayout/cpipes` [sleepy_pauling] DSL2 - revision: 55d6f63710
284
285 ================================================================================
286 (o)
287 ___ _ __ _ _ __ ___ ___
288 / __|| '_ \ | || '_ \ / _ \/ __|
289 | (__ | |_) || || |_) || __/\__ \
290 \___|| .__/ |_|| .__/ \___||___/
291 | | | |
292 |_| |_|
293 --------------------------------------------------------------------------------
294 A collection of modular pipelines at CFSAN, FDA.
295 --------------------------------------------------------------------------------
296 Name : CPIPES
297 Author : Kranti.Konganti@fda.hhs.gov
298 Version : 0.8.0
299 Center : CFSAN, FDA.
300 ================================================================================
301
302 Workflow : nowayout
303
304 Author : Kranti Konganti
305
306 Version : 0.5.0
307
308
309 Usage : cpipes --pipeline nowayout [options]
310
311
312 Required :
313
314 --input : Absolute path to directory containing FASTQ
315 files. The directory should contain only
316 FASTQ files as all the files within the
317 mentioned directory will be read. Ex: --
318 input /path/to/fastq_pass
319
320 --output : Absolute path to directory where all the
321 pipeline outputs should be stored. Ex: --
322 output /path/to/output
323
324 Other options :
325
326 --metadata : Absolute path to metadata CSV file
327 containing five mandatory columns: sample,
328 fq1,fq2,strandedness,single_end. The fq1
329 and fq2 columns contain absolute paths to
330 the FASTQ files. This option can be used in
331 place of --input option. This is rare. Ex
332 : --metadata samplesheet.csv
333
334 --fq_suffix : The suffix of FASTQ files (Unpaired reads
335 or R1 reads or Long reads) if an input
336 directory is mentioned via --input option.
337 Default: _R1_001.fastq.gz
338
339 --fq2_suffix : The suffix of FASTQ files (Paired-end reads
340 or R2 reads) if an input directory is
341 mentioned via --input option. Default:
342 _R2_001.fastq.gz
343
344 --fq_filter_by_len : Remove FASTQ reads that are less than this
345 many bases. Default: 0
346
347 --fq_strandedness : The strandedness of the sequencing run.
348 This is mostly needed if your sequencing
349 run is RNA-SEQ. For most of the other runs
350 , it is probably safe to use unstranded for
351 the option. Default: unstranded
352
353 --fq_single_end : SINGLE-END information will be auto-
354 detected but this option forces PAIRED-END
355 FASTQ files to be treated as SINGLE-END so
356 only read 1 information is included in auto
357 -generated samplesheet. Default: false
358
359 --fq_filename_delim : Delimiter by which the file name is split
360 to obtain sample name. Default: _
361
362 --fq_filename_delim_idx : After splitting FASTQ file name by using
363 the --fq_filename_delim option, all
364 elements before this index (1-based) will
365 be joined to create final sample name.
366 Default: 1
367
368 --fastp_run : Run fastp tool. Default: true
369
370 --fastp_failed_out : Specify whether to store reads that cannot
371 pass the filters. Default: false
372
373 --fastp_merged_out : Specify whether to store merged output or
374 not. Default: false
375
376 --fastp_overlapped_out : For each read pair, output the overlapped
377 region if it has no mismatched base.
378 Default: false
379
380 --fastp_6 : Indicate that the input is using phred64
381 scoring (it'll be converted to phred33, so
382 the output will still be phred33). Default
383 : false
384
385 --fastp_reads_to_process : Specify how many reads/pairs are to be
386 processed. Default value 0 means process
387 all reads. Default: 0
388
389 --fastp_fix_mgi_id : The MGI FASTQ ID format is not compatible
390 with many BAM operation tools, enable this
391 option to fix it. Default: false
392
393 --fastp_A : Disable adapter trimming. On by default.
394 Default: false
395
396 --fastp_adapter_fasta : Specify a FASTA file to trim both read1 and
397 read2 (if PE) by all the sequences in this
398 FASTA file. Default: false
399
400 --fastp_f : Trim how many bases in front of read1.
401 Default: 0
402
403 --fastp_t : Trim how many bases at the end of read1.
404 Default: 0
405
406 --fastp_b : Max length of read1 after trimming. Default
407 : 0
408
409 --fastp_F : Trim how many bases in front of read2.
410 Default: 0
411
412 --fastp_T : Trim how many bases at the end of read2.
413 Default: 0
414
415 --fastp_B : Max length of read2 after trimming. Default
416 : 0
417
418 --fastp_dedup : Enable deduplication to drop the duplicated
419 reads/pairs. Default: true
420
421 --fastp_dup_calc_accuracy : Accuracy level to calculate duplication (1~
422 6), higher level uses more memory (1G, 2G,
423 4G, 8G, 16G, 24G). Default 1 for no-dedup
424 mode, and 3 for dedup mode. Default: 6
425
426 --fastp_poly_g_min_len : The minimum length to detect polyG in the
427 read tail. Default: 10
428
429 --fastp_G : Disable polyG tail trimming. Default: true
430
431 --fastp_x : Enable polyX trimming in 3' ends. Default:
432 false
433
434 --fastp_poly_x_min_len : The minimum length to detect polyX in the
435 read tail. Default: 10
436
437 --fastp_cut_front : Move a sliding window from front (5') to
438 tail, drop the bases in the window if its
439 mean quality < threshold, stop otherwise.
440 Default: true
441
442 --fastp_cut_tail : Move a sliding window from tail (3') to
443 front, drop the bases in the window if its
444 mean quality < threshold, stop otherwise.
445 Default: false
446
447 --fastp_cut_right : Move a sliding window from tail, drop the
448 bases in the window and the right part if
449 its mean quality < threshold, and then stop
450 . Default: true
451
452 --fastp_W : Sliding window size shared by --
453 fastp_cut_front, --fastp_cut_tail and --
454 fastp_cut_right. Default: 20
455
456 --fastp_M : The mean quality requirement shared by --
457 fastp_cut_front, --fastp_cut_tail and --
458 fastp_cut_right. Default: 30
459
460 --fastp_q : The quality value below which a base should
461 is not qualified. Default: 30
462
463 --fastp_u : What percent of bases are allowed to be
464 unqualified. Default: 40
465
466 --fastp_n : How many N's can a read have. Default: 5
467
468 --fastp_e : If the full reads' average quality is below
469 this value, then it is discarded. Default
470 : 0
471
472 --fastp_l : Reads shorter than this length will be
473 discarded. Default: 35
474
475 --fastp_max_len : Reads longer than this length will be
476 discarded. Default: 0
477
478 --fastp_y : Enable low complexity filter. The
479 complexity is defined as the percentage of
480 bases that are different from its next base
481 (base[i] != base[i+1]). Default: true
482
483 --fastp_Y : The threshold for low complexity filter (0~
484 100). Ex: A value of 30 means 30%
485 complexity is required. Default: 30
486
487 --fastp_U : Enable Unique Molecular Identifier (UMI)
488 pre-processing. Default: false
489
490 --fastp_umi_loc : Specify the location of UMI, can be one of
491 index1/index2/read1/read2/per_index/
492 per_read. Default: false
493
494 --fastp_umi_len : If the UMI is in read1 or read2, its length
495 should be provided. Default: false
496
497 --fastp_umi_prefix : If specified, an underline will be used to
498 connect prefix and UMI (i.e. prefix=UMI,
499 UMI=AATTCG, final=UMI_AATTCG). Default:
500 false
501
502 --fastp_umi_skip : If the UMI is in read1 or read2, fastp can
503 skip several bases following the UMI.
504 Default: false
505
506 --fastp_p : Enable overrepresented sequence analysis.
507 Default: true
508
509 --fastp_P : One in this many number of reads will be
510 computed for overrepresentation analysis (1
511 ~10000), smaller is slower. Default: 20
512
513 --kmaalign_run : Run kma tool. Default: true
514
515 --kmaalign_int : Input file has interleaved reads. Default
516 : false
517
518 --kmaalign_ef : Output additional features. Default: false
519
520 --kmaalign_vcf : Output vcf file. 2 to apply FT. Default:
521 false
522
523 --kmaalign_sam : Output SAM, 4/2096 for mapped/aligned.
524 Default: false
525
526 --kmaalign_nc : No consensus file. Default: true
527
528 --kmaalign_na : No aln file. Default: true
529
530 --kmaalign_nf : No frag file. Default: false
531
532 --kmaalign_a : Output all template mappings. Default:
533 false
534
535 --kmaalign_and : Use both -mrs and p-value on consensus.
536 Default: true
537
538 --kmaalign_oa : Use neither -mrs or p-value on consensus.
539 Default: false
540
541 --kmaalign_bc : Minimum support to call bases. Default:
542 false
543
544 --kmaalign_bcNano : Altered indel calling for ONT data. Default
545 : false
546
547 --kmaalign_bcd : Minimum depth to call bases. Default: false
548
549 --kmaalign_bcg : Maintain insignificant gaps. Default: false
550
551 --kmaalign_ID : Minimum consensus ID. Default: 85.0
552
553 --kmaalign_md : Minimum depth. Default: false
554
555 --kmaalign_dense : Skip insertion in consensus. Default: false
556
557 --kmaalign_ref_fsa : Use Ns on indels. Default: false
558
559 --kmaalign_Mt1 : Map everything to one template. Default:
560 false
561
562 --kmaalign_1t1 : Map one query to one template. Default:
563 false
564
565 --kmaalign_mrs : Minimum relative alignment score. Default:
566 0.99
567
568 --kmaalign_mrc : Minimum query coverage. Default: 0.99
569
570 --kmaalign_mp : Minimum phred score of trailing and leading
571 bases. Default: 30
572
573 --kmaalign_mq : Set the minimum mapping quality. Default:
574 false
575
576 --kmaalign_eq : Minimum average quality score. Default: 30
577
578 --kmaalign_5p : Trim 5 prime by this many bases. Default:
579 false
580
581 --kmaalign_3p : Trim 3 prime by this many bases Default:
582 false
583
584 --kmaalign_apm : Sets both -pm and -fpm Default: false
585
586 --kmaalign_cge : Set CGE penalties and rewards Default:
587 false
588
589 --seqkit_grep_run : Run the seqkit `grep` tool. Default: true
590
591 --seqkit_grep_n : Match by full name instead of just ID.
592 Default: undefined
593
594 --seqkit_grep_s : Search subseq on seq, both positive and
595 negative strand are searched, and mismatch
596 allowed using flag --seqkit_grep_m. Default
597 : undefined
598
599 --seqkit_grep_c : Input is circular genome Default: undefined
600
601 --seqkit_grep_C : Just print a count of matching records.
602 With the --seqkit_grep_v flag, count non-
603 matching records. Default: undefined
604
605 --seqkit_grep_i : Ignore case while using seqkit grep.
606 Default: undefined
607
608 --seqkit_grep_v : Invert the match i.e. select non-matching
609 records. Default: undefined
610
611 --seqkit_grep_m : Maximum mismatches when matching by
612 sequence. Default: undefined
613
614 --seqkit_grep_r : Input patters are regular expressions.
615 Default: undefined
616
617 --salmonidx_run : Run `salmon index` tool. Default: true
618
619 --salmonidx_k : The size of k-mers that should be used for
620 the quasi index. Default: false
621
622 --salmonidx_gencode : This flag will expect the input transcript
623 FASTA to be in GENCODE format, and will
624 split the transcript name at the first `|`
625 character. These reduced names will be used
626 in the output and when looking for these
627 transcripts in a gene to transcript GTF.
628 Default: false
629
630 --salmonidx_features : This flag will expect the input reference
631 to be in the tsv file format, and will
632 split the feature name at the first `tab`
633 character. These reduced names will be used
634 in the output and when looking for the
635 sequence of the features. GTF. Default:
636 false
637
638 --salmonidx_keepDuplicates : This flag will disable the default indexing
639 behavior of discarding sequence-identical
640 duplicate transcripts. If this flag is
641 passed then duplicate transcripts that
642 appear in the input will be retained and
643 quantified separately. Default: true
644
645 --salmonidx_keepFixedFasta : Retain the fixed fasta file (without short
646 transcripts and duplicates, clipped, etc.)
647 generated during indexing. Default: false
648
649 --salmonidx_filterSize : The size of the Bloom filter that will be
650 used by TwoPaCo during indexing. The filter
651 will be of size 2^{filterSize}. A value of
652 -1 means that the filter size will be
653 automatically set based on the number of
654 distinct k-mers in the input, as estimated
655 by nthll. Default: false
656
657 --salmonidx_sparse : Build the index using a sparse sampling of
658 k-mer positions This will require less
659 memory (especially during quantification),
660 but will take longer to constructand can
661 slow down mapping / alignment. Default:
662 false
663
664 --salmonidx_n : Do not clip poly-A tails from the ends of
665 target sequences. Default: true
666
667 --sourmashsketch_run : Run `sourmash sketch dna` tool. Default:
668 true
669
670 --sourmashsketch_mode : Select which type of signatures to be
671 created: dna, protein, fromfile or
672 translate. Default: dna
673
674 --sourmashsketch_p : Signature parameters to use. Default: '
675 abund,scaled=100,k=71
676
677 --sourmashsketch_file : <path> A text file containing a list of
678 sequence files to load. Default: false
679
680 --sourmashsketch_f : Recompute signatures even if the file
681 exists. Default: false
682
683 --sourmashsketch_name : Name the signature generated from each file
684 after the first record in the file.
685 Default: false
686
687 --sourmashsketch_randomize : Shuffle the list of input files randomly.
688 Default: false
689
690 --sourmashgather_run : Run `sourmash gather` tool. Default: true
691
692 --sourmashgather_n : Number of results to report. By default,
693 will terminate at --sourmashgather_thr_bp
694 value. Default: false
695
696 --sourmashgather_thr_bp : Reporting threshold (in bp) for estimated
697 overlap with remaining query. Default: 100
698
699 --sourmashgather_ani_ci : Output confidence intervals for ANI
700 estimates. Default: true
701
702 --sourmashgather_k : The k-mer size to select. Default: 71
703
704 --sourmashgather_dna : Choose DNA signature. Default: true
705
706 --sourmashgather_rna : Choose RNA signature. Default: false
707
708 --sourmashgather_nuc : Choose Nucleotide signature. Default: false
709
710 --sourmashgather_scaled : Scaled value should be between 100 and 1e6
711 . Default: false
712
713 --sourmashgather_inc_pat : Search only signatures that match this
714 pattern in name, filename, or md5. Default
715 : false
716
717 --sourmashgather_exc_pat : Search only signatures that do not match
718 this pattern in name, filename, or md5.
719 Default: false
720
721 --sfhpy_run : Run the sourmash_filter_hits.py script.
722 Default: true
723
724 --sfhpy_fcn : Column name by which filtering of rows
725 should be applied. Default: f_match
726
727 --sfhpy_fcv : Remove genomes whose match with the query
728 FASTQ is less than this much. Default: 0.8
729
730 --sfhpy_gt : Apply greather than or equal to condition
731 on numeric values of --sfhpy_fcn column.
732 Default: true
733
734 --sfhpy_lt : Apply less than or equal to condition on
735 numeric values of --sfhpy_fcn column.
736 Default: false
737
738 --sfhpy_all : Instead of just the column value, print
739 entire row. Default: true
740
741 --gsalkronapy_run : Run the `gen_salmon_tph_and_krona_tsv.py`
742 script. Default: true
743
744 --gsalkronapy_sf : Set the scaling factor by which TPM values
745 are scaled down. Default: 10000
746
747 --gsalkronapy_smres_suffix : Find the `sourmash gather` result files
748 ending in this suffix. Default: false
749
750 --gsalkronapy_failed_suffix : Find the sample names which failed
751 classification stored inside the files
752 ending in this suffix. Default: false
753
754 --gsalkronapy_num_lin_cols : Number of columns expected in the lineages
755 CSV file. Default: false
756
757 --gsalkronapy_lin_regex : Number of columns expected in the lineages
758 CSV file. Default: false
759
760 --krona_ktIT_run : Run the ktImportText (ktIT) from krona.
761 Default: true
762
763 --krona_ktIT_n : Name of the highest level. Default: all
764
765 --krona_ktIT_q : Input file(s) do not have a field for
766 quantity. Default: false
767
768 --krona_ktIT_c : Combine data from each file, rather than
769 creating separate datasets within the chart
770 . Default: false
771
772 Help options :
773
774 --help : Display this message.
775 ```