kkonganti@1
|
1 # bettercallsal
|
kkonganti@1
|
2
|
kkonganti@1
|
3 `bettercallsal` is an automated workflow to assign Salmonella serotype based on [NCBI Pathogens Database](https://www.ncbi.nlm.nih.gov/pathogens). It uses `MASH` to reduce the search space followed by additional genome filtering with `sourmash`. It then performs genome based alignment with `kma` followed by count generation using `salmon`. This workflow is especially useful in a case where a sample is of multi-serovar mixture.
|
kkonganti@1
|
4
|
kkonganti@1
|
5 \
|
kkonganti@1
|
6
|
kkonganti@1
|
7
|
kkonganti@1
|
8 <!-- TOC -->
|
kkonganti@1
|
9
|
kkonganti@1
|
10 - [Minimum Requirements](#minimum-requirements)
|
kkonganti@1
|
11 - [Usage and Examples](#usage-and-examples)
|
kkonganti@1
|
12 - [Database](#database)
|
kkonganti@1
|
13 - [Input](#input)
|
kkonganti@1
|
14 - [Output](#output)
|
kkonganti@1
|
15 - [Computational resources](#computational-resources)
|
kkonganti@1
|
16 - [Runtime profiles](#runtime-profiles)
|
kkonganti@1
|
17 - [your_institution.config](#your_institutionconfig)
|
kkonganti@1
|
18 - [Cloud computing](#cloud-computing)
|
kkonganti@1
|
19 - [Example data](#example-data)
|
kkonganti@1
|
20 - [Using sourmash](#using-sourmash)
|
kkonganti@1
|
21 - [bettercallsal CLI Help](#bettercallsal-cli-help)
|
kkonganti@1
|
22
|
kkonganti@1
|
23 <!-- /TOC -->
|
kkonganti@1
|
24
|
kkonganti@1
|
25 \
|
kkonganti@1
|
26
|
kkonganti@1
|
27
|
kkonganti@1
|
28 ## Minimum Requirements
|
kkonganti@1
|
29
|
kkonganti@1
|
30 1. [Nextflow version 22.10.0](https://github.com/nextflow-io/nextflow/releases/download/v22.10.0/nextflow).
|
kkonganti@1
|
31 - Make the `nextflow` binary executable (`chmod 755 nextflow`) and also make sure that it is made available in your `$PATH`.
|
kkonganti@1
|
32 - If your existing `JAVA` install does not support the newest **Nextflow** version, you can try **Amazon**'s `JAVA` (OpenJDK): [Corretto](https://corretto.aws/downloads/latest/amazon-corretto-17-x64-linux-jdk.tar.gz).
|
kkonganti@1
|
33 2. Either of `micromamba` or `docker` or `singularity` installed and made available in your `$PATH`.
|
kkonganti@1
|
34 - Running the workflow via `micromamba` software provisioning is **preferred** as it does not require any `sudo` or `admin` privileges or any other configurations with respect to the various container providers.
|
kkonganti@1
|
35 - To install `micromamba` for your system type, please follow these [installation steps](https://mamba.readthedocs.io/en/latest/installation.html#manual-installation) and make sure that the `micromamba` binary is made available in your `$PATH`.
|
kkonganti@1
|
36 - Just the `curl` step is sufficient to download the binary as far as running the workflows are concerned.
|
kkonganti@1
|
37 3. Minimum of 10 CPU cores and about 16 GBs for main workflow steps. More memory may be required if your **FASTQ** files are big.
|
kkonganti@1
|
38
|
kkonganti@1
|
39 \
|
kkonganti@1
|
40
|
kkonganti@1
|
41
|
kkonganti@1
|
42 ## Usage and Examples
|
kkonganti@1
|
43
|
kkonganti@1
|
44 Clone or download this repository and then call `cpipes`.
|
kkonganti@1
|
45
|
kkonganti@1
|
46 ```bash
|
kkonganti@1
|
47 cpipes --pipeline bettercallsal [options]
|
kkonganti@1
|
48 ```
|
kkonganti@1
|
49
|
kkonganti@1
|
50 \
|
kkonganti@1
|
51
|
kkonganti@1
|
52
|
kkonganti@1
|
53 **Example**: Run the default `bettercallsal` pipeline in single-end mode.
|
kkonganti@1
|
54
|
kkonganti@1
|
55 ```bash
|
kkonganti@1
|
56 cd /data/scratch/$USER
|
kkonganti@1
|
57 mkdir nf-cpipes
|
kkonganti@1
|
58 cd nf-cpipes
|
kkonganti@1
|
59 cpipes
|
kkonganti@1
|
60 --pipeline bettercallsal \
|
kkonganti@1
|
61 --input /path/to/illumina/fastq/dir \
|
kkonganti@1
|
62 --output /path/to/output \
|
kkonganti@1
|
63 --bcs_root_dbdir /data/Kranti_Konganti/bettercallsal_db
|
kkonganti@1
|
64 ```
|
kkonganti@1
|
65
|
kkonganti@1
|
66 \
|
kkonganti@1
|
67
|
kkonganti@1
|
68
|
kkonganti@1
|
69 **Example**: Run the `bettercallsal` pipeline in paired-end mode. In this mode, the `R1` and `R2` files are concatenated. We have found that concatenated reads yields better calling rates. Please refer to the **Methods** and the **Results** section in our [preprint](https://www.biorxiv.org/content/10.1101/2023.04.06.535929v1.full) for more information. Users can still choose to use `bbmerge.sh` by adding the following options on the command-line: `--bbmerge_run true --bcs_concat_pe false`.
|
kkonganti@1
|
70
|
kkonganti@1
|
71 ```bash
|
kkonganti@1
|
72 cd /data/scratch/$USER
|
kkonganti@1
|
73 mkdir nf-cpipes
|
kkonganti@1
|
74 cd nf-cpipes
|
kkonganti@1
|
75 cpipes \
|
kkonganti@1
|
76 --pipeline bettercallsal \
|
kkonganti@1
|
77 --input /path/to/illumina/fastq/dir \
|
kkonganti@1
|
78 --output /path/to/output \
|
kkonganti@1
|
79 --bcs_root_dbdir /data/Kranti_Konganti/bettercallsal_db \
|
kkonganti@1
|
80 --fq_single_end false \
|
kkonganti@1
|
81 --fq_suffix '_R1_001.fastq.gz'
|
kkonganti@1
|
82 ```
|
kkonganti@1
|
83
|
kkonganti@1
|
84 \
|
kkonganti@1
|
85
|
kkonganti@1
|
86
|
kkonganti@1
|
87 ### Database
|
kkonganti@1
|
88
|
kkonganti@1
|
89 ---
|
kkonganti@1
|
90
|
kkonganti@1
|
91 The successful run of the workflow requires certain database flat files specific for the workflow.
|
kkonganti@1
|
92
|
kkonganti@1
|
93 Please refer to `bettercallsal_db` [README](./bettercallsal_db.md) if you would like to run the workflow on the latest version of the **PDG** release.
|
kkonganti@1
|
94
|
kkonganti@1
|
95
|
kkonganti@1
|
96
|
kkonganti@1
|
97 ### Input
|
kkonganti@1
|
98
|
kkonganti@1
|
99 ---
|
kkonganti@1
|
100
|
kkonganti@1
|
101 The input to the workflow is a folder containing compressed (`.gz`) FASTQ files. Please note that the sample grouping happens automatically by the file name of the FASTQ file. If for example, a single sample is sequenced across multiple sequencing lanes, you can choose to group those FASTQ files into one sample by using the `--fq_filename_delim` and `--fq_filename_delim_idx` options. By default, `--fq_filename_delim` is set to `_` (underscore) and `--fq_filename_delim_idx` is set to 1.
|
kkonganti@1
|
102
|
kkonganti@1
|
103 For example, if the directory contains FASTQ files as shown below:
|
kkonganti@1
|
104
|
kkonganti@1
|
105 - KB-01_apple_L001_R1.fastq.gz
|
kkonganti@1
|
106 - KB-01_apple_L001_R2.fastq.gz
|
kkonganti@1
|
107 - KB-01_apple_L002_R1.fastq.gz
|
kkonganti@1
|
108 - KB-01_apple_L002_R2.fastq.gz
|
kkonganti@1
|
109 - KB-02_mango_L001_R1.fastq.gz
|
kkonganti@1
|
110 - KB-02_mango_L001_R2.fastq.gz
|
kkonganti@1
|
111 - KB-02_mango_L002_R1.fastq.gz
|
kkonganti@1
|
112 - KB-02_mango_L002_R2.fastq.gz
|
kkonganti@1
|
113
|
kkonganti@1
|
114 Then, to create 2 sample groups, `apple` and `mango`, we split the file name by the delimitor (underscore in the case, which is default) and group by the first 2 words (`--fq_filename_delim_idx 2`).
|
kkonganti@1
|
115
|
kkonganti@1
|
116 This goes without saying that all the FASTQ files should have uniform naming patterns so that `--fq_filename_delim` and `--fq_filename_delim_idx` options do not have any adverse effect in collecting and creating a sample metadata sheet.
|
kkonganti@1
|
117
|
kkonganti@1
|
118 \
|
kkonganti@1
|
119
|
kkonganti@1
|
120
|
kkonganti@1
|
121 ### Output
|
kkonganti@1
|
122
|
kkonganti@1
|
123 ---
|
kkonganti@1
|
124
|
kkonganti@1
|
125 All the outputs for each step are stored inside the folder mentioned with the `--output` option. A `multiqc_report.html` file inside the `bettercallsal-multiqc` folder can be opened in any browser on your local workstation which contains a consolidated brief report.
|
kkonganti@1
|
126
|
kkonganti@1
|
127 \
|
kkonganti@1
|
128
|
kkonganti@1
|
129
|
kkonganti@1
|
130 ### Computational resources
|
kkonganti@1
|
131
|
kkonganti@1
|
132 ---
|
kkonganti@1
|
133
|
kkonganti@1
|
134 The workflow `bettercallsal` requires at least a minimum of 16 GBs of memory to successfully finish the workflow. By default, `bettercallsal` uses 10 CPU cores where possible. You can change this behavior and adjust the CPU cores with `--max_cpus` option.
|
kkonganti@1
|
135
|
kkonganti@1
|
136 \
|
kkonganti@1
|
137
|
kkonganti@1
|
138
|
kkonganti@1
|
139 Example:
|
kkonganti@1
|
140
|
kkonganti@1
|
141 ```bash
|
kkonganti@1
|
142 cpipes \
|
kkonganti@1
|
143 --pipeline bettercallsal \
|
kkonganti@1
|
144 --input /path/to/bettercallsal_sim_reads \
|
kkonganti@1
|
145 --output /path/to/bettercallsal_sim_reads_output \
|
kkonganti@1
|
146 --bcs_root_dbdir /path/to/PDG000000002.2537
|
kkonganti@1
|
147 --kmaalign_ignorequals \
|
kkonganti@1
|
148 --max_cpus 5 \
|
kkonganti@1
|
149 -profile stdkondagac \
|
kkonganti@1
|
150 -resume
|
kkonganti@1
|
151 ```
|
kkonganti@1
|
152
|
kkonganti@1
|
153 \
|
kkonganti@1
|
154
|
kkonganti@1
|
155
|
kkonganti@1
|
156 ### Runtime profiles
|
kkonganti@1
|
157
|
kkonganti@1
|
158 ---
|
kkonganti@1
|
159
|
kkonganti@1
|
160 You can use different run time profiles that suit your specific compute environments i.e., you can run the workflow locally on your machine or in a grid computing infrastructure.
|
kkonganti@1
|
161
|
kkonganti@1
|
162 \
|
kkonganti@1
|
163
|
kkonganti@1
|
164
|
kkonganti@1
|
165 Example:
|
kkonganti@1
|
166
|
kkonganti@1
|
167 ```bash
|
kkonganti@1
|
168 cd /data/scratch/$USER
|
kkonganti@1
|
169 mkdir nf-cpipes
|
kkonganti@1
|
170 cd nf-cpipes
|
kkonganti@1
|
171 cpipes \
|
kkonganti@1
|
172 --pipeline bettercallsal \
|
kkonganti@1
|
173 --input /path/to/fastq_pass_dir \
|
kkonganti@1
|
174 --output /path/to/where/output/should/go \
|
kkonganti@1
|
175 -profile your_institution
|
kkonganti@1
|
176 ```
|
kkonganti@1
|
177
|
kkonganti@1
|
178 The above command would run the pipeline and store the output at the location per the `--output` flag and the **NEXTFLOW** reports are always stored in the current working directory from where `cpipes` is run. For example, for the above command, a directory called `CPIPES-bettercallsal` would hold all the **NEXTFLOW** related logs, reports and trace files.
|
kkonganti@1
|
179
|
kkonganti@1
|
180 \
|
kkonganti@1
|
181
|
kkonganti@1
|
182
|
kkonganti@1
|
183 ### `your_institution.config`
|
kkonganti@1
|
184
|
kkonganti@1
|
185 ---
|
kkonganti@1
|
186
|
kkonganti@1
|
187 In the above example, we can see that we have mentioned the run time profile as `your_institution`. For this to work, add the following lines at the end of [`computeinfra.config`](../conf/computeinfra.config) file which should be located inside the `conf` folder. For example, if your institution uses **SGE** or **UNIVA** for grid computing instead of **SLURM** and has a job queue named `normal.q`, then add these lines:
|
kkonganti@1
|
188
|
kkonganti@1
|
189 \
|
kkonganti@1
|
190
|
kkonganti@1
|
191
|
kkonganti@1
|
192 ```groovy
|
kkonganti@1
|
193 your_institution {
|
kkonganti@1
|
194 process.executor = 'sge'
|
kkonganti@1
|
195 process.queue = 'normal.q'
|
kkonganti@1
|
196 singularity.enabled = false
|
kkonganti@1
|
197 singularity.autoMounts = true
|
kkonganti@1
|
198 docker.enabled = false
|
kkonganti@1
|
199 params.enable_conda = true
|
kkonganti@1
|
200 conda.enabled = true
|
kkonganti@1
|
201 conda.useMicromamba = true
|
kkonganti@1
|
202 params.enable_module = false
|
kkonganti@1
|
203 }
|
kkonganti@1
|
204 ```
|
kkonganti@1
|
205
|
kkonganti@1
|
206 In the above example, by default, all the software provisioning choices are disabled except `conda`. You can also choose to remove the `process.queue` line altogether and the `bettercallsal` workflow will request the appropriate memory and number of CPU cores automatically, which ranges from 1 CPU, 1 GB and 1 hour for job completion up to 10 CPU cores, 1 TB and 120 hours for job completion.
|
kkonganti@1
|
207
|
kkonganti@1
|
208 \
|
kkonganti@1
|
209
|
kkonganti@1
|
210
|
kkonganti@1
|
211 ### Cloud computing
|
kkonganti@1
|
212
|
kkonganti@1
|
213 ---
|
kkonganti@1
|
214
|
kkonganti@1
|
215 You can run the workflow in the cloud (works only with proper set up of AWS resources). Add new run time profiles with required parameters per [Nextflow docs](https://www.nextflow.io/docs/latest/executor.html):
|
kkonganti@1
|
216
|
kkonganti@1
|
217 \
|
kkonganti@1
|
218
|
kkonganti@1
|
219
|
kkonganti@1
|
220 Example:
|
kkonganti@1
|
221
|
kkonganti@1
|
222 ```groovy
|
kkonganti@1
|
223 my_aws_batch {
|
kkonganti@1
|
224 executor = 'awsbatch'
|
kkonganti@1
|
225 queue = 'my-batch-queue'
|
kkonganti@1
|
226 aws.batch.cliPath = '/home/ec2-user/miniconda/bin/aws'
|
kkonganti@1
|
227 aws.batch.region = 'us-east-1'
|
kkonganti@1
|
228 singularity.enabled = false
|
kkonganti@1
|
229 singularity.autoMounts = true
|
kkonganti@1
|
230 docker.enabled = true
|
kkonganti@1
|
231 params.conda_enabled = false
|
kkonganti@1
|
232 params.enable_module = false
|
kkonganti@1
|
233 }
|
kkonganti@1
|
234 ```
|
kkonganti@1
|
235
|
kkonganti@1
|
236 \
|
kkonganti@1
|
237
|
kkonganti@1
|
238
|
kkonganti@1
|
239 ### Example data
|
kkonganti@1
|
240
|
kkonganti@1
|
241 ---
|
kkonganti@1
|
242
|
kkonganti@1
|
243 After you make sure that you have all the [minimum requirements](#minimum-requirements) to run the workflow, you can try the `bettercallsal` pipeline on some simulated reads. The following input dataset contains simulated reads for `Montevideo` and `I 4,[5],12:i:-` in about roughly equal proportions.
|
kkonganti@1
|
244
|
kkonganti@1
|
245 - Download simulated reads: [S3](https://cfsan-pub-xfer.s3.amazonaws.com/Kranti.Konganti/bettercallsal/bettercallsal_sim_reads.tar.bz2) (~ 3 GB).
|
kkonganti@1
|
246 - Download pre-formatted test database: [S3](https://cfsan-pub-xfer.s3.amazonaws.com/Kranti.Konganti/bettercallsal/PDG000000002.2491.test-db.tar.bz2) (~ 75 MB). This test database works only with the simulated reads.
|
kkonganti@1
|
247 - Download pre-formatted full database (**Optional**): If you would like to do a complete run with your own **FASTQ** datasets, you can either create your own [database](./bettercallsal_db.md) or use [PDG000000002.2537](https://cfsan-pub-xfer.s3.amazonaws.com/Kranti.Konganti/bettercallsal/PDG000000002.2537.tar.bz2) version of the database (~ 37 GB).
|
kkonganti@1
|
248 - After succesful run of the workflow, your **MultiQC** report should look something like [this](https://cfsan-pub-xfer.s3.amazonaws.com/Kranti.Konganti/bettercallsal/bettercallsal_sim_reads_mqc.html).
|
kkonganti@1
|
249
|
kkonganti@1
|
250 Now run the workflow by ignoring quality values since these are simulated base qualities:
|
kkonganti@1
|
251
|
kkonganti@1
|
252 \
|
kkonganti@1
|
253
|
kkonganti@1
|
254
|
kkonganti@1
|
255 ```bash
|
kkonganti@1
|
256 cpipes \
|
kkonganti@1
|
257 --pipeline bettercallsal \
|
kkonganti@1
|
258 --input /path/to/bettercallsal_sim_reads \
|
kkonganti@1
|
259 --output /path/to/bettercallsal_sim_reads_output \
|
kkonganti@1
|
260 --bcs_root_dbdir /path/to/PDG000000002.2537
|
kkonganti@1
|
261 --kmaalign_ignorequals \
|
kkonganti@1
|
262 -profile stdkondagac \
|
kkonganti@1
|
263 -resume
|
kkonganti@1
|
264 ```
|
kkonganti@1
|
265
|
kkonganti@1
|
266 Please note that the run time profile `stdkondagac` will run jobs locally using `micromamba` for software provisioning. The first time you run the command, a new folder called `kondagac_cache` will be created and subsequent runs should use this `conda` cache.
|
kkonganti@1
|
267
|
kkonganti@1
|
268 \
|
kkonganti@1
|
269
|
kkonganti@1
|
270
|
kkonganti@1
|
271 ## Using `sourmash`
|
kkonganti@1
|
272
|
kkonganti@1
|
273 Beginning with `v0.3.0` of `bettercallsal` workflow, `sourmash` sketching is used to further narrow down possible serotype hits. It is **ON** by default. This will enable the generation of **ANI Containment** matrix for **Samples** vs **Genomes**. There may be multiple hits for the same serotype in the final **MultiQC** report as multiple genome accessions can belong to a single serotype.
|
kkonganti@1
|
274
|
kkonganti@1
|
275 You can turn **OFF** this feature with `--sourmashsketch_run false` option.
|
kkonganti@1
|
276
|
kkonganti@1
|
277 \
|
kkonganti@1
|
278
|
kkonganti@1
|
279
|
kkonganti@1
|
280 ## `bettercallsal` CLI Help
|
kkonganti@1
|
281
|
kkonganti@1
|
282 ```text
|
kkonganti@1
|
283 [Kranti_Konganti@my-unix-box ]$ cpipes --pipeline bettercallsal --help
|
kkonganti@1
|
284 N E X T F L O W ~ version 22.10.0
|
kkonganti@1
|
285 Launching `./bettercallsal/cpipes` [awesome_chandrasekhar] DSL2 - revision: 8da4e11078
|
kkonganti@1
|
286 ================================================================================
|
kkonganti@1
|
287 (o)
|
kkonganti@1
|
288 ___ _ __ _ _ __ ___ ___
|
kkonganti@1
|
289 / __|| '_ \ | || '_ \ / _ \/ __|
|
kkonganti@1
|
290 | (__ | |_) || || |_) || __/\__ \
|
kkonganti@1
|
291 \___|| .__/ |_|| .__/ \___||___/
|
kkonganti@1
|
292 | | | |
|
kkonganti@1
|
293 |_| |_|
|
kkonganti@1
|
294 --------------------------------------------------------------------------------
|
kkonganti@1
|
295 A collection of modular pipelines at CFSAN, FDA.
|
kkonganti@1
|
296 --------------------------------------------------------------------------------
|
kkonganti@1
|
297 Name : CPIPES
|
kkonganti@1
|
298 Author : Kranti Konganti
|
kkonganti@1
|
299 Version : 0.5.0
|
kkonganti@1
|
300 Center : CFSAN, FDA.
|
kkonganti@1
|
301 ================================================================================
|
kkonganti@1
|
302
|
kkonganti@1
|
303 Workflow : bettercallsal
|
kkonganti@1
|
304
|
kkonganti@1
|
305 Author : Kranti Konganti
|
kkonganti@1
|
306
|
kkonganti@1
|
307 Version : 0.5.0
|
kkonganti@1
|
308
|
kkonganti@1
|
309
|
kkonganti@1
|
310 Usage : cpipes --pipeline bettercallsal [options]
|
kkonganti@1
|
311
|
kkonganti@1
|
312
|
kkonganti@1
|
313 Required :
|
kkonganti@1
|
314
|
kkonganti@1
|
315 --input : Absolute path to directory containing FASTQ
|
kkonganti@1
|
316 files. The directory should contain only
|
kkonganti@1
|
317 FASTQ files as all the files within the
|
kkonganti@1
|
318 mentioned directory will be read. Ex: --
|
kkonganti@1
|
319 input /path/to/fastq_pass
|
kkonganti@1
|
320
|
kkonganti@1
|
321 --output : Absolute path to directory where all the
|
kkonganti@1
|
322 pipeline outputs should be stored. Ex: --
|
kkonganti@1
|
323 output /path/to/output
|
kkonganti@1
|
324
|
kkonganti@1
|
325 Other options :
|
kkonganti@1
|
326
|
kkonganti@1
|
327 --metadata : Absolute path to metadata CSV file
|
kkonganti@1
|
328 containing five mandatory columns: sample,
|
kkonganti@1
|
329 fq1,fq2,strandedness,single_end. The fq1
|
kkonganti@1
|
330 and fq2 columns contain absolute paths to
|
kkonganti@1
|
331 the FASTQ files. This option can be used in
|
kkonganti@1
|
332 place of --input option. This is rare. Ex
|
kkonganti@1
|
333 : --metadata samplesheet.csv
|
kkonganti@1
|
334
|
kkonganti@1
|
335 --fq_suffix : The suffix of FASTQ files (Unpaired reads
|
kkonganti@1
|
336 or R1 reads or Long reads) if an input
|
kkonganti@1
|
337 directory is mentioned via --input option.
|
kkonganti@1
|
338 Default: .fastq.gz
|
kkonganti@1
|
339
|
kkonganti@1
|
340 --fq2_suffix : The suffix of FASTQ files (Paired-end reads
|
kkonganti@1
|
341 or R2 reads) if an input directory is
|
kkonganti@1
|
342 mentioned via --input option. Default:
|
kkonganti@1
|
343 _R2_001.fastq.gz
|
kkonganti@1
|
344
|
kkonganti@1
|
345 --fq_filter_by_len : Remove FASTQ reads that are less than this
|
kkonganti@1
|
346 many bases. Default: 0
|
kkonganti@1
|
347
|
kkonganti@1
|
348 --fq_strandedness : The strandedness of the sequencing run.
|
kkonganti@1
|
349 This is mostly needed if your sequencing
|
kkonganti@1
|
350 run is RNA-SEQ. For most of the other runs
|
kkonganti@1
|
351 , it is probably safe to use unstranded for
|
kkonganti@1
|
352 the option. Default: unstranded
|
kkonganti@1
|
353
|
kkonganti@1
|
354 --fq_single_end : SINGLE-END information will be auto-
|
kkonganti@1
|
355 detected but this option forces PAIRED-END
|
kkonganti@1
|
356 FASTQ files to be treated as SINGLE-END so
|
kkonganti@1
|
357 only read 1 information is included in auto
|
kkonganti@1
|
358 -generated samplesheet. Default: true
|
kkonganti@1
|
359
|
kkonganti@1
|
360 --fq_filename_delim : Delimiter by which the file name is split
|
kkonganti@1
|
361 to obtain sample name. Default: _
|
kkonganti@1
|
362
|
kkonganti@1
|
363 --fq_filename_delim_idx : After splitting FASTQ file name by using
|
kkonganti@1
|
364 the --fq_filename_delim option, all
|
kkonganti@1
|
365 elements before this index (1-based) will
|
kkonganti@1
|
366 be joined to create final sample name.
|
kkonganti@1
|
367 Default: 1
|
kkonganti@1
|
368
|
kkonganti@1
|
369 --bcs_concat_pe : Concatenate paired-end files. Default: true
|
kkonganti@1
|
370
|
kkonganti@1
|
371 --bbmerge_run : Run BBMerge tool. Default: false
|
kkonganti@1
|
372
|
kkonganti@1
|
373 --bbmerge_reads : Quit after this many read pairs (-1 means
|
kkonganti@1
|
374 all) Default: -1
|
kkonganti@1
|
375
|
kkonganti@1
|
376 --bbmerge_adapters : Absolute UNIX path pointing to the adapters
|
kkonganti@1
|
377 file in FASTA format. Default: false
|
kkonganti@1
|
378
|
kkonganti@1
|
379 --bbmerge_ziplevel : Set to 1 (lowest) through 9 (max) to change
|
kkonganti@1
|
380 compression level; lower compression is
|
kkonganti@1
|
381 faster. Default: 1
|
kkonganti@1
|
382
|
kkonganti@1
|
383 --bbmerge_ordered : Output reads in the same order as input.
|
kkonganti@1
|
384 Default: false
|
kkonganti@1
|
385
|
kkonganti@1
|
386 --bbmerge_qtrim : Trim read ends to remove bases with quality
|
kkonganti@1
|
387 below --bbmerge_minq. Trims BEFORE merging
|
kkonganti@1
|
388 . Values: t (trim both ends), f (neither
|
kkonganti@1
|
389 end), r (right end only), l (left end only
|
kkonganti@1
|
390 ). Default: true
|
kkonganti@1
|
391
|
kkonganti@1
|
392 --bbmerge_qtrim2 : May be specified instead of --bbmerge_qtrim
|
kkonganti@1
|
393 to perform trimming only if merging is
|
kkonganti@1
|
394 unsuccesful. then retry merging. Default:
|
kkonganti@1
|
395 false
|
kkonganti@1
|
396
|
kkonganti@1
|
397 --bbmerge_trimq : Trim quality threshold. This may be comma-
|
kkonganti@1
|
398 delimited list (ascending) to try multiple
|
kkonganti@1
|
399 values. Default: 10
|
kkonganti@1
|
400
|
kkonganti@1
|
401 --bbmerge_minlength : (ml) Reads shorter than this after trimming
|
kkonganti@1
|
402 , but before merging, will be discarded.
|
kkonganti@1
|
403 Pairs will be discarded onlyif both are
|
kkonganti@1
|
404 shorter. Default: 1
|
kkonganti@1
|
405
|
kkonganti@1
|
406 --bbmerge_tbo : (trimbyoverlap). Trim overlapping reads to
|
kkonganti@1
|
407 remove right most (3') non-overlaping
|
kkonganti@1
|
408 portion instead of joining Default: false
|
kkonganti@1
|
409
|
kkonganti@1
|
410 --bbmerge_minavgquality : (maq). Reads with average quality below
|
kkonganti@1
|
411 this after trimming will not be attempted
|
kkonganti@1
|
412 to merge. Default: 30
|
kkonganti@1
|
413
|
kkonganti@1
|
414 --bbmerge_trimpolya : Trim trailing poly-A tail from adapter
|
kkonganti@1
|
415 output. Only affects outadapter. This also
|
kkonganti@1
|
416 trims poly-A followed by poly-G, which
|
kkonganti@1
|
417 occurs on NextSeq. Default: true
|
kkonganti@1
|
418
|
kkonganti@1
|
419 --bbmerge_pfilter : Ban improbable overlaps. Higher is more
|
kkonganti@1
|
420 strict. 0 will disable the filter; 1 will
|
kkonganti@1
|
421 allow only perfect overlaps. Default: 1
|
kkonganti@1
|
422
|
kkonganti@1
|
423 --bbmerge_ouq : Calculate best overlap using quality values
|
kkonganti@1
|
424 . Default: false
|
kkonganti@1
|
425
|
kkonganti@1
|
426 --bbmerge_owq : Calculate best overlap without using
|
kkonganti@1
|
427 quality values. Default: true
|
kkonganti@1
|
428
|
kkonganti@1
|
429 --bbmerge_strict : Decrease false positive rate and merging
|
kkonganti@1
|
430 rate. Default: false
|
kkonganti@1
|
431
|
kkonganti@1
|
432 --bbmerge_verystrict : Greatly decrease false positive rate and
|
kkonganti@1
|
433 merging rate. Default: false
|
kkonganti@1
|
434
|
kkonganti@1
|
435 --bbmerge_ultrastrict : Decrease false positive rate and merging
|
kkonganti@1
|
436 rate even more. Default: true
|
kkonganti@1
|
437
|
kkonganti@1
|
438 --bbmerge_maxstrict : Maxiamally decrease false positive rate and
|
kkonganti@1
|
439 merging rate. Default: false
|
kkonganti@1
|
440
|
kkonganti@1
|
441 --bbmerge_loose : Increase false positive rate and merging
|
kkonganti@1
|
442 rate. Default: false
|
kkonganti@1
|
443
|
kkonganti@1
|
444 --bbmerge_veryloose : Greatly increase false positive rate and
|
kkonganti@1
|
445 merging rate. Default: false
|
kkonganti@1
|
446
|
kkonganti@1
|
447 --bbmerge_ultraloose : Increase false positive rate and merging
|
kkonganti@1
|
448 rate even more. Default: false
|
kkonganti@1
|
449
|
kkonganti@1
|
450 --bbmerge_maxloose : Maximally increase false positive rate and
|
kkonganti@1
|
451 merging rate. Default: false
|
kkonganti@1
|
452
|
kkonganti@1
|
453 --bbmerge_fast : Fastest possible preset. Default: false
|
kkonganti@1
|
454
|
kkonganti@1
|
455 --bbmerge_k : Kmer length. 31 (or less) is fastest and
|
kkonganti@1
|
456 uses the least memory, but higher values
|
kkonganti@1
|
457 may be more accurate. 60 tends to work well
|
kkonganti@1
|
458 for 150bp reads. Default: 60
|
kkonganti@1
|
459
|
kkonganti@1
|
460 --bbmerge_prealloc : Pre-allocate memory rather than dynamically
|
kkonganti@1
|
461 growing. Faster and more memory-efficient
|
kkonganti@1
|
462 for large datasets. A float fraction (0-1)
|
kkonganti@1
|
463 may be specified, default 1. Default: true
|
kkonganti@1
|
464
|
kkonganti@1
|
465 --fastp_run : Run fastp tool. Default: true
|
kkonganti@1
|
466
|
kkonganti@1
|
467 --fastp_failed_out : Specify whether to store reads that cannot
|
kkonganti@1
|
468 pass the filters. Default: false
|
kkonganti@1
|
469
|
kkonganti@1
|
470 --fastp_merged_out : Specify whether to store merged output or
|
kkonganti@1
|
471 not. Default: false
|
kkonganti@1
|
472
|
kkonganti@1
|
473 --fastp_overlapped_out : For each read pair, output the overlapped
|
kkonganti@1
|
474 region if it has no mismatched base.
|
kkonganti@1
|
475 Default: false
|
kkonganti@1
|
476
|
kkonganti@1
|
477 --fastp_6 : Indicate that the input is using phred64
|
kkonganti@1
|
478 scoring (it'll be converted to phred33, so
|
kkonganti@1
|
479 the output will still be phred33). Default
|
kkonganti@1
|
480 : false
|
kkonganti@1
|
481
|
kkonganti@1
|
482 --fastp_reads_to_process : Specify how many reads/pairs are to be
|
kkonganti@1
|
483 processed. Default value 0 means process
|
kkonganti@1
|
484 all reads. Default: 0
|
kkonganti@1
|
485
|
kkonganti@1
|
486 --fastp_fix_mgi_id : The MGI FASTQ ID format is not compatible
|
kkonganti@1
|
487 with many BAM operation tools, enable this
|
kkonganti@1
|
488 option to fix it. Default: false
|
kkonganti@1
|
489
|
kkonganti@1
|
490 --fastp_A : Disable adapter trimming. On by default.
|
kkonganti@1
|
491 Default: false
|
kkonganti@1
|
492
|
kkonganti@1
|
493 --fastp_adapter_fasta : Specify a FASTA file to trim both read1 and
|
kkonganti@1
|
494 read2 (if PE) by all the sequences in this
|
kkonganti@1
|
495 FASTA file. Default: false
|
kkonganti@1
|
496
|
kkonganti@1
|
497 --fastp_f : Trim how many bases in front of read1.
|
kkonganti@1
|
498 Default: 0
|
kkonganti@1
|
499
|
kkonganti@1
|
500 --fastp_t : Trim how many bases at the end of read1.
|
kkonganti@1
|
501 Default: 0
|
kkonganti@1
|
502
|
kkonganti@1
|
503 --fastp_b : Max length of read1 after trimming. Default
|
kkonganti@1
|
504 : 0
|
kkonganti@1
|
505
|
kkonganti@1
|
506 --fastp_F : Trim how many bases in front of read2.
|
kkonganti@1
|
507 Default: 0
|
kkonganti@1
|
508
|
kkonganti@1
|
509 --fastp_T : Trim how many bases at the end of read2.
|
kkonganti@1
|
510 Default: 0
|
kkonganti@1
|
511
|
kkonganti@1
|
512 --fastp_B : Max length of read2 after trimming. Default
|
kkonganti@1
|
513 : 0
|
kkonganti@1
|
514
|
kkonganti@1
|
515 --fastp_dedup : Enable deduplication to drop the duplicated
|
kkonganti@1
|
516 reads/pairs. Default: true
|
kkonganti@1
|
517
|
kkonganti@1
|
518 --fastp_dup_calc_accuracy : Accuracy level to calculate duplication (1~
|
kkonganti@1
|
519 6), higher level uses more memory (1G, 2G,
|
kkonganti@1
|
520 4G, 8G, 16G, 24G). Default 1 for no-dedup
|
kkonganti@1
|
521 mode, and 3 for dedup mode. Default: 6
|
kkonganti@1
|
522
|
kkonganti@1
|
523 --fastp_poly_g_min_len : The minimum length to detect polyG in the
|
kkonganti@1
|
524 read tail. Default: 10
|
kkonganti@1
|
525
|
kkonganti@1
|
526 --fastp_G : Disable polyG tail trimming. Default: true
|
kkonganti@1
|
527
|
kkonganti@1
|
528 --fastp_x : Enable polyX trimming in 3' ends. Default:
|
kkonganti@1
|
529 false
|
kkonganti@1
|
530
|
kkonganti@1
|
531 --fastp_poly_x_min_len : The minimum length to detect polyX in the
|
kkonganti@1
|
532 read tail. Default: 10
|
kkonganti@1
|
533
|
kkonganti@1
|
534 --fastp_cut_front : Move a sliding window from front (5') to
|
kkonganti@1
|
535 tail, drop the bases in the window if its
|
kkonganti@1
|
536 mean quality < threshold, stop otherwise.
|
kkonganti@1
|
537 Default: true
|
kkonganti@1
|
538
|
kkonganti@1
|
539 --fastp_cut_tail : Move a sliding window from tail (3') to
|
kkonganti@1
|
540 front, drop the bases in the window if its
|
kkonganti@1
|
541 mean quality < threshold, stop otherwise.
|
kkonganti@1
|
542 Default: false
|
kkonganti@1
|
543
|
kkonganti@1
|
544 --fastp_cut_right : Move a sliding window from tail, drop the
|
kkonganti@1
|
545 bases in the window and the right part if
|
kkonganti@1
|
546 its mean quality < threshold, and then stop
|
kkonganti@1
|
547 . Default: true
|
kkonganti@1
|
548
|
kkonganti@1
|
549 --fastp_W : Sliding window size shared by --
|
kkonganti@1
|
550 fastp_cut_front, --fastp_cut_tail and --
|
kkonganti@1
|
551 fastp_cut_right. Default: 20
|
kkonganti@1
|
552
|
kkonganti@1
|
553 --fastp_M : The mean quality requirement shared by --
|
kkonganti@1
|
554 fastp_cut_front, --fastp_cut_tail and --
|
kkonganti@1
|
555 fastp_cut_right. Default: 30
|
kkonganti@1
|
556
|
kkonganti@1
|
557 --fastp_q : The quality value below which a base should
|
kkonganti@1
|
558 is not qualified. Default: 30
|
kkonganti@1
|
559
|
kkonganti@1
|
560 --fastp_u : What percent of bases are allowed to be
|
kkonganti@1
|
561 unqualified. Default: 40
|
kkonganti@1
|
562
|
kkonganti@1
|
563 --fastp_n : How many N's can a read have. Default: 5
|
kkonganti@1
|
564
|
kkonganti@1
|
565 --fastp_e : If the full reads' average quality is below
|
kkonganti@1
|
566 this value, then it is discarded. Default
|
kkonganti@1
|
567 : 0
|
kkonganti@1
|
568
|
kkonganti@1
|
569 --fastp_l : Reads shorter than this length will be
|
kkonganti@1
|
570 discarded. Default: 35
|
kkonganti@1
|
571
|
kkonganti@1
|
572 --fastp_max_len : Reads longer than this length will be
|
kkonganti@1
|
573 discarded. Default: 0
|
kkonganti@1
|
574
|
kkonganti@1
|
575 --fastp_y : Enable low complexity filter. The
|
kkonganti@1
|
576 complexity is defined as the percentage of
|
kkonganti@1
|
577 bases that are different from its next base
|
kkonganti@1
|
578 (base[i] != base[i+1]). Default: true
|
kkonganti@1
|
579
|
kkonganti@1
|
580 --fastp_Y : The threshold for low complexity filter (0~
|
kkonganti@1
|
581 100). Ex: A value of 30 means 30%
|
kkonganti@1
|
582 complexity is required. Default: 30
|
kkonganti@1
|
583
|
kkonganti@1
|
584 --fastp_U : Enable Unique Molecular Identifier (UMI)
|
kkonganti@1
|
585 pre-processing. Default: false
|
kkonganti@1
|
586
|
kkonganti@1
|
587 --fastp_umi_loc : Specify the location of UMI, can be one of
|
kkonganti@1
|
588 index1/index2/read1/read2/per_index/
|
kkonganti@1
|
589 per_read. Default: false
|
kkonganti@1
|
590
|
kkonganti@1
|
591 --fastp_umi_len : If the UMI is in read1 or read2, its length
|
kkonganti@1
|
592 should be provided. Default: false
|
kkonganti@1
|
593
|
kkonganti@1
|
594 --fastp_umi_prefix : If specified, an underline will be used to
|
kkonganti@1
|
595 connect prefix and UMI (i.e. prefix=UMI,
|
kkonganti@1
|
596 UMI=AATTCG, final=UMI_AATTCG). Default:
|
kkonganti@1
|
597 false
|
kkonganti@1
|
598
|
kkonganti@1
|
599 --fastp_umi_skip : If the UMI is in read1 or read2, fastp can
|
kkonganti@1
|
600 skip several bases following the UMI.
|
kkonganti@1
|
601 Default: false
|
kkonganti@1
|
602
|
kkonganti@1
|
603 --fastp_p : Enable overrepresented sequence analysis.
|
kkonganti@1
|
604 Default: true
|
kkonganti@1
|
605
|
kkonganti@1
|
606 --fastp_P : One in this many number of reads will be
|
kkonganti@1
|
607 computed for overrepresentation analysis (1
|
kkonganti@1
|
608 ~10000), smaller is slower. Default: 20
|
kkonganti@1
|
609
|
kkonganti@1
|
610 --fastp_use_custom_adapaters : Use custom adapter FASTA with fastp on top
|
kkonganti@1
|
611 of built-in adapter sequence auto-detection
|
kkonganti@1
|
612 . Enabling this option will attempt to find
|
kkonganti@1
|
613 and remove all possible Illumina adapter
|
kkonganti@1
|
614 and primer sequences but will make the
|
kkonganti@1
|
615 workflow run slow. Default: false
|
kkonganti@1
|
616
|
kkonganti@1
|
617 --mashscreen_run : Run `mash screen` tool. Default: true
|
kkonganti@1
|
618
|
kkonganti@1
|
619 --mashscreen_w : Winner-takes-all strategy for identity
|
kkonganti@1
|
620 estimates. After counting hashes for each
|
kkonganti@1
|
621 query, hashes that appear in multiple
|
kkonganti@1
|
622 queries will be removed from all except the
|
kkonganti@1
|
623 one with the best identity (ties broken by
|
kkonganti@1
|
624 larger query), and other identities will
|
kkonganti@1
|
625 be reduced. This removes output redundancy
|
kkonganti@1
|
626 , providing a rough compositional outline
|
kkonganti@1
|
627 . Default: false
|
kkonganti@1
|
628
|
kkonganti@1
|
629 --mashscreen_i : Minimum identity to report. Inclusive
|
kkonganti@1
|
630 unless set to zero, in which case only
|
kkonganti@1
|
631 identities greater than zero (i.e. with at
|
kkonganti@1
|
632 least one shared hash) will be reported.
|
kkonganti@1
|
633 Set to -1 to output everything. (-1-1).
|
kkonganti@1
|
634 Default: false
|
kkonganti@1
|
635
|
kkonganti@1
|
636 --mashscreen_v : Maximum p-value to report (0-1). Default:
|
kkonganti@1
|
637 false
|
kkonganti@1
|
638
|
kkonganti@1
|
639 --tuspy_run : Run the get_top_unique_mash_hits_genomes.py
|
kkonganti@1
|
640 script. Default: true
|
kkonganti@1
|
641
|
kkonganti@1
|
642 --tuspy_s : Absolute UNIX path to metadata text file
|
kkonganti@1
|
643 with the field separator, | and 5 fields:
|
kkonganti@1
|
644 serotype|asm_lvl|asm_url|snp_cluster_idEx:
|
kkonganti@1
|
645 serotype=Derby,antigen_formula=4:f,g:-|
|
kkonganti@1
|
646 Scaffold|402440|ftp://...|PDS000096654.2.
|
kkonganti@1
|
647 Mentioning this option will create a pickle
|
kkonganti@1
|
648 file for the provided metadata and exits.
|
kkonganti@1
|
649 Default: false
|
kkonganti@1
|
650
|
kkonganti@1
|
651 --tuspy_m : Absolute UNIX path to mash screen results
|
kkonganti@1
|
652 file. Default: false
|
kkonganti@1
|
653
|
kkonganti@1
|
654 --tuspy_ps : Absolute UNIX Path to serialized metadata
|
kkonganti@1
|
655 object in a pickle file. Default: /hpc/db/
|
kkonganti@1
|
656 bettercallsal/latest/index_metadata/
|
kkonganti@1
|
657 per_snp_cluster.ACC2SERO.pickle
|
kkonganti@1
|
658
|
kkonganti@1
|
659 --tuspy_gd : Absolute UNIX Path to directory containing
|
kkonganti@1
|
660 gzipped genome FASTA files. Default: /hpc/
|
kkonganti@1
|
661 db/bettercallsal/latest/scaffold_genomes
|
kkonganti@1
|
662
|
kkonganti@1
|
663 --tuspy_gds : Genome FASTA file suffix to search for in
|
kkonganti@1
|
664 the genome directory. Default:
|
kkonganti@1
|
665 _scaffolded_genomic.fna.gz
|
kkonganti@1
|
666
|
kkonganti@1
|
667 --tuspy_n : Return up to this many number of top N
|
kkonganti@1
|
668 unique genome accession hits. Default: 10
|
kkonganti@1
|
669
|
kkonganti@1
|
670 --sourmashsketch_run : Run `sourmash sketch dna` tool. Default:
|
kkonganti@1
|
671 true
|
kkonganti@1
|
672
|
kkonganti@1
|
673 --sourmashsketch_mode : Select which type of signatures to be
|
kkonganti@1
|
674 created: dna, protein, fromfile or
|
kkonganti@1
|
675 translate. Default: dna
|
kkonganti@1
|
676
|
kkonganti@1
|
677 --sourmashsketch_p : Signature parameters to use. Default: abund
|
kkonganti@1
|
678 ,scaled=1000,k=51,k=61,k=71
|
kkonganti@1
|
679
|
kkonganti@1
|
680 --sourmashsketch_file : <path> A text file containing a list of
|
kkonganti@1
|
681 sequence files to load. Default: false
|
kkonganti@1
|
682
|
kkonganti@1
|
683 --sourmashsketch_f : Recompute signatures even if the file
|
kkonganti@1
|
684 exists. Default: false
|
kkonganti@1
|
685
|
kkonganti@1
|
686 --sourmashsketch_merge : Merge all input files into one signature
|
kkonganti@1
|
687 file with the specified name. Default:
|
kkonganti@1
|
688 false
|
kkonganti@1
|
689
|
kkonganti@1
|
690 --sourmashsketch_singleton : Compute a signature for each sequence
|
kkonganti@1
|
691 record individually. Default: true
|
kkonganti@1
|
692
|
kkonganti@1
|
693 --sourmashsketch_name : Name the signature generated from each file
|
kkonganti@1
|
694 after the first record in the file.
|
kkonganti@1
|
695 Default: false
|
kkonganti@1
|
696
|
kkonganti@1
|
697 --sourmashsketch_randomize : Shuffle the list of input files randomly.
|
kkonganti@1
|
698 Default: false
|
kkonganti@1
|
699
|
kkonganti@1
|
700 --sourmashgather_run : Run `sourmash gather` tool. Default: true
|
kkonganti@1
|
701
|
kkonganti@1
|
702 --sourmashgather_n : Number of results to report. By default,
|
kkonganti@1
|
703 will terminate at --sourmashgather_thr_bp
|
kkonganti@1
|
704 value. Default: false
|
kkonganti@1
|
705
|
kkonganti@1
|
706 --sourmashgather_thr_bp : Reporting threshold (in bp) for estimated
|
kkonganti@1
|
707 overlap with remaining query. Default:
|
kkonganti@1
|
708 false
|
kkonganti@1
|
709
|
kkonganti@1
|
710 --sourmashgather_ignoreabn : Do NOT use k-mer abundances if present.
|
kkonganti@1
|
711 Default: false
|
kkonganti@1
|
712
|
kkonganti@1
|
713 --sourmashgather_prefetch : Use prefetch before gather. Default: false
|
kkonganti@1
|
714
|
kkonganti@1
|
715 --sourmashgather_noprefetch : Do not use prefetch before gather. Default
|
kkonganti@1
|
716 : false
|
kkonganti@1
|
717
|
kkonganti@1
|
718 --sourmashgather_ani_ci : Output confidence intervals for ANI
|
kkonganti@1
|
719 estimates. Default: true
|
kkonganti@1
|
720
|
kkonganti@1
|
721 --sourmashgather_k : The k-mer size to select. Default: 71
|
kkonganti@1
|
722
|
kkonganti@1
|
723 --sourmashgather_protein : Choose a protein signature. Default: false
|
kkonganti@1
|
724
|
kkonganti@1
|
725 --sourmashgather_noprotein : Do not choose a protein signature. Default
|
kkonganti@1
|
726 : false
|
kkonganti@1
|
727
|
kkonganti@1
|
728 --sourmashgather_dayhoff : Choose Dayhoff-encoded amino acid
|
kkonganti@1
|
729 signatures. Default: false
|
kkonganti@1
|
730
|
kkonganti@1
|
731 --sourmashgather_nodayhoff : Do not choose Dayhoff-encoded amino acid
|
kkonganti@1
|
732 signatures. Default: false
|
kkonganti@1
|
733
|
kkonganti@1
|
734 --sourmashgather_hp : Choose hydrophobic-polar-encoded amino acid
|
kkonganti@1
|
735 signatures. Default: false
|
kkonganti@1
|
736
|
kkonganti@1
|
737 --sourmashgather_nohp : Do not choose hydrophobic-polar-encoded
|
kkonganti@1
|
738 amino acid signatures. Default: false
|
kkonganti@1
|
739
|
kkonganti@1
|
740 --sourmashgather_dna : Choose DNA signature. Default: true
|
kkonganti@1
|
741
|
kkonganti@1
|
742 --sourmashgather_nodna : Do not choose DNA signature. Default: false
|
kkonganti@1
|
743
|
kkonganti@1
|
744 --sourmashgather_scaled : Scaled value should be between 100 and 1e6
|
kkonganti@1
|
745 . Default: false
|
kkonganti@1
|
746
|
kkonganti@1
|
747 --sourmashgather_inc_pat : Search only signatures that match this
|
kkonganti@1
|
748 pattern in name, filename, or md5. Default
|
kkonganti@1
|
749 : false
|
kkonganti@1
|
750
|
kkonganti@1
|
751 --sourmashgather_exc_pat : Search only signatures that do not match
|
kkonganti@1
|
752 this pattern in name, filename, or md5.
|
kkonganti@1
|
753 Default: false
|
kkonganti@1
|
754
|
kkonganti@1
|
755 --sourmashsearch_run : Run `sourmash search` tool. Default: false
|
kkonganti@1
|
756
|
kkonganti@1
|
757 --sourmashsearch_n : Number of results to report. By default,
|
kkonganti@1
|
758 will terminate at --sourmashsearch_thr
|
kkonganti@1
|
759 value. Default: false
|
kkonganti@1
|
760
|
kkonganti@1
|
761 --sourmashsearch_thr : Reporting threshold (similarity) to return
|
kkonganti@1
|
762 results. Default: 0
|
kkonganti@1
|
763
|
kkonganti@1
|
764 --sourmashsearch_contain : Score based on containment rather than
|
kkonganti@1
|
765 similarity. Default: false
|
kkonganti@1
|
766
|
kkonganti@1
|
767 --sourmashsearch_maxcontain : Score based on max containment rather than
|
kkonganti@1
|
768 similarity. Default: false
|
kkonganti@1
|
769
|
kkonganti@1
|
770 --sourmashsearch_ignoreabn : Do NOT use k-mer abundances if present.
|
kkonganti@1
|
771 Default: true
|
kkonganti@1
|
772
|
kkonganti@1
|
773 --sourmashsearch_ani_ci : Output confidence intervals for ANI
|
kkonganti@1
|
774 estimates. Default: false
|
kkonganti@1
|
775
|
kkonganti@1
|
776 --sourmashsearch_k : The k-mer size to select. Default: 71
|
kkonganti@1
|
777
|
kkonganti@1
|
778 --sourmashsearch_protein : Choose a protein signature. Default: false
|
kkonganti@1
|
779
|
kkonganti@1
|
780 --sourmashsearch_noprotein : Do not choose a protein signature. Default
|
kkonganti@1
|
781 : false
|
kkonganti@1
|
782
|
kkonganti@1
|
783 --sourmashsearch_dayhoff : Choose Dayhoff-encoded amino acid
|
kkonganti@1
|
784 signatures. Default: false
|
kkonganti@1
|
785
|
kkonganti@1
|
786 --sourmashsearch_nodayhoff : Do not choose Dayhoff-encoded amino acid
|
kkonganti@1
|
787 signatures. Default: false
|
kkonganti@1
|
788
|
kkonganti@1
|
789 --sourmashsearch_hp : Choose hydrophobic-polar-encoded amino acid
|
kkonganti@1
|
790 signatures. Default: false
|
kkonganti@1
|
791
|
kkonganti@1
|
792 --sourmashsearch_nohp : Do not choose hydrophobic-polar-encoded
|
kkonganti@1
|
793 amino acid signatures. Default: false
|
kkonganti@1
|
794
|
kkonganti@1
|
795 --sourmashsearch_dna : Choose DNA signature. Default: true
|
kkonganti@1
|
796
|
kkonganti@1
|
797 --sourmashsearch_nodna : Do not choose DNA signature. Default: false
|
kkonganti@1
|
798
|
kkonganti@1
|
799 --sourmashsearch_scaled : Scaled value should be between 100 and 1e6
|
kkonganti@1
|
800 . Default: false
|
kkonganti@1
|
801
|
kkonganti@1
|
802 --sourmashsearch_inc_pat : Search only signatures that match this
|
kkonganti@1
|
803 pattern in name, filename, or md5. Default
|
kkonganti@1
|
804 : false
|
kkonganti@1
|
805
|
kkonganti@1
|
806 --sourmashsearch_exc_pat : Search only signatures that do not match
|
kkonganti@1
|
807 this pattern in name, filename, or md5.
|
kkonganti@1
|
808 Default: false
|
kkonganti@1
|
809
|
kkonganti@1
|
810 --sfhpy_run : Run the sourmash_filter_hits.py script.
|
kkonganti@1
|
811 Default: true
|
kkonganti@1
|
812
|
kkonganti@1
|
813 --sfhpy_fcn : Column name by which filtering of rows
|
kkonganti@1
|
814 should be applied. Default: f_match
|
kkonganti@1
|
815
|
kkonganti@1
|
816 --sfhpy_fcv : Remove genomes whose match with the query
|
kkonganti@1
|
817 FASTQ is less than this much. Default: 0.1
|
kkonganti@1
|
818
|
kkonganti@1
|
819 --sfhpy_gt : Apply greather than or equal to condition
|
kkonganti@1
|
820 on numeric values of --sfhpy_fcn column.
|
kkonganti@1
|
821 Default: true
|
kkonganti@1
|
822
|
kkonganti@1
|
823 --sfhpy_lt : Apply less than or equal to condition on
|
kkonganti@1
|
824 numeric values of --sfhpy_fcn column.
|
kkonganti@1
|
825 Default: false
|
kkonganti@1
|
826
|
kkonganti@1
|
827 --kmaindex_run : Run kma index tool. Default: true
|
kkonganti@1
|
828
|
kkonganti@1
|
829 --kmaindex_t_db : Add to existing DB. Default: false
|
kkonganti@1
|
830
|
kkonganti@1
|
831 --kmaindex_k : k-mer size. Default: 31
|
kkonganti@1
|
832
|
kkonganti@1
|
833 --kmaindex_m : Minimizer size. Default: false
|
kkonganti@1
|
834
|
kkonganti@1
|
835 --kmaindex_hc : Homopolymer compression. Default: false
|
kkonganti@1
|
836
|
kkonganti@1
|
837 --kmaindex_ML : Minimum length of templates. Defaults to --
|
kkonganti@1
|
838 kmaindex_k Default: false
|
kkonganti@1
|
839
|
kkonganti@1
|
840 --kmaindex_ME : Mega DB. Default: false
|
kkonganti@1
|
841
|
kkonganti@1
|
842 --kmaindex_Sparse : Make Sparse DB. Default: false
|
kkonganti@1
|
843
|
kkonganti@1
|
844 --kmaindex_ht : Homology template. Default: false
|
kkonganti@1
|
845
|
kkonganti@1
|
846 --kmaindex_hq : Homology query. Default: false
|
kkonganti@1
|
847
|
kkonganti@1
|
848 --kmaindex_and : Both homology thresholds have to reach.
|
kkonganti@1
|
849 Default: false
|
kkonganti@1
|
850
|
kkonganti@1
|
851 --kmaindex_nbp : No bias print. Default: false
|
kkonganti@1
|
852
|
kkonganti@1
|
853 --kmaalign_run : Run kma tool. Default: true
|
kkonganti@1
|
854
|
kkonganti@1
|
855 --kmaalign_int : Input file has interleaved reads. Default
|
kkonganti@1
|
856 : false
|
kkonganti@1
|
857
|
kkonganti@1
|
858 --kmaalign_ef : Output additional features. Default: false
|
kkonganti@1
|
859
|
kkonganti@1
|
860 --kmaalign_vcf : Output vcf file. 2 to apply FT. Default:
|
kkonganti@1
|
861 false
|
kkonganti@1
|
862
|
kkonganti@1
|
863 --kmaalign_sam : Output SAM, 4/2096 for mapped/aligned.
|
kkonganti@1
|
864 Default: false
|
kkonganti@1
|
865
|
kkonganti@1
|
866 --kmaalign_nc : No consensus file. Default: true
|
kkonganti@1
|
867
|
kkonganti@1
|
868 --kmaalign_na : No aln file. Default: true
|
kkonganti@1
|
869
|
kkonganti@1
|
870 --kmaalign_nf : No frag file. Default: true
|
kkonganti@1
|
871
|
kkonganti@1
|
872 --kmaalign_a : Output all template mappings. Default:
|
kkonganti@1
|
873 false
|
kkonganti@1
|
874
|
kkonganti@1
|
875 --kmaalign_and : Use both -mrs and p-value on consensus.
|
kkonganti@1
|
876 Default: false
|
kkonganti@1
|
877
|
kkonganti@1
|
878 --kmaalign_oa : Use neither -mrs or p-value on consensus.
|
kkonganti@1
|
879 Default: false
|
kkonganti@1
|
880
|
kkonganti@1
|
881 --kmaalign_bc : Minimum support to call bases. Default:
|
kkonganti@1
|
882 false
|
kkonganti@1
|
883
|
kkonganti@1
|
884 --kmaalign_bcNano : Altered indel calling for ONT data. Default
|
kkonganti@1
|
885 : false
|
kkonganti@1
|
886
|
kkonganti@1
|
887 --kmaalign_bcd : Minimum depth to call bases. Default: false
|
kkonganti@1
|
888
|
kkonganti@1
|
889 --kmaalign_bcg : Maintain insignificant gaps. Default: false
|
kkonganti@1
|
890
|
kkonganti@1
|
891 --kmaalign_ID : Minimum consensus ID. Default: false
|
kkonganti@1
|
892
|
kkonganti@1
|
893 --kmaalign_md : Minimum depth. Default: false
|
kkonganti@1
|
894
|
kkonganti@1
|
895 --kmaalign_dense : Skip insertion in consensus. Default: false
|
kkonganti@1
|
896
|
kkonganti@1
|
897 --kmaalign_ref_fsa : Use Ns on indels. Default: false
|
kkonganti@1
|
898
|
kkonganti@1
|
899 --kmaalign_Mt1 : Map everything to one template. Default:
|
kkonganti@1
|
900 false
|
kkonganti@1
|
901
|
kkonganti@1
|
902 --kmaalign_1t1 : Map one query to one template. Default:
|
kkonganti@1
|
903 false
|
kkonganti@1
|
904
|
kkonganti@1
|
905 --kmaalign_mrs : Minimum relative alignment score. Default:
|
kkonganti@1
|
906 false
|
kkonganti@1
|
907
|
kkonganti@1
|
908 --kmaalign_mrc : Minimum query coverage. Default: 0.99
|
kkonganti@1
|
909
|
kkonganti@1
|
910 --kmaalign_mp : Minimum phred score of trailing and leading
|
kkonganti@1
|
911 bases. Default: 30
|
kkonganti@1
|
912
|
kkonganti@1
|
913 --kmaalign_mq : Set the minimum mapping quality. Default:
|
kkonganti@1
|
914 false
|
kkonganti@1
|
915
|
kkonganti@1
|
916 --kmaalign_eq : Minimum average quality score. Default: 30
|
kkonganti@1
|
917
|
kkonganti@1
|
918 --kmaalign_5p : Trim 5 prime by this many bases. Default:
|
kkonganti@1
|
919 false
|
kkonganti@1
|
920
|
kkonganti@1
|
921 --kmaalign_3p : Trim 3 prime by this many bases Default:
|
kkonganti@1
|
922 false
|
kkonganti@1
|
923
|
kkonganti@1
|
924 --kmaalign_apm : Sets both -pm and -fpm Default: false
|
kkonganti@1
|
925
|
kkonganti@1
|
926 --kmaalign_cge : Set CGE penalties and rewards Default:
|
kkonganti@1
|
927 false
|
kkonganti@1
|
928
|
kkonganti@1
|
929 --salmonidx_run : Run `salmon index` tool. Default: true
|
kkonganti@1
|
930
|
kkonganti@1
|
931 --salmonidx_k : The size of k-mers that should be used for
|
kkonganti@1
|
932 the quasi index. Default: false
|
kkonganti@1
|
933
|
kkonganti@1
|
934 --salmonidx_gencode : This flag will expect the input transcript
|
kkonganti@1
|
935 FASTA to be in GENCODE format, and will
|
kkonganti@1
|
936 split the transcript name at the first `|`
|
kkonganti@1
|
937 character. These reduced names will be used
|
kkonganti@1
|
938 in the output and when looking for these
|
kkonganti@1
|
939 transcripts in a gene to transcript GTF.
|
kkonganti@1
|
940 Default: false
|
kkonganti@1
|
941
|
kkonganti@1
|
942 --salmonidx_features : This flag will expect the input reference
|
kkonganti@1
|
943 to be in the tsv file format, and will
|
kkonganti@1
|
944 split the feature name at the first `tab`
|
kkonganti@1
|
945 character. These reduced names will be used
|
kkonganti@1
|
946 in the output and when looking for the
|
kkonganti@1
|
947 sequence of the features. GTF. Default:
|
kkonganti@1
|
948 false
|
kkonganti@1
|
949
|
kkonganti@1
|
950 --salmonidx_keepDuplicates : This flag will disable the default indexing
|
kkonganti@1
|
951 behavior of discarding sequence-identical
|
kkonganti@1
|
952 duplicate transcripts. If this flag is
|
kkonganti@1
|
953 passed then duplicate transcripts that
|
kkonganti@1
|
954 appear in the input will be retained and
|
kkonganti@1
|
955 quantified separately. Default: false
|
kkonganti@1
|
956
|
kkonganti@1
|
957 --salmonidx_keepFixedFasta : Retain the fixed fasta file (without short
|
kkonganti@1
|
958 transcripts and duplicates, clipped, etc.)
|
kkonganti@1
|
959 generated during indexing. Default: false
|
kkonganti@1
|
960
|
kkonganti@1
|
961 --salmonidx_filterSize : The size of the Bloom filter that will be
|
kkonganti@1
|
962 used by TwoPaCo during indexing. The filter
|
kkonganti@1
|
963 will be of size 2^{filterSize}. A value of
|
kkonganti@1
|
964 -1 means that the filter size will be
|
kkonganti@1
|
965 automatically set based on the number of
|
kkonganti@1
|
966 distinct k-mers in the input, as estimated
|
kkonganti@1
|
967 by nthll. Default: false
|
kkonganti@1
|
968
|
kkonganti@1
|
969 --salmonidx_sparse : Build the index using a sparse sampling of
|
kkonganti@1
|
970 k-mer positions This will require less
|
kkonganti@1
|
971 memory (especially during quantification),
|
kkonganti@1
|
972 but will take longer to constructand can
|
kkonganti@1
|
973 slow down mapping / alignment. Default:
|
kkonganti@1
|
974 false
|
kkonganti@1
|
975
|
kkonganti@1
|
976 --salmonidx_n : Do not clip poly-A tails from the ends of
|
kkonganti@1
|
977 target sequences. Default: false
|
kkonganti@1
|
978
|
kkonganti@1
|
979 --gsrpy_run : Run the gen_salmon_res_table.py script.
|
kkonganti@1
|
980 Default: true
|
kkonganti@1
|
981
|
kkonganti@1
|
982 --gsrpy_url : Generate an additional column in final
|
kkonganti@1
|
983 results table which links out to NCBI
|
kkonganti@1
|
984 Pathogens Isolate Browser. Default: true
|
kkonganti@1
|
985
|
kkonganti@1
|
986 Help options :
|
kkonganti@1
|
987
|
kkonganti@1
|
988 --help : Display this message.
|
kkonganti@1
|
989
|
kkonganti@1
|
990 ```
|