cfsan_bettercallsal: 0.5.0/readme/bettercallsal.md annotate

annotate 0.5.0/readme/bettercallsal.md @ 1:365849f031fd

"planemo upload"

author	kkonganti
date	Mon, 05 Jun 2023 18:48:51 -0400
parents
children

rev	line source
kkonganti@1	1 # bettercallsal
kkonganti@1	2
kkonganti@1	3 `bettercallsal` is an automated workflow to assign Salmonella serotype based on [NCBI Pathogens Database](https://www.ncbi.nlm.nih.gov/pathogens). It uses `MASH` to reduce the search space followed by additional genome filtering with `sourmash`. It then performs genome based alignment with `kma` followed by count generation using `salmon`. This workflow is especially useful in a case where a sample is of multi-serovar mixture.
kkonganti@1	4
kkonganti@1	5 \
kkonganti@1	6
kkonganti@1	7
kkonganti@1	8 <!-- TOC -->
kkonganti@1	9
kkonganti@1	10 - [Minimum Requirements](#minimum-requirements)
kkonganti@1	11 - [Usage and Examples](#usage-and-examples)
kkonganti@1	12 - [Database](#database)
kkonganti@1	13 - [Input](#input)
kkonganti@1	14 - [Output](#output)
kkonganti@1	15 - [Computational resources](#computational-resources)
kkonganti@1	16 - [Runtime profiles](#runtime-profiles)
kkonganti@1	17 - [your_institution.config](#your_institutionconfig)
kkonganti@1	18 - [Cloud computing](#cloud-computing)
kkonganti@1	19 - [Example data](#example-data)
kkonganti@1	20 - [Using sourmash](#using-sourmash)
kkonganti@1	21 - [bettercallsal CLI Help](#bettercallsal-cli-help)
kkonganti@1	22
kkonganti@1	23 <!-- /TOC -->
kkonganti@1	24
kkonganti@1	25 \
kkonganti@1	26
kkonganti@1	27
kkonganti@1	28 ## Minimum Requirements
kkonganti@1	29
kkonganti@1	30 1. [Nextflow version 22.10.0](https://github.com/nextflow-io/nextflow/releases/download/v22.10.0/nextflow).
kkonganti@1	31 - Make the `nextflow` binary executable (`chmod 755 nextflow`) and also make sure that it is made available in your `$PATH`.
kkonganti@1	32 - If your existing `JAVA` install does not support the newest Nextflow version, you can try Amazon's `JAVA` (OpenJDK): [Corretto](https://corretto.aws/downloads/latest/amazon-corretto-17-x64-linux-jdk.tar.gz).
kkonganti@1	33 2. Either of `micromamba` or `docker` or `singularity` installed and made available in your `$PATH`.
kkonganti@1	34 - Running the workflow via `micromamba` software provisioning is preferred as it does not require any `sudo` or `admin` privileges or any other configurations with respect to the various container providers.
kkonganti@1	35 - To install `micromamba` for your system type, please follow these [installation steps](https://mamba.readthedocs.io/en/latest/installation.html#manual-installation) and make sure that the `micromamba` binary is made available in your `$PATH`.
kkonganti@1	36 - Just the `curl` step is sufficient to download the binary as far as running the workflows are concerned.
kkonganti@1	37 3. Minimum of 10 CPU cores and about 16 GBs for main workflow steps. More memory may be required if your FASTQ files are big.
kkonganti@1	38
kkonganti@1	39 \
kkonganti@1	40
kkonganti@1	41
kkonganti@1	42 ## Usage and Examples
kkonganti@1	43
kkonganti@1	44 Clone or download this repository and then call `cpipes`.
kkonganti@1	45
kkonganti@1	46 ```bash
kkonganti@1	47 cpipes --pipeline bettercallsal [options]
kkonganti@1	48 ```
kkonganti@1	49
kkonganti@1	50 \
kkonganti@1	51
kkonganti@1	52
kkonganti@1	53 Example: Run the default `bettercallsal` pipeline in single-end mode.
kkonganti@1	54
kkonganti@1	55 ```bash
kkonganti@1	56 cd /data/scratch/$USER
kkonganti@1	57 mkdir nf-cpipes
kkonganti@1	58 cd nf-cpipes
kkonganti@1	59 cpipes
kkonganti@1	60 --pipeline bettercallsal \
kkonganti@1	61 --input /path/to/illumina/fastq/dir \
kkonganti@1	62 --output /path/to/output \
kkonganti@1	63 --bcs_root_dbdir /data/Kranti_Konganti/bettercallsal_db
kkonganti@1	64 ```
kkonganti@1	65
kkonganti@1	66 \
kkonganti@1	67
kkonganti@1	68
kkonganti@1	69 Example: Run the `bettercallsal` pipeline in paired-end mode. In this mode, the `R1` and `R2` files are concatenated. We have found that concatenated reads yields better calling rates. Please refer to the Methods and the Results section in our [preprint](https://www.biorxiv.org/content/10.1101/2023.04.06.535929v1.full) for more information. Users can still choose to use `bbmerge.sh` by adding the following options on the command-line: `--bbmerge_run true --bcs_concat_pe false`.
kkonganti@1	70
kkonganti@1	71 ```bash
kkonganti@1	72 cd /data/scratch/$USER
kkonganti@1	73 mkdir nf-cpipes
kkonganti@1	74 cd nf-cpipes
kkonganti@1	75 cpipes \
kkonganti@1	76 --pipeline bettercallsal \
kkonganti@1	77 --input /path/to/illumina/fastq/dir \
kkonganti@1	78 --output /path/to/output \
kkonganti@1	79 --bcs_root_dbdir /data/Kranti_Konganti/bettercallsal_db \
kkonganti@1	80 --fq_single_end false \
kkonganti@1	81 --fq_suffix '_R1_001.fastq.gz'
kkonganti@1	82 ```
kkonganti@1	83
kkonganti@1	84 \
kkonganti@1	85
kkonganti@1	86
kkonganti@1	87 ### Database
kkonganti@1	88
kkonganti@1	89 ---
kkonganti@1	90
kkonganti@1	91 The successful run of the workflow requires certain database flat files specific for the workflow.
kkonganti@1	92
kkonganti@1	93 Please refer to `bettercallsal_db` [README](./bettercallsal_db.md) if you would like to run the workflow on the latest version of the PDG release.
kkonganti@1	94
kkonganti@1	95
kkonganti@1	96
kkonganti@1	97 ### Input
kkonganti@1	98
kkonganti@1	99 ---
kkonganti@1	100
kkonganti@1	101 The input to the workflow is a folder containing compressed (`.gz`) FASTQ files. Please note that the sample grouping happens automatically by the file name of the FASTQ file. If for example, a single sample is sequenced across multiple sequencing lanes, you can choose to group those FASTQ files into one sample by using the `--fq_filename_delim` and `--fq_filename_delim_idx` options. By default, `--fq_filename_delim` is set to `_` (underscore) and `--fq_filename_delim_idx` is set to 1.
kkonganti@1	102
kkonganti@1	103 For example, if the directory contains FASTQ files as shown below:
kkonganti@1	104
kkonganti@1	105 - KB-01_apple_L001_R1.fastq.gz
kkonganti@1	106 - KB-01_apple_L001_R2.fastq.gz
kkonganti@1	107 - KB-01_apple_L002_R1.fastq.gz
kkonganti@1	108 - KB-01_apple_L002_R2.fastq.gz
kkonganti@1	109 - KB-02_mango_L001_R1.fastq.gz
kkonganti@1	110 - KB-02_mango_L001_R2.fastq.gz
kkonganti@1	111 - KB-02_mango_L002_R1.fastq.gz
kkonganti@1	112 - KB-02_mango_L002_R2.fastq.gz
kkonganti@1	113
kkonganti@1	114 Then, to create 2 sample groups, `apple` and `mango`, we split the file name by the delimitor (underscore in the case, which is default) and group by the first 2 words (`--fq_filename_delim_idx 2`).
kkonganti@1	115
kkonganti@1	116 This goes without saying that all the FASTQ files should have uniform naming patterns so that `--fq_filename_delim` and `--fq_filename_delim_idx` options do not have any adverse effect in collecting and creating a sample metadata sheet.
kkonganti@1	117
kkonganti@1	118 \
kkonganti@1	119
kkonganti@1	120
kkonganti@1	121 ### Output
kkonganti@1	122
kkonganti@1	123 ---
kkonganti@1	124
kkonganti@1	125 All the outputs for each step are stored inside the folder mentioned with the `--output` option. A `multiqc_report.html` file inside the `bettercallsal-multiqc` folder can be opened in any browser on your local workstation which contains a consolidated brief report.
kkonganti@1	126
kkonganti@1	127 \
kkonganti@1	128
kkonganti@1	129
kkonganti@1	130 ### Computational resources
kkonganti@1	131
kkonganti@1	132 ---
kkonganti@1	133
kkonganti@1	134 The workflow `bettercallsal` requires at least a minimum of 16 GBs of memory to successfully finish the workflow. By default, `bettercallsal` uses 10 CPU cores where possible. You can change this behavior and adjust the CPU cores with `--max_cpus` option.
kkonganti@1	135
kkonganti@1	136 \
kkonganti@1	137
kkonganti@1	138
kkonganti@1	139 Example:
kkonganti@1	140
kkonganti@1	141 ```bash
kkonganti@1	142 cpipes \
kkonganti@1	143 --pipeline bettercallsal \
kkonganti@1	144 --input /path/to/bettercallsal_sim_reads \
kkonganti@1	145 --output /path/to/bettercallsal_sim_reads_output \
kkonganti@1	146 --bcs_root_dbdir /path/to/PDG000000002.2537
kkonganti@1	147 --kmaalign_ignorequals \
kkonganti@1	148 --max_cpus 5 \
kkonganti@1	149 -profile stdkondagac \
kkonganti@1	150 -resume
kkonganti@1	151 ```
kkonganti@1	152
kkonganti@1	153 \
kkonganti@1	154
kkonganti@1	155
kkonganti@1	156 ### Runtime profiles
kkonganti@1	157
kkonganti@1	158 ---
kkonganti@1	159
kkonganti@1	160 You can use different run time profiles that suit your specific compute environments i.e., you can run the workflow locally on your machine or in a grid computing infrastructure.
kkonganti@1	161
kkonganti@1	162 \
kkonganti@1	163
kkonganti@1	164
kkonganti@1	165 Example:
kkonganti@1	166
kkonganti@1	167 ```bash
kkonganti@1	168 cd /data/scratch/$USER
kkonganti@1	169 mkdir nf-cpipes
kkonganti@1	170 cd nf-cpipes
kkonganti@1	171 cpipes \
kkonganti@1	172 --pipeline bettercallsal \
kkonganti@1	173 --input /path/to/fastq_pass_dir \
kkonganti@1	174 --output /path/to/where/output/should/go \
kkonganti@1	175 -profile your_institution
kkonganti@1	176 ```
kkonganti@1	177
kkonganti@1	178 The above command would run the pipeline and store the output at the location per the `--output` flag and the NEXTFLOW reports are always stored in the current working directory from where `cpipes` is run. For example, for the above command, a directory called `CPIPES-bettercallsal` would hold all the NEXTFLOW related logs, reports and trace files.
kkonganti@1	179
kkonganti@1	180 \
kkonganti@1	181
kkonganti@1	182
kkonganti@1	183 ### `your_institution.config`
kkonganti@1	184
kkonganti@1	185 ---
kkonganti@1	186
kkonganti@1	187 In the above example, we can see that we have mentioned the run time profile as `your_institution`. For this to work, add the following lines at the end of [`computeinfra.config`](../conf/computeinfra.config) file which should be located inside the `conf` folder. For example, if your institution uses SGE or UNIVA for grid computing instead of SLURM and has a job queue named `normal.q`, then add these lines:
kkonganti@1	188
kkonganti@1	189 \
kkonganti@1	190
kkonganti@1	191
kkonganti@1	192 ```groovy
kkonganti@1	193 your_institution {
kkonganti@1	194 process.executor = 'sge'
kkonganti@1	195 process.queue = 'normal.q'
kkonganti@1	196 singularity.enabled = false
kkonganti@1	197 singularity.autoMounts = true
kkonganti@1	198 docker.enabled = false
kkonganti@1	199 params.enable_conda = true
kkonganti@1	200 conda.enabled = true
kkonganti@1	201 conda.useMicromamba = true
kkonganti@1	202 params.enable_module = false
kkonganti@1	203 }
kkonganti@1	204 ```
kkonganti@1	205
kkonganti@1	206 In the above example, by default, all the software provisioning choices are disabled except `conda`. You can also choose to remove the `process.queue` line altogether and the `bettercallsal` workflow will request the appropriate memory and number of CPU cores automatically, which ranges from 1 CPU, 1 GB and 1 hour for job completion up to 10 CPU cores, 1 TB and 120 hours for job completion.
kkonganti@1	207
kkonganti@1	208 \
kkonganti@1	209
kkonganti@1	210
kkonganti@1	211 ### Cloud computing
kkonganti@1	212
kkonganti@1	213 ---
kkonganti@1	214
kkonganti@1	215 You can run the workflow in the cloud (works only with proper set up of AWS resources). Add new run time profiles with required parameters per [Nextflow docs](https://www.nextflow.io/docs/latest/executor.html):
kkonganti@1	216
kkonganti@1	217 \
kkonganti@1	218
kkonganti@1	219
kkonganti@1	220 Example:
kkonganti@1	221
kkonganti@1	222 ```groovy
kkonganti@1	223 my_aws_batch {
kkonganti@1	224 executor = 'awsbatch'
kkonganti@1	225 queue = 'my-batch-queue'
kkonganti@1	226 aws.batch.cliPath = '/home/ec2-user/miniconda/bin/aws'
kkonganti@1	227 aws.batch.region = 'us-east-1'
kkonganti@1	228 singularity.enabled = false
kkonganti@1	229 singularity.autoMounts = true
kkonganti@1	230 docker.enabled = true
kkonganti@1	231 params.conda_enabled = false
kkonganti@1	232 params.enable_module = false
kkonganti@1	233 }
kkonganti@1	234 ```
kkonganti@1	235
kkonganti@1	236 \
kkonganti@1	237
kkonganti@1	238
kkonganti@1	239 ### Example data
kkonganti@1	240
kkonganti@1	241 ---
kkonganti@1	242
kkonganti@1	243 After you make sure that you have all the [minimum requirements](#minimum-requirements) to run the workflow, you can try the `bettercallsal` pipeline on some simulated reads. The following input dataset contains simulated reads for `Montevideo` and `I 4,[5],12:i:-` in about roughly equal proportions.
kkonganti@1	244
kkonganti@1	245 - Download simulated reads: [S3](https://cfsan-pub-xfer.s3.amazonaws.com/Kranti.Konganti/bettercallsal/bettercallsal_sim_reads.tar.bz2) (~ 3 GB).
kkonganti@1	246 - Download pre-formatted test database: [S3](https://cfsan-pub-xfer.s3.amazonaws.com/Kranti.Konganti/bettercallsal/PDG000000002.2491.test-db.tar.bz2) (~ 75 MB). This test database works only with the simulated reads.
kkonganti@1	247 - Download pre-formatted full database (Optional): If you would like to do a complete run with your own FASTQ datasets, you can either create your own [database](./bettercallsal_db.md) or use [PDG000000002.2537](https://cfsan-pub-xfer.s3.amazonaws.com/Kranti.Konganti/bettercallsal/PDG000000002.2537.tar.bz2) version of the database (~ 37 GB).
kkonganti@1	248 - After succesful run of the workflow, your MultiQC report should look something like [this](https://cfsan-pub-xfer.s3.amazonaws.com/Kranti.Konganti/bettercallsal/bettercallsal_sim_reads_mqc.html).
kkonganti@1	249
kkonganti@1	250 Now run the workflow by ignoring quality values since these are simulated base qualities:
kkonganti@1	251
kkonganti@1	252 \
kkonganti@1	253
kkonganti@1	254
kkonganti@1	255 ```bash
kkonganti@1	256 cpipes \
kkonganti@1	257 --pipeline bettercallsal \
kkonganti@1	258 --input /path/to/bettercallsal_sim_reads \
kkonganti@1	259 --output /path/to/bettercallsal_sim_reads_output \
kkonganti@1	260 --bcs_root_dbdir /path/to/PDG000000002.2537
kkonganti@1	261 --kmaalign_ignorequals \
kkonganti@1	262 -profile stdkondagac \
kkonganti@1	263 -resume
kkonganti@1	264 ```
kkonganti@1	265
kkonganti@1	266 Please note that the run time profile `stdkondagac` will run jobs locally using `micromamba` for software provisioning. The first time you run the command, a new folder called `kondagac_cache` will be created and subsequent runs should use this `conda` cache.
kkonganti@1	267
kkonganti@1	268 \
kkonganti@1	269
kkonganti@1	270
kkonganti@1	271 ## Using `sourmash`
kkonganti@1	272
kkonganti@1	273 Beginning with `v0.3.0` of `bettercallsal` workflow, `sourmash` sketching is used to further narrow down possible serotype hits. It is ON by default. This will enable the generation of ANI Containment matrix for Samples vs Genomes. There may be multiple hits for the same serotype in the final MultiQC report as multiple genome accessions can belong to a single serotype.
kkonganti@1	274
kkonganti@1	275 You can turn OFF this feature with `--sourmashsketch_run false` option.
kkonganti@1	276
kkonganti@1	277 \
kkonganti@1	278
kkonganti@1	279
kkonganti@1	280 ## `bettercallsal` CLI Help
kkonganti@1	281
kkonganti@1	282 ```text
kkonganti@1	283 [Kranti_Konganti@my-unix-box ]$ cpipes --pipeline bettercallsal --help
kkonganti@1	284 N E X T F L O W ~ version 22.10.0
kkonganti@1	285 Launching `./bettercallsal/cpipes` [awesome_chandrasekhar] DSL2 - revision: 8da4e11078
kkonganti@1	286 ================================================================================
kkonganti@1	287 (o)
kkonganti@1	288 ___ _ __ _ _ __ ___ ___
kkonganti@1	289 / __\|\| '_ \ \| \|\| '_ \ / _ \/ __\|
kkonganti@1	290 \| (__ \| \|_) \|\| \|\| \|_) \|\| __/\__ \
kkonganti@1	291 \___\|\| .__/ \|_\|\| .__/ \___\|\|___/
kkonganti@1	292 \| \| \| \|
kkonganti@1	293 \|_\| \|_\|
kkonganti@1	294 --------------------------------------------------------------------------------
kkonganti@1	295 A collection of modular pipelines at CFSAN, FDA.
kkonganti@1	296 --------------------------------------------------------------------------------
kkonganti@1	297 Name : CPIPES
kkonganti@1	298 Author : Kranti Konganti
kkonganti@1	299 Version : 0.5.0
kkonganti@1	300 Center : CFSAN, FDA.
kkonganti@1	301 ================================================================================
kkonganti@1	302
kkonganti@1	303 Workflow : bettercallsal
kkonganti@1	304
kkonganti@1	305 Author : Kranti Konganti
kkonganti@1	306
kkonganti@1	307 Version : 0.5.0
kkonganti@1	308
kkonganti@1	309
kkonganti@1	310 Usage : cpipes --pipeline bettercallsal [options]
kkonganti@1	311
kkonganti@1	312
kkonganti@1	313 Required :
kkonganti@1	314
kkonganti@1	315 --input : Absolute path to directory containing FASTQ
kkonganti@1	316 files. The directory should contain only
kkonganti@1	317 FASTQ files as all the files within the
kkonganti@1	318 mentioned directory will be read. Ex: --
kkonganti@1	319 input /path/to/fastq_pass
kkonganti@1	320
kkonganti@1	321 --output : Absolute path to directory where all the
kkonganti@1	322 pipeline outputs should be stored. Ex: --
kkonganti@1	323 output /path/to/output
kkonganti@1	324
kkonganti@1	325 Other options :
kkonganti@1	326
kkonganti@1	327 --metadata : Absolute path to metadata CSV file
kkonganti@1	328 containing five mandatory columns: sample,
kkonganti@1	329 fq1,fq2,strandedness,single_end. The fq1
kkonganti@1	330 and fq2 columns contain absolute paths to
kkonganti@1	331 the FASTQ files. This option can be used in
kkonganti@1	332 place of --input option. This is rare. Ex
kkonganti@1	333 : --metadata samplesheet.csv
kkonganti@1	334
kkonganti@1	335 --fq_suffix : The suffix of FASTQ files (Unpaired reads
kkonganti@1	336 or R1 reads or Long reads) if an input
kkonganti@1	337 directory is mentioned via --input option.
kkonganti@1	338 Default: .fastq.gz
kkonganti@1	339
kkonganti@1	340 --fq2_suffix : The suffix of FASTQ files (Paired-end reads
kkonganti@1	341 or R2 reads) if an input directory is
kkonganti@1	342 mentioned via --input option. Default:
kkonganti@1	343 _R2_001.fastq.gz
kkonganti@1	344
kkonganti@1	345 --fq_filter_by_len : Remove FASTQ reads that are less than this
kkonganti@1	346 many bases. Default: 0
kkonganti@1	347
kkonganti@1	348 --fq_strandedness : The strandedness of the sequencing run.
kkonganti@1	349 This is mostly needed if your sequencing
kkonganti@1	350 run is RNA-SEQ. For most of the other runs
kkonganti@1	351 , it is probably safe to use unstranded for
kkonganti@1	352 the option. Default: unstranded
kkonganti@1	353
kkonganti@1	354 --fq_single_end : SINGLE-END information will be auto-
kkonganti@1	355 detected but this option forces PAIRED-END
kkonganti@1	356 FASTQ files to be treated as SINGLE-END so
kkonganti@1	357 only read 1 information is included in auto
kkonganti@1	358 -generated samplesheet. Default: true
kkonganti@1	359
kkonganti@1	360 --fq_filename_delim : Delimiter by which the file name is split
kkonganti@1	361 to obtain sample name. Default: _
kkonganti@1	362
kkonganti@1	363 --fq_filename_delim_idx : After splitting FASTQ file name by using
kkonganti@1	364 the --fq_filename_delim option, all
kkonganti@1	365 elements before this index (1-based) will
kkonganti@1	366 be joined to create final sample name.
kkonganti@1	367 Default: 1
kkonganti@1	368
kkonganti@1	369 --bcs_concat_pe : Concatenate paired-end files. Default: true
kkonganti@1	370
kkonganti@1	371 --bbmerge_run : Run BBMerge tool. Default: false
kkonganti@1	372
kkonganti@1	373 --bbmerge_reads : Quit after this many read pairs (-1 means
kkonganti@1	374 all) Default: -1
kkonganti@1	375
kkonganti@1	376 --bbmerge_adapters : Absolute UNIX path pointing to the adapters
kkonganti@1	377 file in FASTA format. Default: false
kkonganti@1	378
kkonganti@1	379 --bbmerge_ziplevel : Set to 1 (lowest) through 9 (max) to change
kkonganti@1	380 compression level; lower compression is
kkonganti@1	381 faster. Default: 1
kkonganti@1	382
kkonganti@1	383 --bbmerge_ordered : Output reads in the same order as input.
kkonganti@1	384 Default: false
kkonganti@1	385
kkonganti@1	386 --bbmerge_qtrim : Trim read ends to remove bases with quality
kkonganti@1	387 below --bbmerge_minq. Trims BEFORE merging
kkonganti@1	388 . Values: t (trim both ends), f (neither
kkonganti@1	389 end), r (right end only), l (left end only
kkonganti@1	390 ). Default: true
kkonganti@1	391
kkonganti@1	392 --bbmerge_qtrim2 : May be specified instead of --bbmerge_qtrim
kkonganti@1	393 to perform trimming only if merging is
kkonganti@1	394 unsuccesful. then retry merging. Default:
kkonganti@1	395 false
kkonganti@1	396
kkonganti@1	397 --bbmerge_trimq : Trim quality threshold. This may be comma-
kkonganti@1	398 delimited list (ascending) to try multiple
kkonganti@1	399 values. Default: 10
kkonganti@1	400
kkonganti@1	401 --bbmerge_minlength : (ml) Reads shorter than this after trimming
kkonganti@1	402 , but before merging, will be discarded.
kkonganti@1	403 Pairs will be discarded onlyif both are
kkonganti@1	404 shorter. Default: 1
kkonganti@1	405
kkonganti@1	406 --bbmerge_tbo : (trimbyoverlap). Trim overlapping reads to
kkonganti@1	407 remove right most (3') non-overlaping
kkonganti@1	408 portion instead of joining Default: false
kkonganti@1	409
kkonganti@1	410 --bbmerge_minavgquality : (maq). Reads with average quality below
kkonganti@1	411 this after trimming will not be attempted
kkonganti@1	412 to merge. Default: 30
kkonganti@1	413
kkonganti@1	414 --bbmerge_trimpolya : Trim trailing poly-A tail from adapter
kkonganti@1	415 output. Only affects outadapter. This also
kkonganti@1	416 trims poly-A followed by poly-G, which
kkonganti@1	417 occurs on NextSeq. Default: true
kkonganti@1	418
kkonganti@1	419 --bbmerge_pfilter : Ban improbable overlaps. Higher is more
kkonganti@1	420 strict. 0 will disable the filter; 1 will
kkonganti@1	421 allow only perfect overlaps. Default: 1
kkonganti@1	422
kkonganti@1	423 --bbmerge_ouq : Calculate best overlap using quality values
kkonganti@1	424 . Default: false
kkonganti@1	425
kkonganti@1	426 --bbmerge_owq : Calculate best overlap without using
kkonganti@1	427 quality values. Default: true
kkonganti@1	428
kkonganti@1	429 --bbmerge_strict : Decrease false positive rate and merging
kkonganti@1	430 rate. Default: false
kkonganti@1	431
kkonganti@1	432 --bbmerge_verystrict : Greatly decrease false positive rate and
kkonganti@1	433 merging rate. Default: false
kkonganti@1	434
kkonganti@1	435 --bbmerge_ultrastrict : Decrease false positive rate and merging
kkonganti@1	436 rate even more. Default: true
kkonganti@1	437
kkonganti@1	438 --bbmerge_maxstrict : Maxiamally decrease false positive rate and
kkonganti@1	439 merging rate. Default: false
kkonganti@1	440
kkonganti@1	441 --bbmerge_loose : Increase false positive rate and merging
kkonganti@1	442 rate. Default: false
kkonganti@1	443
kkonganti@1	444 --bbmerge_veryloose : Greatly increase false positive rate and
kkonganti@1	445 merging rate. Default: false
kkonganti@1	446
kkonganti@1	447 --bbmerge_ultraloose : Increase false positive rate and merging
kkonganti@1	448 rate even more. Default: false
kkonganti@1	449
kkonganti@1	450 --bbmerge_maxloose : Maximally increase false positive rate and
kkonganti@1	451 merging rate. Default: false
kkonganti@1	452
kkonganti@1	453 --bbmerge_fast : Fastest possible preset. Default: false
kkonganti@1	454
kkonganti@1	455 --bbmerge_k : Kmer length. 31 (or less) is fastest and
kkonganti@1	456 uses the least memory, but higher values
kkonganti@1	457 may be more accurate. 60 tends to work well
kkonganti@1	458 for 150bp reads. Default: 60
kkonganti@1	459
kkonganti@1	460 --bbmerge_prealloc : Pre-allocate memory rather than dynamically
kkonganti@1	461 growing. Faster and more memory-efficient
kkonganti@1	462 for large datasets. A float fraction (0-1)
kkonganti@1	463 may be specified, default 1. Default: true
kkonganti@1	464
kkonganti@1	465 --fastp_run : Run fastp tool. Default: true
kkonganti@1	466
kkonganti@1	467 --fastp_failed_out : Specify whether to store reads that cannot
kkonganti@1	468 pass the filters. Default: false
kkonganti@1	469
kkonganti@1	470 --fastp_merged_out : Specify whether to store merged output or
kkonganti@1	471 not. Default: false
kkonganti@1	472
kkonganti@1	473 --fastp_overlapped_out : For each read pair, output the overlapped
kkonganti@1	474 region if it has no mismatched base.
kkonganti@1	475 Default: false
kkonganti@1	476
kkonganti@1	477 --fastp_6 : Indicate that the input is using phred64
kkonganti@1	478 scoring (it'll be converted to phred33, so
kkonganti@1	479 the output will still be phred33). Default
kkonganti@1	480 : false
kkonganti@1	481
kkonganti@1	482 --fastp_reads_to_process : Specify how many reads/pairs are to be
kkonganti@1	483 processed. Default value 0 means process
kkonganti@1	484 all reads. Default: 0
kkonganti@1	485
kkonganti@1	486 --fastp_fix_mgi_id : The MGI FASTQ ID format is not compatible
kkonganti@1	487 with many BAM operation tools, enable this
kkonganti@1	488 option to fix it. Default: false
kkonganti@1	489
kkonganti@1	490 --fastp_A : Disable adapter trimming. On by default.
kkonganti@1	491 Default: false
kkonganti@1	492
kkonganti@1	493 --fastp_adapter_fasta : Specify a FASTA file to trim both read1 and
kkonganti@1	494 read2 (if PE) by all the sequences in this
kkonganti@1	495 FASTA file. Default: false
kkonganti@1	496
kkonganti@1	497 --fastp_f : Trim how many bases in front of read1.
kkonganti@1	498 Default: 0
kkonganti@1	499
kkonganti@1	500 --fastp_t : Trim how many bases at the end of read1.
kkonganti@1	501 Default: 0
kkonganti@1	502
kkonganti@1	503 --fastp_b : Max length of read1 after trimming. Default
kkonganti@1	504 : 0
kkonganti@1	505
kkonganti@1	506 --fastp_F : Trim how many bases in front of read2.
kkonganti@1	507 Default: 0
kkonganti@1	508
kkonganti@1	509 --fastp_T : Trim how many bases at the end of read2.
kkonganti@1	510 Default: 0
kkonganti@1	511
kkonganti@1	512 --fastp_B : Max length of read2 after trimming. Default
kkonganti@1	513 : 0
kkonganti@1	514
kkonganti@1	515 --fastp_dedup : Enable deduplication to drop the duplicated
kkonganti@1	516 reads/pairs. Default: true
kkonganti@1	517
kkonganti@1	518 --fastp_dup_calc_accuracy : Accuracy level to calculate duplication (1~
kkonganti@1	519 6), higher level uses more memory (1G, 2G,
kkonganti@1	520 4G, 8G, 16G, 24G). Default 1 for no-dedup
kkonganti@1	521 mode, and 3 for dedup mode. Default: 6
kkonganti@1	522
kkonganti@1	523 --fastp_poly_g_min_len : The minimum length to detect polyG in the
kkonganti@1	524 read tail. Default: 10
kkonganti@1	525
kkonganti@1	526 --fastp_G : Disable polyG tail trimming. Default: true
kkonganti@1	527
kkonganti@1	528 --fastp_x : Enable polyX trimming in 3' ends. Default:
kkonganti@1	529 false
kkonganti@1	530
kkonganti@1	531 --fastp_poly_x_min_len : The minimum length to detect polyX in the
kkonganti@1	532 read tail. Default: 10
kkonganti@1	533
kkonganti@1	534 --fastp_cut_front : Move a sliding window from front (5') to
kkonganti@1	535 tail, drop the bases in the window if its
kkonganti@1	536 mean quality < threshold, stop otherwise.
kkonganti@1	537 Default: true
kkonganti@1	538
kkonganti@1	539 --fastp_cut_tail : Move a sliding window from tail (3') to
kkonganti@1	540 front, drop the bases in the window if its
kkonganti@1	541 mean quality < threshold, stop otherwise.
kkonganti@1	542 Default: false
kkonganti@1	543
kkonganti@1	544 --fastp_cut_right : Move a sliding window from tail, drop the
kkonganti@1	545 bases in the window and the right part if
kkonganti@1	546 its mean quality < threshold, and then stop
kkonganti@1	547 . Default: true
kkonganti@1	548
kkonganti@1	549 --fastp_W : Sliding window size shared by --
kkonganti@1	550 fastp_cut_front, --fastp_cut_tail and --
kkonganti@1	551 fastp_cut_right. Default: 20
kkonganti@1	552
kkonganti@1	553 --fastp_M : The mean quality requirement shared by --
kkonganti@1	554 fastp_cut_front, --fastp_cut_tail and --
kkonganti@1	555 fastp_cut_right. Default: 30
kkonganti@1	556
kkonganti@1	557 --fastp_q : The quality value below which a base should
kkonganti@1	558 is not qualified. Default: 30
kkonganti@1	559
kkonganti@1	560 --fastp_u : What percent of bases are allowed to be
kkonganti@1	561 unqualified. Default: 40
kkonganti@1	562
kkonganti@1	563 --fastp_n : How many N's can a read have. Default: 5
kkonganti@1	564
kkonganti@1	565 --fastp_e : If the full reads' average quality is below
kkonganti@1	566 this value, then it is discarded. Default
kkonganti@1	567 : 0
kkonganti@1	568
kkonganti@1	569 --fastp_l : Reads shorter than this length will be
kkonganti@1	570 discarded. Default: 35
kkonganti@1	571
kkonganti@1	572 --fastp_max_len : Reads longer than this length will be
kkonganti@1	573 discarded. Default: 0
kkonganti@1	574
kkonganti@1	575 --fastp_y : Enable low complexity filter. The
kkonganti@1	576 complexity is defined as the percentage of
kkonganti@1	577 bases that are different from its next base
kkonganti@1	578 (base[i] != base[i+1]). Default: true
kkonganti@1	579
kkonganti@1	580 --fastp_Y : The threshold for low complexity filter (0~
kkonganti@1	581 100). Ex: A value of 30 means 30%
kkonganti@1	582 complexity is required. Default: 30
kkonganti@1	583
kkonganti@1	584 --fastp_U : Enable Unique Molecular Identifier (UMI)
kkonganti@1	585 pre-processing. Default: false
kkonganti@1	586
kkonganti@1	587 --fastp_umi_loc : Specify the location of UMI, can be one of
kkonganti@1	588 index1/index2/read1/read2/per_index/
kkonganti@1	589 per_read. Default: false
kkonganti@1	590
kkonganti@1	591 --fastp_umi_len : If the UMI is in read1 or read2, its length
kkonganti@1	592 should be provided. Default: false
kkonganti@1	593
kkonganti@1	594 --fastp_umi_prefix : If specified, an underline will be used to
kkonganti@1	595 connect prefix and UMI (i.e. prefix=UMI,
kkonganti@1	596 UMI=AATTCG, final=UMI_AATTCG). Default:
kkonganti@1	597 false
kkonganti@1	598
kkonganti@1	599 --fastp_umi_skip : If the UMI is in read1 or read2, fastp can
kkonganti@1	600 skip several bases following the UMI.
kkonganti@1	601 Default: false
kkonganti@1	602
kkonganti@1	603 --fastp_p : Enable overrepresented sequence analysis.
kkonganti@1	604 Default: true
kkonganti@1	605
kkonganti@1	606 --fastp_P : One in this many number of reads will be
kkonganti@1	607 computed for overrepresentation analysis (1
kkonganti@1	608 ~10000), smaller is slower. Default: 20
kkonganti@1	609
kkonganti@1	610 --fastp_use_custom_adapaters : Use custom adapter FASTA with fastp on top
kkonganti@1	611 of built-in adapter sequence auto-detection
kkonganti@1	612 . Enabling this option will attempt to find
kkonganti@1	613 and remove all possible Illumina adapter
kkonganti@1	614 and primer sequences but will make the
kkonganti@1	615 workflow run slow. Default: false
kkonganti@1	616
kkonganti@1	617 --mashscreen_run : Run `mash screen` tool. Default: true
kkonganti@1	618
kkonganti@1	619 --mashscreen_w : Winner-takes-all strategy for identity
kkonganti@1	620 estimates. After counting hashes for each
kkonganti@1	621 query, hashes that appear in multiple
kkonganti@1	622 queries will be removed from all except the
kkonganti@1	623 one with the best identity (ties broken by
kkonganti@1	624 larger query), and other identities will
kkonganti@1	625 be reduced. This removes output redundancy
kkonganti@1	626 , providing a rough compositional outline
kkonganti@1	627 . Default: false
kkonganti@1	628
kkonganti@1	629 --mashscreen_i : Minimum identity to report. Inclusive
kkonganti@1	630 unless set to zero, in which case only
kkonganti@1	631 identities greater than zero (i.e. with at
kkonganti@1	632 least one shared hash) will be reported.
kkonganti@1	633 Set to -1 to output everything. (-1-1).
kkonganti@1	634 Default: false
kkonganti@1	635
kkonganti@1	636 --mashscreen_v : Maximum p-value to report (0-1). Default:
kkonganti@1	637 false
kkonganti@1	638
kkonganti@1	639 --tuspy_run : Run the get_top_unique_mash_hits_genomes.py
kkonganti@1	640 script. Default: true
kkonganti@1	641
kkonganti@1	642 --tuspy_s : Absolute UNIX path to metadata text file
kkonganti@1	643 with the field separator, \| and 5 fields:
kkonganti@1	644 serotype\|asm_lvl\|asm_url\|snp_cluster_idEx:
kkonganti@1	645 serotype=Derby,antigen_formula=4:f,g:-\|
kkonganti@1	646 Scaffold\|402440\|ftp://...\|PDS000096654.2.
kkonganti@1	647 Mentioning this option will create a pickle
kkonganti@1	648 file for the provided metadata and exits.
kkonganti@1	649 Default: false
kkonganti@1	650
kkonganti@1	651 --tuspy_m : Absolute UNIX path to mash screen results
kkonganti@1	652 file. Default: false
kkonganti@1	653
kkonganti@1	654 --tuspy_ps : Absolute UNIX Path to serialized metadata
kkonganti@1	655 object in a pickle file. Default: /hpc/db/
kkonganti@1	656 bettercallsal/latest/index_metadata/
kkonganti@1	657 per_snp_cluster.ACC2SERO.pickle
kkonganti@1	658
kkonganti@1	659 --tuspy_gd : Absolute UNIX Path to directory containing
kkonganti@1	660 gzipped genome FASTA files. Default: /hpc/
kkonganti@1	661 db/bettercallsal/latest/scaffold_genomes
kkonganti@1	662
kkonganti@1	663 --tuspy_gds : Genome FASTA file suffix to search for in
kkonganti@1	664 the genome directory. Default:
kkonganti@1	665 _scaffolded_genomic.fna.gz
kkonganti@1	666
kkonganti@1	667 --tuspy_n : Return up to this many number of top N
kkonganti@1	668 unique genome accession hits. Default: 10
kkonganti@1	669
kkonganti@1	670 --sourmashsketch_run : Run `sourmash sketch dna` tool. Default:
kkonganti@1	671 true
kkonganti@1	672
kkonganti@1	673 --sourmashsketch_mode : Select which type of signatures to be
kkonganti@1	674 created: dna, protein, fromfile or
kkonganti@1	675 translate. Default: dna
kkonganti@1	676
kkonganti@1	677 --sourmashsketch_p : Signature parameters to use. Default: abund
kkonganti@1	678 ,scaled=1000,k=51,k=61,k=71
kkonganti@1	679
kkonganti@1	680 --sourmashsketch_file : <path> A text file containing a list of
kkonganti@1	681 sequence files to load. Default: false
kkonganti@1	682
kkonganti@1	683 --sourmashsketch_f : Recompute signatures even if the file
kkonganti@1	684 exists. Default: false
kkonganti@1	685
kkonganti@1	686 --sourmashsketch_merge : Merge all input files into one signature
kkonganti@1	687 file with the specified name. Default:
kkonganti@1	688 false
kkonganti@1	689
kkonganti@1	690 --sourmashsketch_singleton : Compute a signature for each sequence
kkonganti@1	691 record individually. Default: true
kkonganti@1	692
kkonganti@1	693 --sourmashsketch_name : Name the signature generated from each file
kkonganti@1	694 after the first record in the file.
kkonganti@1	695 Default: false
kkonganti@1	696
kkonganti@1	697 --sourmashsketch_randomize : Shuffle the list of input files randomly.
kkonganti@1	698 Default: false
kkonganti@1	699
kkonganti@1	700 --sourmashgather_run : Run `sourmash gather` tool. Default: true
kkonganti@1	701
kkonganti@1	702 --sourmashgather_n : Number of results to report. By default,
kkonganti@1	703 will terminate at --sourmashgather_thr_bp
kkonganti@1	704 value. Default: false
kkonganti@1	705
kkonganti@1	706 --sourmashgather_thr_bp : Reporting threshold (in bp) for estimated
kkonganti@1	707 overlap with remaining query. Default:
kkonganti@1	708 false
kkonganti@1	709
kkonganti@1	710 --sourmashgather_ignoreabn : Do NOT use k-mer abundances if present.
kkonganti@1	711 Default: false
kkonganti@1	712
kkonganti@1	713 --sourmashgather_prefetch : Use prefetch before gather. Default: false
kkonganti@1	714
kkonganti@1	715 --sourmashgather_noprefetch : Do not use prefetch before gather. Default
kkonganti@1	716 : false
kkonganti@1	717
kkonganti@1	718 --sourmashgather_ani_ci : Output confidence intervals for ANI
kkonganti@1	719 estimates. Default: true
kkonganti@1	720
kkonganti@1	721 --sourmashgather_k : The k-mer size to select. Default: 71
kkonganti@1	722
kkonganti@1	723 --sourmashgather_protein : Choose a protein signature. Default: false
kkonganti@1	724
kkonganti@1	725 --sourmashgather_noprotein : Do not choose a protein signature. Default
kkonganti@1	726 : false
kkonganti@1	727
kkonganti@1	728 --sourmashgather_dayhoff : Choose Dayhoff-encoded amino acid
kkonganti@1	729 signatures. Default: false
kkonganti@1	730
kkonganti@1	731 --sourmashgather_nodayhoff : Do not choose Dayhoff-encoded amino acid
kkonganti@1	732 signatures. Default: false
kkonganti@1	733
kkonganti@1	734 --sourmashgather_hp : Choose hydrophobic-polar-encoded amino acid
kkonganti@1	735 signatures. Default: false
kkonganti@1	736
kkonganti@1	737 --sourmashgather_nohp : Do not choose hydrophobic-polar-encoded
kkonganti@1	738 amino acid signatures. Default: false
kkonganti@1	739
kkonganti@1	740 --sourmashgather_dna : Choose DNA signature. Default: true
kkonganti@1	741
kkonganti@1	742 --sourmashgather_nodna : Do not choose DNA signature. Default: false
kkonganti@1	743
kkonganti@1	744 --sourmashgather_scaled : Scaled value should be between 100 and 1e6
kkonganti@1	745 . Default: false
kkonganti@1	746
kkonganti@1	747 --sourmashgather_inc_pat : Search only signatures that match this
kkonganti@1	748 pattern in name, filename, or md5. Default
kkonganti@1	749 : false
kkonganti@1	750
kkonganti@1	751 --sourmashgather_exc_pat : Search only signatures that do not match
kkonganti@1	752 this pattern in name, filename, or md5.
kkonganti@1	753 Default: false
kkonganti@1	754
kkonganti@1	755 --sourmashsearch_run : Run `sourmash search` tool. Default: false
kkonganti@1	756
kkonganti@1	757 --sourmashsearch_n : Number of results to report. By default,
kkonganti@1	758 will terminate at --sourmashsearch_thr
kkonganti@1	759 value. Default: false
kkonganti@1	760
kkonganti@1	761 --sourmashsearch_thr : Reporting threshold (similarity) to return
kkonganti@1	762 results. Default: 0
kkonganti@1	763
kkonganti@1	764 --sourmashsearch_contain : Score based on containment rather than
kkonganti@1	765 similarity. Default: false
kkonganti@1	766
kkonganti@1	767 --sourmashsearch_maxcontain : Score based on max containment rather than
kkonganti@1	768 similarity. Default: false
kkonganti@1	769
kkonganti@1	770 --sourmashsearch_ignoreabn : Do NOT use k-mer abundances if present.
kkonganti@1	771 Default: true
kkonganti@1	772
kkonganti@1	773 --sourmashsearch_ani_ci : Output confidence intervals for ANI
kkonganti@1	774 estimates. Default: false
kkonganti@1	775
kkonganti@1	776 --sourmashsearch_k : The k-mer size to select. Default: 71
kkonganti@1	777
kkonganti@1	778 --sourmashsearch_protein : Choose a protein signature. Default: false
kkonganti@1	779
kkonganti@1	780 --sourmashsearch_noprotein : Do not choose a protein signature. Default
kkonganti@1	781 : false
kkonganti@1	782
kkonganti@1	783 --sourmashsearch_dayhoff : Choose Dayhoff-encoded amino acid
kkonganti@1	784 signatures. Default: false
kkonganti@1	785
kkonganti@1	786 --sourmashsearch_nodayhoff : Do not choose Dayhoff-encoded amino acid
kkonganti@1	787 signatures. Default: false
kkonganti@1	788
kkonganti@1	789 --sourmashsearch_hp : Choose hydrophobic-polar-encoded amino acid
kkonganti@1	790 signatures. Default: false
kkonganti@1	791
kkonganti@1	792 --sourmashsearch_nohp : Do not choose hydrophobic-polar-encoded
kkonganti@1	793 amino acid signatures. Default: false
kkonganti@1	794
kkonganti@1	795 --sourmashsearch_dna : Choose DNA signature. Default: true
kkonganti@1	796
kkonganti@1	797 --sourmashsearch_nodna : Do not choose DNA signature. Default: false
kkonganti@1	798
kkonganti@1	799 --sourmashsearch_scaled : Scaled value should be between 100 and 1e6
kkonganti@1	800 . Default: false
kkonganti@1	801
kkonganti@1	802 --sourmashsearch_inc_pat : Search only signatures that match this
kkonganti@1	803 pattern in name, filename, or md5. Default
kkonganti@1	804 : false
kkonganti@1	805
kkonganti@1	806 --sourmashsearch_exc_pat : Search only signatures that do not match
kkonganti@1	807 this pattern in name, filename, or md5.
kkonganti@1	808 Default: false
kkonganti@1	809
kkonganti@1	810 --sfhpy_run : Run the sourmash_filter_hits.py script.
kkonganti@1	811 Default: true
kkonganti@1	812
kkonganti@1	813 --sfhpy_fcn : Column name by which filtering of rows
kkonganti@1	814 should be applied. Default: f_match
kkonganti@1	815
kkonganti@1	816 --sfhpy_fcv : Remove genomes whose match with the query
kkonganti@1	817 FASTQ is less than this much. Default: 0.1
kkonganti@1	818
kkonganti@1	819 --sfhpy_gt : Apply greather than or equal to condition
kkonganti@1	820 on numeric values of --sfhpy_fcn column.
kkonganti@1	821 Default: true
kkonganti@1	822
kkonganti@1	823 --sfhpy_lt : Apply less than or equal to condition on
kkonganti@1	824 numeric values of --sfhpy_fcn column.
kkonganti@1	825 Default: false
kkonganti@1	826
kkonganti@1	827 --kmaindex_run : Run kma index tool. Default: true
kkonganti@1	828
kkonganti@1	829 --kmaindex_t_db : Add to existing DB. Default: false
kkonganti@1	830
kkonganti@1	831 --kmaindex_k : k-mer size. Default: 31
kkonganti@1	832
kkonganti@1	833 --kmaindex_m : Minimizer size. Default: false
kkonganti@1	834
kkonganti@1	835 --kmaindex_hc : Homopolymer compression. Default: false
kkonganti@1	836
kkonganti@1	837 --kmaindex_ML : Minimum length of templates. Defaults to --
kkonganti@1	838 kmaindex_k Default: false
kkonganti@1	839
kkonganti@1	840 --kmaindex_ME : Mega DB. Default: false
kkonganti@1	841
kkonganti@1	842 --kmaindex_Sparse : Make Sparse DB. Default: false
kkonganti@1	843
kkonganti@1	844 --kmaindex_ht : Homology template. Default: false
kkonganti@1	845
kkonganti@1	846 --kmaindex_hq : Homology query. Default: false
kkonganti@1	847
kkonganti@1	848 --kmaindex_and : Both homology thresholds have to reach.
kkonganti@1	849 Default: false
kkonganti@1	850
kkonganti@1	851 --kmaindex_nbp : No bias print. Default: false
kkonganti@1	852
kkonganti@1	853 --kmaalign_run : Run kma tool. Default: true
kkonganti@1	854
kkonganti@1	855 --kmaalign_int : Input file has interleaved reads. Default
kkonganti@1	856 : false
kkonganti@1	857
kkonganti@1	858 --kmaalign_ef : Output additional features. Default: false
kkonganti@1	859
kkonganti@1	860 --kmaalign_vcf : Output vcf file. 2 to apply FT. Default:
kkonganti@1	861 false
kkonganti@1	862
kkonganti@1	863 --kmaalign_sam : Output SAM, 4/2096 for mapped/aligned.
kkonganti@1	864 Default: false
kkonganti@1	865
kkonganti@1	866 --kmaalign_nc : No consensus file. Default: true
kkonganti@1	867
kkonganti@1	868 --kmaalign_na : No aln file. Default: true
kkonganti@1	869
kkonganti@1	870 --kmaalign_nf : No frag file. Default: true
kkonganti@1	871
kkonganti@1	872 --kmaalign_a : Output all template mappings. Default:
kkonganti@1	873 false
kkonganti@1	874
kkonganti@1	875 --kmaalign_and : Use both -mrs and p-value on consensus.
kkonganti@1	876 Default: false
kkonganti@1	877
kkonganti@1	878 --kmaalign_oa : Use neither -mrs or p-value on consensus.
kkonganti@1	879 Default: false
kkonganti@1	880
kkonganti@1	881 --kmaalign_bc : Minimum support to call bases. Default:
kkonganti@1	882 false
kkonganti@1	883
kkonganti@1	884 --kmaalign_bcNano : Altered indel calling for ONT data. Default
kkonganti@1	885 : false
kkonganti@1	886
kkonganti@1	887 --kmaalign_bcd : Minimum depth to call bases. Default: false
kkonganti@1	888
kkonganti@1	889 --kmaalign_bcg : Maintain insignificant gaps. Default: false
kkonganti@1	890
kkonganti@1	891 --kmaalign_ID : Minimum consensus ID. Default: false
kkonganti@1	892
kkonganti@1	893 --kmaalign_md : Minimum depth. Default: false
kkonganti@1	894
kkonganti@1	895 --kmaalign_dense : Skip insertion in consensus. Default: false
kkonganti@1	896
kkonganti@1	897 --kmaalign_ref_fsa : Use Ns on indels. Default: false
kkonganti@1	898
kkonganti@1	899 --kmaalign_Mt1 : Map everything to one template. Default:
kkonganti@1	900 false
kkonganti@1	901
kkonganti@1	902 --kmaalign_1t1 : Map one query to one template. Default:
kkonganti@1	903 false
kkonganti@1	904
kkonganti@1	905 --kmaalign_mrs : Minimum relative alignment score. Default:
kkonganti@1	906 false
kkonganti@1	907
kkonganti@1	908 --kmaalign_mrc : Minimum query coverage. Default: 0.99
kkonganti@1	909
kkonganti@1	910 --kmaalign_mp : Minimum phred score of trailing and leading
kkonganti@1	911 bases. Default: 30
kkonganti@1	912
kkonganti@1	913 --kmaalign_mq : Set the minimum mapping quality. Default:
kkonganti@1	914 false
kkonganti@1	915
kkonganti@1	916 --kmaalign_eq : Minimum average quality score. Default: 30
kkonganti@1	917
kkonganti@1	918 --kmaalign_5p : Trim 5 prime by this many bases. Default:
kkonganti@1	919 false
kkonganti@1	920
kkonganti@1	921 --kmaalign_3p : Trim 3 prime by this many bases Default:
kkonganti@1	922 false
kkonganti@1	923
kkonganti@1	924 --kmaalign_apm : Sets both -pm and -fpm Default: false
kkonganti@1	925
kkonganti@1	926 --kmaalign_cge : Set CGE penalties and rewards Default:
kkonganti@1	927 false
kkonganti@1	928
kkonganti@1	929 --salmonidx_run : Run `salmon index` tool. Default: true
kkonganti@1	930
kkonganti@1	931 --salmonidx_k : The size of k-mers that should be used for
kkonganti@1	932 the quasi index. Default: false
kkonganti@1	933
kkonganti@1	934 --salmonidx_gencode : This flag will expect the input transcript
kkonganti@1	935 FASTA to be in GENCODE format, and will
kkonganti@1	936 split the transcript name at the first `\|`
kkonganti@1	937 character. These reduced names will be used
kkonganti@1	938 in the output and when looking for these
kkonganti@1	939 transcripts in a gene to transcript GTF.
kkonganti@1	940 Default: false
kkonganti@1	941
kkonganti@1	942 --salmonidx_features : This flag will expect the input reference
kkonganti@1	943 to be in the tsv file format, and will
kkonganti@1	944 split the feature name at the first `tab`
kkonganti@1	945 character. These reduced names will be used
kkonganti@1	946 in the output and when looking for the
kkonganti@1	947 sequence of the features. GTF. Default:
kkonganti@1	948 false
kkonganti@1	949
kkonganti@1	950 --salmonidx_keepDuplicates : This flag will disable the default indexing
kkonganti@1	951 behavior of discarding sequence-identical
kkonganti@1	952 duplicate transcripts. If this flag is
kkonganti@1	953 passed then duplicate transcripts that
kkonganti@1	954 appear in the input will be retained and
kkonganti@1	955 quantified separately. Default: false
kkonganti@1	956
kkonganti@1	957 --salmonidx_keepFixedFasta : Retain the fixed fasta file (without short
kkonganti@1	958 transcripts and duplicates, clipped, etc.)
kkonganti@1	959 generated during indexing. Default: false
kkonganti@1	960
kkonganti@1	961 --salmonidx_filterSize : The size of the Bloom filter that will be
kkonganti@1	962 used by TwoPaCo during indexing. The filter
kkonganti@1	963 will be of size 2^{filterSize}. A value of
kkonganti@1	964 -1 means that the filter size will be
kkonganti@1	965 automatically set based on the number of
kkonganti@1	966 distinct k-mers in the input, as estimated
kkonganti@1	967 by nthll. Default: false
kkonganti@1	968
kkonganti@1	969 --salmonidx_sparse : Build the index using a sparse sampling of
kkonganti@1	970 k-mer positions This will require less
kkonganti@1	971 memory (especially during quantification),
kkonganti@1	972 but will take longer to constructand can
kkonganti@1	973 slow down mapping / alignment. Default:
kkonganti@1	974 false
kkonganti@1	975
kkonganti@1	976 --salmonidx_n : Do not clip poly-A tails from the ends of
kkonganti@1	977 target sequences. Default: false
kkonganti@1	978
kkonganti@1	979 --gsrpy_run : Run the gen_salmon_res_table.py script.
kkonganti@1	980 Default: true
kkonganti@1	981
kkonganti@1	982 --gsrpy_url : Generate an additional column in final
kkonganti@1	983 results table which links out to NCBI
kkonganti@1	984 Pathogens Isolate Browser. Default: true
kkonganti@1	985
kkonganti@1	986 Help options :
kkonganti@1	987
kkonganti@1	988 --help : Display this message.
kkonganti@1	989
kkonganti@1	990 ```

Mercurial > repos > kkonganti > cfsan_bettercallsal

annotate 0.5.0/readme/bettercallsal.md @ 1:365849f031fd