annotate 1.0.0/readme/bettercallsal_db.md @ 0:0a8dda29956e draft default tip

planemo upload
author galaxytrakr
date Thu, 28 May 2026 20:41:10 +0000
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
1 # bettercallsal_db
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
2
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
3 `bettercallsal_db` is an end-to-end automated workflow to generate and consolidate the required DB flat files based on [NCBI Pathogens Database for Salmonella](https://ftp.ncbi.nlm.nih.gov/pathogen/Results/Salmonella/). It first downloads the metadata based on the provided release identifier (Ex: `latest_snps` or `PDG000000002.3082`) and then creates a `mash sketch` based on the filtering strategy. It generates two types of sketches, one that prioritizes genome collection based on SNP clustering (`per_snp_cluster`) and the other just collects up to N number of genome accessions for each `computed_serotype` column from the metadata file (`per_computed_serotype`).
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
4
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
5 The `bettercallsal_db` workflow should finish within an hour with stable internet connection.
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
6
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
7 \
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
8  
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
9
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
10 ## Workflow Usage
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
11
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
12 ```bash
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
13 cpipes --pipeline bettercallsal_db [options]
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
14 ```
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
15
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
16 \
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
17  
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
18
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
19 Example: Run the `bettercallsal_db` pipeline and store output at `/data/Kranti_Konganti/bettercallsal_db/PDG000000002.3082`.
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
20
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
21 ```bash
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
22 cpipes
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
23 --pipeline bettercallsal_db \
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
24 --pdg_release PDG000000002.3082 \
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
25 --output /data/Kranti_Konganti/bettercallsal_db/PDG000000002.3082
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
26 ```
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
27
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
28 \
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
29  
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
30
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
31 Now you can run the `bettercallsal` workflow with the created database by mentioning the root path to the database with `--bcs_root_dbdir` option.
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
32
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
33 ```bash
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
34 cpipes
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
35 --pipeline bettercallsal \
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
36 --input /path/to/illumina/fastq/dir \
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
37 --output /path/to/output \
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
38 --bcs_root_dbdir /data/Kranti_Konganti/bettercallsal_db/PDG000000002.3082
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
39 ```
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
40
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
41 \
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
42  
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
43
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
44 ## Note
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
45
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
46 Please note that the last step of the `bettercallsal_db` workflow named `SCAFFOLD_GENOMES` will spawn multiple processes and is not cached by **Nextflow**. This is an intentional setup for this specific stage of the workflow to speed up database creation and as such it is recommended that you run this workflow in a grid computing or similar cloud computing setting.
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
47
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
48 \
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
49  
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
50
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
51 ## `bettercallsal_db` CLI Help
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
52
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
53 ```text
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
54 [Kranti_Konganti@my-unix-box ]$ cpipes --pipeline bettercallsal_db --help
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
55
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
56 N E X T F L O W ~ version 24.04.3
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
57
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
58 Launching `~/apps/bettercallsal/1.0.0/cpipes` [shrivelled_hamilton] DSL2 - revision: d9b4be42be
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
59
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
60 ================================================================================
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
61 (o)
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
62 ___ _ __ _ _ __ ___ ___
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
63 / __|| '_ \ | || '_ \ / _ \/ __|
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
64 | (__ | |_) || || |_) || __/\__ \
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
65 \___|| .__/ |_|| .__/ \___||___/
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
66 | | | |
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
67 |_| |_|
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
68 --------------------------------------------------------------------------------
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
69 A collection of modular pipelines at CFSAN, FDA.
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
70 --------------------------------------------------------------------------------
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
71 Name : bettercallsal
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
72 Author : Kranti Konganti
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
73 Version : 0.9.0
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
74 Center : CFSAN, FDA.
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
75 ================================================================================
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
76
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
77 Workflow : bettercallsal_db
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
78
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
79 Author : Kranti Konganti
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
80
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
81 Version : 1.0.0
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
82
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
83
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
84 Required :
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
85
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
86 --output : Absolute path to directory where all the
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
87 pipeline outputs should be stored. Ex: --
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
88 output /path/to/output
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
89
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
90 Other options :
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
91
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
92 --wcomp_serocol : Column number (non 0-based index) of the
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
93 PDG metadata file by which the serotypes
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
94 are collected. Default: false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
95
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
96 --wcomp_seronamecol : Column number (non 0-based index) of the
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
97 PDG metadata file whose column name is "
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
98 serovar". Default: false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
99
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
100 --wcomp_acc_col : Column number (non 0-based index) of the
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
101 PDG metadata file whose column name is "acc
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
102 ". Default: false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
103
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
104 --wcomp_target_acc_col : Column number (non 0-based index) of the
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
105 PDG metadata file whose column name is "
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
106 target_acc". Default: false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
107
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
108 --wcomp_complete_sero : Skip indexing serotypes when the serotype
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
109 name in the column number 49 (non 0-based)
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
110 of PDG metadata file consists a "-". For
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
111 example, if an accession has a serotype=
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
112 string as such in column number 49 (non 0-
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
113 based): "serotype=- 13:z4,z23:-" then, the
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
114 indexing of that accession is skipped.
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
115 Default: false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
116
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
117 --wcomp_not_null_serovar : Only index the computed_serotype column i.e
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
118 . column number 49 (non 0-based), if the
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
119 serovar column is not NULL. Default: false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
120
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
121 --wcomp_i : Force include this serovar. Ignores --
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
122 wcomp_complete_sero for only this serovar.
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
123 Mention multiple serovars separated by a
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
124 ! (Exclamation mark). Ex: --
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
125 wcomp_complete_sero I 4,[5],12:i:-!Agona
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
126 Default: false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
127
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
128 --wcomp_num : Number of genome accessions to be collected
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
129 per serotype. Default: false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
130
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
131 --wcomp_min_contig_size : Minimum contig size to consider a genome
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
132 for indexing. Default: false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
133
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
134 --wsnp_serocol : Column number (non 0-based index) of the
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
135 PDG metadata file by which the serotypes
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
136 are collected. Default: false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
137
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
138 --wsnp_seronamecol : Column number (non 0-based index) of the
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
139 PDG metadata file whose column name is "
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
140 serovar". Default: false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
141
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
142 --wsnp_acc_col : Column number (non 0-based index) of the
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
143 PDG metadata file whose column name is "acc
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
144 ". Default: false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
145
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
146 --wsnp_target_acc_col : Column number (non 0-based index) of the
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
147 PDG metadata file whose column name is "
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
148 target_acc". Default: false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
149
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
150 --wsnp_complete_sero : Skip indexing serotypes when the serotype
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
151 name in the column number 49 (non 0-based)
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
152 of PDG metadata file consists a "-". For
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
153 example, if an accession has a serotype=
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
154 string as such in column number 49 (non 0-
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
155 based): "serotype=- 13:z4,z23:-" then, the
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
156 indexing of that accession is skipped.
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
157 Default: true
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
158
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
159 --wsnp_not_null_serovar : Only index the computed_serotype column i.e
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
160 . column number 49 (non 0-based), if the
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
161 serovar column is not NULL. Default: false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
162
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
163 --wsnp_i : Force include this serovar. Ignores --
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
164 wsnp_complete_sero for only this serovar.
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
165 Mention multiple serovars separated by a
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
166 ! (Exclamation mark). Ex: --
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
167 wsnp_complete_sero I 4,[5],12:i:-!Agona
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
168 Default: 'I 4,[5],12:i
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
169
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
170 --wsnp_num : Number of genome accessions to collect per
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
171 SNP cluster. Default: false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
172
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
173 --mashsketch_run : Run `mash screen` tool. Default: true
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
174
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
175 --mashsketch_l : List input. Lines in each <input> specify
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
176 paths to sequence files, one per line.
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
177 Default: true
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
178
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
179 --mashsketch_I : <path> ID field for sketch of reads (
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
180 instead of first sequence ID). Default:
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
181 false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
182
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
183 --mashsketch_C : <path> Comment for a sketch of reads (
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
184 instead of first sequence comment). Default
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
185 : false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
186
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
187 --mashsketch_k : <int> K-mer size. Hashes will be based on
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
188 strings of this many nucleotides.
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
189 Canonical nucleotides are used by default (
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
190 see Alphabet options below). (1-32) Default
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
191 : 21
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
192
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
193 --mashsketch_s : <int> Sketch size. Each sketch will have
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
194 at most this many non-redundant min-hashes
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
195 . Default: 1000
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
196
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
197 --mashsketch_i : Sketch individual sequences, rather than
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
198 whole files, e.g. for multi-fastas of
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
199 single-chromosome genomes or pair-wise gene
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
200 comparisons. Default: false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
201
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
202 --mashsketch_S : <int> Seed to provide to the hash
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
203 function. (0-4294967296) [42] Default:
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
204 false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
205
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
206 --mashsketch_w : <num> Probability threshold for warning
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
207 about low k-mer size. (0-1) Default: false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
208
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
209 --mashsketch_r : Input is a read set. See Reads options
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
210 below. Incompatible with --mashsketch_i.
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
211 Default: false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
212
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
213 --mashsketch_b : <size> Use a Bloom filter of this size (
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
214 raw bytes or with K/M/G/T) to filter out
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
215 unique k-mers. This is useful if exact
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
216 filtering with --mashsketch_m uses too much
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
217 memory. However, some unique k-mers may
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
218 pass erroneously, and copies cannot be
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
219 counted beyond 2. Implies --mashsketch_r.
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
220 Default: false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
221
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
222 --mashsketch_m : <int> Minimum copies of each k-mer
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
223 required to pass noise filter for reads.
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
224 Implies --mashsketch_r. Default: false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
225
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
226 --mashsketch_c : <num> Target coverage. Sketching will
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
227 conclude if this coverage is reached before
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
228 the end of the input file (estimated by
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
229 average k-mer multiplicity). Implies --
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
230 mashsketch_r. Default: false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
231
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
232 --mashsketch_g : <size> Genome size (raw bases or with K/M/
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
233 G/T). If specified, will be used for p-
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
234 value calculation instead of an estimated
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
235 size from k-mer content. Implies --
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
236 mashsketch_r. Default: false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
237
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
238 --mashsketch_n : Preserve strand (by default, strand is
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
239 ignored by using canonical DNA k-mers,
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
240 which are alphabetical minima of forward-
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
241 reverse pairs). Implied if an alphabet is
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
242 specified with --mashsketch_a or --
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
243 mashsketch_z. Default: false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
244
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
245 --mashsketch_a : Use amino acid alphabet (A-Z, except BJOUXZ
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
246 ). Implies --mashsketch_n --mashsketch_k 9
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
247 . Default: false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
248
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
249 --mashsketch_z : <text> Alphabet to base hashes on (case
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
250 ignored by default; see --mashsketch_Z). K-
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
251 mers with other characters will be ignored
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
252 . Implies --mashsketch_n. Default: false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
253
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
254 --mashsketch_Z : Preserve case in k-mers and alphabet (case
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
255 is ignored by default). Sequence letters
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
256 whose case is not in the current alphabet
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
257 will be skipped when sketching. Default:
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
258 false
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
259
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
260 Help options :
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
261
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
262 --help : Display this message.
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
263
0a8dda29956e planemo upload
galaxytrakr
parents:
diff changeset
264 ```