comparison CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/share/man/man1/mash-sketch.1 @ 68:5028fdace37b

planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d
author jpayne
date Tue, 18 Mar 2025 16:23:26 -0400
parents
children
comparison
equal deleted inserted replaced
67:0e9998148a16 68:5028fdace37b
1 '\" t
2 .\" Title: mash-sketch
3 .\" Author: [see the "AUTHOR(S)" section]
4 .\" Generator: Asciidoctor 2.0.10
5 .\" Date: 2019-12-13
6 .\" Manual: \ \&
7 .\" Source: \ \&
8 .\" Language: English
9 .\"
10 .TH "MASH\-SKETCH" "1" "2019-12-13" "\ \&" "\ \&"
11 .ie \n(.g .ds Aq \(aq
12 .el .ds Aq '
13 .ss \n[.ss] 0
14 .nh
15 .ad l
16 .de URL
17 \fI\\$2\fP <\\$1>\\$3
18 ..
19 .als MTO URL
20 .if \n[.g] \{\
21 . mso www.tmac
22 . am URL
23 . ad l
24 . .
25 . am MTO
26 . ad l
27 . .
28 . LINKSTYLE blue R < >
29 .\}
30 .SH "NAME"
31 mash\-sketch \- create sketches (reduced representations for fast operations)
32 .SH "SYNOPSIS"
33 .sp
34 \fBmash sketch\fP [options] fast(a|q)[.gz] ...
35 .SH "DESCRIPTION"
36 .sp
37 Create a sketch file, which is a reduced representation of a sequence or set
38 of sequences (based on min\-hashes) that can be used for fast distance
39 estimations. Input can be fasta or fastq files (gzipped or not), and "\-" can
40 be given to read from standard input. Input files can also be files of file
41 names (see \fB\-l\fP). For output, one sketch file will be generated, but it can have
42 multiple sketches within it, divided by sequences or files (see \fB\-i\fP). By
43 default, the output file name will be the first input file with a \(aq.msh\(aq
44 extension, or \(aqstdin.msh\(aq if standard input is used (see \fB\-o\fP).
45 .SH "OPTIONS"
46 .sp
47 \fB\-h\fP
48 .RS 4
49 Help
50 .RE
51 .sp
52 \fB\-p\fP <int>
53 .RS 4
54 Parallelism. This many threads will be spawned for processing. [1]
55 .RE
56 .SS "Input"
57 .sp
58 \fB\-l\fP
59 .RS 4
60 List input. Each file contains a list of sequence files, one per line.
61 .RE
62 .SS "Output"
63 .sp
64 \fB\-o\fP <path>
65 .RS 4
66 Output prefix (first input file used if unspecified). The suffix
67 \(aq.msh\(aq will be appended.
68 .RE
69 .SS "Sketching"
70 .sp
71 \fB\-k\fP <int>
72 .RS 4
73 K\-mer size. Hashes will be based on strings of this many
74 nucleotides. Canonical nucleotides are used by default (see
75 Alphabet options below). (1\-32) [21]
76 .RE
77 .sp
78 \fB\-s\fP <int>
79 .RS 4
80 Sketch size. Each sketch will have at most this many non\-redundant
81 min\-hashes. [1000]
82 .RE
83 .sp
84 \fB\-i\fP
85 .RS 4
86 Sketch individual sequences, rather than whole files.
87 .RE
88 .sp
89 \fB\-w\fP <num>
90 .RS 4
91 Probability threshold for warning about low k\-mer size. (0\-1) [0.01]
92 .RE
93 .sp
94 \fB\-r\fP
95 .RS 4
96 Input is a read set. See Reads options below. Incompatible with \fB\-i\fP.
97 .RE
98 .SS "Sketching (reads)"
99 .sp
100 \fB\-b\fP <size>
101 .RS 4
102 Use a Bloom filter of this size (raw bytes or with K/M/G/T) to
103 filter out unique k\-mers. This is useful if exact filtering with \fB\-m\fP
104 uses too much memory. However, some unique k\-mers may pass
105 erroneously, and copies cannot be counted beyond 2. Implies \fB\-r\fP.
106 .RE
107 .sp
108 \fB\-m\fP <int>
109 .RS 4
110 Minimum copies of each k\-mer required to pass noise filter for
111 reads. Implies \fB\-r\fP. [1]
112 .RE
113 .sp
114 \fB\-c\fP <num>
115 .RS 4
116 Target coverage. Sketching will conclude if this coverage is
117 reached before the end of the input file (estimated by average
118 k\-mer multiplicity). Implies \fB\-r\fP.
119 .RE
120 .sp
121 \fB\-g\fP <size>
122 .RS 4
123 Genome size. If specified, will be used for p\-value calculation
124 instead of an estimated size from k\-mer content. Implies \fB\-r\fP.
125 .RE
126 .SS "Sketching (alphabet)"
127 .sp
128 \fB\-n\fP
129 .RS 4
130 Preserve strand (by default, strand is ignored by using canonical
131 DNA k\-mers, which are alphabetical minima of forward\-reverse
132 pairs). Implied if an alphabet is specified with \fB\-a\fP or \fB\-z\fP.
133 .RE
134 .sp
135 \fB\-a\fP
136 .RS 4
137 Use amino acid alphabet (A\-Z, except BJOUXZ). Implies \fB\-n\fP, \fB\-k\fP 9.
138 .RE
139 .sp
140 \fB\-z\fP <text>
141 .RS 4
142 Alphabet to base hashes on (case ignored by default; see \fB\-Z\fP).
143 K\-mers with other characters will be ignored. Implies \fB\-n\fP.
144 .RE
145 .sp
146 \fB\-Z\fP
147 .RS 4
148 Preserve case in k\-mers and alphabet (case is ignored by default).
149 Sequence letters whose case is not in the current alphabet will be
150 skipped when sketching.
151 .RE
152 .SH "SEE ALSO"
153 .sp
154 mash(1)