jpayne@68
|
1 '\" t
|
jpayne@68
|
2 .\" Title: mash-sketch
|
jpayne@68
|
3 .\" Author: [see the "AUTHOR(S)" section]
|
jpayne@68
|
4 .\" Generator: Asciidoctor 2.0.10
|
jpayne@68
|
5 .\" Date: 2019-12-13
|
jpayne@68
|
6 .\" Manual: \ \&
|
jpayne@68
|
7 .\" Source: \ \&
|
jpayne@68
|
8 .\" Language: English
|
jpayne@68
|
9 .\"
|
jpayne@68
|
10 .TH "MASH\-SKETCH" "1" "2019-12-13" "\ \&" "\ \&"
|
jpayne@68
|
11 .ie \n(.g .ds Aq \(aq
|
jpayne@68
|
12 .el .ds Aq '
|
jpayne@68
|
13 .ss \n[.ss] 0
|
jpayne@68
|
14 .nh
|
jpayne@68
|
15 .ad l
|
jpayne@68
|
16 .de URL
|
jpayne@68
|
17 \fI\\$2\fP <\\$1>\\$3
|
jpayne@68
|
18 ..
|
jpayne@68
|
19 .als MTO URL
|
jpayne@68
|
20 .if \n[.g] \{\
|
jpayne@68
|
21 . mso www.tmac
|
jpayne@68
|
22 . am URL
|
jpayne@68
|
23 . ad l
|
jpayne@68
|
24 . .
|
jpayne@68
|
25 . am MTO
|
jpayne@68
|
26 . ad l
|
jpayne@68
|
27 . .
|
jpayne@68
|
28 . LINKSTYLE blue R < >
|
jpayne@68
|
29 .\}
|
jpayne@68
|
30 .SH "NAME"
|
jpayne@68
|
31 mash\-sketch \- create sketches (reduced representations for fast operations)
|
jpayne@68
|
32 .SH "SYNOPSIS"
|
jpayne@68
|
33 .sp
|
jpayne@68
|
34 \fBmash sketch\fP [options] fast(a|q)[.gz] ...
|
jpayne@68
|
35 .SH "DESCRIPTION"
|
jpayne@68
|
36 .sp
|
jpayne@68
|
37 Create a sketch file, which is a reduced representation of a sequence or set
|
jpayne@68
|
38 of sequences (based on min\-hashes) that can be used for fast distance
|
jpayne@68
|
39 estimations. Input can be fasta or fastq files (gzipped or not), and "\-" can
|
jpayne@68
|
40 be given to read from standard input. Input files can also be files of file
|
jpayne@68
|
41 names (see \fB\-l\fP). For output, one sketch file will be generated, but it can have
|
jpayne@68
|
42 multiple sketches within it, divided by sequences or files (see \fB\-i\fP). By
|
jpayne@68
|
43 default, the output file name will be the first input file with a \(aq.msh\(aq
|
jpayne@68
|
44 extension, or \(aqstdin.msh\(aq if standard input is used (see \fB\-o\fP).
|
jpayne@68
|
45 .SH "OPTIONS"
|
jpayne@68
|
46 .sp
|
jpayne@68
|
47 \fB\-h\fP
|
jpayne@68
|
48 .RS 4
|
jpayne@68
|
49 Help
|
jpayne@68
|
50 .RE
|
jpayne@68
|
51 .sp
|
jpayne@68
|
52 \fB\-p\fP <int>
|
jpayne@68
|
53 .RS 4
|
jpayne@68
|
54 Parallelism. This many threads will be spawned for processing. [1]
|
jpayne@68
|
55 .RE
|
jpayne@68
|
56 .SS "Input"
|
jpayne@68
|
57 .sp
|
jpayne@68
|
58 \fB\-l\fP
|
jpayne@68
|
59 .RS 4
|
jpayne@68
|
60 List input. Each file contains a list of sequence files, one per line.
|
jpayne@68
|
61 .RE
|
jpayne@68
|
62 .SS "Output"
|
jpayne@68
|
63 .sp
|
jpayne@68
|
64 \fB\-o\fP <path>
|
jpayne@68
|
65 .RS 4
|
jpayne@68
|
66 Output prefix (first input file used if unspecified). The suffix
|
jpayne@68
|
67 \(aq.msh\(aq will be appended.
|
jpayne@68
|
68 .RE
|
jpayne@68
|
69 .SS "Sketching"
|
jpayne@68
|
70 .sp
|
jpayne@68
|
71 \fB\-k\fP <int>
|
jpayne@68
|
72 .RS 4
|
jpayne@68
|
73 K\-mer size. Hashes will be based on strings of this many
|
jpayne@68
|
74 nucleotides. Canonical nucleotides are used by default (see
|
jpayne@68
|
75 Alphabet options below). (1\-32) [21]
|
jpayne@68
|
76 .RE
|
jpayne@68
|
77 .sp
|
jpayne@68
|
78 \fB\-s\fP <int>
|
jpayne@68
|
79 .RS 4
|
jpayne@68
|
80 Sketch size. Each sketch will have at most this many non\-redundant
|
jpayne@68
|
81 min\-hashes. [1000]
|
jpayne@68
|
82 .RE
|
jpayne@68
|
83 .sp
|
jpayne@68
|
84 \fB\-i\fP
|
jpayne@68
|
85 .RS 4
|
jpayne@68
|
86 Sketch individual sequences, rather than whole files.
|
jpayne@68
|
87 .RE
|
jpayne@68
|
88 .sp
|
jpayne@68
|
89 \fB\-w\fP <num>
|
jpayne@68
|
90 .RS 4
|
jpayne@68
|
91 Probability threshold for warning about low k\-mer size. (0\-1) [0.01]
|
jpayne@68
|
92 .RE
|
jpayne@68
|
93 .sp
|
jpayne@68
|
94 \fB\-r\fP
|
jpayne@68
|
95 .RS 4
|
jpayne@68
|
96 Input is a read set. See Reads options below. Incompatible with \fB\-i\fP.
|
jpayne@68
|
97 .RE
|
jpayne@68
|
98 .SS "Sketching (reads)"
|
jpayne@68
|
99 .sp
|
jpayne@68
|
100 \fB\-b\fP <size>
|
jpayne@68
|
101 .RS 4
|
jpayne@68
|
102 Use a Bloom filter of this size (raw bytes or with K/M/G/T) to
|
jpayne@68
|
103 filter out unique k\-mers. This is useful if exact filtering with \fB\-m\fP
|
jpayne@68
|
104 uses too much memory. However, some unique k\-mers may pass
|
jpayne@68
|
105 erroneously, and copies cannot be counted beyond 2. Implies \fB\-r\fP.
|
jpayne@68
|
106 .RE
|
jpayne@68
|
107 .sp
|
jpayne@68
|
108 \fB\-m\fP <int>
|
jpayne@68
|
109 .RS 4
|
jpayne@68
|
110 Minimum copies of each k\-mer required to pass noise filter for
|
jpayne@68
|
111 reads. Implies \fB\-r\fP. [1]
|
jpayne@68
|
112 .RE
|
jpayne@68
|
113 .sp
|
jpayne@68
|
114 \fB\-c\fP <num>
|
jpayne@68
|
115 .RS 4
|
jpayne@68
|
116 Target coverage. Sketching will conclude if this coverage is
|
jpayne@68
|
117 reached before the end of the input file (estimated by average
|
jpayne@68
|
118 k\-mer multiplicity). Implies \fB\-r\fP.
|
jpayne@68
|
119 .RE
|
jpayne@68
|
120 .sp
|
jpayne@68
|
121 \fB\-g\fP <size>
|
jpayne@68
|
122 .RS 4
|
jpayne@68
|
123 Genome size. If specified, will be used for p\-value calculation
|
jpayne@68
|
124 instead of an estimated size from k\-mer content. Implies \fB\-r\fP.
|
jpayne@68
|
125 .RE
|
jpayne@68
|
126 .SS "Sketching (alphabet)"
|
jpayne@68
|
127 .sp
|
jpayne@68
|
128 \fB\-n\fP
|
jpayne@68
|
129 .RS 4
|
jpayne@68
|
130 Preserve strand (by default, strand is ignored by using canonical
|
jpayne@68
|
131 DNA k\-mers, which are alphabetical minima of forward\-reverse
|
jpayne@68
|
132 pairs). Implied if an alphabet is specified with \fB\-a\fP or \fB\-z\fP.
|
jpayne@68
|
133 .RE
|
jpayne@68
|
134 .sp
|
jpayne@68
|
135 \fB\-a\fP
|
jpayne@68
|
136 .RS 4
|
jpayne@68
|
137 Use amino acid alphabet (A\-Z, except BJOUXZ). Implies \fB\-n\fP, \fB\-k\fP 9.
|
jpayne@68
|
138 .RE
|
jpayne@68
|
139 .sp
|
jpayne@68
|
140 \fB\-z\fP <text>
|
jpayne@68
|
141 .RS 4
|
jpayne@68
|
142 Alphabet to base hashes on (case ignored by default; see \fB\-Z\fP).
|
jpayne@68
|
143 K\-mers with other characters will be ignored. Implies \fB\-n\fP.
|
jpayne@68
|
144 .RE
|
jpayne@68
|
145 .sp
|
jpayne@68
|
146 \fB\-Z\fP
|
jpayne@68
|
147 .RS 4
|
jpayne@68
|
148 Preserve case in k\-mers and alphabet (case is ignored by default).
|
jpayne@68
|
149 Sequence letters whose case is not in the current alphabet will be
|
jpayne@68
|
150 skipped when sketching.
|
jpayne@68
|
151 .RE
|
jpayne@68
|
152 .SH "SEE ALSO"
|
jpayne@68
|
153 .sp
|
jpayne@68
|
154 mash(1) |