jpayne@68
|
1 '\" t
|
jpayne@68
|
2 .\" Title: mash-dist
|
jpayne@68
|
3 .\" Author: [see the "AUTHOR(S)" section]
|
jpayne@68
|
4 .\" Generator: Asciidoctor 2.0.10
|
jpayne@68
|
5 .\" Date: 2019-12-13
|
jpayne@68
|
6 .\" Manual: \ \&
|
jpayne@68
|
7 .\" Source: \ \&
|
jpayne@68
|
8 .\" Language: English
|
jpayne@68
|
9 .\"
|
jpayne@68
|
10 .TH "MASH\-DIST" "1" "2019-12-13" "\ \&" "\ \&"
|
jpayne@68
|
11 .ie \n(.g .ds Aq \(aq
|
jpayne@68
|
12 .el .ds Aq '
|
jpayne@68
|
13 .ss \n[.ss] 0
|
jpayne@68
|
14 .nh
|
jpayne@68
|
15 .ad l
|
jpayne@68
|
16 .de URL
|
jpayne@68
|
17 \fI\\$2\fP <\\$1>\\$3
|
jpayne@68
|
18 ..
|
jpayne@68
|
19 .als MTO URL
|
jpayne@68
|
20 .if \n[.g] \{\
|
jpayne@68
|
21 . mso www.tmac
|
jpayne@68
|
22 . am URL
|
jpayne@68
|
23 . ad l
|
jpayne@68
|
24 . .
|
jpayne@68
|
25 . am MTO
|
jpayne@68
|
26 . ad l
|
jpayne@68
|
27 . .
|
jpayne@68
|
28 . LINKSTYLE blue R < >
|
jpayne@68
|
29 .\}
|
jpayne@68
|
30 .SH "NAME"
|
jpayne@68
|
31 mash\-dist \- estimate the distance of query sequences to references
|
jpayne@68
|
32 .SH "SYNOPSIS"
|
jpayne@68
|
33 .sp
|
jpayne@68
|
34 \fBmash dist\fP [options] <reference> <query> [<query>] ...
|
jpayne@68
|
35 .SH "DESCRIPTION"
|
jpayne@68
|
36 .sp
|
jpayne@68
|
37 Estimate the distance of each query sequence to the reference. Both the
|
jpayne@68
|
38 reference and queries can be fasta or fastq, gzipped or not, or Mash sketch
|
jpayne@68
|
39 files (.msh) with matching k\-mer sizes. Query files can also be files of file
|
jpayne@68
|
40 names (see \fB\-l\fP). Whole files are compared by default (see \fB\-i\fP). The output
|
jpayne@68
|
41 fields are [reference\-ID, query\-ID, distance, p\-value, shared\-hashes].
|
jpayne@68
|
42 .SH "OPTIONS"
|
jpayne@68
|
43 .sp
|
jpayne@68
|
44 \fB\-h\fP
|
jpayne@68
|
45 .RS 4
|
jpayne@68
|
46 Help
|
jpayne@68
|
47 .RE
|
jpayne@68
|
48 .sp
|
jpayne@68
|
49 \fB\-p\fP <int>
|
jpayne@68
|
50 .RS 4
|
jpayne@68
|
51 Parallelism. This many threads will be spawned for processing. [1]
|
jpayne@68
|
52 .RE
|
jpayne@68
|
53 .SS "Input"
|
jpayne@68
|
54 .sp
|
jpayne@68
|
55 \fB\-l\fP
|
jpayne@68
|
56 .RS 4
|
jpayne@68
|
57 List input. Each query file contains a list of sequence files, one
|
jpayne@68
|
58 per line. The reference file is not affected.
|
jpayne@68
|
59 .RE
|
jpayne@68
|
60 .SS "Output"
|
jpayne@68
|
61 .sp
|
jpayne@68
|
62 \fB\-t\fP
|
jpayne@68
|
63 .RS 4
|
jpayne@68
|
64 Table output (will not report p\-values, but fields will be blank if
|
jpayne@68
|
65 they do not meet the p\-value threshold).
|
jpayne@68
|
66 .RE
|
jpayne@68
|
67 .sp
|
jpayne@68
|
68 \fB\-v\fP <num>
|
jpayne@68
|
69 .RS 4
|
jpayne@68
|
70 Maximum p\-value to report. (0\-1) [1.0]
|
jpayne@68
|
71 .RE
|
jpayne@68
|
72 .sp
|
jpayne@68
|
73 \fB\-d\fP <num>
|
jpayne@68
|
74 .RS 4
|
jpayne@68
|
75 Maximum distance to report. (0\-1) [1.0]
|
jpayne@68
|
76 .RE
|
jpayne@68
|
77 .SS "Sketching"
|
jpayne@68
|
78 .sp
|
jpayne@68
|
79 \fB\-k\fP <int>
|
jpayne@68
|
80 .RS 4
|
jpayne@68
|
81 K\-mer size. Hashes will be based on strings of this many
|
jpayne@68
|
82 nucleotides. Canonical nucleotides are used by default (see
|
jpayne@68
|
83 Alphabet options below). (1\-32) [21]
|
jpayne@68
|
84 .RE
|
jpayne@68
|
85 .sp
|
jpayne@68
|
86 \fB\-s\fP <int>
|
jpayne@68
|
87 .RS 4
|
jpayne@68
|
88 Sketch size. Each sketch will have at most this many non\-redundant
|
jpayne@68
|
89 min\-hashes. [1000]
|
jpayne@68
|
90 .RE
|
jpayne@68
|
91 .sp
|
jpayne@68
|
92 \fB\-i\fP
|
jpayne@68
|
93 .RS 4
|
jpayne@68
|
94 Sketch individual sequences, rather than whole files.
|
jpayne@68
|
95 .RE
|
jpayne@68
|
96 .sp
|
jpayne@68
|
97 \fB\-w\fP <num>
|
jpayne@68
|
98 .RS 4
|
jpayne@68
|
99 Probability threshold for warning about low k\-mer size. (0\-1) [0.01]
|
jpayne@68
|
100 .RE
|
jpayne@68
|
101 .sp
|
jpayne@68
|
102 \fB\-r\fP
|
jpayne@68
|
103 .RS 4
|
jpayne@68
|
104 Input is a read set. See Reads options below. Incompatible with \fB\-i\fP.
|
jpayne@68
|
105 .RE
|
jpayne@68
|
106 .SS "Sketching (reads)"
|
jpayne@68
|
107 .sp
|
jpayne@68
|
108 \fB\-b\fP <size>
|
jpayne@68
|
109 .RS 4
|
jpayne@68
|
110 Use a Bloom filter of this size (raw bytes or with K/M/G/T) to
|
jpayne@68
|
111 filter out unique k\-mers. This is useful if exact filtering with \fB\-m\fP
|
jpayne@68
|
112 uses too much memory. However, some unique k\-mers may pass
|
jpayne@68
|
113 erroneously, and copies cannot be counted beyond 2. Implies \fB\-r\fP.
|
jpayne@68
|
114 .RE
|
jpayne@68
|
115 .sp
|
jpayne@68
|
116 \fB\-m\fP <int>
|
jpayne@68
|
117 .RS 4
|
jpayne@68
|
118 Minimum copies of each k\-mer required to pass noise filter for
|
jpayne@68
|
119 reads. Implies \fB\-r\fP. [1]
|
jpayne@68
|
120 .RE
|
jpayne@68
|
121 .sp
|
jpayne@68
|
122 \fB\-c\fP <num>
|
jpayne@68
|
123 .RS 4
|
jpayne@68
|
124 Target coverage. Sketching will conclude if this coverage is
|
jpayne@68
|
125 reached before the end of the input file (estimated by average
|
jpayne@68
|
126 k\-mer multiplicity). Implies \fB\-r\fP.
|
jpayne@68
|
127 .RE
|
jpayne@68
|
128 .sp
|
jpayne@68
|
129 \fB\-g\fP <size>
|
jpayne@68
|
130 .RS 4
|
jpayne@68
|
131 Genome size. If specified, will be used for p\-value calculation
|
jpayne@68
|
132 instead of an estimated size from k\-mer content. Implies \fB\-r\fP.
|
jpayne@68
|
133 .RE
|
jpayne@68
|
134 .SS "Sketching (alphabet)"
|
jpayne@68
|
135 .sp
|
jpayne@68
|
136 \fB\-n\fP
|
jpayne@68
|
137 .RS 4
|
jpayne@68
|
138 Preserve strand (by default, strand is ignored by using canonical
|
jpayne@68
|
139 DNA k\-mers, which are alphabetical minima of forward\-reverse
|
jpayne@68
|
140 pairs). Implied if an alphabet is specified with \fB\-a\fP or \fB\-z\fP.
|
jpayne@68
|
141 .RE
|
jpayne@68
|
142 .sp
|
jpayne@68
|
143 \fB\-a\fP
|
jpayne@68
|
144 .RS 4
|
jpayne@68
|
145 Use amino acid alphabet (A\-Z, except BJOUXZ). Implies \fB\-n\fP, \fB\-k\fP 9.
|
jpayne@68
|
146 .RE
|
jpayne@68
|
147 .sp
|
jpayne@68
|
148 \fB\-z\fP <text>
|
jpayne@68
|
149 .RS 4
|
jpayne@68
|
150 Alphabet to base hashes on (case ignored by default; see \fB\-Z\fP).
|
jpayne@68
|
151 K\-mers with other characters will be ignored. Implies \fB\-n\fP.
|
jpayne@68
|
152 .RE
|
jpayne@68
|
153 .sp
|
jpayne@68
|
154 \fB\-Z\fP
|
jpayne@68
|
155 .RS 4
|
jpayne@68
|
156 Preserve case in k\-mers and alphabet (case is ignored by default).
|
jpayne@68
|
157 Sequence letters whose case is not in the current alphabet will be
|
jpayne@68
|
158 skipped when sketching.
|
jpayne@68
|
159 .RE
|
jpayne@68
|
160 .SH "SEE ALSO"
|
jpayne@68
|
161 .sp
|
jpayne@68
|
162 mash(1) |