jpayne@68
|
1 '\" t
|
jpayne@68
|
2 .\" Title: mash-triangle
|
jpayne@68
|
3 .\" Author: [see the "AUTHOR(S)" section]
|
jpayne@68
|
4 .\" Generator: Asciidoctor 2.0.10
|
jpayne@68
|
5 .\" Date: 2019-12-13
|
jpayne@68
|
6 .\" Manual: \ \&
|
jpayne@68
|
7 .\" Source: \ \&
|
jpayne@68
|
8 .\" Language: English
|
jpayne@68
|
9 .\"
|
jpayne@68
|
10 .TH "MASH\-TRIANGLE" "1" "2019-12-13" "\ \&" "\ \&"
|
jpayne@68
|
11 .ie \n(.g .ds Aq \(aq
|
jpayne@68
|
12 .el .ds Aq '
|
jpayne@68
|
13 .ss \n[.ss] 0
|
jpayne@68
|
14 .nh
|
jpayne@68
|
15 .ad l
|
jpayne@68
|
16 .de URL
|
jpayne@68
|
17 \fI\\$2\fP <\\$1>\\$3
|
jpayne@68
|
18 ..
|
jpayne@68
|
19 .als MTO URL
|
jpayne@68
|
20 .if \n[.g] \{\
|
jpayne@68
|
21 . mso www.tmac
|
jpayne@68
|
22 . am URL
|
jpayne@68
|
23 . ad l
|
jpayne@68
|
24 . .
|
jpayne@68
|
25 . am MTO
|
jpayne@68
|
26 . ad l
|
jpayne@68
|
27 . .
|
jpayne@68
|
28 . LINKSTYLE blue R < >
|
jpayne@68
|
29 .\}
|
jpayne@68
|
30 .SH "NAME"
|
jpayne@68
|
31 mash\-triangle \- estimate a lower\-triangular distance matrix
|
jpayne@68
|
32 .SH "SYNOPSIS"
|
jpayne@68
|
33 .sp
|
jpayne@68
|
34 \fBmash triangle\fP [options] <seq1> [<seq2>] ...
|
jpayne@68
|
35 .SH "DESCRIPTION"
|
jpayne@68
|
36 .sp
|
jpayne@68
|
37 Estimate the distance of each input sequence to every other input
|
jpayne@68
|
38 sequence. Outputs a lower\-triangular distance matrix in relaxed Phylip
|
jpayne@68
|
39 format. The input sequences can be fasta or fastq, gzipped or not, or
|
jpayne@68
|
40 Mash sketch files (.msh) with matching k\-mer sizes. Input files can also
|
jpayne@68
|
41 be files of file names (see \-l). If more than one input file is provided,
|
jpayne@68
|
42 whole files are compared by default (see \-i).
|
jpayne@68
|
43 .SH "OPTIONS"
|
jpayne@68
|
44 .sp
|
jpayne@68
|
45 \fB\-h\fP
|
jpayne@68
|
46 .RS 4
|
jpayne@68
|
47 Help
|
jpayne@68
|
48 .RE
|
jpayne@68
|
49 .sp
|
jpayne@68
|
50 \fB\-p\fP <int>
|
jpayne@68
|
51 .RS 4
|
jpayne@68
|
52 Parallelism. This many threads will be spawned for processing. [1]
|
jpayne@68
|
53 .RE
|
jpayne@68
|
54 .SS "Input"
|
jpayne@68
|
55 .sp
|
jpayne@68
|
56 \fB\-l\fP
|
jpayne@68
|
57 .RS 4
|
jpayne@68
|
58 List input. Each query file contains a list of sequence files, one
|
jpayne@68
|
59 per line. The reference file is not affected.
|
jpayne@68
|
60 .RE
|
jpayne@68
|
61 .SS "Output"
|
jpayne@68
|
62 .sp
|
jpayne@68
|
63 \fB\-C\fP
|
jpayne@68
|
64 .RS 4
|
jpayne@68
|
65 Use comment fields for sequence names instead of IDs.
|
jpayne@68
|
66 .RE
|
jpayne@68
|
67 .sp
|
jpayne@68
|
68 \fB\-E\fP
|
jpayne@68
|
69 .RS 4
|
jpayne@68
|
70 Output edge list instead of Phylip matrix, with fields [seq1, seq2,
|
jpayne@68
|
71 dist, p\-val, shared\-hashes].
|
jpayne@68
|
72 .RE
|
jpayne@68
|
73 .sp
|
jpayne@68
|
74 \fB\-v\fP <num>
|
jpayne@68
|
75 .RS 4
|
jpayne@68
|
76 Maximum p\-value to report in edge list. Implies \-E. (0\-1) [1.0]
|
jpayne@68
|
77 .RE
|
jpayne@68
|
78 .sp
|
jpayne@68
|
79 \fB\-d\fP <num>
|
jpayne@68
|
80 .RS 4
|
jpayne@68
|
81 Maximum distance to report in edge list. Implies \-E. (0\-1) [1.0]
|
jpayne@68
|
82 .RE
|
jpayne@68
|
83 .SS "Sketching"
|
jpayne@68
|
84 .sp
|
jpayne@68
|
85 \fB\-k\fP <int>
|
jpayne@68
|
86 .RS 4
|
jpayne@68
|
87 K\-mer size. Hashes will be based on strings of this many
|
jpayne@68
|
88 nucleotides. Canonical nucleotides are used by default (see
|
jpayne@68
|
89 Alphabet options below). (1\-32) [21]
|
jpayne@68
|
90 .RE
|
jpayne@68
|
91 .sp
|
jpayne@68
|
92 \fB\-s\fP <int>
|
jpayne@68
|
93 .RS 4
|
jpayne@68
|
94 Sketch size. Each sketch will have at most this many non\-redundant
|
jpayne@68
|
95 min\-hashes. [1000]
|
jpayne@68
|
96 .RE
|
jpayne@68
|
97 .sp
|
jpayne@68
|
98 \fB\-i\fP
|
jpayne@68
|
99 .RS 4
|
jpayne@68
|
100 Sketch individual sequences, rather than whole files, e.g. for
|
jpayne@68
|
101 multi\-fastas of single\-chromosome genomes or pair\-wise gene comparisons.
|
jpayne@68
|
102 .RE
|
jpayne@68
|
103 .sp
|
jpayne@68
|
104 \fB\-w\fP <num>
|
jpayne@68
|
105 .RS 4
|
jpayne@68
|
106 Probability threshold for warning about low k\-mer size. (0\-1) [0.01]
|
jpayne@68
|
107 .RE
|
jpayne@68
|
108 .sp
|
jpayne@68
|
109 \fB\-r\fP
|
jpayne@68
|
110 .RS 4
|
jpayne@68
|
111 Input is a read set. See Reads options below. Incompatible with \fB\-i\fP.
|
jpayne@68
|
112 .RE
|
jpayne@68
|
113 .SS "Sketching (reads)"
|
jpayne@68
|
114 .sp
|
jpayne@68
|
115 \fB\-b\fP <size>
|
jpayne@68
|
116 .RS 4
|
jpayne@68
|
117 Use a Bloom filter of this size (raw bytes or with K/M/G/T) to
|
jpayne@68
|
118 filter out unique k\-mers. This is useful if exact filtering with \fB\-m\fP
|
jpayne@68
|
119 uses too much memory. However, some unique k\-mers may pass
|
jpayne@68
|
120 erroneously, and copies cannot be counted beyond 2. Implies \fB\-r\fP.
|
jpayne@68
|
121 .RE
|
jpayne@68
|
122 .sp
|
jpayne@68
|
123 \fB\-m\fP <int>
|
jpayne@68
|
124 .RS 4
|
jpayne@68
|
125 Minimum copies of each k\-mer required to pass noise filter for
|
jpayne@68
|
126 reads. Implies \fB\-r\fP. [1]
|
jpayne@68
|
127 .RE
|
jpayne@68
|
128 .sp
|
jpayne@68
|
129 \fB\-c\fP <num>
|
jpayne@68
|
130 .RS 4
|
jpayne@68
|
131 Target coverage. Sketching will conclude if this coverage is
|
jpayne@68
|
132 reached before the end of the input file (estimated by average
|
jpayne@68
|
133 k\-mer multiplicity). Implies \fB\-r\fP.
|
jpayne@68
|
134 .RE
|
jpayne@68
|
135 .sp
|
jpayne@68
|
136 \fB\-g\fP <size>
|
jpayne@68
|
137 .RS 4
|
jpayne@68
|
138 Genome size. If specified, will be used for p\-value calculation
|
jpayne@68
|
139 instead of an estimated size from k\-mer content. Implies \fB\-r\fP.
|
jpayne@68
|
140 .RE
|
jpayne@68
|
141 .SS "Sketching (alphabet)"
|
jpayne@68
|
142 .sp
|
jpayne@68
|
143 \fB\-n\fP
|
jpayne@68
|
144 .RS 4
|
jpayne@68
|
145 Preserve strand (by default, strand is ignored by using canonical
|
jpayne@68
|
146 DNA k\-mers, which are alphabetical minima of forward\-reverse
|
jpayne@68
|
147 pairs). Implied if an alphabet is specified with \fB\-a\fP or \fB\-z\fP.
|
jpayne@68
|
148 .RE
|
jpayne@68
|
149 .sp
|
jpayne@68
|
150 \fB\-a\fP
|
jpayne@68
|
151 .RS 4
|
jpayne@68
|
152 Use amino acid alphabet (A\-Z, except BJOUXZ). Implies \fB\-n\fP, \fB\-k\fP 9.
|
jpayne@68
|
153 .RE
|
jpayne@68
|
154 .sp
|
jpayne@68
|
155 \fB\-z\fP <text>
|
jpayne@68
|
156 .RS 4
|
jpayne@68
|
157 Alphabet to base hashes on (case ignored by default; see \fB\-Z\fP).
|
jpayne@68
|
158 K\-mers with other characters will be ignored. Implies \fB\-n\fP.
|
jpayne@68
|
159 .RE
|
jpayne@68
|
160 .sp
|
jpayne@68
|
161 \fB\-Z\fP
|
jpayne@68
|
162 .RS 4
|
jpayne@68
|
163 Preserve case in k\-mers and alphabet (case is ignored by default).
|
jpayne@68
|
164 Sequence letters whose case is not in the current alphabet will be
|
jpayne@68
|
165 skipped when sketching.
|
jpayne@68
|
166 .RE
|
jpayne@68
|
167 .SH "SEE ALSO"
|
jpayne@68
|
168 .sp
|
jpayne@68
|
169 mash(1) |