comparison CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/share/man/man1/mash-triangle.1 @ 68:5028fdace37b

planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d
author jpayne
date Tue, 18 Mar 2025 16:23:26 -0400
parents
children
comparison
equal deleted inserted replaced
67:0e9998148a16 68:5028fdace37b
1 '\" t
2 .\" Title: mash-triangle
3 .\" Author: [see the "AUTHOR(S)" section]
4 .\" Generator: Asciidoctor 2.0.10
5 .\" Date: 2019-12-13
6 .\" Manual: \ \&
7 .\" Source: \ \&
8 .\" Language: English
9 .\"
10 .TH "MASH\-TRIANGLE" "1" "2019-12-13" "\ \&" "\ \&"
11 .ie \n(.g .ds Aq \(aq
12 .el .ds Aq '
13 .ss \n[.ss] 0
14 .nh
15 .ad l
16 .de URL
17 \fI\\$2\fP <\\$1>\\$3
18 ..
19 .als MTO URL
20 .if \n[.g] \{\
21 . mso www.tmac
22 . am URL
23 . ad l
24 . .
25 . am MTO
26 . ad l
27 . .
28 . LINKSTYLE blue R < >
29 .\}
30 .SH "NAME"
31 mash\-triangle \- estimate a lower\-triangular distance matrix
32 .SH "SYNOPSIS"
33 .sp
34 \fBmash triangle\fP [options] <seq1> [<seq2>] ...
35 .SH "DESCRIPTION"
36 .sp
37 Estimate the distance of each input sequence to every other input
38 sequence. Outputs a lower\-triangular distance matrix in relaxed Phylip
39 format. The input sequences can be fasta or fastq, gzipped or not, or
40 Mash sketch files (.msh) with matching k\-mer sizes. Input files can also
41 be files of file names (see \-l). If more than one input file is provided,
42 whole files are compared by default (see \-i).
43 .SH "OPTIONS"
44 .sp
45 \fB\-h\fP
46 .RS 4
47 Help
48 .RE
49 .sp
50 \fB\-p\fP <int>
51 .RS 4
52 Parallelism. This many threads will be spawned for processing. [1]
53 .RE
54 .SS "Input"
55 .sp
56 \fB\-l\fP
57 .RS 4
58 List input. Each query file contains a list of sequence files, one
59 per line. The reference file is not affected.
60 .RE
61 .SS "Output"
62 .sp
63 \fB\-C\fP
64 .RS 4
65 Use comment fields for sequence names instead of IDs.
66 .RE
67 .sp
68 \fB\-E\fP
69 .RS 4
70 Output edge list instead of Phylip matrix, with fields [seq1, seq2,
71 dist, p\-val, shared\-hashes].
72 .RE
73 .sp
74 \fB\-v\fP <num>
75 .RS 4
76 Maximum p\-value to report in edge list. Implies \-E. (0\-1) [1.0]
77 .RE
78 .sp
79 \fB\-d\fP <num>
80 .RS 4
81 Maximum distance to report in edge list. Implies \-E. (0\-1) [1.0]
82 .RE
83 .SS "Sketching"
84 .sp
85 \fB\-k\fP <int>
86 .RS 4
87 K\-mer size. Hashes will be based on strings of this many
88 nucleotides. Canonical nucleotides are used by default (see
89 Alphabet options below). (1\-32) [21]
90 .RE
91 .sp
92 \fB\-s\fP <int>
93 .RS 4
94 Sketch size. Each sketch will have at most this many non\-redundant
95 min\-hashes. [1000]
96 .RE
97 .sp
98 \fB\-i\fP
99 .RS 4
100 Sketch individual sequences, rather than whole files, e.g. for
101 multi\-fastas of single\-chromosome genomes or pair\-wise gene comparisons.
102 .RE
103 .sp
104 \fB\-w\fP <num>
105 .RS 4
106 Probability threshold for warning about low k\-mer size. (0\-1) [0.01]
107 .RE
108 .sp
109 \fB\-r\fP
110 .RS 4
111 Input is a read set. See Reads options below. Incompatible with \fB\-i\fP.
112 .RE
113 .SS "Sketching (reads)"
114 .sp
115 \fB\-b\fP <size>
116 .RS 4
117 Use a Bloom filter of this size (raw bytes or with K/M/G/T) to
118 filter out unique k\-mers. This is useful if exact filtering with \fB\-m\fP
119 uses too much memory. However, some unique k\-mers may pass
120 erroneously, and copies cannot be counted beyond 2. Implies \fB\-r\fP.
121 .RE
122 .sp
123 \fB\-m\fP <int>
124 .RS 4
125 Minimum copies of each k\-mer required to pass noise filter for
126 reads. Implies \fB\-r\fP. [1]
127 .RE
128 .sp
129 \fB\-c\fP <num>
130 .RS 4
131 Target coverage. Sketching will conclude if this coverage is
132 reached before the end of the input file (estimated by average
133 k\-mer multiplicity). Implies \fB\-r\fP.
134 .RE
135 .sp
136 \fB\-g\fP <size>
137 .RS 4
138 Genome size. If specified, will be used for p\-value calculation
139 instead of an estimated size from k\-mer content. Implies \fB\-r\fP.
140 .RE
141 .SS "Sketching (alphabet)"
142 .sp
143 \fB\-n\fP
144 .RS 4
145 Preserve strand (by default, strand is ignored by using canonical
146 DNA k\-mers, which are alphabetical minima of forward\-reverse
147 pairs). Implied if an alphabet is specified with \fB\-a\fP or \fB\-z\fP.
148 .RE
149 .sp
150 \fB\-a\fP
151 .RS 4
152 Use amino acid alphabet (A\-Z, except BJOUXZ). Implies \fB\-n\fP, \fB\-k\fP 9.
153 .RE
154 .sp
155 \fB\-z\fP <text>
156 .RS 4
157 Alphabet to base hashes on (case ignored by default; see \fB\-Z\fP).
158 K\-mers with other characters will be ignored. Implies \fB\-n\fP.
159 .RE
160 .sp
161 \fB\-Z\fP
162 .RS 4
163 Preserve case in k\-mers and alphabet (case is ignored by default).
164 Sequence letters whose case is not in the current alphabet will be
165 skipped when sketching.
166 .RE
167 .SH "SEE ALSO"
168 .sp
169 mash(1)