comparison CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/share/man/man1/mash-dist.1 @ 68:5028fdace37b

planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d
author jpayne
date Tue, 18 Mar 2025 16:23:26 -0400
parents
children
comparison
equal deleted inserted replaced
67:0e9998148a16 68:5028fdace37b
1 '\" t
2 .\" Title: mash-dist
3 .\" Author: [see the "AUTHOR(S)" section]
4 .\" Generator: Asciidoctor 2.0.10
5 .\" Date: 2019-12-13
6 .\" Manual: \ \&
7 .\" Source: \ \&
8 .\" Language: English
9 .\"
10 .TH "MASH\-DIST" "1" "2019-12-13" "\ \&" "\ \&"
11 .ie \n(.g .ds Aq \(aq
12 .el .ds Aq '
13 .ss \n[.ss] 0
14 .nh
15 .ad l
16 .de URL
17 \fI\\$2\fP <\\$1>\\$3
18 ..
19 .als MTO URL
20 .if \n[.g] \{\
21 . mso www.tmac
22 . am URL
23 . ad l
24 . .
25 . am MTO
26 . ad l
27 . .
28 . LINKSTYLE blue R < >
29 .\}
30 .SH "NAME"
31 mash\-dist \- estimate the distance of query sequences to references
32 .SH "SYNOPSIS"
33 .sp
34 \fBmash dist\fP [options] <reference> <query> [<query>] ...
35 .SH "DESCRIPTION"
36 .sp
37 Estimate the distance of each query sequence to the reference. Both the
38 reference and queries can be fasta or fastq, gzipped or not, or Mash sketch
39 files (.msh) with matching k\-mer sizes. Query files can also be files of file
40 names (see \fB\-l\fP). Whole files are compared by default (see \fB\-i\fP). The output
41 fields are [reference\-ID, query\-ID, distance, p\-value, shared\-hashes].
42 .SH "OPTIONS"
43 .sp
44 \fB\-h\fP
45 .RS 4
46 Help
47 .RE
48 .sp
49 \fB\-p\fP <int>
50 .RS 4
51 Parallelism. This many threads will be spawned for processing. [1]
52 .RE
53 .SS "Input"
54 .sp
55 \fB\-l\fP
56 .RS 4
57 List input. Each query file contains a list of sequence files, one
58 per line. The reference file is not affected.
59 .RE
60 .SS "Output"
61 .sp
62 \fB\-t\fP
63 .RS 4
64 Table output (will not report p\-values, but fields will be blank if
65 they do not meet the p\-value threshold).
66 .RE
67 .sp
68 \fB\-v\fP <num>
69 .RS 4
70 Maximum p\-value to report. (0\-1) [1.0]
71 .RE
72 .sp
73 \fB\-d\fP <num>
74 .RS 4
75 Maximum distance to report. (0\-1) [1.0]
76 .RE
77 .SS "Sketching"
78 .sp
79 \fB\-k\fP <int>
80 .RS 4
81 K\-mer size. Hashes will be based on strings of this many
82 nucleotides. Canonical nucleotides are used by default (see
83 Alphabet options below). (1\-32) [21]
84 .RE
85 .sp
86 \fB\-s\fP <int>
87 .RS 4
88 Sketch size. Each sketch will have at most this many non\-redundant
89 min\-hashes. [1000]
90 .RE
91 .sp
92 \fB\-i\fP
93 .RS 4
94 Sketch individual sequences, rather than whole files.
95 .RE
96 .sp
97 \fB\-w\fP <num>
98 .RS 4
99 Probability threshold for warning about low k\-mer size. (0\-1) [0.01]
100 .RE
101 .sp
102 \fB\-r\fP
103 .RS 4
104 Input is a read set. See Reads options below. Incompatible with \fB\-i\fP.
105 .RE
106 .SS "Sketching (reads)"
107 .sp
108 \fB\-b\fP <size>
109 .RS 4
110 Use a Bloom filter of this size (raw bytes or with K/M/G/T) to
111 filter out unique k\-mers. This is useful if exact filtering with \fB\-m\fP
112 uses too much memory. However, some unique k\-mers may pass
113 erroneously, and copies cannot be counted beyond 2. Implies \fB\-r\fP.
114 .RE
115 .sp
116 \fB\-m\fP <int>
117 .RS 4
118 Minimum copies of each k\-mer required to pass noise filter for
119 reads. Implies \fB\-r\fP. [1]
120 .RE
121 .sp
122 \fB\-c\fP <num>
123 .RS 4
124 Target coverage. Sketching will conclude if this coverage is
125 reached before the end of the input file (estimated by average
126 k\-mer multiplicity). Implies \fB\-r\fP.
127 .RE
128 .sp
129 \fB\-g\fP <size>
130 .RS 4
131 Genome size. If specified, will be used for p\-value calculation
132 instead of an estimated size from k\-mer content. Implies \fB\-r\fP.
133 .RE
134 .SS "Sketching (alphabet)"
135 .sp
136 \fB\-n\fP
137 .RS 4
138 Preserve strand (by default, strand is ignored by using canonical
139 DNA k\-mers, which are alphabetical minima of forward\-reverse
140 pairs). Implied if an alphabet is specified with \fB\-a\fP or \fB\-z\fP.
141 .RE
142 .sp
143 \fB\-a\fP
144 .RS 4
145 Use amino acid alphabet (A\-Z, except BJOUXZ). Implies \fB\-n\fP, \fB\-k\fP 9.
146 .RE
147 .sp
148 \fB\-z\fP <text>
149 .RS 4
150 Alphabet to base hashes on (case ignored by default; see \fB\-Z\fP).
151 K\-mers with other characters will be ignored. Implies \fB\-n\fP.
152 .RE
153 .sp
154 \fB\-Z\fP
155 .RS 4
156 Preserve case in k\-mers and alphabet (case is ignored by default).
157 Sequence letters whose case is not in the current alphabet will be
158 skipped when sketching.
159 .RE
160 .SH "SEE ALSO"
161 .sp
162 mash(1)