Mercurial > repos > rliterman > csp2
comparison CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/share/man/man1/mash-triangle.1 @ 68:5028fdace37b
planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d
author | jpayne |
---|---|
date | Tue, 18 Mar 2025 16:23:26 -0400 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
67:0e9998148a16 | 68:5028fdace37b |
---|---|
1 '\" t | |
2 .\" Title: mash-triangle | |
3 .\" Author: [see the "AUTHOR(S)" section] | |
4 .\" Generator: Asciidoctor 2.0.10 | |
5 .\" Date: 2019-12-13 | |
6 .\" Manual: \ \& | |
7 .\" Source: \ \& | |
8 .\" Language: English | |
9 .\" | |
10 .TH "MASH\-TRIANGLE" "1" "2019-12-13" "\ \&" "\ \&" | |
11 .ie \n(.g .ds Aq \(aq | |
12 .el .ds Aq ' | |
13 .ss \n[.ss] 0 | |
14 .nh | |
15 .ad l | |
16 .de URL | |
17 \fI\\$2\fP <\\$1>\\$3 | |
18 .. | |
19 .als MTO URL | |
20 .if \n[.g] \{\ | |
21 . mso www.tmac | |
22 . am URL | |
23 . ad l | |
24 . . | |
25 . am MTO | |
26 . ad l | |
27 . . | |
28 . LINKSTYLE blue R < > | |
29 .\} | |
30 .SH "NAME" | |
31 mash\-triangle \- estimate a lower\-triangular distance matrix | |
32 .SH "SYNOPSIS" | |
33 .sp | |
34 \fBmash triangle\fP [options] <seq1> [<seq2>] ... | |
35 .SH "DESCRIPTION" | |
36 .sp | |
37 Estimate the distance of each input sequence to every other input | |
38 sequence. Outputs a lower\-triangular distance matrix in relaxed Phylip | |
39 format. The input sequences can be fasta or fastq, gzipped or not, or | |
40 Mash sketch files (.msh) with matching k\-mer sizes. Input files can also | |
41 be files of file names (see \-l). If more than one input file is provided, | |
42 whole files are compared by default (see \-i). | |
43 .SH "OPTIONS" | |
44 .sp | |
45 \fB\-h\fP | |
46 .RS 4 | |
47 Help | |
48 .RE | |
49 .sp | |
50 \fB\-p\fP <int> | |
51 .RS 4 | |
52 Parallelism. This many threads will be spawned for processing. [1] | |
53 .RE | |
54 .SS "Input" | |
55 .sp | |
56 \fB\-l\fP | |
57 .RS 4 | |
58 List input. Each query file contains a list of sequence files, one | |
59 per line. The reference file is not affected. | |
60 .RE | |
61 .SS "Output" | |
62 .sp | |
63 \fB\-C\fP | |
64 .RS 4 | |
65 Use comment fields for sequence names instead of IDs. | |
66 .RE | |
67 .sp | |
68 \fB\-E\fP | |
69 .RS 4 | |
70 Output edge list instead of Phylip matrix, with fields [seq1, seq2, | |
71 dist, p\-val, shared\-hashes]. | |
72 .RE | |
73 .sp | |
74 \fB\-v\fP <num> | |
75 .RS 4 | |
76 Maximum p\-value to report in edge list. Implies \-E. (0\-1) [1.0] | |
77 .RE | |
78 .sp | |
79 \fB\-d\fP <num> | |
80 .RS 4 | |
81 Maximum distance to report in edge list. Implies \-E. (0\-1) [1.0] | |
82 .RE | |
83 .SS "Sketching" | |
84 .sp | |
85 \fB\-k\fP <int> | |
86 .RS 4 | |
87 K\-mer size. Hashes will be based on strings of this many | |
88 nucleotides. Canonical nucleotides are used by default (see | |
89 Alphabet options below). (1\-32) [21] | |
90 .RE | |
91 .sp | |
92 \fB\-s\fP <int> | |
93 .RS 4 | |
94 Sketch size. Each sketch will have at most this many non\-redundant | |
95 min\-hashes. [1000] | |
96 .RE | |
97 .sp | |
98 \fB\-i\fP | |
99 .RS 4 | |
100 Sketch individual sequences, rather than whole files, e.g. for | |
101 multi\-fastas of single\-chromosome genomes or pair\-wise gene comparisons. | |
102 .RE | |
103 .sp | |
104 \fB\-w\fP <num> | |
105 .RS 4 | |
106 Probability threshold for warning about low k\-mer size. (0\-1) [0.01] | |
107 .RE | |
108 .sp | |
109 \fB\-r\fP | |
110 .RS 4 | |
111 Input is a read set. See Reads options below. Incompatible with \fB\-i\fP. | |
112 .RE | |
113 .SS "Sketching (reads)" | |
114 .sp | |
115 \fB\-b\fP <size> | |
116 .RS 4 | |
117 Use a Bloom filter of this size (raw bytes or with K/M/G/T) to | |
118 filter out unique k\-mers. This is useful if exact filtering with \fB\-m\fP | |
119 uses too much memory. However, some unique k\-mers may pass | |
120 erroneously, and copies cannot be counted beyond 2. Implies \fB\-r\fP. | |
121 .RE | |
122 .sp | |
123 \fB\-m\fP <int> | |
124 .RS 4 | |
125 Minimum copies of each k\-mer required to pass noise filter for | |
126 reads. Implies \fB\-r\fP. [1] | |
127 .RE | |
128 .sp | |
129 \fB\-c\fP <num> | |
130 .RS 4 | |
131 Target coverage. Sketching will conclude if this coverage is | |
132 reached before the end of the input file (estimated by average | |
133 k\-mer multiplicity). Implies \fB\-r\fP. | |
134 .RE | |
135 .sp | |
136 \fB\-g\fP <size> | |
137 .RS 4 | |
138 Genome size. If specified, will be used for p\-value calculation | |
139 instead of an estimated size from k\-mer content. Implies \fB\-r\fP. | |
140 .RE | |
141 .SS "Sketching (alphabet)" | |
142 .sp | |
143 \fB\-n\fP | |
144 .RS 4 | |
145 Preserve strand (by default, strand is ignored by using canonical | |
146 DNA k\-mers, which are alphabetical minima of forward\-reverse | |
147 pairs). Implied if an alphabet is specified with \fB\-a\fP or \fB\-z\fP. | |
148 .RE | |
149 .sp | |
150 \fB\-a\fP | |
151 .RS 4 | |
152 Use amino acid alphabet (A\-Z, except BJOUXZ). Implies \fB\-n\fP, \fB\-k\fP 9. | |
153 .RE | |
154 .sp | |
155 \fB\-z\fP <text> | |
156 .RS 4 | |
157 Alphabet to base hashes on (case ignored by default; see \fB\-Z\fP). | |
158 K\-mers with other characters will be ignored. Implies \fB\-n\fP. | |
159 .RE | |
160 .sp | |
161 \fB\-Z\fP | |
162 .RS 4 | |
163 Preserve case in k\-mers and alphabet (case is ignored by default). | |
164 Sequence letters whose case is not in the current alphabet will be | |
165 skipped when sketching. | |
166 .RE | |
167 .SH "SEE ALSO" | |
168 .sp | |
169 mash(1) |