jpayne@68: '\" t
jpayne@68: .\"     Title: mash-dist
jpayne@68: .\"    Author: [see the "AUTHOR(S)" section]
jpayne@68: .\" Generator: Asciidoctor 2.0.10
jpayne@68: .\"      Date: 2019-12-13
jpayne@68: .\"    Manual: \ \&
jpayne@68: .\"    Source: \ \&
jpayne@68: .\"  Language: English
jpayne@68: .\"
jpayne@68: .TH "MASH\-DIST" "1" "2019-12-13" "\ \&" "\ \&"
jpayne@68: .ie \n(.g .ds Aq \(aq
jpayne@68: .el       .ds Aq '
jpayne@68: .ss \n[.ss] 0
jpayne@68: .nh
jpayne@68: .ad l
jpayne@68: .de URL
jpayne@68: \fI\\$2\fP <\\$1>\\$3
jpayne@68: ..
jpayne@68: .als MTO URL
jpayne@68: .if \n[.g] \{\
jpayne@68: .  mso www.tmac
jpayne@68: .  am URL
jpayne@68: .    ad l
jpayne@68: .  .
jpayne@68: .  am MTO
jpayne@68: .    ad l
jpayne@68: .  .
jpayne@68: .  LINKSTYLE blue R < >
jpayne@68: .\}
jpayne@68: .SH "NAME"
jpayne@68: mash\-dist \- estimate the distance of query sequences to references
jpayne@68: .SH "SYNOPSIS"
jpayne@68: .sp
jpayne@68: \fBmash dist\fP [options] <reference> <query> [<query>] ...
jpayne@68: .SH "DESCRIPTION"
jpayne@68: .sp
jpayne@68: Estimate the distance of each query sequence to the reference. Both the
jpayne@68: reference and queries can be fasta or fastq, gzipped or not, or Mash sketch
jpayne@68: files (.msh) with matching k\-mer sizes. Query files can also be files of file
jpayne@68: names (see \fB\-l\fP). Whole files are compared by default (see \fB\-i\fP). The output
jpayne@68: fields are [reference\-ID, query\-ID, distance, p\-value, shared\-hashes].
jpayne@68: .SH "OPTIONS"
jpayne@68: .sp
jpayne@68: \fB\-h\fP
jpayne@68: .RS 4
jpayne@68: Help
jpayne@68: .RE
jpayne@68: .sp
jpayne@68: \fB\-p\fP <int>
jpayne@68: .RS 4
jpayne@68: Parallelism. This many threads will be spawned for processing. [1]
jpayne@68: .RE
jpayne@68: .SS "Input"
jpayne@68: .sp
jpayne@68: \fB\-l\fP
jpayne@68: .RS 4
jpayne@68: List input. Each query file contains a list of sequence files, one
jpayne@68: per line. The reference file is not affected.
jpayne@68: .RE
jpayne@68: .SS "Output"
jpayne@68: .sp
jpayne@68: \fB\-t\fP
jpayne@68: .RS 4
jpayne@68: Table output (will not report p\-values, but fields will be blank if
jpayne@68: they do not meet the p\-value threshold).
jpayne@68: .RE
jpayne@68: .sp
jpayne@68: \fB\-v\fP <num>
jpayne@68: .RS 4
jpayne@68: Maximum p\-value to report. (0\-1) [1.0]
jpayne@68: .RE
jpayne@68: .sp
jpayne@68: \fB\-d\fP <num>
jpayne@68: .RS 4
jpayne@68: Maximum distance to report. (0\-1) [1.0]
jpayne@68: .RE
jpayne@68: .SS "Sketching"
jpayne@68: .sp
jpayne@68: \fB\-k\fP <int>
jpayne@68: .RS 4
jpayne@68: K\-mer size. Hashes will be based on strings of this many
jpayne@68: nucleotides. Canonical nucleotides are used by default (see
jpayne@68: Alphabet options below). (1\-32) [21]
jpayne@68: .RE
jpayne@68: .sp
jpayne@68: \fB\-s\fP <int>
jpayne@68: .RS 4
jpayne@68: Sketch size. Each sketch will have at most this many non\-redundant
jpayne@68: min\-hashes. [1000]
jpayne@68: .RE
jpayne@68: .sp
jpayne@68: \fB\-i\fP
jpayne@68: .RS 4
jpayne@68: Sketch individual sequences, rather than whole files.
jpayne@68: .RE
jpayne@68: .sp
jpayne@68: \fB\-w\fP <num>
jpayne@68: .RS 4
jpayne@68: Probability threshold for warning about low k\-mer size. (0\-1) [0.01]
jpayne@68: .RE
jpayne@68: .sp
jpayne@68: \fB\-r\fP
jpayne@68: .RS 4
jpayne@68: Input is a read set. See Reads options below. Incompatible with \fB\-i\fP.
jpayne@68: .RE
jpayne@68: .SS "Sketching (reads)"
jpayne@68: .sp
jpayne@68: \fB\-b\fP <size>
jpayne@68: .RS 4
jpayne@68: Use a Bloom filter of this size (raw bytes or with K/M/G/T) to
jpayne@68: filter out unique k\-mers. This is useful if exact filtering with \fB\-m\fP
jpayne@68: uses too much memory. However, some unique k\-mers may pass
jpayne@68: erroneously, and copies cannot be counted beyond 2. Implies \fB\-r\fP.
jpayne@68: .RE
jpayne@68: .sp
jpayne@68: \fB\-m\fP <int>
jpayne@68: .RS 4
jpayne@68: Minimum copies of each k\-mer required to pass noise filter for
jpayne@68: reads. Implies \fB\-r\fP. [1]
jpayne@68: .RE
jpayne@68: .sp
jpayne@68: \fB\-c\fP <num>
jpayne@68: .RS 4
jpayne@68: Target coverage. Sketching will conclude if this coverage is
jpayne@68: reached before the end of the input file (estimated by average
jpayne@68: k\-mer multiplicity). Implies \fB\-r\fP.
jpayne@68: .RE
jpayne@68: .sp
jpayne@68: \fB\-g\fP <size>
jpayne@68: .RS 4
jpayne@68: Genome size. If specified, will be used for p\-value calculation
jpayne@68: instead of an estimated size from k\-mer content. Implies \fB\-r\fP.
jpayne@68: .RE
jpayne@68: .SS "Sketching (alphabet)"
jpayne@68: .sp
jpayne@68: \fB\-n\fP
jpayne@68: .RS 4
jpayne@68: Preserve strand (by default, strand is ignored by using canonical
jpayne@68: DNA k\-mers, which are alphabetical minima of forward\-reverse
jpayne@68: pairs). Implied if an alphabet is specified with \fB\-a\fP or \fB\-z\fP.
jpayne@68: .RE
jpayne@68: .sp
jpayne@68: \fB\-a\fP
jpayne@68: .RS 4
jpayne@68: Use amino acid alphabet (A\-Z, except BJOUXZ). Implies \fB\-n\fP, \fB\-k\fP 9.
jpayne@68: .RE
jpayne@68: .sp
jpayne@68: \fB\-z\fP <text>
jpayne@68: .RS 4
jpayne@68: Alphabet to base hashes on (case ignored by default; see \fB\-Z\fP).
jpayne@68: K\-mers with other characters will be ignored. Implies \fB\-n\fP.
jpayne@68: .RE
jpayne@68: .sp
jpayne@68: \fB\-Z\fP
jpayne@68: .RS 4
jpayne@68: Preserve case in k\-mers and alphabet (case is ignored by default).
jpayne@68: Sequence letters whose case is not in the current alphabet will be
jpayne@68: skipped when sketching.
jpayne@68: .RE
jpayne@68: .SH "SEE ALSO"
jpayne@68: .sp
jpayne@68: mash(1)