jpayne@68: '\" t jpayne@68: .\" Title: mash-sketch jpayne@68: .\" Author: [see the "AUTHOR(S)" section] jpayne@68: .\" Generator: Asciidoctor 2.0.10 jpayne@68: .\" Date: 2019-12-13 jpayne@68: .\" Manual: \ \& jpayne@68: .\" Source: \ \& jpayne@68: .\" Language: English jpayne@68: .\" jpayne@68: .TH "MASH\-SKETCH" "1" "2019-12-13" "\ \&" "\ \&" jpayne@68: .ie \n(.g .ds Aq \(aq jpayne@68: .el .ds Aq ' jpayne@68: .ss \n[.ss] 0 jpayne@68: .nh jpayne@68: .ad l jpayne@68: .de URL jpayne@68: \fI\\$2\fP <\\$1>\\$3 jpayne@68: .. jpayne@68: .als MTO URL jpayne@68: .if \n[.g] \{\ jpayne@68: . mso www.tmac jpayne@68: . am URL jpayne@68: . ad l jpayne@68: . . jpayne@68: . am MTO jpayne@68: . ad l jpayne@68: . . jpayne@68: . LINKSTYLE blue R < > jpayne@68: .\} jpayne@68: .SH "NAME" jpayne@68: mash\-sketch \- create sketches (reduced representations for fast operations) jpayne@68: .SH "SYNOPSIS" jpayne@68: .sp jpayne@68: \fBmash sketch\fP [options] fast(a|q)[.gz] ... jpayne@68: .SH "DESCRIPTION" jpayne@68: .sp jpayne@68: Create a sketch file, which is a reduced representation of a sequence or set jpayne@68: of sequences (based on min\-hashes) that can be used for fast distance jpayne@68: estimations. Input can be fasta or fastq files (gzipped or not), and "\-" can jpayne@68: be given to read from standard input. Input files can also be files of file jpayne@68: names (see \fB\-l\fP). For output, one sketch file will be generated, but it can have jpayne@68: multiple sketches within it, divided by sequences or files (see \fB\-i\fP). By jpayne@68: default, the output file name will be the first input file with a \(aq.msh\(aq jpayne@68: extension, or \(aqstdin.msh\(aq if standard input is used (see \fB\-o\fP). jpayne@68: .SH "OPTIONS" jpayne@68: .sp jpayne@68: \fB\-h\fP jpayne@68: .RS 4 jpayne@68: Help jpayne@68: .RE jpayne@68: .sp jpayne@68: \fB\-p\fP jpayne@68: .RS 4 jpayne@68: Parallelism. This many threads will be spawned for processing. [1] jpayne@68: .RE jpayne@68: .SS "Input" jpayne@68: .sp jpayne@68: \fB\-l\fP jpayne@68: .RS 4 jpayne@68: List input. Each file contains a list of sequence files, one per line. jpayne@68: .RE jpayne@68: .SS "Output" jpayne@68: .sp jpayne@68: \fB\-o\fP jpayne@68: .RS 4 jpayne@68: Output prefix (first input file used if unspecified). The suffix jpayne@68: \(aq.msh\(aq will be appended. jpayne@68: .RE jpayne@68: .SS "Sketching" jpayne@68: .sp jpayne@68: \fB\-k\fP jpayne@68: .RS 4 jpayne@68: K\-mer size. Hashes will be based on strings of this many jpayne@68: nucleotides. Canonical nucleotides are used by default (see jpayne@68: Alphabet options below). (1\-32) [21] jpayne@68: .RE jpayne@68: .sp jpayne@68: \fB\-s\fP jpayne@68: .RS 4 jpayne@68: Sketch size. Each sketch will have at most this many non\-redundant jpayne@68: min\-hashes. [1000] jpayne@68: .RE jpayne@68: .sp jpayne@68: \fB\-i\fP jpayne@68: .RS 4 jpayne@68: Sketch individual sequences, rather than whole files. jpayne@68: .RE jpayne@68: .sp jpayne@68: \fB\-w\fP jpayne@68: .RS 4 jpayne@68: Probability threshold for warning about low k\-mer size. (0\-1) [0.01] jpayne@68: .RE jpayne@68: .sp jpayne@68: \fB\-r\fP jpayne@68: .RS 4 jpayne@68: Input is a read set. See Reads options below. Incompatible with \fB\-i\fP. jpayne@68: .RE jpayne@68: .SS "Sketching (reads)" jpayne@68: .sp jpayne@68: \fB\-b\fP jpayne@68: .RS 4 jpayne@68: Use a Bloom filter of this size (raw bytes or with K/M/G/T) to jpayne@68: filter out unique k\-mers. This is useful if exact filtering with \fB\-m\fP jpayne@68: uses too much memory. However, some unique k\-mers may pass jpayne@68: erroneously, and copies cannot be counted beyond 2. Implies \fB\-r\fP. jpayne@68: .RE jpayne@68: .sp jpayne@68: \fB\-m\fP jpayne@68: .RS 4 jpayne@68: Minimum copies of each k\-mer required to pass noise filter for jpayne@68: reads. Implies \fB\-r\fP. [1] jpayne@68: .RE jpayne@68: .sp jpayne@68: \fB\-c\fP jpayne@68: .RS 4 jpayne@68: Target coverage. Sketching will conclude if this coverage is jpayne@68: reached before the end of the input file (estimated by average jpayne@68: k\-mer multiplicity). Implies \fB\-r\fP. jpayne@68: .RE jpayne@68: .sp jpayne@68: \fB\-g\fP jpayne@68: .RS 4 jpayne@68: Genome size. If specified, will be used for p\-value calculation jpayne@68: instead of an estimated size from k\-mer content. Implies \fB\-r\fP. jpayne@68: .RE jpayne@68: .SS "Sketching (alphabet)" jpayne@68: .sp jpayne@68: \fB\-n\fP jpayne@68: .RS 4 jpayne@68: Preserve strand (by default, strand is ignored by using canonical jpayne@68: DNA k\-mers, which are alphabetical minima of forward\-reverse jpayne@68: pairs). Implied if an alphabet is specified with \fB\-a\fP or \fB\-z\fP. jpayne@68: .RE jpayne@68: .sp jpayne@68: \fB\-a\fP jpayne@68: .RS 4 jpayne@68: Use amino acid alphabet (A\-Z, except BJOUXZ). Implies \fB\-n\fP, \fB\-k\fP 9. jpayne@68: .RE jpayne@68: .sp jpayne@68: \fB\-z\fP jpayne@68: .RS 4 jpayne@68: Alphabet to base hashes on (case ignored by default; see \fB\-Z\fP). jpayne@68: K\-mers with other characters will be ignored. Implies \fB\-n\fP. jpayne@68: .RE jpayne@68: .sp jpayne@68: \fB\-Z\fP jpayne@68: .RS 4 jpayne@68: Preserve case in k\-mers and alphabet (case is ignored by default). jpayne@68: Sequence letters whose case is not in the current alphabet will be jpayne@68: skipped when sketching. jpayne@68: .RE jpayne@68: .SH "SEE ALSO" jpayne@68: .sp jpayne@68: mash(1)