jpayne@69: -=- MUMmer3.x README -=- jpayne@69: jpayne@69: ** NOTE ** jpayne@69: A comprehensive HTML user manual is available in the docs/web/manual jpayne@69: subdirectory or at http://mummer.sourceforge.net/manual jpayne@69: jpayne@69: MUMmer is now an open source package! Please contact us if you would like jpayne@69: to contribute to the MUMmer project. For more information or the latest jpayne@69: release please visit the MUMmer homepage at http://mummer.sourceforge.net jpayne@69: jpayne@69: Please refer to the INSTALL file for installation instructions. This file jpayne@69: contains brief descriptions of all executables in the base directory and jpayne@69: general information about the MUMmer package. jpayne@69: jpayne@69: jpayne@69: jpayne@69: -- DESCRIPTION -- jpayne@69: MUMmer is a system for rapidly aligning entire genomes. The current jpayne@69: version (release 3.0) can find all 20 base pair maximal exact matches between jpayne@69: two bacterial genomes of ~5 million base pairs each in 20 seconds, using 90 MB jpayne@69: of memory, on a typical 1.8 GHz Linux desktop computer. MUMmer can also align jpayne@69: incomplete genomes; it handles the 100s or 1000s of contigs from a shotgun jpayne@69: sequencing project with ease, and will align them to another set of contigs or jpayne@69: a genome, using the nucmer utility included with the system. The promer jpayne@69: utility takes this a step further by generating alignments based upon the jpayne@69: six-frame translations of both input sequences. promer permits the alignment jpayne@69: of genomes for which the proteins are similar but the DNA sequence is too jpayne@69: divergent to detect similarity. See the nucmer and promer readme files in the jpayne@69: "docs/" subdirectory for more details. MUMmer is open source, so all we ask jpayne@69: is that you cite our most recent paper in any publications that use this jpayne@69: system: jpayne@69: jpayne@69: (Version 3.0 described) jpayne@69: Versatile and open software for comparing large genomes. jpayne@69: S. Kurtz, A. Phillippy, A.L. Delcher, jpayne@69: M. Smoot, M. Shumway, C. Antonescu, and S.L. Salzberg. jpayne@69: Genome Biology (2004), 5:R12. jpayne@69: jpayne@69: (Version 2.1 described) jpayne@69: Fast algorithms for large-scale genome alignment and comparison. jpayne@69: A.L. Delcher. A. Phillippy, J. Carlton, and S.L. Salzberg. jpayne@69: Nucleic Acids Research 30:11 (2002), 2478-2483. jpayne@69: jpayne@69: (Version 1.0 described) jpayne@69: Alignment of Whole Genomes. jpayne@69: A.L. Delcher, S. Kasif, jpayne@69: R.D. Fleischmann, J. Peterson, O. White, and S.L. Salzberg. jpayne@69: Nucleic Acids Research, 27:11 (1999), 2369-2376. jpayne@69: jpayne@69: jpayne@69: -- RUNNING MUMmer3.0 -- jpayne@69: MUMmer3.0 is comprised of many various utilities and scripts. For general jpayne@69: purposes, the scripts "run-mummer1", "run-mummer3", "nucmer", and "promer" jpayne@69: will be all that is needed. See their descriptions in the "RUNNING THE MUMmer jpayne@69: SCRIPTS" section, or refer to their individual documentation in the "docs/" jpayne@69: subdirectory. Refer to the "RUNNING THE MUMmer UTILITIES" section for a brief jpayne@69: description of all of the utilities in this directory. jpayne@69: jpayne@69: Simple use case: jpayne@69: Given a file containing a single reference sequence (ref.seq) in jpayne@69: FASTA format and another file containing multiple sequences in FastA jpayne@69: format (qry.seq) type the following at the command line: jpayne@69: jpayne@69: './nucmer -p ref.seq qry.seq' jpayne@69: jpayne@69: To produce the following files: jpayne@69: .delta jpayne@69: jpayne@69: or jpayne@69: jpayne@69: './run-mummer3.csh ref.seq qry.seq ' jpayne@69: jpayne@69: To produce the following files: jpayne@69: .out jpayne@69: .gaps jpayne@69: .align jpayne@69: .errorsgaps jpayne@69: jpayne@69: Please read the utility-specific documentation in the "docs/" subdirectory jpayne@69: for descriptions of these files and information on how to change the jpayne@69: alignment parameters for the scripts (minimum match length, etc.), or see jpayne@69: the notes below in the "RUNNING THE MUMmer SCRIPTS" section for a brief jpayne@69: explanation. jpayne@69: jpayne@69: To see a simple gnuplot output, if you have gnuplot installed, run jpayne@69: the perl script 'mummerplot' on the output files. This script can be run jpayne@69: on mummer output (.out), or nucmer/promer output (.delta). Edit the jpayne@69: .gp file that is created to change colors, line thicknesses, etc. or jpayne@69: explore the .[fr]plot file to see the data collection. jpayne@69: jpayne@69: './mummerplot -p .out' jpayne@69: jpayne@69: Or you can use the web viewer for completed microbial genomes: jpayne@69: http://www.tigr.org/CMR jpayne@69: jpayne@69: jpayne@69: jpayne@69: -- RUNNING THE MUMmer SCRIPTS -- jpayne@69: Because of MUMmer's modular design, it may be necessary to use a number jpayne@69: of separate programs to produce the desired output. The MUMmer scripts jpayne@69: attempt to simplify this process by wrapping various utilities into packages jpayne@69: that can perform standard alignment requests. Listed below are brief jpayne@69: descriptions and usage definitions for these scripts. Please refer to the jpayne@69: "docs/" subdirectory for a more detailed description of each script. jpayne@69: jpayne@69: jpayne@69: ** nucmer ** jpayne@69: jpayne@69: DESCRIPTION: jpayne@69: nucmer is for the all-vs-all comparison of nucleotide sequences jpayne@69: contained in multi-FastA data files. It is best used for highly jpayne@69: similar sequence that may have large rearrangements. Common use jpayne@69: cases are: comparing two unfinished shotgun sequencing assemblies, jpayne@69: mapping an unfinished sequencing assembly to a finished genome, and jpayne@69: comparing two fairly similar genomes that may have large jpayne@69: rearrangements and duplications. Please refer to "docs/nucmer.README" jpayne@69: for more information regarding this script and its output, or type jpayne@69: 'nucmer -h' for a list of its options. jpayne@69: jpayne@69: USAGE: jpayne@69: nucmer [options] jpayne@69: jpayne@69: [options] type 'nucmer -h' for a list of options. jpayne@69: specifies the multi-FastA sequence file that contains jpayne@69: the reference sequences, to be aligned with the queries. jpayne@69: specifies the multi-FastA sequence file that contains jpayne@69: the query sequences, to be aligned with the references. jpayne@69: jpayne@69: OUTPUT: jpayne@69: out.delta the delta encoded alignments between the reference and jpayne@69: query sequences. This file can be parsed with any of jpayne@69: the show-* programs which are described in the "RUNNING jpayne@69: THE MUMmer UTILITIES" section. jpayne@69: jpayne@69: NOTES: jpayne@69: All output coordinates reference the forward strand of the involved jpayne@69: sequence, regardless of the match direction. Also, nucmer now uses jpayne@69: only matches that are unique in the reference sequence by default, jpayne@69: use the '--mum' or '--maxmatch' options to change this behavior. jpayne@69: jpayne@69: jpayne@69: ** promer ** jpayne@69: jpayne@69: DESCRIPTION: jpayne@69: promer is for the protein level, all-vs-all comparison of nucleotide jpayne@69: sequences contained in multi-FastA data files. The nucleotide input jpayne@69: files are translated in all 6 reading frames and then aligned to one jpayne@69: another via the same methods as nucmer. It is best used for highly jpayne@69: divergent sequences that may have moderate to high similarity on the jpayne@69: protein level. Common use cases are: identifying syntenic regions jpayne@69: between highly divergent genomes, comparative genome annotation i.e. jpayne@69: using an already annotated genome to help in the annotation of a jpayne@69: newly sequenced genome, and the general comparison of two fairly jpayne@69: divergent genomes that have large rearrangements and may only be jpayne@69: similar on the protein level. Please refer to "docs/promer.README" jpayne@69: for more information regarding this script and its output, or type jpayne@69: 'promer -h' for a list of its options. jpayne@69: jpayne@69: USAGE: jpayne@69: promer [options] jpayne@69: jpayne@69: [options] type 'promer -h' for a list of options. jpayne@69: specifies the multi-FastA sequence file that contains jpayne@69: the reference sequences, to be aligned with the queries. jpayne@69: specifies the multi-FastA sequence file that contains jpayne@69: the query sequences, to be aligned with the references. jpayne@69: jpayne@69: OUTPUT: jpayne@69: out.delta the delta encoded alignments between the reference and jpayne@69: query sequences. This file can be parsed with any of jpayne@69: the show-* programs which are described in the "RUNNING jpayne@69: THE MUMmer UTILITIES" section. jpayne@69: jpayne@69: NOTES: jpayne@69: All output coordinates reference the forward strand of the involved jpayne@69: sequence, regardless of the match direction, and are measured in jpayne@69: nucleotides with the exception of the delta integers which are jpayne@69: measured in amino acids (1 delta int = 3 nucleotides). Also, promer jpayne@69: now uses only matches that are unique in the reference sequence by jpayne@69: default, use the '--mum' or '--maxmatch' options to change this jpayne@69: behavior. jpayne@69: jpayne@69: jpayne@69: ** run-mummer1 ** jpayne@69: jpayne@69: DESCRIPTION: jpayne@69: This script is taken directly from MUMmer1.0 and is best used to jpayne@69: align two sequences in which there is high similarity and no re- jpayne@69: arrangements. Common use cases are: aligning two finished bacterial jpayne@69: chromosomes. Please refer to "docs/run-mummer1.README" for the jpayne@69: original documentation for this script and its output. jpayne@69: jpayne@69: USAGE: jpayne@69: run-mummer1 [-r] jpayne@69: jpayne@69: specifies the file with the first sequence in FastA format. jpayne@69: No more than one sequence is allowed. jpayne@69: specifies the file with the second sequence in FastA format. jpayne@69: No more than one sequence is allowed. jpayne@69: specifies the prefix to be used for the output files. jpayne@69: [-r] is an optional parameter that will reverse complement the jpayne@69: second sequence. jpayne@69: jpayne@69: OUTPUT: jpayne@69: out.align the out.gaps file interspersed with the alignments jpayne@69: of the gaps. jpayne@69: out.errorsgaps the out.gaps file with an extra column stating the jpayne@69: number of errors contained in each gap. jpayne@69: out.gaps an ordered (clustered) list of matches with position jpayne@69: information, and gap distances between each match. jpayne@69: out.out a list of all maximal unique matches between the two jpayne@69: input sequences ordered by their start position in the jpayne@69: second sequence. jpayne@69: jpayne@69: NOTES: jpayne@69: All output coordinates reference their respective strand. This means jpayne@69: that if the -r switch is active, coordinates that reference the jpayne@69: second sequence will be relative to the reverse complement of the jpayne@69: second sequence. Please use nucmer or promer if this coordinate jpayne@69: system is confusing. jpayne@69: Eventually, this script's components will be rewritten to work jpayne@69: with the new MUMmer format standards and phased out in favor of the jpayne@69: new components and wrapping script. jpayne@69: jpayne@69: jpayne@69: ** run-mummer3 ** jpayne@69: jpayne@69: DESCRIPTION: jpayne@69: This script is the improved version of the MUMmer1.0 run-mummer1 jpayne@69: script. It uses a new clustering algorithm that appropriately jpayne@69: handles multiple sequence rearrangements and inversions. Because jpayne@69: of this, it can handle more divergent sequences better than jpayne@69: run-mummer1. In addition, it allows a multi-FastA query file for jpayne@69: 1-vs-many sequence comparisons. Please refer to jpayne@69: "docs/run-mummer3.README" for more detailed documentation of this jpayne@69: script and its output. jpayne@69: jpayne@69: USAGE: jpayne@69: run-mummer3 jpayne@69: jpayne@69: specifies the file with the reference sequence in FastA jpayne@69: format. No more than one sequence is allowed. jpayne@69: specifies the multi-FastA sequence file that contains jpayne@69: the query sequences. jpayne@69: specifies the file prefix for the output files. jpayne@69: jpayne@69: OUTPUT: jpayne@69: out.align the out.gaps file interspersed with the alignments jpayne@69: of the gaps. jpayne@69: out.errorsgaps the out.gaps file with an extra column stating the jpayne@69: number of errors contained in each gap. jpayne@69: out.gaps an ordered (clustered) list of matches with position jpayne@69: information, and gap distances between each match. jpayne@69: out.out a list of all maximal unique matches between the two jpayne@69: input sequences ordered by their start position in the jpayne@69: second sequence. jpayne@69: jpayne@69: NOTES: jpayne@69: All output coordinates reference their respective strand. This means jpayne@69: that for all reverse matches, the coordinates that reference the jpayne@69: query sequence will be relative to the reverse complement of the jpayne@69: query sequence. Please use nucmer or promer if this coordinate jpayne@69: system is confusing. jpayne@69: jpayne@69: jpayne@69: ** dnadiff ** jpayne@69: jpayne@69: DESCRIPTION: jpayne@69: This script is a wrapper around nucmer that builds an jpayne@69: alignment using default parameters, and runs many of nucmer's jpayne@69: helper scripts to process the output and report alignment jpayne@69: statistics, SNPs, breakpoints, etc. It is designed for jpayne@69: evaluating the sequence and structural similarity of two jpayne@69: highly similar sequence sets. E.g. comparing two different jpayne@69: assemblies of the same organism, or comparing two strains of jpayne@69: the same species. Please refer to "docs/dnadiff.README" for jpayne@69: more information regarding this script and its output, or type jpayne@69: 'dnadiff -h' for a list of its options. jpayne@69: jpayne@69: USAGE: dnadiff [options] jpayne@69: or dnadiff [options] -d jpayne@69: jpayne@69: Set the input reference multi-FASTA filename jpayne@69: Set the input query multi-FASTA filename jpayne@69: or jpayne@69: Unfiltered .delta alignment file from nucmer jpayne@69: jpayne@69: OUTPUT: jpayne@69: .report - Summary of alignments, differences and SNPs jpayne@69: .delta - Standard nucmer alignment output jpayne@69: .1delta - 1-to-1 alignment from delta-filter -1 jpayne@69: .mdelta - M-to-M alignment from delta-filter -m jpayne@69: .1coords - 1-to-1 coordinates from show-coords -THrcl .1delta jpayne@69: .mcoords - M-to-M coordinates from show-coords -THrcl .mdelta jpayne@69: .snps - SNPs from show-snps -rlTHC .1delta jpayne@69: .rdiff - Classified ref breakpoints from show-diff -rH .mdelta jpayne@69: .qdiff - Classified qry breakpoints from show-diff -qH .mdelta jpayne@69: .unref - Unaligned reference IDs and lengths (if applicable) jpayne@69: .unqry - Unaligned query IDs and lengths (if applicable) jpayne@69: jpayne@69: NOTES: jpayne@69: The report file generated by this script can be useful for jpayne@69: comparing the differences between two similar genomes or jpayne@69: assemblies. The other outputs generated by this script are in jpayne@69: unlabeled tabular format, so please refer to the utility jpayne@69: specific documentation for interpreting them. A full jpayne@69: description of the report file is given in "docs/dnadiff.README". jpayne@69: jpayne@69: jpayne@69: -- RUNNING THE MUMmer UTILITIES -- jpayne@69: The MUMmer package consists of various utilities that can interact with jpayne@69: the 'mummer' program. 'mummer' performs all maximal and maximal unique jpayne@69: matching, and all other utilities were designed to process the input and jpayne@69: output of this program and its related scripts, in order to extract jpayne@69: additional information from the output. Listed below are the descriptions jpayne@69: and usage definitions for these utilities. jpayne@69: jpayne@69: jpayne@69: ** annotate ** jpayne@69: jpayne@69: DESCRIPTION: jpayne@69: This program reads the output of the 'gaps' program and adds alignment jpayne@69: information to it. Part of the original MUMmer1.0 pipeline and can jpayne@69: only be used on the output of the 'gaps' program. jpayne@69: jpayne@69: USAGE: jpayne@69: annotate jpayne@69: jpayne@69: the output of the 'gaps' program. jpayne@69: the file containing the second sequence in the comparison. jpayne@69: jpayne@69: OUTPUT: jpayne@69: stdout the 'gaps' output interspersed with the alignments of jpayne@69: the gaps between adjacent MUMs. An alignment of a jpayne@69: gap comes after the second MUM defining the gap, and jpayne@69: alignment errors are marked with a '^' character. jpayne@69: witherrors.gaps the 'gaps' output with an appended column that lists jpayne@69: the number of alignment errors for each gap. jpayne@69: jpayne@69: NOTES: jpayne@69: This program will eventually be dropped in favor of the combineMUMs jpayne@69: or nucmer match extenders, but persists for the time being. jpayne@69: jpayne@69: jpayne@69: ** combineMUMs ** jpayne@69: jpayne@69: DESCRIPTION: jpayne@69: This program reads the output of the 'mgaps' program and adds alignment jpayne@69: information to it. Part of the MUMmer3.0 pipeline and can only be jpayne@69: used on the output of the 'mgaps' program. This -D option alters this jpayne@69: behavior and only outputs the positions of difference, e.g. SNPs. jpayne@69: jpayne@69: USAGE: jpayne@69: combineMUMs [options] jpayne@69: jpayne@69: [options] type 'combineMUMs -h' for a list of options. jpayne@69: the FastA reference file used in the comparison. jpayne@69: the multi-FastA reference file used in the comparison. jpayne@69: the output of the 'mgaps' program run on the match jpayne@69: list produced by 'mummer' for the reference and query jpayne@69: files. jpayne@69: jpayne@69: OUTPUT: jpayne@69: stdout the 'mgaps' output interspersed with the alignments jpayne@69: of the gaps between adjacent MUMs. An alignment of a jpayne@69: gap comes after the second MUM defining the gap, and jpayne@69: alignment errors are marked with a '^' character. At jpayne@69: the end of each cluster is a summary line (keyword jpayne@69: "Region") noting the bounds of the cluster in the jpayne@69: reference and query sequences, the total number of jpayne@69: errors for the region, the length of the region and jpayne@69: the percent error of the region. jpayne@69: witherrors.gaps the 'mgaps' output with an appended column that lists jpayne@69: the number of alignment errors for each gap. jpayne@69: jpayne@69: jpayne@69: ** delta-filter ** jpayne@69: jpayne@69: DESCRIPTION: jpayne@69: jpayne@69: This program filters a delta alignment file produced by either jpayne@69: nucmer or promer, leaving only the desired alignments which jpayne@69: are output to stdout in the same delta format as the jpayne@69: input. Its primary function is the LIS algorithm which jpayne@69: calculates the longest increasing subset of alignments. This jpayne@69: allows for the calculation of a global set of alignments jpayne@69: (i.e. 1-to-1 and mutually consistent order) with the -g option jpayne@69: or locally consistent with -1 or -m. Reference sequences can jpayne@69: be mapped to query sequences with -r, or queries to references jpayne@69: with -q. This allows the user to exclude chance and repeat jpayne@69: induced alignments, leaving only the "best" alignments between jpayne@69: the two data sets. Filtering can also be performed on length, jpayne@69: identity, and uniquenes. jpayne@69: jpayne@69: USAGE: jpayne@69: delta-filter [options] jpayne@69: jpayne@69: [options] type 'delta-filter -h' for a list of options. jpayne@69: the .delta output file from either nucmer or promer. jpayne@69: jpayne@69: OUTPUT: jpayne@69: stdout The same delta alignment format as output by nucmer and promer. jpayne@69: jpayne@69: NOTES: jpayne@69: For most cases the -m option is recommended, however -1 is jpayne@69: useful for applications that require a 1-to-1 mapping, such as jpayne@69: SNP finding. Use the -q option for mapping query contigs to jpayne@69: their best reference location. jpayne@69: jpayne@69: jpayne@69: ** exact-tandems ** jpayne@69: jpayne@69: DESCRIPTION: jpayne@69: This script finds exact tandem repeats in a specified FastA sequence jpayne@69: file. It is a post-processor for 'repeat-match' and provides a simple jpayne@69: interface and output for tandem repeat detection. jpayne@69: jpayne@69: USAGE: jpayne@69: exact-tandems jpayne@69: jpayne@69: the single sequence in FastA format to search for repeats. jpayne@69: the minimum match length for the tandems. jpayne@69: jpayne@69: OUTPUT: jpayne@69: stdout 4 columns, the start of the tandem repeat, the total extent jpayne@69: of the repeat region, the length of each repetitive unit, and jpayne@69: to total copies of the repetitive unit involved. jpayne@69: jpayne@69: jpayne@69: ** gaps ** jpayne@69: jpayne@69: DESCRIPTION: jpayne@69: This program reads a list of unique matches between two strings and jpayne@69: outputs the longest consistent set of matches, followed by all the jpayne@69: other matches. Part of the MUMmer1.0 pipeline and the output of the jpayne@69: 'mummer' program needs to be processed (to strip all non-match lines) jpayne@69: before it can be passed to this program. jpayne@69: jpayne@69: USAGE: jpayne@69: gaps [-r] < jpayne@69: jpayne@69: The first sequence file that the match list represents. jpayne@69: A simple list of matches and NO header lines or other jpayne@69: mumbo jumbo. The columns of the match list should be jpayne@69: start in the reference, start in the query, and length jpayne@69: of the match. jpayne@69: [-r] Simply puts the string "reverse" on the header of the jpayne@69: output so 'annotate' knows to reverse the second jpayne@69: sequence. jpayne@69: jpayne@69: OUTPUT: jpayne@69: stdout an ordered set of the input matches, separated by headers. jpayne@69: The first set is the longest consistent set of matches and jpayne@69: the second set is all other matches. jpayne@69: jpayne@69: NOTES: jpayne@69: This program will eventually be rewritten to be interchangeable with jpayne@69: 'mgaps', so that it may be plugged into the nucmer or promer jpayne@69: pipelines. jpayne@69: jpayne@69: jpayne@69: ** mapview ** jpayne@69: jpayne@69: DESCRIPTION: jpayne@69: mapview is a utility program for displaying sequence alignments as jpayne@69: provided by MUMmer, nucmer or promer. This program takes the output jpayne@69: from these alignment routines and converts it to a FIG, PDF or PS jpayne@69: file for visual analysis. It can also break the output into multiple jpayne@69: files for easier viewing and printing. Please refer to jpayne@69: "docs/mapview.README" for a more detailed description and explination. jpayne@69: jpayne@69: USAGE: jpayne@69: mapview [options] [UTR coords] [CDS coords] jpayne@69: jpayne@69: [options] type 'mapview -h' for a list of options. jpayne@69: show-coords output file jpayne@69: [UTR coords] UTR coordinate file in GFF format jpayne@69: [CDS coords] CDS coordinate file in GFF format jpayne@69: jpayne@69: OUTPUT: jpayne@69: Default output format is an xfig file, however this can be changed to jpayne@69: a postscript of PDF file with the -f option. See 'mapview -h' for a jpayne@69: list of available formatting options. jpayne@69: jpayne@69: NOTES: jpayne@69: The produce the coords file input, 'show-coords' must be run with the jpayne@69: -r -l options. To reduce redundant matches in promer output, run jpayne@69: show-coords with the -k option. To generate output formats other than jpayne@69: xfig, the fig2dev utility must be available from the system path. For jpayne@69: very large reference genomes, FIG format may be the only option that jpayne@69: will allow the entire display to be stored in one file, as fig2dev has jpayne@69: problems if the output is too large. jpayne@69: jpayne@69: jpayne@69: ** mgaps ** jpayne@69: jpayne@69: DESCRIPTION: jpayne@69: This program reads a list of matches between a single-FastA reference jpayne@69: and a multi-FastA query file and outputs clusters of matches that lie jpayne@69: on similar diagonals and within a reasonable distance. Part of the jpayne@69: MUMmer3.0 pipeline and the output of 'mummer' need not be processed jpayne@69: before passing it to this program, so long as 'mummer' was run on a jpayne@69: 1-vs-many or 1-vs-1 dataset. jpayne@69: jpayne@69: USAGE: jpayne@69: mgaps [options] < jpayne@69: jpayne@69: [options] type 'mgaps -h' for a list of options. jpayne@69: A list of matches separated by their sequence FastA tags. jpayne@69: The columns of the match list should be start in jpayne@69: reference, start in query, and length of the match. jpayne@69: jpayne@69: OUTPUT: jpayne@69: stdout An ordered set of the input matches, separated by headers. jpayne@69: Individual clusters are separated by a '#' character and jpayne@69: sets of clusters from different sequences are separated by jpayne@69: the FastA header tag for the query sequence. jpayne@69: jpayne@69: NOTES: jpayne@69: It is often very helpful to adjust the clustering parameters. Check jpayne@69: 'mgaps -h' for the list of parameters and check the source for a jpayne@69: better idea of how each parameter affects the result. Often, it is jpayne@69: helpful to run this program a number of times with different jpayne@69: parameters until the desired result is achieved. jpayne@69: jpayne@69: jpayne@69: ** mummer ** jpayne@69: jpayne@69: DESCRIPTION: jpayne@69: This is the core program of the MUMmer package. It is the suffix-tree jpayne@69: based match finding routine, and the main part of every MUMmer script. jpayne@69: For a detailed manual describing how to use this program, please refer jpayne@69: to "docs/maxmat3man.pdf" or in LaTeX format "docs/maxmat3man.tex". By jpayne@69: default, 'mummer' now finds maximal matches regardless of their jpayne@69: uniqueness. Limiting the output to only unique matches can be specified jpayne@69: as a command line switch. jpayne@69: jpayne@69: USAGE: jpayne@69: mummer [options] ... jpayne@69: jpayne@69: [options] type 'mummer -help' for a list of options. jpayne@69: specifies the single or multi-FastA sequence file that jpayne@69: contains the reference sequence(s), to be aligned with jpayne@69: the queries. jpayne@69: specifies the multi-FastA sequence file that contains jpayne@69: the query sequences, to be aligned with the references. jpayne@69: Multiple query files are allowed, up to 32. jpayne@69: jpayne@69: OUTPUT: jpayne@69: stdout a list of exact matches. Varies depending on input, refer to jpayne@69: the manual specified in the description above. jpayne@69: jpayne@69: NOTES: jpayne@69: Many thanks to Stefan Kurtz for the latest mummer version. 'mummer' jpayne@69: now behaves like the old 'mummer2' program by default. The -mum switch jpayne@69: forces it to behave like 'mummer1', the -mumreference switch forces it jpayne@69: to behave like 'mummer2' while the -maxmatch switch forces it to behave jpayne@69: like the old 'max-match' program. jpayne@69: jpayne@69: jpayne@69: ** mummerplot ** jpayne@69: jpayne@69: DESCRIPTION: jpayne@69: mummerplot is a perl script that generates gnuplot scripts and data jpayne@69: collections for plotting with the gnuplot utility. It can generate jpayne@69: 2-d dotplots and 1-d coverage plots for the output of mummer, nucmer, jpayne@69: promer or show-tiling. It can also color dotplots with an identity jpayne@69: color gradient. jpayne@69: jpayne@69: USAGE: jpayne@69: mummerplot [options] jpayne@69: jpayne@69: [options] type 'mummerplot -h' for a list of options. jpayne@69: the output of 'mummer', 'nucmer', 'promer', or jpayne@69: 'show-tiling'. 'mummerplot' will automatically determine jpayne@69: the format of the data it was given and produce the plot jpayne@69: accordingly. jpayne@69: jpayne@69: OUTPUT: jpayne@69: out.gp The gnuplot script, type 'gnuplot out.gp' to evaluate the jpayne@69: the gnuplot script. jpayne@69: out.fplot jpayne@69: out.rplot jpayne@69: out.hplot The forward, reverse and highlighted match information for jpayne@69: plotting with gnuplot. jpayne@69: jpayne@69: out.ps jpayne@69: out.png The plotted image file, postscript or png depending on the jpayne@69: selected terminal type. jpayne@69: jpayne@69: NOTES: jpayne@69: For alignments with multiple reference or query sequences, be sure to jpayne@69: use the -r -q or -R -Q options to avoid overlaying multiple plots in jpayne@69: the same space. For better looking color gradient plots, try the jpayne@69: postscript terminal and avoid the png terminal. jpayne@69: jpayne@69: jpayne@69: ** nucmer2xfig ** jpayne@69: jpayne@69: DESCRIPTION: jpayne@69: Script for plotting nucmer hits against a reference sequence. See top jpayne@69: of script for more information, or see if 'mummerplot' or 'mapview' jpayne@69: has the functionality required as they are properly maintained. jpayne@69: jpayne@69: jpayne@69: ** repeat-match ** jpayne@69: jpayne@69: DESCRIPTION: jpayne@69: Finds exact repeats within a single sequence. jpayne@69: jpayne@69: USAGE: jpayne@69: repeat-match [options] jpayne@69: jpayne@69: [options] type 'repeat-match -h' for a list of options. jpayne@69: the single sequence in FastA format to search for repeats. jpayne@69: jpayne@69: OUTPUT: jpayne@69: stdout 3 columns, the start of the first copy of the repeat, the jpayne@69: start of the second copy of the repeat, and the length of the jpayne@69: repeat respectively. jpayne@69: jpayne@69: NOTES: jpayne@69: REPuter (freely available for universities) may be better suited for jpayne@69: most repeat matching, but 'repeat-match' is open-source and has some jpayne@69: functionality that REPuter does not so we include it along with the jpayne@69: MUMmer package. jpayne@69: jpayne@69: jpayne@69: ** show-aligns ** jpayne@69: jpayne@69: DESCRIPTION: jpayne@69: This program parses the delta alignment output of nucmer and promer jpayne@69: and displays all of the pairwise alignments from the two sequences jpayne@69: specified on the command line. jpayne@69: jpayne@69: USAGE: jpayne@69: show-aligns [options] jpayne@69: jpayne@69: [options] type 'show-aligns -h' for a list of options. jpayne@69: the .delta output file from either nucmer or promer. jpayne@69: the FastA header tag of the desired reference sequence. jpayne@69: the FastA header tag of the desired query sequence. jpayne@69: jpayne@69: OUTPUT: jpayne@69: stdout each alignment header and footer describes the frame of the jpayne@69: alignment in each sequence, and the start and finish jpayne@69: (inclusive) of the alignment in each sequence. At the jpayne@69: beginning of each line of aligned sequence are two numbers, the jpayne@69: top is the coordinate of the first reference base on that line jpayne@69: and the bottom is the coordinate of the first query base on jpayne@69: that line. ALL coordinates reference the forward strand of the jpayne@69: DNA sequence, even if it is a protein alignment. A gap caused jpayne@69: by an insertion or deletion is filled with a '.' character. jpayne@69: Errors in a DNA alignment are marked with a '^' below the jpayne@69: error. Errors in an amino acid alignment are marked with a jpayne@69: whitespace in the middle consensus line, while matches are jpayne@69: marked with the consensus base and similarities are marked with jpayne@69: a '+' in the consensus line. jpayne@69: jpayne@69: jpayne@69: ** show-coords ** jpayne@69: jpayne@69: DESCRIPTION: jpayne@69: This program parses the delta alignment output of nucmer and promer jpayne@69: and displays the coordinates, and other useful information about the jpayne@69: alignments. jpayne@69: jpayne@69: USAGE: jpayne@69: show-coords [options] jpayne@69: jpayne@69: [options] type 'show-coords -h' for a list of options. jpayne@69: the .delta output file from either nucmer or promer. jpayne@69: jpayne@69: OUTPUT: jpayne@69: stdout run 'show-coords' without the -H option to see the column jpayne@69: header tags. Here is a description of each tag. Note that jpayne@69: some of the below tags do not apply to nucmer data, and that jpayne@69: all coordinates are inclusive and relative to the forward DNA jpayne@69: strand. jpayne@69: jpayne@69: [S1] Start of the alignment region in the reference sequence. jpayne@69: jpayne@69: [E1] End of the alignment region in the reference sequence. jpayne@69: jpayne@69: [S2] Start of the alignment region in the query sequence. jpayne@69: jpayne@69: [E2] End of the alignment region in the query sequence. jpayne@69: jpayne@69: [LEN 1] Length of the alignment region in the reference sequence, jpayne@69: measured in nucleotides. jpayne@69: jpayne@69: [LEN 2] Length of the alignment region in the query sequence, measured jpayne@69: in nucleotides. jpayne@69: jpayne@69: [% IDY] Percent identity of the alignment, calculated as the jpayne@69: (number of exact matches) / ([LEN 1] + insertions in the query). jpayne@69: jpayne@69: [% SIM] Percent similarity of the alignment, calculated like the above jpayne@69: value, but counting positive BLOSUM matrix scores instead of exact jpayne@69: matches. jpayne@69: jpayne@69: [% STP] Percent of stop codons of the alignment, calculated as jpayne@69: (number of stop codons) / (([LEN 1] + insertions in the query) * 2). jpayne@69: jpayne@69: [LEN R] Length of the reference sequence. jpayne@69: jpayne@69: [LEN Q] Length of the query sequence. jpayne@69: jpayne@69: [COV R] Percent coverage of the alignment on the reference sequence, jpayne@69: calculated as [LEN 1] / [LEN R]. jpayne@69: jpayne@69: [COV Q] Percent coverage of the alignment on the query sequence, jpayne@69: calculated as [LEN 2] / [LEN Q]. jpayne@69: jpayne@69: [FRM] Reading frame for the reference sequence and the reading frame jpayne@69: for the query sequence respectively. This is one of the columns jpayne@69: absent from the nucmer data, however, match direction can easily be jpayne@69: determined by the start and end coordinates. jpayne@69: jpayne@69: [TAGS] The reference FastA ID and the query FastA ID. jpayne@69: jpayne@69: There is also an optional final column (turned on with the -w jpayne@69: or -o option) that will contain some 'annotations'. The -o option will jpayne@69: annotate alignments that represent overlaps between two sequences, jpayne@69: while the -w option is antiquated and should no longer be used. jpayne@69: Sometimes, nucmer or promer will extend adjacent clusters past one jpayne@69: another, thus causing a somewhat redundant output, this option will jpayne@69: notify users of such rare occurrences. jpayne@69: jpayne@69: NOTES: jpayne@69: The -c and -l options are useful when comparing two sets of assembly jpayne@69: contigs, in that these options help determine if an alignment spans an jpayne@69: entire contig, or is just a partial hit to a different read. The -b jpayne@69: option is useful when the user wishes to identify sytenic regions jpayne@69: between two genomes, but is not particularly interested in the actual jpayne@69: alignment similarity or appearance. This option also disregards match jpayne@69: orientation, so should not be used if this information is needed. jpayne@69: jpayne@69: jpayne@69: ** show-diff ** jpayne@69: jpayne@69: DESCRIPTION: jpayne@69: This program classifies alignment breakpoints for the jpayne@69: quantification of macroscopic differences between two jpayne@69: genomes. It takes a standard, unfiltered delta file as input, jpayne@69: determines the best mapping between the two sequence sets, and jpayne@69: reports on the breaks in that mapping. jpayne@69: jpayne@69: USAGE: jpayne@69: show-diff [options] jpayne@69: jpayne@69: [options] type 'show-diff -h' for a list of options. jpayne@69: the .delta output file from nucmer jpayne@69: jpayne@69: OUTPUT: jpayne@69: stdout Classified breakpoints are output one per line with jpayne@69: the following types and column definitions. The first jpayne@69: five columns of every row are seq ID, feature type, jpayne@69: feature start, feature end, and feature length. jpayne@69: jpayne@69: Feature Columns jpayne@69: jpayne@69: IDR GAP gap-start gap-end gap-length-R gap-length-Q gap-diff jpayne@69: IDR DUP dup-start dup-end dup-length jpayne@69: IDR BRK gap-start gap-end gap-length jpayne@69: IDR JMP gap-start gap-end gap-length jpayne@69: IDR INV gap-start gap-end gap-length jpayne@69: IDR SEQ gap-start gap-end gap-length prev-sequence next-sequence jpayne@69: jpayne@69: Feature Types jpayne@69: jpayne@69: [GAP] A gap between two mutually consistent ordered and jpayne@69: oriented alignments. gap-length-R is the length of the jpayne@69: alignment gap in the reference, gap-length-Q is the length of jpayne@69: the alignment gap in the query, and gap-diff is the difference jpayne@69: between the two gap lengths. If gap-diff is positive, sequence jpayne@69: has been inserted in the reference. If gap-diff is negative, jpayne@69: sequence has been deleted from the reference. If both jpayne@69: gap-length-R and gap-length-Q are negative, the indel is jpayne@69: tandem duplication copy difference. jpayne@69: jpayne@69: [DUP] A duplicated sequence in the reference that occurs more jpayne@69: times in the reference than in the query. The coordinate jpayne@69: columns specify the bounds and length of the jpayne@69: duplication. These features are often bookended by BRK jpayne@69: features if there is unique sequence bounding the duplication. jpayne@69: jpayne@69: [BRK] An insertion in the reference of unknown origin, that jpayne@69: indicates no query sequence aligns to the sequence bounded by jpayne@69: gap-start and gap-end. Often found around DUP elements or at jpayne@69: the beginning or end of sequences. jpayne@69: jpayne@69: [JMP] A relocation event, where the consistent ordering of jpayne@69: alignments is disrupted. The coordinate columns specify the jpayne@69: breakpoints of the relocation in the reference, and the jpayne@69: gap-length between them. A negative gap-length indicates the jpayne@69: relocation occurred around a repetitive sequence, and a jpayne@69: positive length indicates unique sequence between the jpayne@69: alignments. jpayne@69: jpayne@69: [INV] The same as a relocation event, however both the jpayne@69: ordering and orientation of the alignments is disrupted. Note jpayne@69: that for JMP and INV, generally two features will be output, jpayne@69: one for the beginning of the inverted region, and another for jpayne@69: the end of the inverted region. jpayne@69: jpayne@69: [SEQ] A translocation event that requires jumping to a new jpayne@69: query sequence in order to continue aligning to the jpayne@69: reference. If each input sequence is a chromosome, these jpayne@69: features correspond to inter-chromosomal translocations. jpayne@69: jpayne@69: NOTES: jpayne@69: The estimated number of features, take inversions for example, jpayne@69: represents the number of breakpoints classified as bordering jpayne@69: an inversion. Therefore, since there will be a breakpoint at jpayne@69: both the beginning and the end of an inversion, the feature jpayne@69: counts are roughly double the number of inversion events. In jpayne@69: addition, all counts are estimates and do not represent the jpayne@69: exact number of each evolutionary event. jpayne@69: jpayne@69: Summing the fifth column (ignoring negative values) yeilds an jpayne@69: estimate of the total inserted sequence in the jpayne@69: reference. Summing the fifth column after removing DUP jpayne@69: features yields an estimate of the total amount of unique jpayne@69: (unaligned) sequence in the reference. Note that unaligned jpayne@69: sequences are not counted, and could represent additional jpayne@69: "unique" sequences. Use the 'dnadiff' script if you must jpayne@69: recover this information. Finally, the -q option switches jpayne@69: references for queries, and uses the query coordinates for the jpayne@69: analysis. jpayne@69: jpayne@69: jpayne@69: ** show-snps ** jpayne@69: jpayne@69: DESCRIPTION: jpayne@69: This program reports polymorphism contained in a delta encoded jpayne@69: alignment file output by either nucmer or promer. It catalogs jpayne@69: all of the single nucleotide polymorphisms (SNPs) and jpayne@69: insertions/deletions within the delta file jpayne@69: alignments. Polymorphisms are reported one per line, in a jpayne@69: delimited fashion similar to show-coords. Pairing this program jpayne@69: with the appropriate MUMmer tools can create an easy to use jpayne@69: SNP pipeline for the rapid identification of putative SNPs jpayne@69: between any two sequence sets. jpayne@69: jpayne@69: USAGE: jpayne@69: show-snps [options] jpayne@69: jpayne@69: [options] type 'show-snps -h' for a list of options. jpayne@69: the .delta output file from either nucmer or promer. jpayne@69: jpayne@69: OUTPUT: jpayne@69: stdout Standard output has column headers with the following jpayne@69: meanings. Not all columns will be output by default, jpayne@69: see 'show-snps -h' for switch to control the output. jpayne@69: jpayne@69: [P1] SNP position in the reference. jpayne@69: jpayne@69: [SUB] Character in the reference. jpayne@69: jpayne@69: [SUB] Character in the query. jpayne@69: jpayne@69: [P2] SNP position in the query. jpayne@69: jpayne@69: [BUFF] Distance from this SNP to the nearest mismatch (end of jpayne@69: alignment, indel, SNP, etc) in the same alignment. jpayne@69: jpayne@69: [DIST] Distance from this SNP to the nearest sequence end. jpayne@69: jpayne@69: [R] Number of repeat alignments which cover this reference jpayne@69: position, >0 means repetitive sequence. jpayne@69: jpayne@69: [Q] Number of repeat alignments which cover this query jpayne@69: position, >0 means repetitive sequence. jpayne@69: jpayne@69: [LEN R] Length of the reference sequence. jpayne@69: jpayne@69: [LEN Q] Length of the query sequence. jpayne@69: jpayne@69: [CTX R] Surrounding context sequence in the reference. jpayne@69: jpayne@69: [CTX Q] Surrounding context sequence in the query. jpayne@69: jpayne@69: [FRM] Reading frame for the reference sequence and the jpayne@69: reading frame for the query sequence respectively. Simply jpayne@69: 'forward' 1, or 'reverse' -1 for nucmer data. jpayne@69: jpayne@69: [TAGS] The reference FastA ID and the query FastA ID. jpayne@69: jpayne@69: NOTES: jpayne@69: It is often helpful to run this with the -C option to assure jpayne@69: reported SNPs are only reported from uniquely aligned regions. jpayne@69: jpayne@69: jpayne@69: ** show-tiling ** jpayne@69: jpayne@69: DESCRIPTION: jpayne@69: This program attempts to construct a tiling path out of the query jpayne@69: contigs as mapped to the reference sequences. Given the delta jpayne@69: alignment information of a few long reference sequences and many small jpayne@69: query contigs, 'show-tiling' will determine the best location on a jpayne@69: reference for each contig. Note that each contig may only be tiled jpayne@69: once, so repetitive regions may cause this program some difficulty. jpayne@69: This program is useful for aiding in the scaffolding and closure of an jpayne@69: unfinished set of contigs, if a suitable, high similarity, reference jpayne@69: genome is available. Or, if using promer, 'show-tiling' will help jpayne@69: in the identification of syntenic regions and their contig's mapping jpayne@69: the the references. jpayne@69: jpayne@69: USAGE: jpayne@69: show-tiling [options] jpayne@69: jpayne@69: [options] type 'show-tiling -h' for a list of options. jpayne@69: the .delta output file from either nucmer or promer. jpayne@69: jpayne@69: OUTPUT: jpayne@69: stdout Standard output has 8 columns: start in reference, end in jpayne@69: reference, gap between this contig and the next, length of this jpayne@69: contig, alignment coverage of this contig, average percent jpayne@69: identity of the alignments for this contig, orientation of this jpayne@69: contig, contig ID. All matches to a reference are headed by the jpayne@69: FASTA tag of that reference. Output with the -a option is the jpayne@69: same as 'show-coords -cl' when run on nucmer data. jpayne@69: jpayne@69: NOTES: jpayne@69: When run with the -x option, 'show-tiling' will produce an XML output jpayne@69: format that can be accepted by TIGR's open source scaffolding software jpayne@69: 'Bambus' as contig linking information. jpayne@69: jpayne@69: jpayne@69: -- CONTACT INFORMATION -- jpayne@69: jpayne@69: Please address questions and bug reports to: jpayne@69: jpayne@69: Last Revised May 12, 2005