jpayne@69
|
1 -=- run-mummer1 (MUMmer1.0) README -=-
|
jpayne@69
|
2
|
jpayne@69
|
3 ** NOTE **
|
jpayne@69
|
4 This manual is outdated, please refer to the HTML documentation included in
|
jpayne@69
|
5 this distribution or at:
|
jpayne@69
|
6
|
jpayne@69
|
7 http://mummer.sourceforge.net
|
jpayne@69
|
8 http://mummer.sourceforge.net/manual
|
jpayne@69
|
9 http://mummer.sourceforge.net/examples
|
jpayne@69
|
10
|
jpayne@69
|
11 This is the README for the original MUMmer 1.0 system, which is
|
jpayne@69
|
12 included in the MUMmer 3 package. It is slower and uses more memory,
|
jpayne@69
|
13 but it does have some slightly different functions and output so we
|
jpayne@69
|
14 make it available.
|
jpayne@69
|
15
|
jpayne@69
|
16
|
jpayne@69
|
17
|
jpayne@69
|
18 MUMmer 1.0 code and documentation are copyright (c) 1999 by The Institute
|
jpayne@69
|
19 for Genomic Research. The principle architect for the system was
|
jpayne@69
|
20 Arthur Delcher.
|
jpayne@69
|
21
|
jpayne@69
|
22 This directory contains the source code for MUMmer program for
|
jpayne@69
|
23 aligning long DNA sequences. If you use this code in any publication,
|
jpayne@69
|
24 please cite the following:
|
jpayne@69
|
25 A.L. Delcher, S. Kasif, R.D. Fleischmann, J. Peterson,
|
jpayne@69
|
26 O. White, and S.L. Salzberg. Alignment of whole genomes.
|
jpayne@69
|
27 Nucleic Acids Research, 27:11 (1999), 2369-2376.
|
jpayne@69
|
28
|
jpayne@69
|
29 MUMmer works on Unix only. This README file is the only documentation
|
jpayne@69
|
30 besides the code itself. Since it is free, we cannot provide any
|
jpayne@69
|
31 other support. The system is very easy to use, but you need to invest
|
jpayne@69
|
32 a few minutes up front reading this file and figuring out how to
|
jpayne@69
|
33 interpret the output. If you discover bugs, we would be interested in
|
jpayne@69
|
34 hearing about them so we can correct them. Address bug reports to
|
jpayne@69
|
35 <mummer-help@lists.sourceforge.net>. The system uses a LOT of RAM - if
|
jpayne@69
|
36 it crashes for that reason, that's not a bug. We recommend at least
|
jpayne@69
|
37 512Mb to align most pairs of bacterial genomes, and 1Gb or more may be
|
jpayne@69
|
38 required.
|
jpayne@69
|
39
|
jpayne@69
|
40 To use this system, first compile it by typing 'make' at the
|
jpayne@69
|
41 command line. There is a script, 'run-mummer1.csh', that runs all
|
jpayne@69
|
42 the steps of aligning two genomes. The script takes these arguments:
|
jpayne@69
|
43
|
jpayne@69
|
44 run-mummer1.csh <genome1> <genome2> <tag> [-r]
|
jpayne@69
|
45
|
jpayne@69
|
46 The two genomes must DNA sequences in FASTA format. Multi-FASTA
|
jpayne@69
|
47 files don't work. The tag is used to create 4 output files,
|
jpayne@69
|
48 all with 'tag' as a prefix. -flip will reverse complement <genome2>.
|
jpayne@69
|
49
|
jpayne@69
|
50 Of the four output files, two are intended for your inspection.
|
jpayne@69
|
51 <tag>.errorsgaps lists the alignment of all the MUMs (maximal unique
|
jpayne@69
|
52 matches - read the paper for definitions), and includes the longest
|
jpayne@69
|
53 ascending sequence of MUMs first. This sequence is the best alignment
|
jpayne@69
|
54 of the two genomes by the program. In between the MUMs are gaps,
|
jpayne@69
|
55 which are aligned using a Smith-Waterman implementation of our own.
|
jpayne@69
|
56 Those Smith-Waterman's are contained in the file <tag>.align.
|
jpayne@69
|
57 This can be a very long file if there are lots of gaps. If either
|
jpayne@69
|
58 of the two gaps is too long (over 5000bp), then the alignment is
|
jpayne@69
|
59 not performed.
|
jpayne@69
|
60
|
jpayne@69
|
61 IMPORTANT: the performance of the program can critically depend on the
|
jpayne@69
|
62 minimum MUM length you use. The default is 20bp. If you want to
|
jpayne@69
|
63 change it, do the following: edit the file run-mummer1.csh. Add a new
|
jpayne@69
|
64 length switch to the 'mummer' call.
|
jpayne@69
|
65
|
jpayne@69
|
66 The other file - one that we often spend lots of time analyzing - is
|
jpayne@69
|
67 <tag>.errorsgaps. The lines in that file are the MUMs, for example:
|
jpayne@69
|
68 46989 271588 23 none 3262 3304 1022
|
jpayne@69
|
69 47013 271612 24 none 1 1 1
|
jpayne@69
|
70 Columns 1 and 2 are the positions of a MUM in genomes 1 and 2. The
|
jpayne@69
|
71 MUM is 24bp in length, shown in column 3. Column 4 is the overlap
|
jpayne@69
|
72 from the previous MUM - usually this will be 'none' except when there
|
jpayne@69
|
73 are repeats present. Columns 5 and 6 show the gaps from the *end*
|
jpayne@69
|
74 of the previous MUM. Column 7 shows the number of errors - indels
|
jpayne@69
|
75 or mismatches - in the S-W alignment of the gap before this MUM.
|
jpayne@69
|
76 Hence in the two lines shown here, the second line is a MUM of 24bp,
|
jpayne@69
|
77 and it follows the previous MUM after a gap of just 1bp in each genome.
|
jpayne@69
|
78 This indicates a single nucleotide polymorphism.
|
jpayne@69
|
79
|
jpayne@69
|
80 Files in this directory:
|
jpayne@69
|
81
|
jpayne@69
|
82 annotate.cc Adds alignment info to gaps file produced by gaps
|
jpayne@69
|
83 or gaps2 . Reads frag info from the file named on the '>'
|
jpayne@69
|
84 lines of the gap file. The word "reverse" after that name
|
jpayne@69
|
85 will make that sequence get reverse-complemented.
|
jpayne@69
|
86 Also produces file witherrors.gaps which is the same as the
|
jpayne@69
|
87 gaps input file, put with an extra column of number of
|
jpayne@69
|
88 errors.
|
jpayne@69
|
89
|
jpayne@69
|
90 gaps.cc Finds longest consistent set of matches in list
|
jpayne@69
|
91 produced by mummer1 program.
|
jpayne@69
|
92
|
jpayne@69
|
93 run-mummer1.csh Script to run alignment programs. Format is:
|
jpayne@69
|
94 run-mummer1.csh <genome1> <genome2> <tag> [-flip]
|
jpayne@69
|
95 <tag> will be used to make output files: <tag>.out , <tag>.gaps
|
jpayne@69
|
96 and <tag>.align . -r will reverse complement <genome2>
|
jpayne@69
|
97
|
jpayne@69
|
98 Email questions, comments or bug reports to: <mummer-help@lists.sourceforge.net>
|
jpayne@69
|
99
|