jpayne@69
|
1 -=- MUMmer3.x README -=-
|
jpayne@69
|
2
|
jpayne@69
|
3 ** NOTE **
|
jpayne@69
|
4 A comprehensive HTML user manual is available in the docs/web/manual
|
jpayne@69
|
5 subdirectory or at http://mummer.sourceforge.net/manual
|
jpayne@69
|
6
|
jpayne@69
|
7 MUMmer is now an open source package! Please contact us if you would like
|
jpayne@69
|
8 to contribute to the MUMmer project. For more information or the latest
|
jpayne@69
|
9 release please visit the MUMmer homepage at http://mummer.sourceforge.net
|
jpayne@69
|
10
|
jpayne@69
|
11 Please refer to the INSTALL file for installation instructions. This file
|
jpayne@69
|
12 contains brief descriptions of all executables in the base directory and
|
jpayne@69
|
13 general information about the MUMmer package.
|
jpayne@69
|
14
|
jpayne@69
|
15
|
jpayne@69
|
16
|
jpayne@69
|
17 -- DESCRIPTION --
|
jpayne@69
|
18 MUMmer is a system for rapidly aligning entire genomes. The current
|
jpayne@69
|
19 version (release 3.0) can find all 20 base pair maximal exact matches between
|
jpayne@69
|
20 two bacterial genomes of ~5 million base pairs each in 20 seconds, using 90 MB
|
jpayne@69
|
21 of memory, on a typical 1.8 GHz Linux desktop computer. MUMmer can also align
|
jpayne@69
|
22 incomplete genomes; it handles the 100s or 1000s of contigs from a shotgun
|
jpayne@69
|
23 sequencing project with ease, and will align them to another set of contigs or
|
jpayne@69
|
24 a genome, using the nucmer utility included with the system. The promer
|
jpayne@69
|
25 utility takes this a step further by generating alignments based upon the
|
jpayne@69
|
26 six-frame translations of both input sequences. promer permits the alignment
|
jpayne@69
|
27 of genomes for which the proteins are similar but the DNA sequence is too
|
jpayne@69
|
28 divergent to detect similarity. See the nucmer and promer readme files in the
|
jpayne@69
|
29 "docs/" subdirectory for more details. MUMmer is open source, so all we ask
|
jpayne@69
|
30 is that you cite our most recent paper in any publications that use this
|
jpayne@69
|
31 system:
|
jpayne@69
|
32
|
jpayne@69
|
33 (Version 3.0 described)
|
jpayne@69
|
34 Versatile and open software for comparing large genomes.
|
jpayne@69
|
35 S. Kurtz, A. Phillippy, A.L. Delcher,
|
jpayne@69
|
36 M. Smoot, M. Shumway, C. Antonescu, and S.L. Salzberg.
|
jpayne@69
|
37 Genome Biology (2004), 5:R12.
|
jpayne@69
|
38
|
jpayne@69
|
39 (Version 2.1 described)
|
jpayne@69
|
40 Fast algorithms for large-scale genome alignment and comparison.
|
jpayne@69
|
41 A.L. Delcher. A. Phillippy, J. Carlton, and S.L. Salzberg.
|
jpayne@69
|
42 Nucleic Acids Research 30:11 (2002), 2478-2483.
|
jpayne@69
|
43
|
jpayne@69
|
44 (Version 1.0 described)
|
jpayne@69
|
45 Alignment of Whole Genomes.
|
jpayne@69
|
46 A.L. Delcher, S. Kasif,
|
jpayne@69
|
47 R.D. Fleischmann, J. Peterson, O. White, and S.L. Salzberg.
|
jpayne@69
|
48 Nucleic Acids Research, 27:11 (1999), 2369-2376.
|
jpayne@69
|
49
|
jpayne@69
|
50
|
jpayne@69
|
51 -- RUNNING MUMmer3.0 --
|
jpayne@69
|
52 MUMmer3.0 is comprised of many various utilities and scripts. For general
|
jpayne@69
|
53 purposes, the scripts "run-mummer1", "run-mummer3", "nucmer", and "promer"
|
jpayne@69
|
54 will be all that is needed. See their descriptions in the "RUNNING THE MUMmer
|
jpayne@69
|
55 SCRIPTS" section, or refer to their individual documentation in the "docs/"
|
jpayne@69
|
56 subdirectory. Refer to the "RUNNING THE MUMmer UTILITIES" section for a brief
|
jpayne@69
|
57 description of all of the utilities in this directory.
|
jpayne@69
|
58
|
jpayne@69
|
59 Simple use case:
|
jpayne@69
|
60 Given a file containing a single reference sequence (ref.seq) in
|
jpayne@69
|
61 FASTA format and another file containing multiple sequences in FastA
|
jpayne@69
|
62 format (qry.seq) type the following at the command line:
|
jpayne@69
|
63
|
jpayne@69
|
64 './nucmer -p <prefix> ref.seq qry.seq'
|
jpayne@69
|
65
|
jpayne@69
|
66 To produce the following files:
|
jpayne@69
|
67 <prefix>.delta
|
jpayne@69
|
68
|
jpayne@69
|
69 or
|
jpayne@69
|
70
|
jpayne@69
|
71 './run-mummer3.csh ref.seq qry.seq <prefix>'
|
jpayne@69
|
72
|
jpayne@69
|
73 To produce the following files:
|
jpayne@69
|
74 <prefix>.out
|
jpayne@69
|
75 <prefix>.gaps
|
jpayne@69
|
76 <prefix>.align
|
jpayne@69
|
77 <prefix>.errorsgaps
|
jpayne@69
|
78
|
jpayne@69
|
79 Please read the utility-specific documentation in the "docs/" subdirectory
|
jpayne@69
|
80 for descriptions of these files and information on how to change the
|
jpayne@69
|
81 alignment parameters for the scripts (minimum match length, etc.), or see
|
jpayne@69
|
82 the notes below in the "RUNNING THE MUMmer SCRIPTS" section for a brief
|
jpayne@69
|
83 explanation.
|
jpayne@69
|
84
|
jpayne@69
|
85 To see a simple gnuplot output, if you have gnuplot installed, run
|
jpayne@69
|
86 the perl script 'mummerplot' on the output files. This script can be run
|
jpayne@69
|
87 on mummer output (.out), or nucmer/promer output (.delta). Edit the
|
jpayne@69
|
88 <prefix>.gp file that is created to change colors, line thicknesses, etc. or
|
jpayne@69
|
89 explore the <prefix>.[fr]plot file to see the data collection.
|
jpayne@69
|
90
|
jpayne@69
|
91 './mummerplot -p <prefix> <prefix>.out'
|
jpayne@69
|
92
|
jpayne@69
|
93 Or you can use the web viewer for completed microbial genomes:
|
jpayne@69
|
94 http://www.tigr.org/CMR
|
jpayne@69
|
95
|
jpayne@69
|
96
|
jpayne@69
|
97
|
jpayne@69
|
98 -- RUNNING THE MUMmer SCRIPTS --
|
jpayne@69
|
99 Because of MUMmer's modular design, it may be necessary to use a number
|
jpayne@69
|
100 of separate programs to produce the desired output. The MUMmer scripts
|
jpayne@69
|
101 attempt to simplify this process by wrapping various utilities into packages
|
jpayne@69
|
102 that can perform standard alignment requests. Listed below are brief
|
jpayne@69
|
103 descriptions and usage definitions for these scripts. Please refer to the
|
jpayne@69
|
104 "docs/" subdirectory for a more detailed description of each script.
|
jpayne@69
|
105
|
jpayne@69
|
106
|
jpayne@69
|
107 ** nucmer **
|
jpayne@69
|
108
|
jpayne@69
|
109 DESCRIPTION:
|
jpayne@69
|
110 nucmer is for the all-vs-all comparison of nucleotide sequences
|
jpayne@69
|
111 contained in multi-FastA data files. It is best used for highly
|
jpayne@69
|
112 similar sequence that may have large rearrangements. Common use
|
jpayne@69
|
113 cases are: comparing two unfinished shotgun sequencing assemblies,
|
jpayne@69
|
114 mapping an unfinished sequencing assembly to a finished genome, and
|
jpayne@69
|
115 comparing two fairly similar genomes that may have large
|
jpayne@69
|
116 rearrangements and duplications. Please refer to "docs/nucmer.README"
|
jpayne@69
|
117 for more information regarding this script and its output, or type
|
jpayne@69
|
118 'nucmer -h' for a list of its options.
|
jpayne@69
|
119
|
jpayne@69
|
120 USAGE:
|
jpayne@69
|
121 nucmer [options] <reference> <query>
|
jpayne@69
|
122
|
jpayne@69
|
123 [options] type 'nucmer -h' for a list of options.
|
jpayne@69
|
124 <reference> specifies the multi-FastA sequence file that contains
|
jpayne@69
|
125 the reference sequences, to be aligned with the queries.
|
jpayne@69
|
126 <query> specifies the multi-FastA sequence file that contains
|
jpayne@69
|
127 the query sequences, to be aligned with the references.
|
jpayne@69
|
128
|
jpayne@69
|
129 OUTPUT:
|
jpayne@69
|
130 out.delta the delta encoded alignments between the reference and
|
jpayne@69
|
131 query sequences. This file can be parsed with any of
|
jpayne@69
|
132 the show-* programs which are described in the "RUNNING
|
jpayne@69
|
133 THE MUMmer UTILITIES" section.
|
jpayne@69
|
134
|
jpayne@69
|
135 NOTES:
|
jpayne@69
|
136 All output coordinates reference the forward strand of the involved
|
jpayne@69
|
137 sequence, regardless of the match direction. Also, nucmer now uses
|
jpayne@69
|
138 only matches that are unique in the reference sequence by default,
|
jpayne@69
|
139 use the '--mum' or '--maxmatch' options to change this behavior.
|
jpayne@69
|
140
|
jpayne@69
|
141
|
jpayne@69
|
142 ** promer **
|
jpayne@69
|
143
|
jpayne@69
|
144 DESCRIPTION:
|
jpayne@69
|
145 promer is for the protein level, all-vs-all comparison of nucleotide
|
jpayne@69
|
146 sequences contained in multi-FastA data files. The nucleotide input
|
jpayne@69
|
147 files are translated in all 6 reading frames and then aligned to one
|
jpayne@69
|
148 another via the same methods as nucmer. It is best used for highly
|
jpayne@69
|
149 divergent sequences that may have moderate to high similarity on the
|
jpayne@69
|
150 protein level. Common use cases are: identifying syntenic regions
|
jpayne@69
|
151 between highly divergent genomes, comparative genome annotation i.e.
|
jpayne@69
|
152 using an already annotated genome to help in the annotation of a
|
jpayne@69
|
153 newly sequenced genome, and the general comparison of two fairly
|
jpayne@69
|
154 divergent genomes that have large rearrangements and may only be
|
jpayne@69
|
155 similar on the protein level. Please refer to "docs/promer.README"
|
jpayne@69
|
156 for more information regarding this script and its output, or type
|
jpayne@69
|
157 'promer -h' for a list of its options.
|
jpayne@69
|
158
|
jpayne@69
|
159 USAGE:
|
jpayne@69
|
160 promer [options] <reference> <query>
|
jpayne@69
|
161
|
jpayne@69
|
162 [options] type 'promer -h' for a list of options.
|
jpayne@69
|
163 <reference> specifies the multi-FastA sequence file that contains
|
jpayne@69
|
164 the reference sequences, to be aligned with the queries.
|
jpayne@69
|
165 <query> specifies the multi-FastA sequence file that contains
|
jpayne@69
|
166 the query sequences, to be aligned with the references.
|
jpayne@69
|
167
|
jpayne@69
|
168 OUTPUT:
|
jpayne@69
|
169 out.delta the delta encoded alignments between the reference and
|
jpayne@69
|
170 query sequences. This file can be parsed with any of
|
jpayne@69
|
171 the show-* programs which are described in the "RUNNING
|
jpayne@69
|
172 THE MUMmer UTILITIES" section.
|
jpayne@69
|
173
|
jpayne@69
|
174 NOTES:
|
jpayne@69
|
175 All output coordinates reference the forward strand of the involved
|
jpayne@69
|
176 sequence, regardless of the match direction, and are measured in
|
jpayne@69
|
177 nucleotides with the exception of the delta integers which are
|
jpayne@69
|
178 measured in amino acids (1 delta int = 3 nucleotides). Also, promer
|
jpayne@69
|
179 now uses only matches that are unique in the reference sequence by
|
jpayne@69
|
180 default, use the '--mum' or '--maxmatch' options to change this
|
jpayne@69
|
181 behavior.
|
jpayne@69
|
182
|
jpayne@69
|
183
|
jpayne@69
|
184 ** run-mummer1 **
|
jpayne@69
|
185
|
jpayne@69
|
186 DESCRIPTION:
|
jpayne@69
|
187 This script is taken directly from MUMmer1.0 and is best used to
|
jpayne@69
|
188 align two sequences in which there is high similarity and no re-
|
jpayne@69
|
189 arrangements. Common use cases are: aligning two finished bacterial
|
jpayne@69
|
190 chromosomes. Please refer to "docs/run-mummer1.README" for the
|
jpayne@69
|
191 original documentation for this script and its output.
|
jpayne@69
|
192
|
jpayne@69
|
193 USAGE:
|
jpayne@69
|
194 run-mummer1 <seq1> <seq2> <tag> [-r]
|
jpayne@69
|
195
|
jpayne@69
|
196 <seq1> specifies the file with the first sequence in FastA format.
|
jpayne@69
|
197 No more than one sequence is allowed.
|
jpayne@69
|
198 <seq2> specifies the file with the second sequence in FastA format.
|
jpayne@69
|
199 No more than one sequence is allowed.
|
jpayne@69
|
200 <tag> specifies the prefix to be used for the output files.
|
jpayne@69
|
201 [-r] is an optional parameter that will reverse complement the
|
jpayne@69
|
202 second sequence.
|
jpayne@69
|
203
|
jpayne@69
|
204 OUTPUT:
|
jpayne@69
|
205 out.align the out.gaps file interspersed with the alignments
|
jpayne@69
|
206 of the gaps.
|
jpayne@69
|
207 out.errorsgaps the out.gaps file with an extra column stating the
|
jpayne@69
|
208 number of errors contained in each gap.
|
jpayne@69
|
209 out.gaps an ordered (clustered) list of matches with position
|
jpayne@69
|
210 information, and gap distances between each match.
|
jpayne@69
|
211 out.out a list of all maximal unique matches between the two
|
jpayne@69
|
212 input sequences ordered by their start position in the
|
jpayne@69
|
213 second sequence.
|
jpayne@69
|
214
|
jpayne@69
|
215 NOTES:
|
jpayne@69
|
216 All output coordinates reference their respective strand. This means
|
jpayne@69
|
217 that if the -r switch is active, coordinates that reference the
|
jpayne@69
|
218 second sequence will be relative to the reverse complement of the
|
jpayne@69
|
219 second sequence. Please use nucmer or promer if this coordinate
|
jpayne@69
|
220 system is confusing.
|
jpayne@69
|
221 Eventually, this script's components will be rewritten to work
|
jpayne@69
|
222 with the new MUMmer format standards and phased out in favor of the
|
jpayne@69
|
223 new components and wrapping script.
|
jpayne@69
|
224
|
jpayne@69
|
225
|
jpayne@69
|
226 ** run-mummer3 **
|
jpayne@69
|
227
|
jpayne@69
|
228 DESCRIPTION:
|
jpayne@69
|
229 This script is the improved version of the MUMmer1.0 run-mummer1
|
jpayne@69
|
230 script. It uses a new clustering algorithm that appropriately
|
jpayne@69
|
231 handles multiple sequence rearrangements and inversions. Because
|
jpayne@69
|
232 of this, it can handle more divergent sequences better than
|
jpayne@69
|
233 run-mummer1. In addition, it allows a multi-FastA query file for
|
jpayne@69
|
234 1-vs-many sequence comparisons. Please refer to
|
jpayne@69
|
235 "docs/run-mummer3.README" for more detailed documentation of this
|
jpayne@69
|
236 script and its output.
|
jpayne@69
|
237
|
jpayne@69
|
238 USAGE:
|
jpayne@69
|
239 run-mummer3 <reference> <query> <prefix>
|
jpayne@69
|
240
|
jpayne@69
|
241 <reference> specifies the file with the reference sequence in FastA
|
jpayne@69
|
242 format. No more than one sequence is allowed.
|
jpayne@69
|
243 <query> specifies the multi-FastA sequence file that contains
|
jpayne@69
|
244 the query sequences.
|
jpayne@69
|
245 <prefix> specifies the file prefix for the output files.
|
jpayne@69
|
246
|
jpayne@69
|
247 OUTPUT:
|
jpayne@69
|
248 out.align the out.gaps file interspersed with the alignments
|
jpayne@69
|
249 of the gaps.
|
jpayne@69
|
250 out.errorsgaps the out.gaps file with an extra column stating the
|
jpayne@69
|
251 number of errors contained in each gap.
|
jpayne@69
|
252 out.gaps an ordered (clustered) list of matches with position
|
jpayne@69
|
253 information, and gap distances between each match.
|
jpayne@69
|
254 out.out a list of all maximal unique matches between the two
|
jpayne@69
|
255 input sequences ordered by their start position in the
|
jpayne@69
|
256 second sequence.
|
jpayne@69
|
257
|
jpayne@69
|
258 NOTES:
|
jpayne@69
|
259 All output coordinates reference their respective strand. This means
|
jpayne@69
|
260 that for all reverse matches, the coordinates that reference the
|
jpayne@69
|
261 query sequence will be relative to the reverse complement of the
|
jpayne@69
|
262 query sequence. Please use nucmer or promer if this coordinate
|
jpayne@69
|
263 system is confusing.
|
jpayne@69
|
264
|
jpayne@69
|
265
|
jpayne@69
|
266 ** dnadiff **
|
jpayne@69
|
267
|
jpayne@69
|
268 DESCRIPTION:
|
jpayne@69
|
269 This script is a wrapper around nucmer that builds an
|
jpayne@69
|
270 alignment using default parameters, and runs many of nucmer's
|
jpayne@69
|
271 helper scripts to process the output and report alignment
|
jpayne@69
|
272 statistics, SNPs, breakpoints, etc. It is designed for
|
jpayne@69
|
273 evaluating the sequence and structural similarity of two
|
jpayne@69
|
274 highly similar sequence sets. E.g. comparing two different
|
jpayne@69
|
275 assemblies of the same organism, or comparing two strains of
|
jpayne@69
|
276 the same species. Please refer to "docs/dnadiff.README" for
|
jpayne@69
|
277 more information regarding this script and its output, or type
|
jpayne@69
|
278 'dnadiff -h' for a list of its options.
|
jpayne@69
|
279
|
jpayne@69
|
280 USAGE: dnadiff [options] <reference> <query>
|
jpayne@69
|
281 or dnadiff [options] -d <delta file>
|
jpayne@69
|
282
|
jpayne@69
|
283 <reference> Set the input reference multi-FASTA filename
|
jpayne@69
|
284 <query> Set the input query multi-FASTA filename
|
jpayne@69
|
285 or
|
jpayne@69
|
286 <delta file> Unfiltered .delta alignment file from nucmer
|
jpayne@69
|
287
|
jpayne@69
|
288 OUTPUT:
|
jpayne@69
|
289 .report - Summary of alignments, differences and SNPs
|
jpayne@69
|
290 .delta - Standard nucmer alignment output
|
jpayne@69
|
291 .1delta - 1-to-1 alignment from delta-filter -1
|
jpayne@69
|
292 .mdelta - M-to-M alignment from delta-filter -m
|
jpayne@69
|
293 .1coords - 1-to-1 coordinates from show-coords -THrcl .1delta
|
jpayne@69
|
294 .mcoords - M-to-M coordinates from show-coords -THrcl .mdelta
|
jpayne@69
|
295 .snps - SNPs from show-snps -rlTHC .1delta
|
jpayne@69
|
296 .rdiff - Classified ref breakpoints from show-diff -rH .mdelta
|
jpayne@69
|
297 .qdiff - Classified qry breakpoints from show-diff -qH .mdelta
|
jpayne@69
|
298 .unref - Unaligned reference IDs and lengths (if applicable)
|
jpayne@69
|
299 .unqry - Unaligned query IDs and lengths (if applicable)
|
jpayne@69
|
300
|
jpayne@69
|
301 NOTES:
|
jpayne@69
|
302 The report file generated by this script can be useful for
|
jpayne@69
|
303 comparing the differences between two similar genomes or
|
jpayne@69
|
304 assemblies. The other outputs generated by this script are in
|
jpayne@69
|
305 unlabeled tabular format, so please refer to the utility
|
jpayne@69
|
306 specific documentation for interpreting them. A full
|
jpayne@69
|
307 description of the report file is given in "docs/dnadiff.README".
|
jpayne@69
|
308
|
jpayne@69
|
309
|
jpayne@69
|
310 -- RUNNING THE MUMmer UTILITIES --
|
jpayne@69
|
311 The MUMmer package consists of various utilities that can interact with
|
jpayne@69
|
312 the 'mummer' program. 'mummer' performs all maximal and maximal unique
|
jpayne@69
|
313 matching, and all other utilities were designed to process the input and
|
jpayne@69
|
314 output of this program and its related scripts, in order to extract
|
jpayne@69
|
315 additional information from the output. Listed below are the descriptions
|
jpayne@69
|
316 and usage definitions for these utilities.
|
jpayne@69
|
317
|
jpayne@69
|
318
|
jpayne@69
|
319 ** annotate **
|
jpayne@69
|
320
|
jpayne@69
|
321 DESCRIPTION:
|
jpayne@69
|
322 This program reads the output of the 'gaps' program and adds alignment
|
jpayne@69
|
323 information to it. Part of the original MUMmer1.0 pipeline and can
|
jpayne@69
|
324 only be used on the output of the 'gaps' program.
|
jpayne@69
|
325
|
jpayne@69
|
326 USAGE:
|
jpayne@69
|
327 annotate <gapsfile> <seq2>
|
jpayne@69
|
328
|
jpayne@69
|
329 <gapsfile> the output of the 'gaps' program.
|
jpayne@69
|
330 <seq2> the file containing the second sequence in the comparison.
|
jpayne@69
|
331
|
jpayne@69
|
332 OUTPUT:
|
jpayne@69
|
333 stdout the 'gaps' output interspersed with the alignments of
|
jpayne@69
|
334 the gaps between adjacent MUMs. An alignment of a
|
jpayne@69
|
335 gap comes after the second MUM defining the gap, and
|
jpayne@69
|
336 alignment errors are marked with a '^' character.
|
jpayne@69
|
337 witherrors.gaps the 'gaps' output with an appended column that lists
|
jpayne@69
|
338 the number of alignment errors for each gap.
|
jpayne@69
|
339
|
jpayne@69
|
340 NOTES:
|
jpayne@69
|
341 This program will eventually be dropped in favor of the combineMUMs
|
jpayne@69
|
342 or nucmer match extenders, but persists for the time being.
|
jpayne@69
|
343
|
jpayne@69
|
344
|
jpayne@69
|
345 ** combineMUMs **
|
jpayne@69
|
346
|
jpayne@69
|
347 DESCRIPTION:
|
jpayne@69
|
348 This program reads the output of the 'mgaps' program and adds alignment
|
jpayne@69
|
349 information to it. Part of the MUMmer3.0 pipeline and can only be
|
jpayne@69
|
350 used on the output of the 'mgaps' program. This -D option alters this
|
jpayne@69
|
351 behavior and only outputs the positions of difference, e.g. SNPs.
|
jpayne@69
|
352
|
jpayne@69
|
353 USAGE:
|
jpayne@69
|
354 combineMUMs [options] <reference> <query> <mgapsfile>
|
jpayne@69
|
355
|
jpayne@69
|
356 [options] type 'combineMUMs -h' for a list of options.
|
jpayne@69
|
357 <reference> the FastA reference file used in the comparison.
|
jpayne@69
|
358 <query> the multi-FastA reference file used in the comparison.
|
jpayne@69
|
359 <mgapsfile> the output of the 'mgaps' program run on the match
|
jpayne@69
|
360 list produced by 'mummer' for the reference and query
|
jpayne@69
|
361 files.
|
jpayne@69
|
362
|
jpayne@69
|
363 OUTPUT:
|
jpayne@69
|
364 stdout the 'mgaps' output interspersed with the alignments
|
jpayne@69
|
365 of the gaps between adjacent MUMs. An alignment of a
|
jpayne@69
|
366 gap comes after the second MUM defining the gap, and
|
jpayne@69
|
367 alignment errors are marked with a '^' character. At
|
jpayne@69
|
368 the end of each cluster is a summary line (keyword
|
jpayne@69
|
369 "Region") noting the bounds of the cluster in the
|
jpayne@69
|
370 reference and query sequences, the total number of
|
jpayne@69
|
371 errors for the region, the length of the region and
|
jpayne@69
|
372 the percent error of the region.
|
jpayne@69
|
373 witherrors.gaps the 'mgaps' output with an appended column that lists
|
jpayne@69
|
374 the number of alignment errors for each gap.
|
jpayne@69
|
375
|
jpayne@69
|
376
|
jpayne@69
|
377 ** delta-filter **
|
jpayne@69
|
378
|
jpayne@69
|
379 DESCRIPTION:
|
jpayne@69
|
380
|
jpayne@69
|
381 This program filters a delta alignment file produced by either
|
jpayne@69
|
382 nucmer or promer, leaving only the desired alignments which
|
jpayne@69
|
383 are output to stdout in the same delta format as the
|
jpayne@69
|
384 input. Its primary function is the LIS algorithm which
|
jpayne@69
|
385 calculates the longest increasing subset of alignments. This
|
jpayne@69
|
386 allows for the calculation of a global set of alignments
|
jpayne@69
|
387 (i.e. 1-to-1 and mutually consistent order) with the -g option
|
jpayne@69
|
388 or locally consistent with -1 or -m. Reference sequences can
|
jpayne@69
|
389 be mapped to query sequences with -r, or queries to references
|
jpayne@69
|
390 with -q. This allows the user to exclude chance and repeat
|
jpayne@69
|
391 induced alignments, leaving only the "best" alignments between
|
jpayne@69
|
392 the two data sets. Filtering can also be performed on length,
|
jpayne@69
|
393 identity, and uniquenes.
|
jpayne@69
|
394
|
jpayne@69
|
395 USAGE:
|
jpayne@69
|
396 delta-filter [options] <deltafile>
|
jpayne@69
|
397
|
jpayne@69
|
398 [options] type 'delta-filter -h' for a list of options.
|
jpayne@69
|
399 <deltafile> the .delta output file from either nucmer or promer.
|
jpayne@69
|
400
|
jpayne@69
|
401 OUTPUT:
|
jpayne@69
|
402 stdout The same delta alignment format as output by nucmer and promer.
|
jpayne@69
|
403
|
jpayne@69
|
404 NOTES:
|
jpayne@69
|
405 For most cases the -m option is recommended, however -1 is
|
jpayne@69
|
406 useful for applications that require a 1-to-1 mapping, such as
|
jpayne@69
|
407 SNP finding. Use the -q option for mapping query contigs to
|
jpayne@69
|
408 their best reference location.
|
jpayne@69
|
409
|
jpayne@69
|
410
|
jpayne@69
|
411 ** exact-tandems **
|
jpayne@69
|
412
|
jpayne@69
|
413 DESCRIPTION:
|
jpayne@69
|
414 This script finds exact tandem repeats in a specified FastA sequence
|
jpayne@69
|
415 file. It is a post-processor for 'repeat-match' and provides a simple
|
jpayne@69
|
416 interface and output for tandem repeat detection.
|
jpayne@69
|
417
|
jpayne@69
|
418 USAGE:
|
jpayne@69
|
419 exact-tandems <file> <min match>
|
jpayne@69
|
420
|
jpayne@69
|
421 <file> the single sequence in FastA format to search for repeats.
|
jpayne@69
|
422 <min match> the minimum match length for the tandems.
|
jpayne@69
|
423
|
jpayne@69
|
424 OUTPUT:
|
jpayne@69
|
425 stdout 4 columns, the start of the tandem repeat, the total extent
|
jpayne@69
|
426 of the repeat region, the length of each repetitive unit, and
|
jpayne@69
|
427 to total copies of the repetitive unit involved.
|
jpayne@69
|
428
|
jpayne@69
|
429
|
jpayne@69
|
430 ** gaps **
|
jpayne@69
|
431
|
jpayne@69
|
432 DESCRIPTION:
|
jpayne@69
|
433 This program reads a list of unique matches between two strings and
|
jpayne@69
|
434 outputs the longest consistent set of matches, followed by all the
|
jpayne@69
|
435 other matches. Part of the MUMmer1.0 pipeline and the output of the
|
jpayne@69
|
436 'mummer' program needs to be processed (to strip all non-match lines)
|
jpayne@69
|
437 before it can be passed to this program.
|
jpayne@69
|
438
|
jpayne@69
|
439 USAGE:
|
jpayne@69
|
440 gaps <seq1> [-r] < <matchlist>
|
jpayne@69
|
441
|
jpayne@69
|
442 <seq1> The first sequence file that the match list represents.
|
jpayne@69
|
443 <matchlist> A simple list of matches and NO header lines or other
|
jpayne@69
|
444 mumbo jumbo. The columns of the match list should be
|
jpayne@69
|
445 start in the reference, start in the query, and length
|
jpayne@69
|
446 of the match.
|
jpayne@69
|
447 [-r] Simply puts the string "reverse" on the header of the
|
jpayne@69
|
448 output so 'annotate' knows to reverse the second
|
jpayne@69
|
449 sequence.
|
jpayne@69
|
450
|
jpayne@69
|
451 OUTPUT:
|
jpayne@69
|
452 stdout an ordered set of the input matches, separated by headers.
|
jpayne@69
|
453 The first set is the longest consistent set of matches and
|
jpayne@69
|
454 the second set is all other matches.
|
jpayne@69
|
455
|
jpayne@69
|
456 NOTES:
|
jpayne@69
|
457 This program will eventually be rewritten to be interchangeable with
|
jpayne@69
|
458 'mgaps', so that it may be plugged into the nucmer or promer
|
jpayne@69
|
459 pipelines.
|
jpayne@69
|
460
|
jpayne@69
|
461
|
jpayne@69
|
462 ** mapview **
|
jpayne@69
|
463
|
jpayne@69
|
464 DESCRIPTION:
|
jpayne@69
|
465 mapview is a utility program for displaying sequence alignments as
|
jpayne@69
|
466 provided by MUMmer, nucmer or promer. This program takes the output
|
jpayne@69
|
467 from these alignment routines and converts it to a FIG, PDF or PS
|
jpayne@69
|
468 file for visual analysis. It can also break the output into multiple
|
jpayne@69
|
469 files for easier viewing and printing. Please refer to
|
jpayne@69
|
470 "docs/mapview.README" for a more detailed description and explination.
|
jpayne@69
|
471
|
jpayne@69
|
472 USAGE:
|
jpayne@69
|
473 mapview [options] <coords file> [UTR coords] [CDS coords]
|
jpayne@69
|
474
|
jpayne@69
|
475 [options] type 'mapview -h' for a list of options.
|
jpayne@69
|
476 <coords file> show-coords output file
|
jpayne@69
|
477 [UTR coords] UTR coordinate file in GFF format
|
jpayne@69
|
478 [CDS coords] CDS coordinate file in GFF format
|
jpayne@69
|
479
|
jpayne@69
|
480 OUTPUT:
|
jpayne@69
|
481 Default output format is an xfig file, however this can be changed to
|
jpayne@69
|
482 a postscript of PDF file with the -f option. See 'mapview -h' for a
|
jpayne@69
|
483 list of available formatting options.
|
jpayne@69
|
484
|
jpayne@69
|
485 NOTES:
|
jpayne@69
|
486 The produce the coords file input, 'show-coords' must be run with the
|
jpayne@69
|
487 -r -l options. To reduce redundant matches in promer output, run
|
jpayne@69
|
488 show-coords with the -k option. To generate output formats other than
|
jpayne@69
|
489 xfig, the fig2dev utility must be available from the system path. For
|
jpayne@69
|
490 very large reference genomes, FIG format may be the only option that
|
jpayne@69
|
491 will allow the entire display to be stored in one file, as fig2dev has
|
jpayne@69
|
492 problems if the output is too large.
|
jpayne@69
|
493
|
jpayne@69
|
494
|
jpayne@69
|
495 ** mgaps **
|
jpayne@69
|
496
|
jpayne@69
|
497 DESCRIPTION:
|
jpayne@69
|
498 This program reads a list of matches between a single-FastA reference
|
jpayne@69
|
499 and a multi-FastA query file and outputs clusters of matches that lie
|
jpayne@69
|
500 on similar diagonals and within a reasonable distance. Part of the
|
jpayne@69
|
501 MUMmer3.0 pipeline and the output of 'mummer' need not be processed
|
jpayne@69
|
502 before passing it to this program, so long as 'mummer' was run on a
|
jpayne@69
|
503 1-vs-many or 1-vs-1 dataset.
|
jpayne@69
|
504
|
jpayne@69
|
505 USAGE:
|
jpayne@69
|
506 mgaps [options] < <matchlist>
|
jpayne@69
|
507
|
jpayne@69
|
508 [options] type 'mgaps -h' for a list of options.
|
jpayne@69
|
509 <matchlist> A list of matches separated by their sequence FastA tags.
|
jpayne@69
|
510 The columns of the match list should be start in
|
jpayne@69
|
511 reference, start in query, and length of the match.
|
jpayne@69
|
512
|
jpayne@69
|
513 OUTPUT:
|
jpayne@69
|
514 stdout An ordered set of the input matches, separated by headers.
|
jpayne@69
|
515 Individual clusters are separated by a '#' character and
|
jpayne@69
|
516 sets of clusters from different sequences are separated by
|
jpayne@69
|
517 the FastA header tag for the query sequence.
|
jpayne@69
|
518
|
jpayne@69
|
519 NOTES:
|
jpayne@69
|
520 It is often very helpful to adjust the clustering parameters. Check
|
jpayne@69
|
521 'mgaps -h' for the list of parameters and check the source for a
|
jpayne@69
|
522 better idea of how each parameter affects the result. Often, it is
|
jpayne@69
|
523 helpful to run this program a number of times with different
|
jpayne@69
|
524 parameters until the desired result is achieved.
|
jpayne@69
|
525
|
jpayne@69
|
526
|
jpayne@69
|
527 ** mummer **
|
jpayne@69
|
528
|
jpayne@69
|
529 DESCRIPTION:
|
jpayne@69
|
530 This is the core program of the MUMmer package. It is the suffix-tree
|
jpayne@69
|
531 based match finding routine, and the main part of every MUMmer script.
|
jpayne@69
|
532 For a detailed manual describing how to use this program, please refer
|
jpayne@69
|
533 to "docs/maxmat3man.pdf" or in LaTeX format "docs/maxmat3man.tex". By
|
jpayne@69
|
534 default, 'mummer' now finds maximal matches regardless of their
|
jpayne@69
|
535 uniqueness. Limiting the output to only unique matches can be specified
|
jpayne@69
|
536 as a command line switch.
|
jpayne@69
|
537
|
jpayne@69
|
538 USAGE:
|
jpayne@69
|
539 mummer [options] <reference> <query> ...
|
jpayne@69
|
540
|
jpayne@69
|
541 [options] type 'mummer -help' for a list of options.
|
jpayne@69
|
542 <reference> specifies the single or multi-FastA sequence file that
|
jpayne@69
|
543 contains the reference sequence(s), to be aligned with
|
jpayne@69
|
544 the queries.
|
jpayne@69
|
545 <query> specifies the multi-FastA sequence file that contains
|
jpayne@69
|
546 the query sequences, to be aligned with the references.
|
jpayne@69
|
547 Multiple query files are allowed, up to 32.
|
jpayne@69
|
548
|
jpayne@69
|
549 OUTPUT:
|
jpayne@69
|
550 stdout a list of exact matches. Varies depending on input, refer to
|
jpayne@69
|
551 the manual specified in the description above.
|
jpayne@69
|
552
|
jpayne@69
|
553 NOTES:
|
jpayne@69
|
554 Many thanks to Stefan Kurtz for the latest mummer version. 'mummer'
|
jpayne@69
|
555 now behaves like the old 'mummer2' program by default. The -mum switch
|
jpayne@69
|
556 forces it to behave like 'mummer1', the -mumreference switch forces it
|
jpayne@69
|
557 to behave like 'mummer2' while the -maxmatch switch forces it to behave
|
jpayne@69
|
558 like the old 'max-match' program.
|
jpayne@69
|
559
|
jpayne@69
|
560
|
jpayne@69
|
561 ** mummerplot **
|
jpayne@69
|
562
|
jpayne@69
|
563 DESCRIPTION:
|
jpayne@69
|
564 mummerplot is a perl script that generates gnuplot scripts and data
|
jpayne@69
|
565 collections for plotting with the gnuplot utility. It can generate
|
jpayne@69
|
566 2-d dotplots and 1-d coverage plots for the output of mummer, nucmer,
|
jpayne@69
|
567 promer or show-tiling. It can also color dotplots with an identity
|
jpayne@69
|
568 color gradient.
|
jpayne@69
|
569
|
jpayne@69
|
570 USAGE:
|
jpayne@69
|
571 mummerplot [options] <matchfile>
|
jpayne@69
|
572
|
jpayne@69
|
573 [options] type 'mummerplot -h' for a list of options.
|
jpayne@69
|
574 <matchfile> the output of 'mummer', 'nucmer', 'promer', or
|
jpayne@69
|
575 'show-tiling'. 'mummerplot' will automatically determine
|
jpayne@69
|
576 the format of the data it was given and produce the plot
|
jpayne@69
|
577 accordingly.
|
jpayne@69
|
578
|
jpayne@69
|
579 OUTPUT:
|
jpayne@69
|
580 out.gp The gnuplot script, type 'gnuplot out.gp' to evaluate the
|
jpayne@69
|
581 the gnuplot script.
|
jpayne@69
|
582 out.fplot
|
jpayne@69
|
583 out.rplot
|
jpayne@69
|
584 out.hplot The forward, reverse and highlighted match information for
|
jpayne@69
|
585 plotting with gnuplot.
|
jpayne@69
|
586
|
jpayne@69
|
587 out.ps
|
jpayne@69
|
588 out.png The plotted image file, postscript or png depending on the
|
jpayne@69
|
589 selected terminal type.
|
jpayne@69
|
590
|
jpayne@69
|
591 NOTES:
|
jpayne@69
|
592 For alignments with multiple reference or query sequences, be sure to
|
jpayne@69
|
593 use the -r -q or -R -Q options to avoid overlaying multiple plots in
|
jpayne@69
|
594 the same space. For better looking color gradient plots, try the
|
jpayne@69
|
595 postscript terminal and avoid the png terminal.
|
jpayne@69
|
596
|
jpayne@69
|
597
|
jpayne@69
|
598 ** nucmer2xfig **
|
jpayne@69
|
599
|
jpayne@69
|
600 DESCRIPTION:
|
jpayne@69
|
601 Script for plotting nucmer hits against a reference sequence. See top
|
jpayne@69
|
602 of script for more information, or see if 'mummerplot' or 'mapview'
|
jpayne@69
|
603 has the functionality required as they are properly maintained.
|
jpayne@69
|
604
|
jpayne@69
|
605
|
jpayne@69
|
606 ** repeat-match **
|
jpayne@69
|
607
|
jpayne@69
|
608 DESCRIPTION:
|
jpayne@69
|
609 Finds exact repeats within a single sequence.
|
jpayne@69
|
610
|
jpayne@69
|
611 USAGE:
|
jpayne@69
|
612 repeat-match [options] <seq>
|
jpayne@69
|
613
|
jpayne@69
|
614 [options] type 'repeat-match -h' for a list of options.
|
jpayne@69
|
615 <seq> the single sequence in FastA format to search for repeats.
|
jpayne@69
|
616
|
jpayne@69
|
617 OUTPUT:
|
jpayne@69
|
618 stdout 3 columns, the start of the first copy of the repeat, the
|
jpayne@69
|
619 start of the second copy of the repeat, and the length of the
|
jpayne@69
|
620 repeat respectively.
|
jpayne@69
|
621
|
jpayne@69
|
622 NOTES:
|
jpayne@69
|
623 REPuter (freely available for universities) may be better suited for
|
jpayne@69
|
624 most repeat matching, but 'repeat-match' is open-source and has some
|
jpayne@69
|
625 functionality that REPuter does not so we include it along with the
|
jpayne@69
|
626 MUMmer package.
|
jpayne@69
|
627
|
jpayne@69
|
628
|
jpayne@69
|
629 ** show-aligns **
|
jpayne@69
|
630
|
jpayne@69
|
631 DESCRIPTION:
|
jpayne@69
|
632 This program parses the delta alignment output of nucmer and promer
|
jpayne@69
|
633 and displays all of the pairwise alignments from the two sequences
|
jpayne@69
|
634 specified on the command line.
|
jpayne@69
|
635
|
jpayne@69
|
636 USAGE:
|
jpayne@69
|
637 show-aligns [options] <deltafile> <IdR> <IdQ>
|
jpayne@69
|
638
|
jpayne@69
|
639 [options] type 'show-aligns -h' for a list of options.
|
jpayne@69
|
640 <deltafile> the .delta output file from either nucmer or promer.
|
jpayne@69
|
641 <IdR> the FastA header tag of the desired reference sequence.
|
jpayne@69
|
642 <IdQ> the FastA header tag of the desired query sequence.
|
jpayne@69
|
643
|
jpayne@69
|
644 OUTPUT:
|
jpayne@69
|
645 stdout each alignment header and footer describes the frame of the
|
jpayne@69
|
646 alignment in each sequence, and the start and finish
|
jpayne@69
|
647 (inclusive) of the alignment in each sequence. At the
|
jpayne@69
|
648 beginning of each line of aligned sequence are two numbers, the
|
jpayne@69
|
649 top is the coordinate of the first reference base on that line
|
jpayne@69
|
650 and the bottom is the coordinate of the first query base on
|
jpayne@69
|
651 that line. ALL coordinates reference the forward strand of the
|
jpayne@69
|
652 DNA sequence, even if it is a protein alignment. A gap caused
|
jpayne@69
|
653 by an insertion or deletion is filled with a '.' character.
|
jpayne@69
|
654 Errors in a DNA alignment are marked with a '^' below the
|
jpayne@69
|
655 error. Errors in an amino acid alignment are marked with a
|
jpayne@69
|
656 whitespace in the middle consensus line, while matches are
|
jpayne@69
|
657 marked with the consensus base and similarities are marked with
|
jpayne@69
|
658 a '+' in the consensus line.
|
jpayne@69
|
659
|
jpayne@69
|
660
|
jpayne@69
|
661 ** show-coords **
|
jpayne@69
|
662
|
jpayne@69
|
663 DESCRIPTION:
|
jpayne@69
|
664 This program parses the delta alignment output of nucmer and promer
|
jpayne@69
|
665 and displays the coordinates, and other useful information about the
|
jpayne@69
|
666 alignments.
|
jpayne@69
|
667
|
jpayne@69
|
668 USAGE:
|
jpayne@69
|
669 show-coords [options] <deltafile>
|
jpayne@69
|
670
|
jpayne@69
|
671 [options] type 'show-coords -h' for a list of options.
|
jpayne@69
|
672 <deltafile> the .delta output file from either nucmer or promer.
|
jpayne@69
|
673
|
jpayne@69
|
674 OUTPUT:
|
jpayne@69
|
675 stdout run 'show-coords' without the -H option to see the column
|
jpayne@69
|
676 header tags. Here is a description of each tag. Note that
|
jpayne@69
|
677 some of the below tags do not apply to nucmer data, and that
|
jpayne@69
|
678 all coordinates are inclusive and relative to the forward DNA
|
jpayne@69
|
679 strand.
|
jpayne@69
|
680
|
jpayne@69
|
681 [S1] Start of the alignment region in the reference sequence.
|
jpayne@69
|
682
|
jpayne@69
|
683 [E1] End of the alignment region in the reference sequence.
|
jpayne@69
|
684
|
jpayne@69
|
685 [S2] Start of the alignment region in the query sequence.
|
jpayne@69
|
686
|
jpayne@69
|
687 [E2] End of the alignment region in the query sequence.
|
jpayne@69
|
688
|
jpayne@69
|
689 [LEN 1] Length of the alignment region in the reference sequence,
|
jpayne@69
|
690 measured in nucleotides.
|
jpayne@69
|
691
|
jpayne@69
|
692 [LEN 2] Length of the alignment region in the query sequence, measured
|
jpayne@69
|
693 in nucleotides.
|
jpayne@69
|
694
|
jpayne@69
|
695 [% IDY] Percent identity of the alignment, calculated as the
|
jpayne@69
|
696 (number of exact matches) / ([LEN 1] + insertions in the query).
|
jpayne@69
|
697
|
jpayne@69
|
698 [% SIM] Percent similarity of the alignment, calculated like the above
|
jpayne@69
|
699 value, but counting positive BLOSUM matrix scores instead of exact
|
jpayne@69
|
700 matches.
|
jpayne@69
|
701
|
jpayne@69
|
702 [% STP] Percent of stop codons of the alignment, calculated as
|
jpayne@69
|
703 (number of stop codons) / (([LEN 1] + insertions in the query) * 2).
|
jpayne@69
|
704
|
jpayne@69
|
705 [LEN R] Length of the reference sequence.
|
jpayne@69
|
706
|
jpayne@69
|
707 [LEN Q] Length of the query sequence.
|
jpayne@69
|
708
|
jpayne@69
|
709 [COV R] Percent coverage of the alignment on the reference sequence,
|
jpayne@69
|
710 calculated as [LEN 1] / [LEN R].
|
jpayne@69
|
711
|
jpayne@69
|
712 [COV Q] Percent coverage of the alignment on the query sequence,
|
jpayne@69
|
713 calculated as [LEN 2] / [LEN Q].
|
jpayne@69
|
714
|
jpayne@69
|
715 [FRM] Reading frame for the reference sequence and the reading frame
|
jpayne@69
|
716 for the query sequence respectively. This is one of the columns
|
jpayne@69
|
717 absent from the nucmer data, however, match direction can easily be
|
jpayne@69
|
718 determined by the start and end coordinates.
|
jpayne@69
|
719
|
jpayne@69
|
720 [TAGS] The reference FastA ID and the query FastA ID.
|
jpayne@69
|
721
|
jpayne@69
|
722 There is also an optional final column (turned on with the -w
|
jpayne@69
|
723 or -o option) that will contain some 'annotations'. The -o option will
|
jpayne@69
|
724 annotate alignments that represent overlaps between two sequences,
|
jpayne@69
|
725 while the -w option is antiquated and should no longer be used.
|
jpayne@69
|
726 Sometimes, nucmer or promer will extend adjacent clusters past one
|
jpayne@69
|
727 another, thus causing a somewhat redundant output, this option will
|
jpayne@69
|
728 notify users of such rare occurrences.
|
jpayne@69
|
729
|
jpayne@69
|
730 NOTES:
|
jpayne@69
|
731 The -c and -l options are useful when comparing two sets of assembly
|
jpayne@69
|
732 contigs, in that these options help determine if an alignment spans an
|
jpayne@69
|
733 entire contig, or is just a partial hit to a different read. The -b
|
jpayne@69
|
734 option is useful when the user wishes to identify sytenic regions
|
jpayne@69
|
735 between two genomes, but is not particularly interested in the actual
|
jpayne@69
|
736 alignment similarity or appearance. This option also disregards match
|
jpayne@69
|
737 orientation, so should not be used if this information is needed.
|
jpayne@69
|
738
|
jpayne@69
|
739
|
jpayne@69
|
740 ** show-diff **
|
jpayne@69
|
741
|
jpayne@69
|
742 DESCRIPTION:
|
jpayne@69
|
743 This program classifies alignment breakpoints for the
|
jpayne@69
|
744 quantification of macroscopic differences between two
|
jpayne@69
|
745 genomes. It takes a standard, unfiltered delta file as input,
|
jpayne@69
|
746 determines the best mapping between the two sequence sets, and
|
jpayne@69
|
747 reports on the breaks in that mapping.
|
jpayne@69
|
748
|
jpayne@69
|
749 USAGE:
|
jpayne@69
|
750 show-diff [options] <deltafile>
|
jpayne@69
|
751
|
jpayne@69
|
752 [options] type 'show-diff -h' for a list of options.
|
jpayne@69
|
753 <deltafile> the .delta output file from nucmer
|
jpayne@69
|
754
|
jpayne@69
|
755 OUTPUT:
|
jpayne@69
|
756 stdout Classified breakpoints are output one per line with
|
jpayne@69
|
757 the following types and column definitions. The first
|
jpayne@69
|
758 five columns of every row are seq ID, feature type,
|
jpayne@69
|
759 feature start, feature end, and feature length.
|
jpayne@69
|
760
|
jpayne@69
|
761 Feature Columns
|
jpayne@69
|
762
|
jpayne@69
|
763 IDR GAP gap-start gap-end gap-length-R gap-length-Q gap-diff
|
jpayne@69
|
764 IDR DUP dup-start dup-end dup-length
|
jpayne@69
|
765 IDR BRK gap-start gap-end gap-length
|
jpayne@69
|
766 IDR JMP gap-start gap-end gap-length
|
jpayne@69
|
767 IDR INV gap-start gap-end gap-length
|
jpayne@69
|
768 IDR SEQ gap-start gap-end gap-length prev-sequence next-sequence
|
jpayne@69
|
769
|
jpayne@69
|
770 Feature Types
|
jpayne@69
|
771
|
jpayne@69
|
772 [GAP] A gap between two mutually consistent ordered and
|
jpayne@69
|
773 oriented alignments. gap-length-R is the length of the
|
jpayne@69
|
774 alignment gap in the reference, gap-length-Q is the length of
|
jpayne@69
|
775 the alignment gap in the query, and gap-diff is the difference
|
jpayne@69
|
776 between the two gap lengths. If gap-diff is positive, sequence
|
jpayne@69
|
777 has been inserted in the reference. If gap-diff is negative,
|
jpayne@69
|
778 sequence has been deleted from the reference. If both
|
jpayne@69
|
779 gap-length-R and gap-length-Q are negative, the indel is
|
jpayne@69
|
780 tandem duplication copy difference.
|
jpayne@69
|
781
|
jpayne@69
|
782 [DUP] A duplicated sequence in the reference that occurs more
|
jpayne@69
|
783 times in the reference than in the query. The coordinate
|
jpayne@69
|
784 columns specify the bounds and length of the
|
jpayne@69
|
785 duplication. These features are often bookended by BRK
|
jpayne@69
|
786 features if there is unique sequence bounding the duplication.
|
jpayne@69
|
787
|
jpayne@69
|
788 [BRK] An insertion in the reference of unknown origin, that
|
jpayne@69
|
789 indicates no query sequence aligns to the sequence bounded by
|
jpayne@69
|
790 gap-start and gap-end. Often found around DUP elements or at
|
jpayne@69
|
791 the beginning or end of sequences.
|
jpayne@69
|
792
|
jpayne@69
|
793 [JMP] A relocation event, where the consistent ordering of
|
jpayne@69
|
794 alignments is disrupted. The coordinate columns specify the
|
jpayne@69
|
795 breakpoints of the relocation in the reference, and the
|
jpayne@69
|
796 gap-length between them. A negative gap-length indicates the
|
jpayne@69
|
797 relocation occurred around a repetitive sequence, and a
|
jpayne@69
|
798 positive length indicates unique sequence between the
|
jpayne@69
|
799 alignments.
|
jpayne@69
|
800
|
jpayne@69
|
801 [INV] The same as a relocation event, however both the
|
jpayne@69
|
802 ordering and orientation of the alignments is disrupted. Note
|
jpayne@69
|
803 that for JMP and INV, generally two features will be output,
|
jpayne@69
|
804 one for the beginning of the inverted region, and another for
|
jpayne@69
|
805 the end of the inverted region.
|
jpayne@69
|
806
|
jpayne@69
|
807 [SEQ] A translocation event that requires jumping to a new
|
jpayne@69
|
808 query sequence in order to continue aligning to the
|
jpayne@69
|
809 reference. If each input sequence is a chromosome, these
|
jpayne@69
|
810 features correspond to inter-chromosomal translocations.
|
jpayne@69
|
811
|
jpayne@69
|
812 NOTES:
|
jpayne@69
|
813 The estimated number of features, take inversions for example,
|
jpayne@69
|
814 represents the number of breakpoints classified as bordering
|
jpayne@69
|
815 an inversion. Therefore, since there will be a breakpoint at
|
jpayne@69
|
816 both the beginning and the end of an inversion, the feature
|
jpayne@69
|
817 counts are roughly double the number of inversion events. In
|
jpayne@69
|
818 addition, all counts are estimates and do not represent the
|
jpayne@69
|
819 exact number of each evolutionary event.
|
jpayne@69
|
820
|
jpayne@69
|
821 Summing the fifth column (ignoring negative values) yeilds an
|
jpayne@69
|
822 estimate of the total inserted sequence in the
|
jpayne@69
|
823 reference. Summing the fifth column after removing DUP
|
jpayne@69
|
824 features yields an estimate of the total amount of unique
|
jpayne@69
|
825 (unaligned) sequence in the reference. Note that unaligned
|
jpayne@69
|
826 sequences are not counted, and could represent additional
|
jpayne@69
|
827 "unique" sequences. Use the 'dnadiff' script if you must
|
jpayne@69
|
828 recover this information. Finally, the -q option switches
|
jpayne@69
|
829 references for queries, and uses the query coordinates for the
|
jpayne@69
|
830 analysis.
|
jpayne@69
|
831
|
jpayne@69
|
832
|
jpayne@69
|
833 ** show-snps **
|
jpayne@69
|
834
|
jpayne@69
|
835 DESCRIPTION:
|
jpayne@69
|
836 This program reports polymorphism contained in a delta encoded
|
jpayne@69
|
837 alignment file output by either nucmer or promer. It catalogs
|
jpayne@69
|
838 all of the single nucleotide polymorphisms (SNPs) and
|
jpayne@69
|
839 insertions/deletions within the delta file
|
jpayne@69
|
840 alignments. Polymorphisms are reported one per line, in a
|
jpayne@69
|
841 delimited fashion similar to show-coords. Pairing this program
|
jpayne@69
|
842 with the appropriate MUMmer tools can create an easy to use
|
jpayne@69
|
843 SNP pipeline for the rapid identification of putative SNPs
|
jpayne@69
|
844 between any two sequence sets.
|
jpayne@69
|
845
|
jpayne@69
|
846 USAGE:
|
jpayne@69
|
847 show-snps [options] <deltafile>
|
jpayne@69
|
848
|
jpayne@69
|
849 [options] type 'show-snps -h' for a list of options.
|
jpayne@69
|
850 <deltafile> the .delta output file from either nucmer or promer.
|
jpayne@69
|
851
|
jpayne@69
|
852 OUTPUT:
|
jpayne@69
|
853 stdout Standard output has column headers with the following
|
jpayne@69
|
854 meanings. Not all columns will be output by default,
|
jpayne@69
|
855 see 'show-snps -h' for switch to control the output.
|
jpayne@69
|
856
|
jpayne@69
|
857 [P1] SNP position in the reference.
|
jpayne@69
|
858
|
jpayne@69
|
859 [SUB] Character in the reference.
|
jpayne@69
|
860
|
jpayne@69
|
861 [SUB] Character in the query.
|
jpayne@69
|
862
|
jpayne@69
|
863 [P2] SNP position in the query.
|
jpayne@69
|
864
|
jpayne@69
|
865 [BUFF] Distance from this SNP to the nearest mismatch (end of
|
jpayne@69
|
866 alignment, indel, SNP, etc) in the same alignment.
|
jpayne@69
|
867
|
jpayne@69
|
868 [DIST] Distance from this SNP to the nearest sequence end.
|
jpayne@69
|
869
|
jpayne@69
|
870 [R] Number of repeat alignments which cover this reference
|
jpayne@69
|
871 position, >0 means repetitive sequence.
|
jpayne@69
|
872
|
jpayne@69
|
873 [Q] Number of repeat alignments which cover this query
|
jpayne@69
|
874 position, >0 means repetitive sequence.
|
jpayne@69
|
875
|
jpayne@69
|
876 [LEN R] Length of the reference sequence.
|
jpayne@69
|
877
|
jpayne@69
|
878 [LEN Q] Length of the query sequence.
|
jpayne@69
|
879
|
jpayne@69
|
880 [CTX R] Surrounding context sequence in the reference.
|
jpayne@69
|
881
|
jpayne@69
|
882 [CTX Q] Surrounding context sequence in the query.
|
jpayne@69
|
883
|
jpayne@69
|
884 [FRM] Reading frame for the reference sequence and the
|
jpayne@69
|
885 reading frame for the query sequence respectively. Simply
|
jpayne@69
|
886 'forward' 1, or 'reverse' -1 for nucmer data.
|
jpayne@69
|
887
|
jpayne@69
|
888 [TAGS] The reference FastA ID and the query FastA ID.
|
jpayne@69
|
889
|
jpayne@69
|
890 NOTES:
|
jpayne@69
|
891 It is often helpful to run this with the -C option to assure
|
jpayne@69
|
892 reported SNPs are only reported from uniquely aligned regions.
|
jpayne@69
|
893
|
jpayne@69
|
894
|
jpayne@69
|
895 ** show-tiling **
|
jpayne@69
|
896
|
jpayne@69
|
897 DESCRIPTION:
|
jpayne@69
|
898 This program attempts to construct a tiling path out of the query
|
jpayne@69
|
899 contigs as mapped to the reference sequences. Given the delta
|
jpayne@69
|
900 alignment information of a few long reference sequences and many small
|
jpayne@69
|
901 query contigs, 'show-tiling' will determine the best location on a
|
jpayne@69
|
902 reference for each contig. Note that each contig may only be tiled
|
jpayne@69
|
903 once, so repetitive regions may cause this program some difficulty.
|
jpayne@69
|
904 This program is useful for aiding in the scaffolding and closure of an
|
jpayne@69
|
905 unfinished set of contigs, if a suitable, high similarity, reference
|
jpayne@69
|
906 genome is available. Or, if using promer, 'show-tiling' will help
|
jpayne@69
|
907 in the identification of syntenic regions and their contig's mapping
|
jpayne@69
|
908 the the references.
|
jpayne@69
|
909
|
jpayne@69
|
910 USAGE:
|
jpayne@69
|
911 show-tiling [options] <deltafile>
|
jpayne@69
|
912
|
jpayne@69
|
913 [options] type 'show-tiling -h' for a list of options.
|
jpayne@69
|
914 <deltafile> the .delta output file from either nucmer or promer.
|
jpayne@69
|
915
|
jpayne@69
|
916 OUTPUT:
|
jpayne@69
|
917 stdout Standard output has 8 columns: start in reference, end in
|
jpayne@69
|
918 reference, gap between this contig and the next, length of this
|
jpayne@69
|
919 contig, alignment coverage of this contig, average percent
|
jpayne@69
|
920 identity of the alignments for this contig, orientation of this
|
jpayne@69
|
921 contig, contig ID. All matches to a reference are headed by the
|
jpayne@69
|
922 FASTA tag of that reference. Output with the -a option is the
|
jpayne@69
|
923 same as 'show-coords -cl' when run on nucmer data.
|
jpayne@69
|
924
|
jpayne@69
|
925 NOTES:
|
jpayne@69
|
926 When run with the -x option, 'show-tiling' will produce an XML output
|
jpayne@69
|
927 format that can be accepted by TIGR's open source scaffolding software
|
jpayne@69
|
928 'Bambus' as contig linking information.
|
jpayne@69
|
929
|
jpayne@69
|
930
|
jpayne@69
|
931 -- CONTACT INFORMATION --
|
jpayne@69
|
932
|
jpayne@69
|
933 Please address questions and bug reports to: <mummer-help@lists.sourceforge.net>
|
jpayne@69
|
934
|
jpayne@69
|
935 Last Revised May 12, 2005
|