Mercurial > repos > rliterman > csp2
comparison CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/opt/mummer-3.23/README @ 69:33d812a61356
planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d
author | jpayne |
---|---|
date | Tue, 18 Mar 2025 17:55:14 -0400 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
67:0e9998148a16 | 69:33d812a61356 |
---|---|
1 -=- MUMmer3.x README -=- | |
2 | |
3 ** NOTE ** | |
4 A comprehensive HTML user manual is available in the docs/web/manual | |
5 subdirectory or at http://mummer.sourceforge.net/manual | |
6 | |
7 MUMmer is now an open source package! Please contact us if you would like | |
8 to contribute to the MUMmer project. For more information or the latest | |
9 release please visit the MUMmer homepage at http://mummer.sourceforge.net | |
10 | |
11 Please refer to the INSTALL file for installation instructions. This file | |
12 contains brief descriptions of all executables in the base directory and | |
13 general information about the MUMmer package. | |
14 | |
15 | |
16 | |
17 -- DESCRIPTION -- | |
18 MUMmer is a system for rapidly aligning entire genomes. The current | |
19 version (release 3.0) can find all 20 base pair maximal exact matches between | |
20 two bacterial genomes of ~5 million base pairs each in 20 seconds, using 90 MB | |
21 of memory, on a typical 1.8 GHz Linux desktop computer. MUMmer can also align | |
22 incomplete genomes; it handles the 100s or 1000s of contigs from a shotgun | |
23 sequencing project with ease, and will align them to another set of contigs or | |
24 a genome, using the nucmer utility included with the system. The promer | |
25 utility takes this a step further by generating alignments based upon the | |
26 six-frame translations of both input sequences. promer permits the alignment | |
27 of genomes for which the proteins are similar but the DNA sequence is too | |
28 divergent to detect similarity. See the nucmer and promer readme files in the | |
29 "docs/" subdirectory for more details. MUMmer is open source, so all we ask | |
30 is that you cite our most recent paper in any publications that use this | |
31 system: | |
32 | |
33 (Version 3.0 described) | |
34 Versatile and open software for comparing large genomes. | |
35 S. Kurtz, A. Phillippy, A.L. Delcher, | |
36 M. Smoot, M. Shumway, C. Antonescu, and S.L. Salzberg. | |
37 Genome Biology (2004), 5:R12. | |
38 | |
39 (Version 2.1 described) | |
40 Fast algorithms for large-scale genome alignment and comparison. | |
41 A.L. Delcher. A. Phillippy, J. Carlton, and S.L. Salzberg. | |
42 Nucleic Acids Research 30:11 (2002), 2478-2483. | |
43 | |
44 (Version 1.0 described) | |
45 Alignment of Whole Genomes. | |
46 A.L. Delcher, S. Kasif, | |
47 R.D. Fleischmann, J. Peterson, O. White, and S.L. Salzberg. | |
48 Nucleic Acids Research, 27:11 (1999), 2369-2376. | |
49 | |
50 | |
51 -- RUNNING MUMmer3.0 -- | |
52 MUMmer3.0 is comprised of many various utilities and scripts. For general | |
53 purposes, the scripts "run-mummer1", "run-mummer3", "nucmer", and "promer" | |
54 will be all that is needed. See their descriptions in the "RUNNING THE MUMmer | |
55 SCRIPTS" section, or refer to their individual documentation in the "docs/" | |
56 subdirectory. Refer to the "RUNNING THE MUMmer UTILITIES" section for a brief | |
57 description of all of the utilities in this directory. | |
58 | |
59 Simple use case: | |
60 Given a file containing a single reference sequence (ref.seq) in | |
61 FASTA format and another file containing multiple sequences in FastA | |
62 format (qry.seq) type the following at the command line: | |
63 | |
64 './nucmer -p <prefix> ref.seq qry.seq' | |
65 | |
66 To produce the following files: | |
67 <prefix>.delta | |
68 | |
69 or | |
70 | |
71 './run-mummer3.csh ref.seq qry.seq <prefix>' | |
72 | |
73 To produce the following files: | |
74 <prefix>.out | |
75 <prefix>.gaps | |
76 <prefix>.align | |
77 <prefix>.errorsgaps | |
78 | |
79 Please read the utility-specific documentation in the "docs/" subdirectory | |
80 for descriptions of these files and information on how to change the | |
81 alignment parameters for the scripts (minimum match length, etc.), or see | |
82 the notes below in the "RUNNING THE MUMmer SCRIPTS" section for a brief | |
83 explanation. | |
84 | |
85 To see a simple gnuplot output, if you have gnuplot installed, run | |
86 the perl script 'mummerplot' on the output files. This script can be run | |
87 on mummer output (.out), or nucmer/promer output (.delta). Edit the | |
88 <prefix>.gp file that is created to change colors, line thicknesses, etc. or | |
89 explore the <prefix>.[fr]plot file to see the data collection. | |
90 | |
91 './mummerplot -p <prefix> <prefix>.out' | |
92 | |
93 Or you can use the web viewer for completed microbial genomes: | |
94 http://www.tigr.org/CMR | |
95 | |
96 | |
97 | |
98 -- RUNNING THE MUMmer SCRIPTS -- | |
99 Because of MUMmer's modular design, it may be necessary to use a number | |
100 of separate programs to produce the desired output. The MUMmer scripts | |
101 attempt to simplify this process by wrapping various utilities into packages | |
102 that can perform standard alignment requests. Listed below are brief | |
103 descriptions and usage definitions for these scripts. Please refer to the | |
104 "docs/" subdirectory for a more detailed description of each script. | |
105 | |
106 | |
107 ** nucmer ** | |
108 | |
109 DESCRIPTION: | |
110 nucmer is for the all-vs-all comparison of nucleotide sequences | |
111 contained in multi-FastA data files. It is best used for highly | |
112 similar sequence that may have large rearrangements. Common use | |
113 cases are: comparing two unfinished shotgun sequencing assemblies, | |
114 mapping an unfinished sequencing assembly to a finished genome, and | |
115 comparing two fairly similar genomes that may have large | |
116 rearrangements and duplications. Please refer to "docs/nucmer.README" | |
117 for more information regarding this script and its output, or type | |
118 'nucmer -h' for a list of its options. | |
119 | |
120 USAGE: | |
121 nucmer [options] <reference> <query> | |
122 | |
123 [options] type 'nucmer -h' for a list of options. | |
124 <reference> specifies the multi-FastA sequence file that contains | |
125 the reference sequences, to be aligned with the queries. | |
126 <query> specifies the multi-FastA sequence file that contains | |
127 the query sequences, to be aligned with the references. | |
128 | |
129 OUTPUT: | |
130 out.delta the delta encoded alignments between the reference and | |
131 query sequences. This file can be parsed with any of | |
132 the show-* programs which are described in the "RUNNING | |
133 THE MUMmer UTILITIES" section. | |
134 | |
135 NOTES: | |
136 All output coordinates reference the forward strand of the involved | |
137 sequence, regardless of the match direction. Also, nucmer now uses | |
138 only matches that are unique in the reference sequence by default, | |
139 use the '--mum' or '--maxmatch' options to change this behavior. | |
140 | |
141 | |
142 ** promer ** | |
143 | |
144 DESCRIPTION: | |
145 promer is for the protein level, all-vs-all comparison of nucleotide | |
146 sequences contained in multi-FastA data files. The nucleotide input | |
147 files are translated in all 6 reading frames and then aligned to one | |
148 another via the same methods as nucmer. It is best used for highly | |
149 divergent sequences that may have moderate to high similarity on the | |
150 protein level. Common use cases are: identifying syntenic regions | |
151 between highly divergent genomes, comparative genome annotation i.e. | |
152 using an already annotated genome to help in the annotation of a | |
153 newly sequenced genome, and the general comparison of two fairly | |
154 divergent genomes that have large rearrangements and may only be | |
155 similar on the protein level. Please refer to "docs/promer.README" | |
156 for more information regarding this script and its output, or type | |
157 'promer -h' for a list of its options. | |
158 | |
159 USAGE: | |
160 promer [options] <reference> <query> | |
161 | |
162 [options] type 'promer -h' for a list of options. | |
163 <reference> specifies the multi-FastA sequence file that contains | |
164 the reference sequences, to be aligned with the queries. | |
165 <query> specifies the multi-FastA sequence file that contains | |
166 the query sequences, to be aligned with the references. | |
167 | |
168 OUTPUT: | |
169 out.delta the delta encoded alignments between the reference and | |
170 query sequences. This file can be parsed with any of | |
171 the show-* programs which are described in the "RUNNING | |
172 THE MUMmer UTILITIES" section. | |
173 | |
174 NOTES: | |
175 All output coordinates reference the forward strand of the involved | |
176 sequence, regardless of the match direction, and are measured in | |
177 nucleotides with the exception of the delta integers which are | |
178 measured in amino acids (1 delta int = 3 nucleotides). Also, promer | |
179 now uses only matches that are unique in the reference sequence by | |
180 default, use the '--mum' or '--maxmatch' options to change this | |
181 behavior. | |
182 | |
183 | |
184 ** run-mummer1 ** | |
185 | |
186 DESCRIPTION: | |
187 This script is taken directly from MUMmer1.0 and is best used to | |
188 align two sequences in which there is high similarity and no re- | |
189 arrangements. Common use cases are: aligning two finished bacterial | |
190 chromosomes. Please refer to "docs/run-mummer1.README" for the | |
191 original documentation for this script and its output. | |
192 | |
193 USAGE: | |
194 run-mummer1 <seq1> <seq2> <tag> [-r] | |
195 | |
196 <seq1> specifies the file with the first sequence in FastA format. | |
197 No more than one sequence is allowed. | |
198 <seq2> specifies the file with the second sequence in FastA format. | |
199 No more than one sequence is allowed. | |
200 <tag> specifies the prefix to be used for the output files. | |
201 [-r] is an optional parameter that will reverse complement the | |
202 second sequence. | |
203 | |
204 OUTPUT: | |
205 out.align the out.gaps file interspersed with the alignments | |
206 of the gaps. | |
207 out.errorsgaps the out.gaps file with an extra column stating the | |
208 number of errors contained in each gap. | |
209 out.gaps an ordered (clustered) list of matches with position | |
210 information, and gap distances between each match. | |
211 out.out a list of all maximal unique matches between the two | |
212 input sequences ordered by their start position in the | |
213 second sequence. | |
214 | |
215 NOTES: | |
216 All output coordinates reference their respective strand. This means | |
217 that if the -r switch is active, coordinates that reference the | |
218 second sequence will be relative to the reverse complement of the | |
219 second sequence. Please use nucmer or promer if this coordinate | |
220 system is confusing. | |
221 Eventually, this script's components will be rewritten to work | |
222 with the new MUMmer format standards and phased out in favor of the | |
223 new components and wrapping script. | |
224 | |
225 | |
226 ** run-mummer3 ** | |
227 | |
228 DESCRIPTION: | |
229 This script is the improved version of the MUMmer1.0 run-mummer1 | |
230 script. It uses a new clustering algorithm that appropriately | |
231 handles multiple sequence rearrangements and inversions. Because | |
232 of this, it can handle more divergent sequences better than | |
233 run-mummer1. In addition, it allows a multi-FastA query file for | |
234 1-vs-many sequence comparisons. Please refer to | |
235 "docs/run-mummer3.README" for more detailed documentation of this | |
236 script and its output. | |
237 | |
238 USAGE: | |
239 run-mummer3 <reference> <query> <prefix> | |
240 | |
241 <reference> specifies the file with the reference sequence in FastA | |
242 format. No more than one sequence is allowed. | |
243 <query> specifies the multi-FastA sequence file that contains | |
244 the query sequences. | |
245 <prefix> specifies the file prefix for the output files. | |
246 | |
247 OUTPUT: | |
248 out.align the out.gaps file interspersed with the alignments | |
249 of the gaps. | |
250 out.errorsgaps the out.gaps file with an extra column stating the | |
251 number of errors contained in each gap. | |
252 out.gaps an ordered (clustered) list of matches with position | |
253 information, and gap distances between each match. | |
254 out.out a list of all maximal unique matches between the two | |
255 input sequences ordered by their start position in the | |
256 second sequence. | |
257 | |
258 NOTES: | |
259 All output coordinates reference their respective strand. This means | |
260 that for all reverse matches, the coordinates that reference the | |
261 query sequence will be relative to the reverse complement of the | |
262 query sequence. Please use nucmer or promer if this coordinate | |
263 system is confusing. | |
264 | |
265 | |
266 ** dnadiff ** | |
267 | |
268 DESCRIPTION: | |
269 This script is a wrapper around nucmer that builds an | |
270 alignment using default parameters, and runs many of nucmer's | |
271 helper scripts to process the output and report alignment | |
272 statistics, SNPs, breakpoints, etc. It is designed for | |
273 evaluating the sequence and structural similarity of two | |
274 highly similar sequence sets. E.g. comparing two different | |
275 assemblies of the same organism, or comparing two strains of | |
276 the same species. Please refer to "docs/dnadiff.README" for | |
277 more information regarding this script and its output, or type | |
278 'dnadiff -h' for a list of its options. | |
279 | |
280 USAGE: dnadiff [options] <reference> <query> | |
281 or dnadiff [options] -d <delta file> | |
282 | |
283 <reference> Set the input reference multi-FASTA filename | |
284 <query> Set the input query multi-FASTA filename | |
285 or | |
286 <delta file> Unfiltered .delta alignment file from nucmer | |
287 | |
288 OUTPUT: | |
289 .report - Summary of alignments, differences and SNPs | |
290 .delta - Standard nucmer alignment output | |
291 .1delta - 1-to-1 alignment from delta-filter -1 | |
292 .mdelta - M-to-M alignment from delta-filter -m | |
293 .1coords - 1-to-1 coordinates from show-coords -THrcl .1delta | |
294 .mcoords - M-to-M coordinates from show-coords -THrcl .mdelta | |
295 .snps - SNPs from show-snps -rlTHC .1delta | |
296 .rdiff - Classified ref breakpoints from show-diff -rH .mdelta | |
297 .qdiff - Classified qry breakpoints from show-diff -qH .mdelta | |
298 .unref - Unaligned reference IDs and lengths (if applicable) | |
299 .unqry - Unaligned query IDs and lengths (if applicable) | |
300 | |
301 NOTES: | |
302 The report file generated by this script can be useful for | |
303 comparing the differences between two similar genomes or | |
304 assemblies. The other outputs generated by this script are in | |
305 unlabeled tabular format, so please refer to the utility | |
306 specific documentation for interpreting them. A full | |
307 description of the report file is given in "docs/dnadiff.README". | |
308 | |
309 | |
310 -- RUNNING THE MUMmer UTILITIES -- | |
311 The MUMmer package consists of various utilities that can interact with | |
312 the 'mummer' program. 'mummer' performs all maximal and maximal unique | |
313 matching, and all other utilities were designed to process the input and | |
314 output of this program and its related scripts, in order to extract | |
315 additional information from the output. Listed below are the descriptions | |
316 and usage definitions for these utilities. | |
317 | |
318 | |
319 ** annotate ** | |
320 | |
321 DESCRIPTION: | |
322 This program reads the output of the 'gaps' program and adds alignment | |
323 information to it. Part of the original MUMmer1.0 pipeline and can | |
324 only be used on the output of the 'gaps' program. | |
325 | |
326 USAGE: | |
327 annotate <gapsfile> <seq2> | |
328 | |
329 <gapsfile> the output of the 'gaps' program. | |
330 <seq2> the file containing the second sequence in the comparison. | |
331 | |
332 OUTPUT: | |
333 stdout the 'gaps' output interspersed with the alignments of | |
334 the gaps between adjacent MUMs. An alignment of a | |
335 gap comes after the second MUM defining the gap, and | |
336 alignment errors are marked with a '^' character. | |
337 witherrors.gaps the 'gaps' output with an appended column that lists | |
338 the number of alignment errors for each gap. | |
339 | |
340 NOTES: | |
341 This program will eventually be dropped in favor of the combineMUMs | |
342 or nucmer match extenders, but persists for the time being. | |
343 | |
344 | |
345 ** combineMUMs ** | |
346 | |
347 DESCRIPTION: | |
348 This program reads the output of the 'mgaps' program and adds alignment | |
349 information to it. Part of the MUMmer3.0 pipeline and can only be | |
350 used on the output of the 'mgaps' program. This -D option alters this | |
351 behavior and only outputs the positions of difference, e.g. SNPs. | |
352 | |
353 USAGE: | |
354 combineMUMs [options] <reference> <query> <mgapsfile> | |
355 | |
356 [options] type 'combineMUMs -h' for a list of options. | |
357 <reference> the FastA reference file used in the comparison. | |
358 <query> the multi-FastA reference file used in the comparison. | |
359 <mgapsfile> the output of the 'mgaps' program run on the match | |
360 list produced by 'mummer' for the reference and query | |
361 files. | |
362 | |
363 OUTPUT: | |
364 stdout the 'mgaps' output interspersed with the alignments | |
365 of the gaps between adjacent MUMs. An alignment of a | |
366 gap comes after the second MUM defining the gap, and | |
367 alignment errors are marked with a '^' character. At | |
368 the end of each cluster is a summary line (keyword | |
369 "Region") noting the bounds of the cluster in the | |
370 reference and query sequences, the total number of | |
371 errors for the region, the length of the region and | |
372 the percent error of the region. | |
373 witherrors.gaps the 'mgaps' output with an appended column that lists | |
374 the number of alignment errors for each gap. | |
375 | |
376 | |
377 ** delta-filter ** | |
378 | |
379 DESCRIPTION: | |
380 | |
381 This program filters a delta alignment file produced by either | |
382 nucmer or promer, leaving only the desired alignments which | |
383 are output to stdout in the same delta format as the | |
384 input. Its primary function is the LIS algorithm which | |
385 calculates the longest increasing subset of alignments. This | |
386 allows for the calculation of a global set of alignments | |
387 (i.e. 1-to-1 and mutually consistent order) with the -g option | |
388 or locally consistent with -1 or -m. Reference sequences can | |
389 be mapped to query sequences with -r, or queries to references | |
390 with -q. This allows the user to exclude chance and repeat | |
391 induced alignments, leaving only the "best" alignments between | |
392 the two data sets. Filtering can also be performed on length, | |
393 identity, and uniquenes. | |
394 | |
395 USAGE: | |
396 delta-filter [options] <deltafile> | |
397 | |
398 [options] type 'delta-filter -h' for a list of options. | |
399 <deltafile> the .delta output file from either nucmer or promer. | |
400 | |
401 OUTPUT: | |
402 stdout The same delta alignment format as output by nucmer and promer. | |
403 | |
404 NOTES: | |
405 For most cases the -m option is recommended, however -1 is | |
406 useful for applications that require a 1-to-1 mapping, such as | |
407 SNP finding. Use the -q option for mapping query contigs to | |
408 their best reference location. | |
409 | |
410 | |
411 ** exact-tandems ** | |
412 | |
413 DESCRIPTION: | |
414 This script finds exact tandem repeats in a specified FastA sequence | |
415 file. It is a post-processor for 'repeat-match' and provides a simple | |
416 interface and output for tandem repeat detection. | |
417 | |
418 USAGE: | |
419 exact-tandems <file> <min match> | |
420 | |
421 <file> the single sequence in FastA format to search for repeats. | |
422 <min match> the minimum match length for the tandems. | |
423 | |
424 OUTPUT: | |
425 stdout 4 columns, the start of the tandem repeat, the total extent | |
426 of the repeat region, the length of each repetitive unit, and | |
427 to total copies of the repetitive unit involved. | |
428 | |
429 | |
430 ** gaps ** | |
431 | |
432 DESCRIPTION: | |
433 This program reads a list of unique matches between two strings and | |
434 outputs the longest consistent set of matches, followed by all the | |
435 other matches. Part of the MUMmer1.0 pipeline and the output of the | |
436 'mummer' program needs to be processed (to strip all non-match lines) | |
437 before it can be passed to this program. | |
438 | |
439 USAGE: | |
440 gaps <seq1> [-r] < <matchlist> | |
441 | |
442 <seq1> The first sequence file that the match list represents. | |
443 <matchlist> A simple list of matches and NO header lines or other | |
444 mumbo jumbo. The columns of the match list should be | |
445 start in the reference, start in the query, and length | |
446 of the match. | |
447 [-r] Simply puts the string "reverse" on the header of the | |
448 output so 'annotate' knows to reverse the second | |
449 sequence. | |
450 | |
451 OUTPUT: | |
452 stdout an ordered set of the input matches, separated by headers. | |
453 The first set is the longest consistent set of matches and | |
454 the second set is all other matches. | |
455 | |
456 NOTES: | |
457 This program will eventually be rewritten to be interchangeable with | |
458 'mgaps', so that it may be plugged into the nucmer or promer | |
459 pipelines. | |
460 | |
461 | |
462 ** mapview ** | |
463 | |
464 DESCRIPTION: | |
465 mapview is a utility program for displaying sequence alignments as | |
466 provided by MUMmer, nucmer or promer. This program takes the output | |
467 from these alignment routines and converts it to a FIG, PDF or PS | |
468 file for visual analysis. It can also break the output into multiple | |
469 files for easier viewing and printing. Please refer to | |
470 "docs/mapview.README" for a more detailed description and explination. | |
471 | |
472 USAGE: | |
473 mapview [options] <coords file> [UTR coords] [CDS coords] | |
474 | |
475 [options] type 'mapview -h' for a list of options. | |
476 <coords file> show-coords output file | |
477 [UTR coords] UTR coordinate file in GFF format | |
478 [CDS coords] CDS coordinate file in GFF format | |
479 | |
480 OUTPUT: | |
481 Default output format is an xfig file, however this can be changed to | |
482 a postscript of PDF file with the -f option. See 'mapview -h' for a | |
483 list of available formatting options. | |
484 | |
485 NOTES: | |
486 The produce the coords file input, 'show-coords' must be run with the | |
487 -r -l options. To reduce redundant matches in promer output, run | |
488 show-coords with the -k option. To generate output formats other than | |
489 xfig, the fig2dev utility must be available from the system path. For | |
490 very large reference genomes, FIG format may be the only option that | |
491 will allow the entire display to be stored in one file, as fig2dev has | |
492 problems if the output is too large. | |
493 | |
494 | |
495 ** mgaps ** | |
496 | |
497 DESCRIPTION: | |
498 This program reads a list of matches between a single-FastA reference | |
499 and a multi-FastA query file and outputs clusters of matches that lie | |
500 on similar diagonals and within a reasonable distance. Part of the | |
501 MUMmer3.0 pipeline and the output of 'mummer' need not be processed | |
502 before passing it to this program, so long as 'mummer' was run on a | |
503 1-vs-many or 1-vs-1 dataset. | |
504 | |
505 USAGE: | |
506 mgaps [options] < <matchlist> | |
507 | |
508 [options] type 'mgaps -h' for a list of options. | |
509 <matchlist> A list of matches separated by their sequence FastA tags. | |
510 The columns of the match list should be start in | |
511 reference, start in query, and length of the match. | |
512 | |
513 OUTPUT: | |
514 stdout An ordered set of the input matches, separated by headers. | |
515 Individual clusters are separated by a '#' character and | |
516 sets of clusters from different sequences are separated by | |
517 the FastA header tag for the query sequence. | |
518 | |
519 NOTES: | |
520 It is often very helpful to adjust the clustering parameters. Check | |
521 'mgaps -h' for the list of parameters and check the source for a | |
522 better idea of how each parameter affects the result. Often, it is | |
523 helpful to run this program a number of times with different | |
524 parameters until the desired result is achieved. | |
525 | |
526 | |
527 ** mummer ** | |
528 | |
529 DESCRIPTION: | |
530 This is the core program of the MUMmer package. It is the suffix-tree | |
531 based match finding routine, and the main part of every MUMmer script. | |
532 For a detailed manual describing how to use this program, please refer | |
533 to "docs/maxmat3man.pdf" or in LaTeX format "docs/maxmat3man.tex". By | |
534 default, 'mummer' now finds maximal matches regardless of their | |
535 uniqueness. Limiting the output to only unique matches can be specified | |
536 as a command line switch. | |
537 | |
538 USAGE: | |
539 mummer [options] <reference> <query> ... | |
540 | |
541 [options] type 'mummer -help' for a list of options. | |
542 <reference> specifies the single or multi-FastA sequence file that | |
543 contains the reference sequence(s), to be aligned with | |
544 the queries. | |
545 <query> specifies the multi-FastA sequence file that contains | |
546 the query sequences, to be aligned with the references. | |
547 Multiple query files are allowed, up to 32. | |
548 | |
549 OUTPUT: | |
550 stdout a list of exact matches. Varies depending on input, refer to | |
551 the manual specified in the description above. | |
552 | |
553 NOTES: | |
554 Many thanks to Stefan Kurtz for the latest mummer version. 'mummer' | |
555 now behaves like the old 'mummer2' program by default. The -mum switch | |
556 forces it to behave like 'mummer1', the -mumreference switch forces it | |
557 to behave like 'mummer2' while the -maxmatch switch forces it to behave | |
558 like the old 'max-match' program. | |
559 | |
560 | |
561 ** mummerplot ** | |
562 | |
563 DESCRIPTION: | |
564 mummerplot is a perl script that generates gnuplot scripts and data | |
565 collections for plotting with the gnuplot utility. It can generate | |
566 2-d dotplots and 1-d coverage plots for the output of mummer, nucmer, | |
567 promer or show-tiling. It can also color dotplots with an identity | |
568 color gradient. | |
569 | |
570 USAGE: | |
571 mummerplot [options] <matchfile> | |
572 | |
573 [options] type 'mummerplot -h' for a list of options. | |
574 <matchfile> the output of 'mummer', 'nucmer', 'promer', or | |
575 'show-tiling'. 'mummerplot' will automatically determine | |
576 the format of the data it was given and produce the plot | |
577 accordingly. | |
578 | |
579 OUTPUT: | |
580 out.gp The gnuplot script, type 'gnuplot out.gp' to evaluate the | |
581 the gnuplot script. | |
582 out.fplot | |
583 out.rplot | |
584 out.hplot The forward, reverse and highlighted match information for | |
585 plotting with gnuplot. | |
586 | |
587 out.ps | |
588 out.png The plotted image file, postscript or png depending on the | |
589 selected terminal type. | |
590 | |
591 NOTES: | |
592 For alignments with multiple reference or query sequences, be sure to | |
593 use the -r -q or -R -Q options to avoid overlaying multiple plots in | |
594 the same space. For better looking color gradient plots, try the | |
595 postscript terminal and avoid the png terminal. | |
596 | |
597 | |
598 ** nucmer2xfig ** | |
599 | |
600 DESCRIPTION: | |
601 Script for plotting nucmer hits against a reference sequence. See top | |
602 of script for more information, or see if 'mummerplot' or 'mapview' | |
603 has the functionality required as they are properly maintained. | |
604 | |
605 | |
606 ** repeat-match ** | |
607 | |
608 DESCRIPTION: | |
609 Finds exact repeats within a single sequence. | |
610 | |
611 USAGE: | |
612 repeat-match [options] <seq> | |
613 | |
614 [options] type 'repeat-match -h' for a list of options. | |
615 <seq> the single sequence in FastA format to search for repeats. | |
616 | |
617 OUTPUT: | |
618 stdout 3 columns, the start of the first copy of the repeat, the | |
619 start of the second copy of the repeat, and the length of the | |
620 repeat respectively. | |
621 | |
622 NOTES: | |
623 REPuter (freely available for universities) may be better suited for | |
624 most repeat matching, but 'repeat-match' is open-source and has some | |
625 functionality that REPuter does not so we include it along with the | |
626 MUMmer package. | |
627 | |
628 | |
629 ** show-aligns ** | |
630 | |
631 DESCRIPTION: | |
632 This program parses the delta alignment output of nucmer and promer | |
633 and displays all of the pairwise alignments from the two sequences | |
634 specified on the command line. | |
635 | |
636 USAGE: | |
637 show-aligns [options] <deltafile> <IdR> <IdQ> | |
638 | |
639 [options] type 'show-aligns -h' for a list of options. | |
640 <deltafile> the .delta output file from either nucmer or promer. | |
641 <IdR> the FastA header tag of the desired reference sequence. | |
642 <IdQ> the FastA header tag of the desired query sequence. | |
643 | |
644 OUTPUT: | |
645 stdout each alignment header and footer describes the frame of the | |
646 alignment in each sequence, and the start and finish | |
647 (inclusive) of the alignment in each sequence. At the | |
648 beginning of each line of aligned sequence are two numbers, the | |
649 top is the coordinate of the first reference base on that line | |
650 and the bottom is the coordinate of the first query base on | |
651 that line. ALL coordinates reference the forward strand of the | |
652 DNA sequence, even if it is a protein alignment. A gap caused | |
653 by an insertion or deletion is filled with a '.' character. | |
654 Errors in a DNA alignment are marked with a '^' below the | |
655 error. Errors in an amino acid alignment are marked with a | |
656 whitespace in the middle consensus line, while matches are | |
657 marked with the consensus base and similarities are marked with | |
658 a '+' in the consensus line. | |
659 | |
660 | |
661 ** show-coords ** | |
662 | |
663 DESCRIPTION: | |
664 This program parses the delta alignment output of nucmer and promer | |
665 and displays the coordinates, and other useful information about the | |
666 alignments. | |
667 | |
668 USAGE: | |
669 show-coords [options] <deltafile> | |
670 | |
671 [options] type 'show-coords -h' for a list of options. | |
672 <deltafile> the .delta output file from either nucmer or promer. | |
673 | |
674 OUTPUT: | |
675 stdout run 'show-coords' without the -H option to see the column | |
676 header tags. Here is a description of each tag. Note that | |
677 some of the below tags do not apply to nucmer data, and that | |
678 all coordinates are inclusive and relative to the forward DNA | |
679 strand. | |
680 | |
681 [S1] Start of the alignment region in the reference sequence. | |
682 | |
683 [E1] End of the alignment region in the reference sequence. | |
684 | |
685 [S2] Start of the alignment region in the query sequence. | |
686 | |
687 [E2] End of the alignment region in the query sequence. | |
688 | |
689 [LEN 1] Length of the alignment region in the reference sequence, | |
690 measured in nucleotides. | |
691 | |
692 [LEN 2] Length of the alignment region in the query sequence, measured | |
693 in nucleotides. | |
694 | |
695 [% IDY] Percent identity of the alignment, calculated as the | |
696 (number of exact matches) / ([LEN 1] + insertions in the query). | |
697 | |
698 [% SIM] Percent similarity of the alignment, calculated like the above | |
699 value, but counting positive BLOSUM matrix scores instead of exact | |
700 matches. | |
701 | |
702 [% STP] Percent of stop codons of the alignment, calculated as | |
703 (number of stop codons) / (([LEN 1] + insertions in the query) * 2). | |
704 | |
705 [LEN R] Length of the reference sequence. | |
706 | |
707 [LEN Q] Length of the query sequence. | |
708 | |
709 [COV R] Percent coverage of the alignment on the reference sequence, | |
710 calculated as [LEN 1] / [LEN R]. | |
711 | |
712 [COV Q] Percent coverage of the alignment on the query sequence, | |
713 calculated as [LEN 2] / [LEN Q]. | |
714 | |
715 [FRM] Reading frame for the reference sequence and the reading frame | |
716 for the query sequence respectively. This is one of the columns | |
717 absent from the nucmer data, however, match direction can easily be | |
718 determined by the start and end coordinates. | |
719 | |
720 [TAGS] The reference FastA ID and the query FastA ID. | |
721 | |
722 There is also an optional final column (turned on with the -w | |
723 or -o option) that will contain some 'annotations'. The -o option will | |
724 annotate alignments that represent overlaps between two sequences, | |
725 while the -w option is antiquated and should no longer be used. | |
726 Sometimes, nucmer or promer will extend adjacent clusters past one | |
727 another, thus causing a somewhat redundant output, this option will | |
728 notify users of such rare occurrences. | |
729 | |
730 NOTES: | |
731 The -c and -l options are useful when comparing two sets of assembly | |
732 contigs, in that these options help determine if an alignment spans an | |
733 entire contig, or is just a partial hit to a different read. The -b | |
734 option is useful when the user wishes to identify sytenic regions | |
735 between two genomes, but is not particularly interested in the actual | |
736 alignment similarity or appearance. This option also disregards match | |
737 orientation, so should not be used if this information is needed. | |
738 | |
739 | |
740 ** show-diff ** | |
741 | |
742 DESCRIPTION: | |
743 This program classifies alignment breakpoints for the | |
744 quantification of macroscopic differences between two | |
745 genomes. It takes a standard, unfiltered delta file as input, | |
746 determines the best mapping between the two sequence sets, and | |
747 reports on the breaks in that mapping. | |
748 | |
749 USAGE: | |
750 show-diff [options] <deltafile> | |
751 | |
752 [options] type 'show-diff -h' for a list of options. | |
753 <deltafile> the .delta output file from nucmer | |
754 | |
755 OUTPUT: | |
756 stdout Classified breakpoints are output one per line with | |
757 the following types and column definitions. The first | |
758 five columns of every row are seq ID, feature type, | |
759 feature start, feature end, and feature length. | |
760 | |
761 Feature Columns | |
762 | |
763 IDR GAP gap-start gap-end gap-length-R gap-length-Q gap-diff | |
764 IDR DUP dup-start dup-end dup-length | |
765 IDR BRK gap-start gap-end gap-length | |
766 IDR JMP gap-start gap-end gap-length | |
767 IDR INV gap-start gap-end gap-length | |
768 IDR SEQ gap-start gap-end gap-length prev-sequence next-sequence | |
769 | |
770 Feature Types | |
771 | |
772 [GAP] A gap between two mutually consistent ordered and | |
773 oriented alignments. gap-length-R is the length of the | |
774 alignment gap in the reference, gap-length-Q is the length of | |
775 the alignment gap in the query, and gap-diff is the difference | |
776 between the two gap lengths. If gap-diff is positive, sequence | |
777 has been inserted in the reference. If gap-diff is negative, | |
778 sequence has been deleted from the reference. If both | |
779 gap-length-R and gap-length-Q are negative, the indel is | |
780 tandem duplication copy difference. | |
781 | |
782 [DUP] A duplicated sequence in the reference that occurs more | |
783 times in the reference than in the query. The coordinate | |
784 columns specify the bounds and length of the | |
785 duplication. These features are often bookended by BRK | |
786 features if there is unique sequence bounding the duplication. | |
787 | |
788 [BRK] An insertion in the reference of unknown origin, that | |
789 indicates no query sequence aligns to the sequence bounded by | |
790 gap-start and gap-end. Often found around DUP elements or at | |
791 the beginning or end of sequences. | |
792 | |
793 [JMP] A relocation event, where the consistent ordering of | |
794 alignments is disrupted. The coordinate columns specify the | |
795 breakpoints of the relocation in the reference, and the | |
796 gap-length between them. A negative gap-length indicates the | |
797 relocation occurred around a repetitive sequence, and a | |
798 positive length indicates unique sequence between the | |
799 alignments. | |
800 | |
801 [INV] The same as a relocation event, however both the | |
802 ordering and orientation of the alignments is disrupted. Note | |
803 that for JMP and INV, generally two features will be output, | |
804 one for the beginning of the inverted region, and another for | |
805 the end of the inverted region. | |
806 | |
807 [SEQ] A translocation event that requires jumping to a new | |
808 query sequence in order to continue aligning to the | |
809 reference. If each input sequence is a chromosome, these | |
810 features correspond to inter-chromosomal translocations. | |
811 | |
812 NOTES: | |
813 The estimated number of features, take inversions for example, | |
814 represents the number of breakpoints classified as bordering | |
815 an inversion. Therefore, since there will be a breakpoint at | |
816 both the beginning and the end of an inversion, the feature | |
817 counts are roughly double the number of inversion events. In | |
818 addition, all counts are estimates and do not represent the | |
819 exact number of each evolutionary event. | |
820 | |
821 Summing the fifth column (ignoring negative values) yeilds an | |
822 estimate of the total inserted sequence in the | |
823 reference. Summing the fifth column after removing DUP | |
824 features yields an estimate of the total amount of unique | |
825 (unaligned) sequence in the reference. Note that unaligned | |
826 sequences are not counted, and could represent additional | |
827 "unique" sequences. Use the 'dnadiff' script if you must | |
828 recover this information. Finally, the -q option switches | |
829 references for queries, and uses the query coordinates for the | |
830 analysis. | |
831 | |
832 | |
833 ** show-snps ** | |
834 | |
835 DESCRIPTION: | |
836 This program reports polymorphism contained in a delta encoded | |
837 alignment file output by either nucmer or promer. It catalogs | |
838 all of the single nucleotide polymorphisms (SNPs) and | |
839 insertions/deletions within the delta file | |
840 alignments. Polymorphisms are reported one per line, in a | |
841 delimited fashion similar to show-coords. Pairing this program | |
842 with the appropriate MUMmer tools can create an easy to use | |
843 SNP pipeline for the rapid identification of putative SNPs | |
844 between any two sequence sets. | |
845 | |
846 USAGE: | |
847 show-snps [options] <deltafile> | |
848 | |
849 [options] type 'show-snps -h' for a list of options. | |
850 <deltafile> the .delta output file from either nucmer or promer. | |
851 | |
852 OUTPUT: | |
853 stdout Standard output has column headers with the following | |
854 meanings. Not all columns will be output by default, | |
855 see 'show-snps -h' for switch to control the output. | |
856 | |
857 [P1] SNP position in the reference. | |
858 | |
859 [SUB] Character in the reference. | |
860 | |
861 [SUB] Character in the query. | |
862 | |
863 [P2] SNP position in the query. | |
864 | |
865 [BUFF] Distance from this SNP to the nearest mismatch (end of | |
866 alignment, indel, SNP, etc) in the same alignment. | |
867 | |
868 [DIST] Distance from this SNP to the nearest sequence end. | |
869 | |
870 [R] Number of repeat alignments which cover this reference | |
871 position, >0 means repetitive sequence. | |
872 | |
873 [Q] Number of repeat alignments which cover this query | |
874 position, >0 means repetitive sequence. | |
875 | |
876 [LEN R] Length of the reference sequence. | |
877 | |
878 [LEN Q] Length of the query sequence. | |
879 | |
880 [CTX R] Surrounding context sequence in the reference. | |
881 | |
882 [CTX Q] Surrounding context sequence in the query. | |
883 | |
884 [FRM] Reading frame for the reference sequence and the | |
885 reading frame for the query sequence respectively. Simply | |
886 'forward' 1, or 'reverse' -1 for nucmer data. | |
887 | |
888 [TAGS] The reference FastA ID and the query FastA ID. | |
889 | |
890 NOTES: | |
891 It is often helpful to run this with the -C option to assure | |
892 reported SNPs are only reported from uniquely aligned regions. | |
893 | |
894 | |
895 ** show-tiling ** | |
896 | |
897 DESCRIPTION: | |
898 This program attempts to construct a tiling path out of the query | |
899 contigs as mapped to the reference sequences. Given the delta | |
900 alignment information of a few long reference sequences and many small | |
901 query contigs, 'show-tiling' will determine the best location on a | |
902 reference for each contig. Note that each contig may only be tiled | |
903 once, so repetitive regions may cause this program some difficulty. | |
904 This program is useful for aiding in the scaffolding and closure of an | |
905 unfinished set of contigs, if a suitable, high similarity, reference | |
906 genome is available. Or, if using promer, 'show-tiling' will help | |
907 in the identification of syntenic regions and their contig's mapping | |
908 the the references. | |
909 | |
910 USAGE: | |
911 show-tiling [options] <deltafile> | |
912 | |
913 [options] type 'show-tiling -h' for a list of options. | |
914 <deltafile> the .delta output file from either nucmer or promer. | |
915 | |
916 OUTPUT: | |
917 stdout Standard output has 8 columns: start in reference, end in | |
918 reference, gap between this contig and the next, length of this | |
919 contig, alignment coverage of this contig, average percent | |
920 identity of the alignments for this contig, orientation of this | |
921 contig, contig ID. All matches to a reference are headed by the | |
922 FASTA tag of that reference. Output with the -a option is the | |
923 same as 'show-coords -cl' when run on nucmer data. | |
924 | |
925 NOTES: | |
926 When run with the -x option, 'show-tiling' will produce an XML output | |
927 format that can be accepted by TIGR's open source scaffolding software | |
928 'Bambus' as contig linking information. | |
929 | |
930 | |
931 -- CONTACT INFORMATION -- | |
932 | |
933 Please address questions and bug reports to: <mummer-help@lists.sourceforge.net> | |
934 | |
935 Last Revised May 12, 2005 |