annotate CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/opt/mummer-3.23/docs/run-mummer3.README @ 69:33d812a61356

planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d
author jpayne
date Tue, 18 Mar 2025 17:55:14 -0400
parents
children
rev   line source
jpayne@69 1 -=- run-mummer3 (MUMmer3.0) README -=-
jpayne@69 2
jpayne@69 3 ** NOTE **
jpayne@69 4 This manual is outdated, please refer to the HTML documentation included in
jpayne@69 5 this distribution or at:
jpayne@69 6
jpayne@69 7 http://mummer.sourceforge.net
jpayne@69 8 http://mummer.sourceforge.net/manual
jpayne@69 9 http://mummer.sourceforge.net/examples
jpayne@69 10
jpayne@69 11 If any of this code is used in any publication, please cite the following:
jpayne@69 12
jpayne@69 13 Fast algorithms for large-scale genome alignment and comparison.
jpayne@69 14 A.L. Delcher. A. Phillippy, J. Carlton, and S.L. Salzberg. Nucleic
jpayne@69 15 Acids Research 30:11 (2002), 2478-2483.
jpayne@69 16
jpayne@69 17
jpayne@69 18 USAGE: ./run-mummer3 <reference> <query> <file prefix>
jpayne@69 19
jpayne@69 20 <reference> specifies the file with the first sequence in FastA
jpayne@69 21 format. No more than on sequence is allowed.
jpayne@69 22 <query> specifies the multi-fasta sequence FastA file that contains
jpayne@69 23 the query sequences, to be aligned to the reference.
jpayne@69 24 <file prefix> specifies the file prefix for the output files.
jpayne@69 25
jpayne@69 26 NOTE:
jpayne@69 27 Coordinates from this script will be relative to their respective
jpayne@69 28 strand. Thus reverse matches will have coordinates that reference the
jpayne@69 29 reverse complemented query sequence! If this coordinate system is
jpayne@69 30 confusing, use nucmer instead.
jpayne@69 31
jpayne@69 32
jpayne@69 33 MATCHING PARAMETERS:
jpayne@69 34 It is important to customize the command line parameters of the matching
jpayne@69 35 and clustering algorithms to reflect the users desired alignment results.
jpayne@69 36 All of the programs in the run-mummer3 script (mummer, mgaps, and combineMUMs)
jpayne@69 37 can be passed various parameters to alter their performance and output. To view
jpayne@69 38 these options, run each program from the command line with the "-h" option to
jpayne@69 39 view their definable parameters. Then to make a permanent change to the script,
jpayne@69 40 simply add the desired parameters to the script using a standard text editor.
jpayne@69 41
jpayne@69 42 NOTE: For SNP hunters the -D option to combineMUMs will be very handy for
jpayne@69 43 locating and parsing the difference positions.
jpayne@69 44
jpayne@69 45
jpayne@69 46 MATCHING ALGORITHMS:
jpayne@69 47 It is also possible to change the matching algorithm used in the alignment
jpayne@69 48 generation. type 'mummer -help' to see the algorithm switches.
jpayne@69 49
jpayne@69 50
jpayne@69 51 OUTPUT FILES:
jpayne@69 52 The four output files of this script are:
jpayne@69 53 <prefix>.out
jpayne@69 54 <prefix>.gaps
jpayne@69 55 <prefix>.errorsgaps
jpayne@69 56 <prefix>.align
jpayne@69 57
jpayne@69 58 + <prefix>.out
jpayne@69 59 This file lists the coordinates of the matches found and the length
jpayne@69 60 of each match found. A group of matches to a specific sequence in
jpayne@69 61 the query file is headed by the FastA tag of that sequence. The first
jpayne@69 62 and second columns list the start coordinates in the reference and
jpayne@69 63 query sequences respectively. The final column is the length of the
jpayne@69 64 match.
jpayne@69 65
jpayne@69 66 + <prefix>.gaps
jpayne@69 67 The headers and 1st - 3rd columns of this file are exactly like the
jpayne@69 68 <prefix>.out file above. However, matches are now sorted and clustered
jpayne@69 69 according to their position in each of the sequences. Clusters of
jpayne@69 70 matches are separated by a "#" character and matches from different
jpayne@69 71 sequences in the query file are still separated by the FastA header
jpayne@69 72 for that sequence.
jpayne@69 73 The additional columns in this file (4th - 6th) describe
jpayne@69 74 the gaps between adjacent matches. The 4th column represents the
jpayne@69 75 number of overlapping characters between the current match and the
jpayne@69 76 previous match. The 5th column displays the number of characters
jpayne@69 77 between the beginning of this match in the reference and the end
jpayne@69 78 of the previous match in the reference. Finally, the 6th column
jpayne@69 79 displays the number of characters between the beginning of this
jpayne@69 80 match in the query sequence and the end of the previous match in
jpayne@69 81 the query sequence.
jpayne@69 82
jpayne@69 83 + <prefix>.errorsgaps
jpayne@69 84 This file is identical to the <prefix>.gaps file, except for the
jpayne@69 85 addition of a 7th column that displays the number of errors in the
jpayne@69 86 gap described by columns 5 and 6. This is perhaps the most helpful
jpayne@69 87 output file of the script, as it is easy to parse and interpret.
jpayne@69 88
jpayne@69 89 + <prefix>.align
jpayne@69 90 This file also expands on the <prefix>.gaps file, but in a different
jpayne@69 91 way than the witherrors.gaps file. This file intersperses the lines
jpayne@69 92 of the <prefix>.gaps file with an actual alignment of the gap between
jpayne@69 93 the previous two matches. Additionally, wherever there was a "#"
jpayne@69 94 character in the <prefix>.gaps file, the <prefix>.align file adds
jpayne@69 95 a line that lists the encompassing start and stop coordinates of the
jpayne@69 96 previous alignment region, a error ratio, and an error percentage.
jpayne@69 97
jpayne@69 98
jpayne@69 99 Email questions, comments or bug reports to: <mummer-help@lists.sourceforge.net>