csp2: CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/opt/mummer-3.23/README comparison

comparison CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/opt/mummer-3.23/README @ 69:33d812a61356

planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d

author	jpayne
date	Tue, 18 Mar 2025 17:55:14 -0400
parents
children

comparison

equal deleted inserted replaced

-:0e9998148a16
+:33d812a61356
+-=- MUMmer3.x README -=-
+** NOTE **
+A comprehensive HTML user manual is available in the docs/web/manual
+subdirectory or at http://mummer.sourceforge.net/manual
+MUMmer is now an open source package!  Please contact us if you would like
+to contribute to the MUMmer project.  For more information or the latest
+release please visit the MUMmer homepage at http://mummer.sourceforge.net
+Please refer to the INSTALL file for installation instructions.  This file
+contains brief descriptions of all executables in the base directory and
+general information about the MUMmer package.
+-- DESCRIPTION --
+MUMmer is a system for rapidly aligning entire genomes.  The current
+version (release 3.0) can find all 20 base pair maximal exact matches between
+two bacterial genomes of ~5 million base pairs each in 20 seconds, using 90 MB
+of memory, on a typical 1.8 GHz Linux desktop computer.  MUMmer can also align
+incomplete genomes; it handles the 100s or 1000s of contigs from a shotgun
+sequencing project with ease, and will align them to another set of contigs or
+a genome, using the nucmer utility included with the system.  The promer
+utility takes this a step further by generating alignments based upon the
+six-frame translations of both input sequences.  promer permits the alignment
+of genomes for which the proteins are similar but the DNA sequence is too
+divergent to detect similarity.  See the nucmer and promer readme files in the
+"docs/" subdirectory for more details.  MUMmer is open source, so all we ask
+is that you cite our most recent paper in any publications that use this
+system:
+(Version 3.0 described)
+Versatile and open software for comparing large genomes.
+S. Kurtz, A. Phillippy, A.L. Delcher,
+M. Smoot, M. Shumway, C. Antonescu, and S.L. Salzberg.
+Genome Biology (2004), 5:R12.
+(Version 2.1 described)
+Fast algorithms for large-scale genome alignment and comparison.
+A.L. Delcher. A. Phillippy, J. Carlton, and S.L. Salzberg.
+Nucleic Acids Research 30:11 (2002), 2478-2483.
+(Version 1.0 described)
+Alignment of Whole Genomes.
+A.L. Delcher, S. Kasif,
+R.D. Fleischmann, J. Peterson, O. White, and S.L. Salzberg.
+Nucleic Acids Research, 27:11 (1999), 2369-2376.
+-- RUNNING MUMmer3.0 --
+MUMmer3.0 is comprised of many various utilities and scripts.  For general
+purposes, the scripts "run-mummer1", "run-mummer3", "nucmer", and "promer"
+will be all that is needed.  See their descriptions in the "RUNNING THE MUMmer
+SCRIPTS" section, or refer to their individual documentation in the "docs/"
+subdirectory.  Refer to the "RUNNING THE MUMmer UTILITIES" section for a brief
+description of all of the utilities in this directory.
+Simple use case:
+Given a file containing a single reference sequence (ref.seq) in
+FASTA format and another file containing multiple sequences in FastA
+format (qry.seq) type the following at the command line:
+'./nucmer  -p <prefix>  ref.seq  qry.seq'
+To produce the following files:
+<prefix>.delta
+or
+'./run-mummer3.csh  ref.seq  qry.seq  <prefix>'
+To produce the following files:
+<prefix>.out
+<prefix>.gaps
+<prefix>.align
+<prefix>.errorsgaps
+Please read the utility-specific documentation in the "docs/" subdirectory
+for descriptions of these files and information on how to change the
+alignment parameters for the scripts (minimum match length, etc.), or see
+the notes below in the "RUNNING THE MUMmer SCRIPTS" section for a brief
+explanation.
+To see a simple gnuplot output, if you have gnuplot installed, run
+the perl script 'mummerplot' on the output files. This script can be run
+on mummer output (.out), or nucmer/promer output (.delta). Edit the
+<prefix>.gp file that is created to change colors, line thicknesses, etc. or
+explore the <prefix>.[fr]plot file to see the data collection.
+'./mummerplot  -p <prefix>  <prefix>.out'
+Or you can use the web viewer for completed microbial genomes:
+http://www.tigr.org/CMR
+-- RUNNING THE MUMmer SCRIPTS --
+Because of MUMmer's modular design, it may be necessary to use a number
+of separate programs to produce the desired output.  The MUMmer scripts
+attempt to simplify this process by wrapping various utilities into packages
+that can perform standard alignment requests.  Listed below are brief
+descriptions and usage definitions for these scripts.  Please refer to the
+"docs/" subdirectory for a more detailed description of each script.
+** nucmer **
+DESCRIPTION:
+nucmer is for the all-vs-all comparison of nucleotide sequences
+contained in multi-FastA data files.  It is best used for highly
+similar sequence that may have large rearrangements.  Common use
+cases are: comparing two unfinished shotgun sequencing assemblies,
+mapping an unfinished sequencing assembly to a finished genome, and
+comparing two fairly similar genomes that may have large
+rearrangements and duplications.  Please refer to "docs/nucmer.README"
+for more information regarding this script and its output, or type
+'nucmer -h' for a list of its options.
+USAGE:
+nucmer  [options]  <reference>  <query>
+[options]    type 'nucmer -h' for a list of options.
+<reference>  specifies the multi-FastA sequence file that contains
+the reference sequences, to be aligned with the queries.
+<query>      specifies the multi-FastA sequence file that contains
+the query sequences, to be aligned with the references.
+OUTPUT:
+out.delta    the delta encoded alignments between the reference and
+query sequences.  This file can be parsed with any of
+the show-* programs which are described in the "RUNNING
+THE MUMmer UTILITIES" section.
+NOTES:
+All output coordinates reference the forward strand of the involved
+sequence, regardless of the match direction. Also, nucmer now uses
+only matches that are unique in the reference sequence by default,
+use the '--mum' or '--maxmatch' options to change this behavior.
+** promer **
+DESCRIPTION:
+promer is for the protein level, all-vs-all comparison of nucleotide
+sequences contained in multi-FastA data files.  The nucleotide input
+files are translated in all 6 reading frames and then aligned to one
+another via the same methods as nucmer.  It is best used for highly
+divergent sequences that may have moderate to high similarity on the
+protein level.  Common use cases are: identifying syntenic regions
+between highly divergent genomes, comparative genome annotation i.e.
+using an already annotated genome to help in the annotation of a
+newly sequenced genome, and the general comparison of two fairly
+divergent genomes that have large rearrangements and may only be
+similar on the protein level. Please refer to "docs/promer.README"
+for more information regarding this script and its output, or type
+'promer -h' for a list of its options.
+USAGE:
+promer  [options]  <reference>  <query>
+[options]    type 'promer -h' for a list of options.
+<reference>  specifies the multi-FastA sequence file that contains
+the reference sequences, to be aligned with the queries.
+<query>      specifies the multi-FastA sequence file that contains
+the query sequences, to be aligned with the references.
+OUTPUT:
+out.delta    the delta encoded alignments between the reference and
+query sequences.  This file can be parsed with any of
+the show-* programs which are described in the "RUNNING
+THE MUMmer UTILITIES" section.
+NOTES:
+All output coordinates reference the forward strand of the involved
+sequence, regardless of the match direction, and are measured in
+nucleotides with the exception of the delta integers which are
+measured in amino acids (1 delta int = 3 nucleotides). Also, promer
+now uses only matches that are unique in the reference sequence by
+default, use the '--mum' or '--maxmatch' options to change this
+behavior.
+** run-mummer1 **
+DESCRIPTION:
+This script is taken directly from MUMmer1.0 and is best used to
+align two sequences in which there is high similarity and no re-
+arrangements.  Common use cases are: aligning two finished bacterial
+chromosomes.  Please refer to "docs/run-mummer1.README" for the
+original documentation for this script and its output.
+USAGE:
+run-mummer1  <seq1>  <seq2>  <tag>  [-r]
+<seq1>  specifies the file with the first sequence in FastA format.
+No more than one sequence is allowed.
+<seq2>  specifies the file with the second sequence in FastA format.
+No more than one sequence is allowed.
+<tag>   specifies the prefix to be used for the output files.
+[-r]    is an optional parameter that will reverse complement the
+second sequence.
+OUTPUT:
+out.align       the out.gaps file interspersed with the alignments
+of the gaps.
+out.errorsgaps  the out.gaps file with an extra column stating the
+number of errors contained in each gap.
+out.gaps        an ordered (clustered) list of matches with position
+information, and gap distances between each match.
+out.out         a list of all maximal unique matches between the two
+input sequences ordered by their start position in the
+second sequence.
+NOTES:
+All output coordinates reference their respective strand.  This means
+that if the -r switch is active, coordinates that reference the
+second sequence will be relative to the reverse complement of the
+second sequence.  Please use nucmer or promer if this coordinate
+system is confusing.
+Eventually, this script's components will be rewritten to work
+with the new MUMmer format standards and phased out in favor of the
+new components and wrapping script.
+** run-mummer3 **
+DESCRIPTION:
+This script is the improved version of the MUMmer1.0 run-mummer1
+script.  It uses a new clustering algorithm that appropriately
+handles multiple sequence rearrangements and inversions.  Because
+of this, it can handle more divergent sequences better than
+run-mummer1.  In addition, it allows a multi-FastA query file for
+1-vs-many sequence comparisons.  Please refer to
+"docs/run-mummer3.README" for more detailed documentation of this
+script and its output.
+USAGE:
+run-mummer3  <reference>  <query>  <prefix>
+<reference>  specifies the file with the reference sequence in FastA
+format.  No more than one sequence is allowed.
+<query>      specifies the multi-FastA sequence file that contains
+the query sequences.
+<prefix>     specifies the file prefix for the output files.
+OUTPUT:
+out.align       the out.gaps file interspersed with the alignments
+of the gaps.
+out.errorsgaps  the out.gaps file with an extra column stating the
+number of errors contained in each gap.
+out.gaps        an ordered (clustered) list of matches with position
+information, and gap distances between each match.
+out.out         a list of all maximal unique matches between the two
+input sequences ordered by their start position in the
+second sequence.
+NOTES:
+All output coordinates reference their respective strand.  This means
+that for all reverse matches, the coordinates that reference the
+query sequence will be relative to the reverse complement of the
+query sequence.  Please use nucmer or promer if this coordinate
+system is confusing.
+** dnadiff **
+DESCRIPTION:
+This script is a wrapper around nucmer that builds an
+alignment using default parameters, and runs many of nucmer's
+helper scripts to process the output and report alignment
+statistics, SNPs, breakpoints, etc. It is designed for
+evaluating the sequence and structural similarity of two
+highly similar sequence sets. E.g. comparing two different
+assemblies of the same organism, or comparing two strains of
+the same species.  Please refer to "docs/dnadiff.README" for
+more information regarding this script and its output, or type
+'dnadiff -h' for a list of its options.
+USAGE: dnadiff  [options]  <reference>  <query>
+or   dnadiff  [options]  -d <delta file>
+<reference>       Set the input reference multi-FASTA filename
+<query>           Set the input query multi-FASTA filename
+or
+<delta file>      Unfiltered .delta alignment file from nucmer
+OUTPUT:
+.report  - Summary of alignments, differences and SNPs
+.delta   - Standard nucmer alignment output
+.1delta  - 1-to-1 alignment from delta-filter -1
+.mdelta  - M-to-M alignment from delta-filter -m
+.1coords - 1-to-1 coordinates from show-coords -THrcl .1delta
+.mcoords - M-to-M coordinates from show-coords -THrcl .mdelta
+.snps    - SNPs from show-snps -rlTHC .1delta
+.rdiff   - Classified ref breakpoints from show-diff -rH .mdelta
+.qdiff   - Classified qry breakpoints from show-diff -qH .mdelta
+.unref   - Unaligned reference IDs and lengths (if applicable)
+.unqry   - Unaligned query IDs and lengths (if applicable)
+NOTES:
+The report file generated by this script can be useful for
+comparing the differences between two similar genomes or
+assemblies. The other outputs generated by this script are in
+unlabeled tabular format, so please refer to the utility
+specific documentation for interpreting them. A full
+description of the report file is given in "docs/dnadiff.README".
+-- RUNNING THE MUMmer UTILITIES --
+The MUMmer package consists of various utilities that can interact with
+the 'mummer' program.  'mummer' performs all maximal and maximal unique
+matching, and all other utilities were designed to process the input and
+output of this program and its related scripts, in order to extract
+additional information from the output.  Listed below are the descriptions
+and usage definitions for these utilities.
+** annotate **
+DESCRIPTION:
+This program reads the output of the 'gaps' program and adds alignment
+information to it.  Part of the original MUMmer1.0 pipeline and can
+only be used on the output of the 'gaps' program.
+USAGE:
+annotate  <gapsfile>  <seq2>
+<gapsfile>  the output of the 'gaps' program.
+<seq2>      the file containing the second sequence in the comparison.
+OUTPUT:
+stdout           the 'gaps' output interspersed with the alignments of
+the gaps between adjacent MUMs.  An alignment of a
+gap comes after the second MUM defining the gap, and
+alignment errors are marked with a '^' character.
+witherrors.gaps  the 'gaps' output with an appended column that lists
+the number of alignment errors for each gap.
+NOTES:
+This program will eventually be dropped in favor of the combineMUMs
+or nucmer match extenders, but persists for the time being.
+** combineMUMs **
+DESCRIPTION:
+This program reads the output of the 'mgaps' program and adds alignment
+information to it.  Part of the MUMmer3.0 pipeline and can only be
+used on the output of the 'mgaps' program. This -D option alters this
+behavior and only outputs the positions of difference, e.g. SNPs.
+USAGE:
+combineMUMs  [options]  <reference>  <query>  <mgapsfile>
+[options]    type 'combineMUMs -h' for a list of options.
+<reference>  the FastA reference file used in the comparison.
+<query>      the multi-FastA reference file used in the comparison.
+<mgapsfile>  the output of the 'mgaps' program run on the match
+list produced by 'mummer' for the reference and query
+files.
+OUTPUT:
+stdout           the 'mgaps' output interspersed with the alignments
+of the gaps between adjacent MUMs.  An alignment of a
+gap comes after the second MUM defining the gap, and
+alignment errors are marked with a '^' character.  At
+the end of each cluster is a summary line (keyword
+"Region") noting the bounds of the cluster in the
+reference and query sequences, the total number of
+errors for the region, the length of the region and
+the percent error of the region.
+witherrors.gaps  the 'mgaps' output with an appended column that lists
+the number of alignment errors for each gap.
+** delta-filter **
+DESCRIPTION:
+This program filters a delta alignment file produced by either
+nucmer or promer, leaving only the desired alignments which
+are output to stdout in the same delta format as the
+input. Its primary function is the LIS algorithm which
+calculates the longest increasing subset of alignments. This
+allows for the calculation of a global set of alignments
+(i.e. 1-to-1 and mutually consistent order) with the -g option
+or locally consistent with -1 or -m. Reference sequences can
+be mapped to query sequences with -r, or queries to references
+with -q. This allows the user to exclude chance and repeat
+induced alignments, leaving only the "best" alignments between
+the two data sets. Filtering can also be performed on length,
+identity, and uniquenes.
+USAGE:
+delta-filter  [options]  <deltafile>
+[options]    type 'delta-filter -h' for a list of options.
+<deltafile>  the .delta output file from either nucmer or promer.
+OUTPUT:
+stdout  The same delta alignment format as output by nucmer and promer.
+NOTES:
+For most cases the -m option is recommended, however -1 is
+useful for applications that require a 1-to-1 mapping, such as
+SNP finding. Use the -q option for mapping query contigs to
+their best reference location.
+** exact-tandems **
+DESCRIPTION:
+This script finds exact tandem repeats in a specified FastA sequence
+file.  It is a post-processor for 'repeat-match' and provides a simple
+interface and output for tandem repeat detection.
+USAGE:
+exact-tandems  <file>  <min match>
+<file>       the single sequence in FastA format to search for repeats.
+<min match>  the minimum match length for the tandems.
+OUTPUT:
+stdout  4 columns, the start of the tandem repeat, the total extent
+of the repeat region, the length of each repetitive unit, and
+to total copies of the repetitive unit involved.
+** gaps **
+DESCRIPTION:
+This program reads a list of unique matches between two strings and
+outputs the longest consistent set of matches, followed by all the
+other matches.  Part of the MUMmer1.0 pipeline and the output of the
+'mummer' program needs to be processed (to strip all non-match lines)
+before it can be passed to this program.
+USAGE:
+gaps  <seq1>  [-r]  <  <matchlist>
+<seq1>       The first sequence file that the match list represents.
+<matchlist>  A simple list of matches and NO header lines or other
+mumbo jumbo.  The columns of the match list should be
+start in the reference, start in the query, and length
+of the match.
+[-r]         Simply puts the string "reverse" on the header of the
+output so 'annotate' knows to reverse the second
+sequence.
+OUTPUT:
+stdout  an ordered set of the input matches, separated by headers.
+The first set is the longest consistent set of matches and
+the second set is all other matches.
+NOTES:
+This program will eventually be rewritten to be interchangeable with
+'mgaps', so that it may be plugged into the nucmer or promer
+pipelines.
+** mapview **
+DESCRIPTION:
+mapview is a utility program for displaying sequence alignments as
+provided by MUMmer, nucmer or promer. This program takes the output
+from these alignment routines and converts it to a FIG, PDF or PS
+file for visual analysis. It can also break the output into multiple
+files for easier viewing and printing. Please refer to
+"docs/mapview.README" for a more detailed description and explination.
+USAGE:
+mapview  [options]  <coords file>  [UTR coords]  [CDS coords]
+[options]       type 'mapview -h' for a list of options.
+<coords file>   show-coords output file
+[UTR coords]    UTR coordinate file in GFF format
+[CDS coords]    CDS coordinate file in GFF format
+OUTPUT:
+Default output format is an xfig file, however this can be changed to
+a postscript of PDF file with the -f option. See 'mapview -h' for a
+list of available formatting options.
+NOTES:
+The produce the coords file input, 'show-coords' must be run with the
+-r -l options. To reduce redundant matches in promer output, run
+show-coords with the -k option. To generate output formats other than
+xfig, the fig2dev utility must be available from the system path. For
+very large reference genomes, FIG format may be the only option that
+will allow the entire display to be stored in one file, as fig2dev has
+problems if the output is too large.
+** mgaps **
+DESCRIPTION:
+This program reads a list of matches between a single-FastA reference
+and a multi-FastA query file and outputs clusters of matches that lie
+on similar diagonals and within a reasonable distance.  Part of the
+MUMmer3.0 pipeline and the output of 'mummer' need not be processed
+before passing it to this program, so long as 'mummer' was run on a
+1-vs-many or 1-vs-1 dataset.
+USAGE:
+mgaps  [options]  <  <matchlist>
+[options]    type 'mgaps -h' for a list of options.
+<matchlist>  A list of matches separated by their sequence FastA tags.
+The columns of the match list should be start in
+reference, start in query, and length of the match.
+OUTPUT:
+stdout  An ordered set of the input matches, separated by headers.
+Individual clusters are separated by a '#' character and
+sets of clusters from different sequences are separated by
+the FastA header tag for the query sequence.
+NOTES:
+It is often very helpful to adjust the clustering parameters.  Check
+'mgaps -h' for the list of parameters and check the source for a
+better idea of how each parameter affects the result.  Often, it is
+helpful to run this program a number of times with different
+parameters until the desired result is achieved.
+** mummer **
+DESCRIPTION:
+This is the core program of the MUMmer package.  It is the suffix-tree
+based match finding routine, and the main part of every MUMmer script.
+For a detailed manual describing how to use this program, please refer
+to "docs/maxmat3man.pdf" or in LaTeX format "docs/maxmat3man.tex". By
+default, 'mummer' now finds maximal matches regardless of their
+uniqueness. Limiting the output to only unique matches can be specified
+as a command line switch.
+USAGE:
+mummer  [options]  <reference>  <query> ...
+[options]    type 'mummer -help' for a list of options.
+<reference>  specifies the single or multi-FastA sequence file that
+contains the reference sequence(s), to be aligned with
+the queries.
+<query>      specifies the multi-FastA sequence file that contains
+the query sequences, to be aligned with the references.
+Multiple query files are allowed, up to 32.
+OUTPUT:
+stdout  a list of exact matches. Varies depending on input, refer to
+the manual specified in the description above.
+NOTES:
+Many thanks to Stefan Kurtz for the latest mummer version. 'mummer'
+now behaves like the old 'mummer2' program by default. The -mum switch
+forces it to behave like 'mummer1', the -mumreference switch forces it
+to behave like 'mummer2' while the -maxmatch switch forces it to behave
+like the old 'max-match' program.
+** mummerplot **
+DESCRIPTION:
+mummerplot is a perl script that generates gnuplot scripts and data
+collections for plotting with the gnuplot utility.  It can generate
+2-d dotplots and 1-d coverage plots for the output of mummer, nucmer,
+promer or show-tiling. It can also color dotplots with an identity
+color gradient.
+USAGE:
+mummerplot  [options]  <matchfile>
+[options]    type 'mummerplot -h' for a list of options.
+<matchfile>  the output of 'mummer', 'nucmer', 'promer', or
+'show-tiling'. 'mummerplot' will automatically determine
+the format of the data it was given and produce the plot
+accordingly.
+OUTPUT:
+out.gp     The gnuplot script, type 'gnuplot out.gp' to evaluate the
+the gnuplot script.
+out.fplot
+out.rplot
+out.hplot  The forward, reverse and highlighted match information for
+plotting with gnuplot.
+out.ps
+out.png    The plotted image file, postscript or png depending on the
+selected terminal type.
+NOTES:
+For alignments with multiple reference or query sequences, be sure to
+use the -r -q or -R -Q options to avoid overlaying multiple plots in
+the same space. For better looking color gradient plots, try the
+postscript terminal and avoid the png terminal.
+** nucmer2xfig **
+DESCRIPTION:
+Script for plotting nucmer hits against a reference sequence. See top
+of script for more information, or see if 'mummerplot' or 'mapview'
+has the functionality required as they are properly maintained.
+** repeat-match **
+DESCRIPTION:
+Finds exact repeats within a single sequence.
+USAGE:
+repeat-match  [options]  <seq>
+[options]  type 'repeat-match -h' for a list of options.
+<seq>      the single sequence in FastA format to search for repeats.
+OUTPUT:
+stdout  3 columns, the start of the first copy of the repeat, the
+start of the second copy of the repeat, and the length of the
+repeat respectively.
+NOTES:
+REPuter (freely available for universities) may be better suited for
+most repeat matching, but 'repeat-match' is open-source and has some
+functionality that REPuter does not so we include it along with the
+MUMmer package.
+** show-aligns **
+DESCRIPTION:
+This program parses the delta alignment output of nucmer and promer
+and displays all of the pairwise alignments from the two sequences
+specified on the command line.
+USAGE:
+show-aligns  [options]  <deltafile>  <IdR>  <IdQ>
+[options]    type 'show-aligns -h' for a list of options.
+<deltafile>  the .delta output file from either nucmer or promer.
+<IdR>        the FastA header tag of the desired reference sequence.
+<IdQ>        the FastA header tag of the desired query sequence.
+OUTPUT:
+stdout  each alignment header and footer describes the frame of the
+alignment in each sequence, and the start and finish
+(inclusive) of the alignment in each sequence.  At the
+beginning of each line of aligned sequence are two numbers, the
+top is the coordinate of the first reference base on that line
+and the bottom is the coordinate of the first query base on
+that line.  ALL coordinates reference the forward strand of the
+DNA sequence, even if it is a protein alignment.  A gap caused
+by an insertion or deletion is filled with a '.' character.
+Errors in a DNA alignment are marked with a '^' below the
+error.  Errors in an amino acid alignment are marked with a
+whitespace in the middle consensus line, while matches are
+marked with the consensus base and similarities are marked with
+a '+' in the consensus line.
+** show-coords **
+DESCRIPTION:
+This program parses the delta alignment output of nucmer and promer
+and displays the coordinates, and other useful information about the
+alignments.
+USAGE:
+show-coords  [options]  <deltafile>
+[options]    type 'show-coords -h' for a list of options.
+<deltafile>  the .delta output file from either nucmer or promer.
+OUTPUT:
+stdout  run 'show-coords' without the -H option to see the column
+header tags.  Here is a description of each tag.  Note that
+some of the below tags do not apply to nucmer data, and that
+all coordinates are inclusive and relative to the forward DNA
+strand.
+[S1]    Start of the alignment region in the reference sequence.
+[E1]    End of the alignment region in the reference sequence.
+[S2]    Start of the alignment region in the query sequence.
+[E2]    End of the alignment region in the query sequence.
+[LEN 1] Length of the alignment region in the reference sequence,
+measured in nucleotides.
+[LEN 2] Length of the alignment region in the query sequence, measured
+in nucleotides.
+[% IDY] Percent identity of the alignment, calculated as the
+(number of exact matches) / ([LEN 1] + insertions in the query).
+[% SIM] Percent similarity of the alignment, calculated like the above
+value, but counting positive BLOSUM matrix scores instead of exact
+matches.
+[% STP] Percent of stop codons of the alignment, calculated as
+(number of stop codons) / (([LEN 1] + insertions in the query) * 2).
+[LEN R] Length of the reference sequence.
+[LEN Q] Length of the query sequence.
+[COV R] Percent coverage of the alignment on the reference sequence,
+calculated as [LEN 1] / [LEN R].
+[COV Q] Percent coverage of the alignment on the query sequence,
+calculated as [LEN 2] / [LEN Q].
+[FRM]   Reading frame for the reference sequence and the reading frame
+for the query sequence respectively.  This is one of the columns
+absent from the nucmer data, however, match direction can easily be
+determined by the start and end coordinates.
+[TAGS]  The reference FastA ID and the query FastA ID.
+There is also an optional final column (turned on with the -w
+or -o option) that will contain some 'annotations'. The -o option will
+annotate alignments that represent overlaps between two sequences,
+while the -w option is antiquated and should no longer be used.
+Sometimes, nucmer or promer will extend adjacent clusters past one
+another, thus causing a somewhat redundant output, this option will
+notify users of such rare occurrences.
+NOTES:
+The -c and -l options are useful when comparing two sets of assembly
+contigs, in that these options help determine if an alignment spans an
+entire contig, or is just a partial hit to a different read.  The -b
+option is useful when the user wishes to identify sytenic regions
+between two genomes, but is not particularly interested in the actual
+alignment similarity or appearance.  This option also disregards match
+orientation, so should not be used if this information is needed.
+** show-diff **
+DESCRIPTION:
+This program classifies alignment breakpoints for the
+quantification of macroscopic differences between two
+genomes. It takes a standard, unfiltered delta file as input,
+determines the best mapping between the two sequence sets, and
+reports on the breaks in that mapping.
+USAGE:
+show-diff  [options]  <deltafile>
+[options]    type 'show-diff -h' for a list of options.
+<deltafile>  the .delta output file from nucmer
+OUTPUT:
+stdout  Classified breakpoints are output one per line with
+the following types and column definitions. The first
+five columns of every row are seq ID, feature type,
+feature start, feature end, and feature length.
+Feature Columns
+IDR GAP gap-start gap-end gap-length-R gap-length-Q gap-diff
+IDR DUP dup-start dup-end dup-length
+IDR BRK gap-start gap-end gap-length
+IDR JMP gap-start gap-end gap-length
+IDR INV gap-start gap-end gap-length
+IDR SEQ gap-start gap-end gap-length prev-sequence next-sequence
+Feature Types
+[GAP] A gap between two mutually consistent ordered and
+oriented alignments. gap-length-R is the length of the
+alignment gap in the reference, gap-length-Q is the length of
+the alignment gap in the query, and gap-diff is the difference
+between the two gap lengths. If gap-diff is positive, sequence
+has been inserted in the reference. If gap-diff is negative,
+sequence has been deleted from the reference. If both
+gap-length-R and gap-length-Q are negative, the indel is
+tandem duplication copy difference.
+[DUP] A duplicated sequence in the reference that occurs more
+times in the reference than in the query. The coordinate
+columns specify the bounds and length of the
+duplication. These features are often bookended by BRK
+features if there is unique sequence bounding the duplication.
+[BRK] An insertion in the reference of unknown origin, that
+indicates no query sequence aligns to the sequence bounded by
+gap-start and gap-end. Often found around DUP elements or at
+the beginning or end of sequences.
+[JMP] A relocation event, where the consistent ordering of
+alignments is disrupted. The coordinate columns specify the
+breakpoints of the relocation in the reference, and the
+gap-length between them. A negative gap-length indicates the
+relocation occurred around a repetitive sequence, and a
+positive length indicates unique sequence between the
+alignments.
+[INV] The same as a relocation event, however both the
+ordering and orientation of the alignments is disrupted. Note
+that for JMP and INV, generally two features will be output,
+one for the beginning of the inverted region, and another for
+the end of the inverted region.
+[SEQ] A translocation event that requires jumping to a new
+query sequence in order to continue aligning to the
+reference. If each input sequence is a chromosome, these
+features correspond to inter-chromosomal translocations.
+NOTES:
+The estimated number of features, take inversions for example,
+represents the number of breakpoints classified as bordering
+an inversion. Therefore, since there will be a breakpoint at
+both the beginning and the end of an inversion, the feature
+counts are roughly double the number of inversion events. In
+addition, all counts are estimates and do not represent the
+exact number of each evolutionary event.
+Summing the fifth column (ignoring negative values) yeilds an
+estimate of the total inserted sequence in the
+reference. Summing the fifth column after removing DUP
+features yields an estimate of the total amount of unique
+(unaligned) sequence in the reference. Note that unaligned
+sequences are not counted, and could represent additional
+"unique" sequences. Use the 'dnadiff' script if you must
+recover this information. Finally, the -q option switches
+references for queries, and uses the query coordinates for the
+analysis.
+** show-snps **
+DESCRIPTION:
+This program reports polymorphism contained in a delta encoded
+alignment file output by either nucmer or promer. It catalogs
+all of the single nucleotide polymorphisms (SNPs) and
+insertions/deletions within the delta file
+alignments. Polymorphisms are reported one per line, in a
+delimited fashion similar to show-coords. Pairing this program
+with the appropriate MUMmer tools can create an easy to use
+SNP pipeline for the rapid identification of putative SNPs
+between any two sequence sets.
+USAGE:
+show-snps  [options]  <deltafile>
+[options]    type 'show-snps -h' for a list of options.
+<deltafile>  the .delta output file from either nucmer or promer.
+OUTPUT:
+stdout  Standard output has column headers with the following
+meanings. Not all columns will be output by default,
+see 'show-snps -h' for switch to control the output.
+[P1]    SNP position in the reference.
+[SUB]   Character in the reference.
+[SUB]   Character in the query.
+[P2]    SNP position in the query.
+[BUFF]  Distance from this SNP to the nearest mismatch (end of
+alignment, indel, SNP, etc) in the same alignment.
+[DIST]  Distance from this SNP to the nearest sequence end.
+[R]     Number of repeat alignments which cover this reference
+position, >0 means repetitive sequence.
+[Q]     Number of repeat alignments which cover this query
+position, >0 means repetitive sequence.
+[LEN R] Length of the reference sequence.
+[LEN Q] Length of the query sequence.
+[CTX R] Surrounding context sequence in the reference.
+[CTX Q] Surrounding context sequence in the query.
+[FRM]   Reading frame for the reference sequence and the
+reading frame for the query sequence respectively. Simply
+'forward' 1, or 'reverse' -1 for nucmer data.
+[TAGS]  The reference FastA ID and the query FastA ID.
+NOTES:
+It is often helpful to run this with the -C option to assure
+reported SNPs are only reported from uniquely aligned regions.
+** show-tiling **
+DESCRIPTION:
+This program attempts to construct a tiling path out of the query
+contigs as mapped to the reference sequences.  Given the delta
+alignment information of a few long reference sequences and many small
+query contigs, 'show-tiling' will determine the best location on a
+reference for each contig.  Note that each contig may only be tiled
+once, so repetitive regions may cause this program some difficulty.
+This program is useful for aiding in the scaffolding and closure of an
+unfinished set of contigs, if a suitable, high similarity, reference
+genome is available.  Or, if using promer, 'show-tiling' will help
+in the identification of syntenic regions and their contig's mapping
+the the references.
+USAGE:
+show-tiling  [options]  <deltafile>
+[options]    type 'show-tiling -h' for a list of options.
+<deltafile>  the .delta output file from either nucmer or promer.
+OUTPUT:
+stdout  Standard output has 8 columns: start in reference, end in
+reference, gap between this contig and the next, length of this
+contig, alignment coverage of this contig, average percent
+identity of the alignments for this contig, orientation of this
+contig, contig ID. All matches to a reference are headed by the
+FASTA tag of that reference.  Output with the -a option is the
+same as 'show-coords -cl' when run on nucmer data.
+NOTES:
+When run with the -x option, 'show-tiling' will produce an XML output
+format that can be accepted by TIGR's open source scaffolding software
+'Bambus' as contig linking information.
+-- CONTACT INFORMATION --
+Please address questions and bug reports to: <mummer-help@lists.sourceforge.net>
+Last Revised May 12, 2005

Mercurial > repos > rliterman > csp2

comparison CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/opt/mummer-3.23/README @ 69:33d812a61356